[ https://issues.apache.org/jira/browse/FLINK-9597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
swy updated FLINK-9597: ----------------------- Attachment: sample.png > Flink fail to scale! > -------------------- > > Key: FLINK-9597 > URL: https://issues.apache.org/jira/browse/FLINK-9597 > Project: Flink > Issue Type: Bug > Components: Core > Affects Versions: 1.5.0 > Reporter: swy > Priority: Major > Attachments: JM.png, TM.png, flink_app_parser_git.zip, sample.png, > scaleNotWork.png > > > Hi, we found that our Flink application with simple logic, which using > process function is not scale-able when scale from 8 parallelism onward even > though with sufficient resources. Below it the result which is capped at > ~250k TPS. No matter how we tune the parallelism of the operators it just not > scale, same to increase source parallelism. > Please refer to "scaleNotWork.png", > 1. fixed source parallelism 4, other operators parallelism 8 > 2. fixed source parallelism 4, other operators parallelism 16 > 3. fixed source parallelism 4, other operators parallelism 32 > 4. fixed source parallelism 6, other operators parallelism 8 > 5. fixed source parallelism 6, other operators parallelism 16 > 6. fixed source parallelism 6, other operators parallelism 32 > 7. fixed source parallelism 6, other operators parallelism 64 performance > worse than parallelism 32. > Sample source code attached(flink_app_parser_git.zip). It is a simple > program, parsing json record into object, and pass it to a empty logic > Flink's process function. Rocksdb is in used, and the source is generated by > the program itself. This could be reproduce easily. > We choose Flink because of it scalability, but this is not the case now, > appreciated if anyone could help as this is impacting our projects! thank you. > To run the program, sample parameters, > "aggrinterval=6000000 loop=7500000 statsd=1 psrc=4 pJ2R=32 pAggr=72 > URL=do36.comptel.com:8127" > * aggrinterval: time in ms for timer to trigger > * loop: how many row of data to feed > * statsd: to send result to statsd > * psrc: source parallelism > * pJ2R: parallelism of map operator(JsonRecTranslator) > * pAggr: parallelism of process+timer operator(AggregationDuration) > We are running in VMWare, 5 Task Managers and each has 32 slots. > lscpu > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 32 > On-line CPU(s) list: 0-31 > Thread(s) per core: 1 > Core(s) per socket: 1 > Socket(s): 32 > NUMA node(s): 1 > Vendor ID: GenuineIntel > CPU family: 6 > Model: 63 > Model name: Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz > Stepping: 2 > CPU MHz: 2593.993 > BogoMIPS: 5187.98 > Hypervisor vendor: VMware > Virtualization type: full > L1d cache: 32K > L1i cache: 32K > L2 cache: 256K > L3 cache: 20480K > NUMA node0 CPU(s): 0-31 > Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge > mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp > lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc > aperfmperf pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe > popcnt aes xsave avx f16c rdrand hypervisor lahf_lm epb fsgsbase smep dtherm > ida arat pln pts > total used free shared buff/cache > available > Mem: 98 24 72 0 1 > 72 > Swap: 3 0 3 > Please refer TM.png and JM.png for further details. > The test without any checkpoint enable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)