Not knowing much about Flume any more, this smells like a case of only a subset of CPU cores being utilized. Top should show this.
Otis Solr & ElasticSearch Support http://sematext.com/ On Dec 13, 2013 9:22 PM, "Roshan Naik" <[email protected]> wrote: > Some of the folks on this dev list may be aware that I am doing some flume > performance measurements. > > Here is some preliminary data: > > > I initially started with Avro source + FC + 4 HDFS sinks. Measurements > indicated the agent was able to only reach around 20k events per second. I > tried with event sizes of 1kB and 500 bytes. > > > I replaced the hdfs sinks with null sinks just to narrow down the source of > the bottle neck. For the same reason i replaced the source with an exec > source that which basically in a loop will cat the same 1GB input file many > times. > > *SYSTEM STATS:* > There is a single Disk on the machine but the utilization is very low as > can be seen from the *iostat* output below: > > avg-cpu: %user %nice %system %iowait %steal %idle > 2.37 0.00 0.44 0.04 0.00 97.16 > > Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn > sda 95.98 655.31 6603.58 1348373762 13587517606 > > > Top output also shows cpu & memory are not bottleneck: > > top - 17:21:57 up 23 days, 19:34, 2 users, load average: 3.44, 3.17, 2.72 > Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie > Cpu(s): *5.9%us,* 3.3%sy, 0.0%ni, 90.7%id, 0.2%wa, 0.0%hi, 0.0%si, > 0.0%st > Mem: 65937984k total, 22648200k used, 43289784k free, 198448k buffers > Swap: 1048568k total, 14268k used, 1034300k free, 19619416k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > > 6255 root 20 0 12.3g 1.4g 125m S *219.4* 2.2 19:57.64 java > > > > *FLUME MEASUREMENTS* > > Since there was spare CPU & Mem & Disk available, I ran a 2nd agent and > noticed that it was able to independently deliver approx. 20k events /sec. > With third agent also same perf was observed. > So system does not seem to be bottleneck. > > The channel size remains small and steady so the ingestion rate is the > bottleneck not the drain rate. > > Varying the batch size on exec source between 20,100, 500 & 1000 yielded > the foll numbers for ingestion rate with event size of 1024bytes: > > FC + exec (batch size 20) + 4 null sink = 18k events/sec > FC + exec (batch size 100) + 4 null sink = 24.2k eps > FC + exec (batch size 500) + 4 null sink = 24k eps > FC + exec (batch size 1000) + 4 null sink = 23.2k eps > > Just for the heck of it, i replaced FC with MemCh > > FC + exec (batch size 1000) + 4 null sink = 123.4k eps > > > A few runs with Event size of 500 bytes also gave me numbers in the same > ballpark. > > Here is my FC config: > > nontx_agent01.channels.fc.checkpointDir = /flume/checkpoint/agent1 > nontx_agent01.channels.fc.dataDirs = /flume/data/agent1 > nontx_agent01.channels.fc.capacity = 140000000 > nontx_agent01.channels.fc.transactionCapacity = 240000 > > > In this setup, these numbers appear to be indicating that Events/s seems to > be a primary bottleneck in FC, and not much the event size or batch size or > cpu/disk capacity. > > > -Roshan > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. >
