Measuring Flume perf

Roshan Naik Fri, 13 Dec 2013 18:22:38 -0800

Some of the folks on this dev list may be aware that I am doing some  flume
performance measurements.


Here is some preliminary data:


I initially started with Avro source + FC + 4 HDFS sinks.  Measurements
indicated the agent was able to only reach around 20k events per second.  I
tried with event sizes of 1kB and 500 bytes.


I replaced the hdfs sinks with null sinks just to narrow down the source of
the bottle neck. For the same reason i replaced the source with an exec
source that which basically in a loop will cat the same 1GB input file many
times.

*SYSTEM STATS:*
There is a single Disk on the machine but the utilization is very low as
can be seen from the *iostat* output below:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.37    0.00    0.44    0.04    0.00   97.16

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              95.98       655.31      6603.58 1348373762 13587517606


Top output also shows cpu & memory are not bottleneck:

top - 17:21:57 up 23 days, 19:34,  2 users,  load average: 3.44, 3.17, 2.72
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
Cpu(s):  *5.9%us,*  3.3%sy,  0.0%ni, 90.7%id,  0.2%wa,  0.0%hi,  0.0%si,
 0.0%st
Mem:  65937984k total, 22648200k used, 43289784k free,   198448k buffers
Swap:  1048568k total,    14268k used,  1034300k free, 19619416k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND


 6255 root      20   0 12.3g 1.4g 125m S *219.4*  2.2  19:57.64 java



*FLUME MEASUREMENTS*

Since there was spare CPU & Mem & Disk available, I ran a 2nd agent and
noticed that it was able to independently deliver approx. 20k events /sec.
With third agent also same perf was observed.
So system does not seem to be bottleneck.

The channel size remains small and steady so the ingestion rate is the
bottleneck not the drain rate.

Varying the batch size on exec source between 20,100, 500 & 1000  yielded
the foll numbers for ingestion rate with event size of 1024bytes:

FC + exec (batch size     20) + 4 null sink =  18k events/sec
FC + exec (batch size   100) + 4 null sink =   24.2k eps
FC + exec (batch size   500) + 4 null sink =   24k eps
FC + exec (batch size 1000) + 4 null sink =   23.2k eps

Just for the heck of it, i replaced FC with MemCh

FC + exec (batch size 1000) + 4 null sink =   123.4k eps


A few runs with Event size of 500 bytes also gave me numbers in the same
ballpark.

Here is my FC config:

nontx_agent01.channels.fc.checkpointDir = /flume/checkpoint/agent1
nontx_agent01.channels.fc.dataDirs = /flume/data/agent1
nontx_agent01.channels.fc.capacity = 140000000
nontx_agent01.channels.fc.transactionCapacity = 240000


In this setup, these numbers appear to be indicating that Events/s seems to
be a primary bottleneck in FC, and not much the event size or batch size or
cpu/disk capacity.


-Roshan

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Measuring Flume perf

Reply via email to