Re: Flume performance measurements

Hari Shreedharan Thu, 02 Apr 2015 15:08:29 -0700

Arvind - please could you grant Roshan access to the wiki.




Thanks, Hari

On Thu, Apr 2, 2015 at 3:04 PM, Roshan Naik <[email protected]>
wrote:

> Could u grant me write access to wiki ?
> username: roshannaik
> On 4/2/15 2:53 PM, "Hari Shreedharan" <[email protected]> wrote:
>>Roshan, 
>>
>>
>>
>>
>>Could you update the performance measurements page on our wiki with this
>>info? That would be more useful to reference.
>>
>>
>>
>>
>>Thanks, Hari
>>
>>On Thu, Apr 2, 2015 at 2:34 PM, Roshan Naik <[email protected]>
>>wrote:
>>
>>> Sample Flume v1.4 Measurements for reference:
>>> Here are some sample measurements taken with a single agent and 500
>>>byte events.
>>> Cluster Config: 20-node Hadoop cluster (1 name node and 19 data nodes).
>>> Machine Config: 24 cores - Xeon E5-2640 v2 @ 2.00GHz, 164 GB RAM.
>>> 1.     File channel with HDFS Sink (Sequence File):
>>> Source: 4 x Exec Source, 100k batchSize
>>> HDFS Sink Batch size: 500,000
>>> Channel: File
>>> Number of data dirs: 8
>>> Events/Sec
>>> Sink Count
>>> 1 data dirs
>>> 2 data dirs
>>> 4 data dirs
>>> 6 data dirs
>>> 8 data dirs
>>> 10 data dirs
>>> 1
>>> 14.3 k
>>> 2
>>> 21.9 k
>>> 4
>>> 35.8 k
>>> 8
>>> 24.8 k
>>> 43.8 k
>>> 72.5 k
>>> 77 k
>>> 78.6 k
>>> 76.6 k
>>> 10
>>> 58 k
>>> 12
>>> 49.3 k
>>> 49 k
>>> Was looking for sweet spot in perf. So did not take measurements for
>>>all data  points on grid. Only too for the ones that made sense. For
>>>example: when perf dropped by adding more sinks, did not take more
>>>measurements for those rows.
>>> 2.     HDFS Sink:
>>> Channel: Memory
>>> # of  HDFS
>>> Sinks
>>> Snappy
>>> BatchSz:1.2mill
>>> Snappy
>>> BatchSz:1.4mill
>>> Sequence File
>>> BatchSz:1.2mill
>>> 1
>>> 34.3 k
>>> 33 k
>>> 33 k
>>> 2
>>> 71 k
>>> 75 k
>>> 69 k
>>> 4
>>> 141 k
>>> 145 k
>>> 141 k
>>> 8
>>> 271 k
>>> 273 k
>>> 251 k
>>> 12
>>> 382 k
>>> 380 k
>>> 370 k
>>> 16
>>> 478 k
>>> 538 k
>>> 486 k
>>> Some simple observations :
>>>   *   increasing number of dataDirs helps FC perf even on single disk
>>>systems
>>>   *   Increasing  number of sinks helps
>>>   *   Max throughput observed was about 538k events/sec for HDFS sink
>>>which is approx 240MB/s

Re: Flume performance measurements

Reply via email to