Roshan, 



Could you update the performance measurements page on our wiki with this info? 
That would be more useful to reference.




Thanks, Hari

On Thu, Apr 2, 2015 at 2:34 PM, Roshan Naik <ros...@hortonworks.com>
wrote:

> Sample Flume v1.4 Measurements for reference:
> Here are some sample measurements taken with a single agent and 500 byte 
> events.
> Cluster Config: 20-node Hadoop cluster (1 name node and 19 data nodes).
> Machine Config: 24 cores - Xeon E5-2640 v2 @ 2.00GHz, 164 GB RAM.
> 1.     File channel with HDFS Sink (Sequence File):
> Source: 4 x Exec Source, 100k batchSize
> HDFS Sink Batch size: 500,000
> Channel: File
> Number of data dirs: 8
> Events/Sec
> Sink Count
> 1 data dirs
> 2 data dirs
> 4 data dirs
> 6 data dirs
> 8 data dirs
> 10 data dirs
> 1
> 14.3 k
> 2
> 21.9 k
> 4
> 35.8 k
> 8
> 24.8 k
> 43.8 k
> 72.5 k
> 77 k
> 78.6 k
> 76.6 k
> 10
> 58 k
> 12
> 49.3 k
> 49 k
> Was looking for sweet spot in perf. So did not take measurements for all data 
>  points on grid. Only too for the ones that made sense. For example: when 
> perf dropped by adding more sinks, did not take more measurements for those 
> rows.
> 2.     HDFS Sink:
> Channel: Memory
> # of  HDFS
> Sinks
> Snappy
> BatchSz:1.2mill
> Snappy
> BatchSz:1.4mill
> Sequence File
> BatchSz:1.2mill
> 1
> 34.3 k
> 33 k
> 33 k
> 2
> 71 k
> 75 k
> 69 k
> 4
> 141 k
> 145 k
> 141 k
> 8
> 271 k
> 273 k
> 251 k
> 12
> 382 k
> 380 k
> 370 k
> 16
> 478 k
> 538 k
> 486 k
> Some simple observations :
>   *   increasing number of dataDirs helps FC perf even on single disk systems
>   *   Increasing  number of sinks helps
>   *   Max throughput observed was about 538k events/sec for HDFS sink which 
> is approx 240MB/s

Reply via email to