Roshan,
Could you update the performance measurements page on our wiki with this info? That would be more useful to reference. Thanks, Hari On Thu, Apr 2, 2015 at 2:34 PM, Roshan Naik <ros...@hortonworks.com> wrote: > Sample Flume v1.4 Measurements for reference: > Here are some sample measurements taken with a single agent and 500 byte > events. > Cluster Config: 20-node Hadoop cluster (1 name node and 19 data nodes). > Machine Config: 24 cores - Xeon E5-2640 v2 @ 2.00GHz, 164 GB RAM. > 1. File channel with HDFS Sink (Sequence File): > Source: 4 x Exec Source, 100k batchSize > HDFS Sink Batch size: 500,000 > Channel: File > Number of data dirs: 8 > Events/Sec > Sink Count > 1 data dirs > 2 data dirs > 4 data dirs > 6 data dirs > 8 data dirs > 10 data dirs > 1 > 14.3 k > 2 > 21.9 k > 4 > 35.8 k > 8 > 24.8 k > 43.8 k > 72.5 k > 77 k > 78.6 k > 76.6 k > 10 > 58 k > 12 > 49.3 k > 49 k > Was looking for sweet spot in perf. So did not take measurements for all data > points on grid. Only too for the ones that made sense. For example: when > perf dropped by adding more sinks, did not take more measurements for those > rows. > 2. HDFS Sink: > Channel: Memory > # of HDFS > Sinks > Snappy > BatchSz:1.2mill > Snappy > BatchSz:1.4mill > Sequence File > BatchSz:1.2mill > 1 > 34.3 k > 33 k > 33 k > 2 > 71 k > 75 k > 69 k > 4 > 141 k > 145 k > 141 k > 8 > 271 k > 273 k > 251 k > 12 > 382 k > 380 k > 370 k > 16 > 478 k > 538 k > 486 k > Some simple observations : > * increasing number of dataDirs helps FC perf even on single disk systems > * Increasing number of sinks helps > * Max throughput observed was about 538k events/sec for HDFS sink which > is approx 240MB/s