Arvind - please could you grant Roshan access to the wiki.
Thanks, Hari On Thu, Apr 2, 2015 at 3:04 PM, Roshan Naik <ros...@hortonworks.com> wrote: > Could u grant me write access to wiki ? > username: roshannaik > On 4/2/15 2:53 PM, "Hari Shreedharan" <hshreedha...@cloudera.com> wrote: >>Roshan, >> >> >> >> >>Could you update the performance measurements page on our wiki with this >>info? That would be more useful to reference. >> >> >> >> >>Thanks, Hari >> >>On Thu, Apr 2, 2015 at 2:34 PM, Roshan Naik <ros...@hortonworks.com> >>wrote: >> >>> Sample Flume v1.4 Measurements for reference: >>> Here are some sample measurements taken with a single agent and 500 >>>byte events. >>> Cluster Config: 20-node Hadoop cluster (1 name node and 19 data nodes). >>> Machine Config: 24 cores - Xeon E5-2640 v2 @ 2.00GHz, 164 GB RAM. >>> 1. File channel with HDFS Sink (Sequence File): >>> Source: 4 x Exec Source, 100k batchSize >>> HDFS Sink Batch size: 500,000 >>> Channel: File >>> Number of data dirs: 8 >>> Events/Sec >>> Sink Count >>> 1 data dirs >>> 2 data dirs >>> 4 data dirs >>> 6 data dirs >>> 8 data dirs >>> 10 data dirs >>> 1 >>> 14.3 k >>> 2 >>> 21.9 k >>> 4 >>> 35.8 k >>> 8 >>> 24.8 k >>> 43.8 k >>> 72.5 k >>> 77 k >>> 78.6 k >>> 76.6 k >>> 10 >>> 58 k >>> 12 >>> 49.3 k >>> 49 k >>> Was looking for sweet spot in perf. So did not take measurements for >>>all data points on grid. Only too for the ones that made sense. For >>>example: when perf dropped by adding more sinks, did not take more >>>measurements for those rows. >>> 2. HDFS Sink: >>> Channel: Memory >>> # of HDFS >>> Sinks >>> Snappy >>> BatchSz:1.2mill >>> Snappy >>> BatchSz:1.4mill >>> Sequence File >>> BatchSz:1.2mill >>> 1 >>> 34.3 k >>> 33 k >>> 33 k >>> 2 >>> 71 k >>> 75 k >>> 69 k >>> 4 >>> 141 k >>> 145 k >>> 141 k >>> 8 >>> 271 k >>> 273 k >>> 251 k >>> 12 >>> 382 k >>> 380 k >>> 370 k >>> 16 >>> 478 k >>> 538 k >>> 486 k >>> Some simple observations : >>> * increasing number of dataDirs helps FC perf even on single disk >>>systems >>> * Increasing number of sinks helps >>> * Max throughput observed was about 538k events/sec for HDFS sink >>>which is approx 240MB/s