roshan_naik is my login to cwiki.apache.org
On 4/8/15 3:55 PM, "Arvind Prabhakar" <arv...@apache.org> wrote: >Added Hari to the wiki. > >Roshan, I could not look you up on the wiki users, can you please tell me >your username? If you don't have one yet, please register and let me know. > >Regards, >Arvind Prabhakar > >On Wed, Apr 8, 2015 at 3:26 PM, Roshan Naik <ros...@hortonworks.com> >wrote: > >> Arvind, >> Please do let me know once you have granted me permission to the >>wiki. >> -roshan >> >> From: Hari Shreedharan <hshreedha...@cloudera.com<mailto: >> hshreedha...@cloudera.com>> >> Date: Thursday, April 2, 2015 3:06 PM >> To: Roshan Naik <ros...@hortonworks.com<mailto:ros...@hortonworks.com>> >> Cc: "dev@flume.apache.org<mailto:dev@flume.apache.org>" < >> dev@flume.apache.org<mailto:dev@flume.apache.org>> >> Subject: Re: Flume performance measurements >> >> Arvind - please could you grant Roshan access to the wiki. >> >> Thanks, >> Hari >> >> >> >> On Thu, Apr 2, 2015 at 3:04 PM, Roshan Naik <ros...@hortonworks.com >> <mailto:ros...@hortonworks.com>> wrote: >> >> Could u grant me write access to wiki ? >> username: roshannaik >> >> >> >> On 4/2/15 2:53 PM, "Hari Shreedharan" <hshreedha...@cloudera.com<mailto: >> hshreedha...@cloudera.com>> wrote: >> >> >Roshan, >> > >> > >> > >> > >> >Could you update the performance measurements page on our wiki with >>this >> >info? That would be more useful to reference. >> > >> > >> > >> > >> >Thanks, Hari >> > >> >On Thu, Apr 2, 2015 at 2:34 PM, Roshan Naik <ros...@hortonworks.com >> <mailto:ros...@hortonworks.com>> >> >wrote: >> > >> >> Sample Flume v1.4 Measurements for reference: >> >> Here are some sample measurements taken with a single agent and 500 >> >>byte events. >> >> Cluster Config: 20-node Hadoop cluster (1 name node and 19 data >>nodes). >> >> Machine Config: 24 cores - Xeon E5-2640 v2 @ 2.00GHz, 164 GB RAM. >> >> 1. File channel with HDFS Sink (Sequence File): >> >> Source: 4 x Exec Source, 100k batchSize >> >> HDFS Sink Batch size: 500,000 >> >> Channel: File >> >> Number of data dirs: 8 >> >> Events/Sec >> >> Sink Count >> >> 1 data dirs >> >> 2 data dirs >> >> 4 data dirs >> >> 6 data dirs >> >> 8 data dirs >> >> 10 data dirs >> >> 1 >> >> 14.3 k >> >> 2 >> >> 21.9 k >> >> 4 >> >> 35.8 k >> >> 8 >> >> 24.8 k >> >> 43.8 k >> >> 72.5 k >> >> 77 k >> >> 78.6 k >> >> 76.6 k >> >> 10 >> >> 58 k >> >> 12 >> >> 49.3 k >> >> 49 k >> >> Was looking for sweet spot in perf. So did not take measurements for >> >>all data points on grid. Only too for the ones that made sense. For >> >>example: when perf dropped by adding more sinks, did not take more >> >>measurements for those rows. >> >> 2. HDFS Sink: >> >> Channel: Memory >> >> # of HDFS >> >> Sinks >> >> Snappy >> >> BatchSz:1.2mill >> >> Snappy >> >> BatchSz:1.4mill >> >> Sequence File >> >> BatchSz:1.2mill >> >> 1 >> >> 34.3 k >> >> 33 k >> >> 33 k >> >> 2 >> >> 71 k >> >> 75 k >> >> 69 k >> >> 4 >> >> 141 k >> >> 145 k >> >> 141 k >> >> 8 >> >> 271 k >> >> 273 k >> >> 251 k >> >> 12 >> >> 382 k >> >> 380 k >> >> 370 k >> >> 16 >> >> 478 k >> >> 538 k >> >> 486 k >> >> Some simple observations : >> >> * increasing number of dataDirs helps FC perf even on single disk >> >>systems >> >> * Increasing number of sinks helps >> >> * Max throughput observed was about 538k events/sec for HDFS sink >> >>which is approx 240MB/s >> >> >>