Done. Please let me know if you run into any issues. Regards, Arvind
On Wed, Apr 8, 2015 at 3:58 PM, Roshan Naik <ros...@hortonworks.com> wrote: > roshan_naik is my login to cwiki.apache.org > > > > > On 4/8/15 3:55 PM, "Arvind Prabhakar" <arv...@apache.org> wrote: > > >Added Hari to the wiki. > > > >Roshan, I could not look you up on the wiki users, can you please tell me > >your username? If you don't have one yet, please register and let me know. > > > >Regards, > >Arvind Prabhakar > > > >On Wed, Apr 8, 2015 at 3:26 PM, Roshan Naik <ros...@hortonworks.com> > >wrote: > > > >> Arvind, > >> Please do let me know once you have granted me permission to the > >>wiki. > >> -roshan > >> > >> From: Hari Shreedharan <hshreedha...@cloudera.com<mailto: > >> hshreedha...@cloudera.com>> > >> Date: Thursday, April 2, 2015 3:06 PM > >> To: Roshan Naik <ros...@hortonworks.com<mailto:ros...@hortonworks.com>> > >> Cc: "dev@flume.apache.org<mailto:dev@flume.apache.org>" < > >> dev@flume.apache.org<mailto:dev@flume.apache.org>> > >> Subject: Re: Flume performance measurements > >> > >> Arvind - please could you grant Roshan access to the wiki. > >> > >> Thanks, > >> Hari > >> > >> > >> > >> On Thu, Apr 2, 2015 at 3:04 PM, Roshan Naik <ros...@hortonworks.com > >> <mailto:ros...@hortonworks.com>> wrote: > >> > >> Could u grant me write access to wiki ? > >> username: roshannaik > >> > >> > >> > >> On 4/2/15 2:53 PM, "Hari Shreedharan" <hshreedha...@cloudera.com > <mailto: > >> hshreedha...@cloudera.com>> wrote: > >> > >> >Roshan, > >> > > >> > > >> > > >> > > >> >Could you update the performance measurements page on our wiki with > >>this > >> >info? That would be more useful to reference. > >> > > >> > > >> > > >> > > >> >Thanks, Hari > >> > > >> >On Thu, Apr 2, 2015 at 2:34 PM, Roshan Naik <ros...@hortonworks.com > >> <mailto:ros...@hortonworks.com>> > >> >wrote: > >> > > >> >> Sample Flume v1.4 Measurements for reference: > >> >> Here are some sample measurements taken with a single agent and 500 > >> >>byte events. > >> >> Cluster Config: 20-node Hadoop cluster (1 name node and 19 data > >>nodes). > >> >> Machine Config: 24 cores - Xeon E5-2640 v2 @ 2.00GHz, 164 GB RAM. > >> >> 1. File channel with HDFS Sink (Sequence File): > >> >> Source: 4 x Exec Source, 100k batchSize > >> >> HDFS Sink Batch size: 500,000 > >> >> Channel: File > >> >> Number of data dirs: 8 > >> >> Events/Sec > >> >> Sink Count > >> >> 1 data dirs > >> >> 2 data dirs > >> >> 4 data dirs > >> >> 6 data dirs > >> >> 8 data dirs > >> >> 10 data dirs > >> >> 1 > >> >> 14.3 k > >> >> 2 > >> >> 21.9 k > >> >> 4 > >> >> 35.8 k > >> >> 8 > >> >> 24.8 k > >> >> 43.8 k > >> >> 72.5 k > >> >> 77 k > >> >> 78.6 k > >> >> 76.6 k > >> >> 10 > >> >> 58 k > >> >> 12 > >> >> 49.3 k > >> >> 49 k > >> >> Was looking for sweet spot in perf. So did not take measurements for > >> >>all data points on grid. Only too for the ones that made sense. For > >> >>example: when perf dropped by adding more sinks, did not take more > >> >>measurements for those rows. > >> >> 2. HDFS Sink: > >> >> Channel: Memory > >> >> # of HDFS > >> >> Sinks > >> >> Snappy > >> >> BatchSz:1.2mill > >> >> Snappy > >> >> BatchSz:1.4mill > >> >> Sequence File > >> >> BatchSz:1.2mill > >> >> 1 > >> >> 34.3 k > >> >> 33 k > >> >> 33 k > >> >> 2 > >> >> 71 k > >> >> 75 k > >> >> 69 k > >> >> 4 > >> >> 141 k > >> >> 145 k > >> >> 141 k > >> >> 8 > >> >> 271 k > >> >> 273 k > >> >> 251 k > >> >> 12 > >> >> 382 k > >> >> 380 k > >> >> 370 k > >> >> 16 > >> >> 478 k > >> >> 538 k > >> >> 486 k > >> >> Some simple observations : > >> >> * increasing number of dataDirs helps FC perf even on single disk > >> >>systems > >> >> * Increasing number of sinks helps > >> >> * Max throughput observed was about 538k events/sec for HDFS sink > >> >>which is approx 240MB/s > >> > >> > >> > >