Re: Flume performance measurements

Roshan Naik Wed, 08 Apr 2015 16:00:12 -0700

roshan_naik is my login to cwiki.apache.org




On 4/8/15 3:55 PM, "Arvind Prabhakar" <arv...@apache.org> wrote:

>Added Hari to the wiki.
>
>Roshan, I could not look you up on the wiki users, can you please tell me
>your username? If you don't have one yet, please register and let me know.
>
>Regards,
>Arvind Prabhakar
>
>On Wed, Apr 8, 2015 at 3:26 PM, Roshan Naik <ros...@hortonworks.com>
>wrote:
>
>> Arvind,
>>   Please do let me know once  you have granted me permission to the
>>wiki.
>> -roshan
>>
>> From: Hari Shreedharan <hshreedha...@cloudera.com<mailto:
>> hshreedha...@cloudera.com>>
>> Date: Thursday, April 2, 2015 3:06 PM
>> To: Roshan Naik <ros...@hortonworks.com<mailto:ros...@hortonworks.com>>
>> Cc: "dev@flume.apache.org<mailto:dev@flume.apache.org>" <
>> dev@flume.apache.org<mailto:dev@flume.apache.org>>
>> Subject: Re: Flume performance measurements
>>
>> Arvind - please could you grant Roshan access to the wiki.
>>
>> Thanks,
>> Hari
>>
>>
>>
>> On Thu, Apr 2, 2015 at 3:04 PM, Roshan Naik <ros...@hortonworks.com
>> <mailto:ros...@hortonworks.com>> wrote:
>>
>> Could u grant me write access to wiki ?
>> username: roshannaik
>>
>>
>>
>> On 4/2/15 2:53 PM, "Hari Shreedharan" <hshreedha...@cloudera.com<mailto:
>> hshreedha...@cloudera.com>> wrote:
>>
>> >Roshan,
>> >
>> >
>> >
>> >
>> >Could you update the performance measurements page on our wiki with
>>this
>> >info? That would be more useful to reference.
>> >
>> >
>> >
>> >
>> >Thanks, Hari
>> >
>> >On Thu, Apr 2, 2015 at 2:34 PM, Roshan Naik <ros...@hortonworks.com
>> <mailto:ros...@hortonworks.com>>
>> >wrote:
>> >
>> >> Sample Flume v1.4 Measurements for reference:
>> >> Here are some sample measurements taken with a single agent and 500
>> >>byte events.
>> >> Cluster Config: 20-node Hadoop cluster (1 name node and 19 data
>>nodes).
>> >> Machine Config: 24 cores - Xeon E5-2640 v2 @ 2.00GHz, 164 GB RAM.
>> >> 1. File channel with HDFS Sink (Sequence File):
>> >> Source: 4 x Exec Source, 100k batchSize
>> >> HDFS Sink Batch size: 500,000
>> >> Channel: File
>> >> Number of data dirs: 8
>> >> Events/Sec
>> >> Sink Count
>> >> 1 data dirs
>> >> 2 data dirs
>> >> 4 data dirs
>> >> 6 data dirs
>> >> 8 data dirs
>> >> 10 data dirs
>> >> 1
>> >> 14.3 k
>> >> 2
>> >> 21.9 k
>> >> 4
>> >> 35.8 k
>> >> 8
>> >> 24.8 k
>> >> 43.8 k
>> >> 72.5 k
>> >> 77 k
>> >> 78.6 k
>> >> 76.6 k
>> >> 10
>> >> 58 k
>> >> 12
>> >> 49.3 k
>> >> 49 k
>> >> Was looking for sweet spot in perf. So did not take measurements for
>> >>all data points on grid. Only too for the ones that made sense. For
>> >>example: when perf dropped by adding more sinks, did not take more
>> >>measurements for those rows.
>> >> 2. HDFS Sink:
>> >> Channel: Memory
>> >> # of HDFS
>> >> Sinks
>> >> Snappy
>> >> BatchSz:1.2mill
>> >> Snappy
>> >> BatchSz:1.4mill
>> >> Sequence File
>> >> BatchSz:1.2mill
>> >> 1
>> >> 34.3 k
>> >> 33 k
>> >> 33 k
>> >> 2
>> >> 71 k
>> >> 75 k
>> >> 69 k
>> >> 4
>> >> 141 k
>> >> 145 k
>> >> 141 k
>> >> 8
>> >> 271 k
>> >> 273 k
>> >> 251 k
>> >> 12
>> >> 382 k
>> >> 380 k
>> >> 370 k
>> >> 16
>> >> 478 k
>> >> 538 k
>> >> 486 k
>> >> Some simple observations :
>> >> * increasing number of dataDirs helps FC perf even on single disk
>> >>systems
>> >> * Increasing number of sinks helps
>> >> * Max throughput observed was about 538k events/sec for HDFS sink
>> >>which is approx 240MB/s
>>
>>
>>

Re: Flume performance measurements

Reply via email to