[ 
https://issues.apache.org/jira/browse/HADOOP-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Lord updated HADOOP-9198:
------------------------------

    Component/s: documentation
    
> Update Flume Wiki and User Guide to provide clearer explanation of BatchSize, 
> ChannelCapacity and ChannelTransactionCapacity properties.
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-9198
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9198
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: documentation
>            Reporter: Jeff Lord
>
> It would be good if we refined our wiki and user guide to help explain the 
> following in a more clear fashion:
> 1) Batch Size 
>   1.a) When configured by client code using the flume-core-sdk , to send 
> events to flume avro source.
> The flume client sdk has an appendBatch method. This will take a list of 
> events and send them to the source as a batch. This is the size of the number 
> of events to be passed to the source at one time.
>   1.b) When set as a parameter on HDFS sink (or other sinks which support 
> BatchSize parameter)
> This is the number of events written to file before it is flushed to HDFS
> 2)
>   2.a) Channel Capacity
> This is the maximum capacity number of events of the channel.
>   2.b) Channel Transaction Capacity.
> This is the max number of events stored in the channel per transaction.
> How will setting these parameters to different values, affect throughput, 
> latency in event flow?
> In general you will see better throughput by using memory channel as opposed 
> to using file channel at the loss of durability.
> The channel capacity is going to need to be sized such that it is large 
> enough to hold as many events as will be added to it by upstream agents. 
> Ideal flow would see the sink draining events from the channel faster than it 
> is having events added by its source.
> The channel transaction capacity will need to be smaller than the channel 
> capacity.
> e.g. If your Channel capacity is set to 10000 than Channel Transaction 
> Capacity should be set to something like 100.
> Specifically if we have clients with varying frequency of event generation, 
> i.e. some clients generating thousands of events/sec, while
> others at a much slower rate, what effect will different values of these 
> params have on these clients ?
> Transaction Capacity is going to be what throttles or limits how many events 
> the source can put into the channel. This going to vary depending on how many 
> tiers of agents/collectors you have setup.
> In general though this should probably be equal to whatever you have the 
> batch size set to in your client.
> With regards to the hdfs batch size, the larger your batch size the better 
> performance will be. However, keep in mind that if a transaction fails the 
> entire transaction will be replayed which could have the implication of 
> duplicate events downstream.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to