[ 
https://issues.apache.org/jira/browse/FLUME-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735711#comment-13735711
 ] 

Hari Shreedharan commented on FLUME-2155:
-----------------------------------------

In fact, since all of the takes would be at the top of the queue, it might be a 
smarter idea to move events down than up. Another idea would be to start from 
the middle of the queue, and pull first half events down and the second half 
events up (this will complicate things, and I am not entirely sure this would 
yield a significant advantage over moving things down - since if there are 
takes in 2nd half of the queue, the queue is likely to be small enough to cause 
any real latency in the first approach mentioned in this comment).
                
> Improve replay time
> -------------------
>
>                 Key: FLUME-2155
>                 URL: https://issues.apache.org/jira/browse/FLUME-2155
>             Project: Flume
>          Issue Type: Bug
>            Reporter: Hari Shreedharan
>            Assignee: Hari Shreedharan
>         Attachments: SmartCompaction.pdf
>
>
> File Channel has scaled so well that people now run channels with sizes in 
> 100's of millions of events. Turns out, replay can be crazy slow even between 
> checkpoints at this scale - because of the remove() method in FlumeEventQueue 
> moving every pointer that follows the one being removed (1 remove causes 99 
> million+ moves for a channel of 100 million!). There are several ways of 
> improving - one being move at the end of replay - sort of like a compaction. 
> Another is to use the fact that all removes happen from the top of the queue, 
> so move the first "k" events out to hashset and remove from there - we can 
> find k using the write id of the last checkpoint and the current one. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to