[ 
https://issues.apache.org/jira/browse/FLUME-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Hobbs updated FLUME-2180:
------------------------------

    Attachment: FLUME-2180-0.patch

The attached patch fixes the problem somewhat, by replacing an O(N log N) sort 
algorithm with an O(N) min algorithm. A more effective change might be to save 
off a copy of the sorted list and then re-fetch the list of files from the 
operating system only when the saved copy is exhausted.
                
> SpoolDirectorySource performs poorly when there are thousands of files
> ----------------------------------------------------------------------
>
>                 Key: FLUME-2180
>                 URL: https://issues.apache.org/jira/browse/FLUME-2180
>             Project: Flume
>          Issue Type: Improvement
>          Components: Sinks+Sources
>    Affects Versions: v1.4.0
>         Environment: Flume 1.4.0 running on Ubuntu 12.04.2 LTS x86_64
>            Reporter: Mike Hobbs
>            Priority: Trivial
>              Labels: patch, performance
>         Attachments: FLUME-2180-0.patch
>
>
> org.apache.flume.client.avro.ReliableSpoolingFileEventReader.getNextFile() 
> spikes the CPU when there are thousands of files in the spoolDirectory. So 
> much so that the source was unable to keep up and even more files started 
> accumulating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to