[
https://issues.apache.org/jira/browse/FLUME-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mike Hobbs updated FLUME-2180:
------------------------------
Attachment: FLUME-2180-0.patch
The attached patch fixes the problem somewhat, by replacing an O(N log N) sort
algorithm with an O(N) min algorithm. A more effective change might be to save
off a copy of the sorted list and then re-fetch the list of files from the
operating system only when the saved copy is exhausted.
> SpoolDirectorySource performs poorly when there are thousands of files
> ----------------------------------------------------------------------
>
> Key: FLUME-2180
> URL: https://issues.apache.org/jira/browse/FLUME-2180
> Project: Flume
> Issue Type: Improvement
> Components: Sinks+Sources
> Affects Versions: v1.4.0
> Environment: Flume 1.4.0 running on Ubuntu 12.04.2 LTS x86_64
> Reporter: Mike Hobbs
> Priority: Trivial
> Labels: patch, performance
> Attachments: FLUME-2180-0.patch
>
>
> org.apache.flume.client.avro.ReliableSpoolingFileEventReader.getNextFile()
> spikes the CPU when there are thousands of files in the spoolDirectory. So
> much so that the source was unable to keep up and even more files started
> accumulating.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira