[
https://issues.apache.org/jira/browse/CRUNCH-256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Josh Wills resolved CRUNCH-256.
-------------------------------
Resolution: Fixed
> SequentialFileNamingScheme should cache the # of files in the target
> directory after the first read
> ---------------------------------------------------------------------------------------------------
>
> Key: CRUNCH-256
> URL: https://issues.apache.org/jira/browse/CRUNCH-256
> Project: Crunch
> Issue Type: Bug
> Affects Versions: 0.7.0
> Reporter: Josh Wills
> Assignee: Josh Wills
> Fix For: 0.8.0
>
> Attachments: CRUNCH-256b.patch, CRUNCH-256.patch
>
>
> After a job finishes running, the post-job hooks rename the files from a temp
> output directory to the target output directory. When we have lots of files,
> this move can take a long time, and I traced the performance issue to the
> fact that SequentialFileNamingScheme does a listStatus() on the output
> directory for every file that gets moved. If SequentialFileNamingScheme just
> does this check once and then increments an internal counter, we can
> significantly decrease the performance overhead involved with the move.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira