[ 
https://issues.apache.org/jira/browse/CRUNCH-256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Wills resolved CRUNCH-256.
-------------------------------

    Resolution: Fixed
    
> SequentialFileNamingScheme should cache the # of files in the target 
> directory after the first read
> ---------------------------------------------------------------------------------------------------
>
>                 Key: CRUNCH-256
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-256
>             Project: Crunch
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Josh Wills
>            Assignee: Josh Wills
>             Fix For: 0.8.0
>
>         Attachments: CRUNCH-256b.patch, CRUNCH-256.patch
>
>
> After a job finishes running, the post-job hooks rename the files from a temp 
> output directory to the target output directory. When we have lots of files, 
> this move can take a long time, and I traced the performance issue to the 
> fact that SequentialFileNamingScheme does a listStatus() on the output 
> directory for every file that gets moved. If SequentialFileNamingScheme just 
> does this check once and then increments an internal counter, we can 
> significantly decrease the performance overhead involved with the move.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to