[ 
https://issues.apache.org/jira/browse/HCATALOG-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462061#comment-13462061
 ] 

Greg Malewicz commented on HCATALOG-506:
----------------------------------------

Per your suggestion of thoughts from others, I have waited a few days. A 
revision is attached. Please let me know if you have any more comments.
                
> desired number of input splits for large files
> ----------------------------------------------
>
>                 Key: HCATALOG-506
>                 URL: https://issues.apache.org/jira/browse/HCATALOG-506
>             Project: HCatalog
>          Issue Type: Improvement
>    Affects Versions: 0.4
>            Reporter: Greg Malewicz
>              Labels: performance
>         Attachments: HCATALOG-506.patch, HCATALOG-506-revised.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Allow the user to specify the desired number of input splits through a new 
> configuration parameter hcatalog.desiredNumInputSplits. Two existing 
> parameters may also need to be specified: mapred.min.split.size and 
> mapred.max.split.size. This is useful when there are few but large input 
> files that we want to split into many splits, so as to enhance the 
> parallelizm of loading the splits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to