[
https://issues.apache.org/jira/browse/HCATALOG-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459265#comment-13459265
]
Travis Crawford commented on HCATALOG-506:
------------------------------------------
Thanks for the patch Greg! A few comments:
(a) Please put the configuration key and default value in HCatConstants, which
is where we've consolidated our options. Combined with useful javadocs we'll
have a single place users can see what knobs are available. I hope to
automatically publish javadoc from CI at some point, so this will actually be
useful to people searching.
http://svn.apache.org/repos/asf/incubator/hcatalog/trunk/src/java/org/apache/hcatalog/common/HCatConstants.java
(b) This flag could cause issues for jobs loading multiple data sets, where the
flag helps loading one but harms another loaded data set. Since the value of
this flag comes from the jobContext I think if someone wanted to support such a
case (pig loader for example) some thought would need to be given about
correctly setting this property for each loader. I think this is okay as-is,
but I'd be interested to hear if anyone else has thoughts.
> desired number of input splits for large files
> ----------------------------------------------
>
> Key: HCATALOG-506
> URL: https://issues.apache.org/jira/browse/HCATALOG-506
> Project: HCatalog
> Issue Type: Improvement
> Affects Versions: 0.4
> Reporter: Greg Malewicz
> Labels: performance
> Attachments: HCATALOG-506.patch
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> Allow the user to specify the desired number of input splits through a new
> configuration parameter hcatalog.desiredNumInputSplits. Two existing
> parameters may also need to be specified: mapred.min.split.size and
> mapred.max.split.size. This is useful when there are few but large input
> files that we want to split into many splits, so as to enhance the
> parallelizm of loading the splits.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira