[ https://issues.apache.org/jira/browse/PIG-3346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208363#comment-14208363 ]
Cheolsoo Park commented on PIG-3346: ------------------------------------ [~rohini], thank you for your suggestion. I just tried to set {{mapreduce.input.fileinputformat.split.maxsize}}, but that didn't help with s3 files. Few mapper tasks still load too many small files. My patch actually limits the # of combined splits and reports it as a counter. This is quite helpful to debug slow mappers for me. > New property that controls the number of combined splits > -------------------------------------------------------- > > Key: PIG-3346 > URL: https://issues.apache.org/jira/browse/PIG-3346 > Project: Pig > Issue Type: Improvement > Components: impl > Reporter: Cheolsoo Park > Assignee: Cheolsoo Park > Fix For: 0.15.0 > > Attachments: PIG-3346-2.patch, PIG-3346-3.patch, PIG-3346.patch > > > Currently, the size of combined splits can be configured by the > {{pig.maxCombinedSplitSize}} property. > Although this works fine most of time, it can lead to a undesired situation > where a single mapper ends up loading a lot of combined splits. Particularly, > this is bad if Pig uploads them from S3. > So it will be useful if the max number of combined splits can be configured > via a property something like {{pig.maxCombinedSplitNum}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)