[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13804712#comment-13804712
 ] 

Sangjin Lee commented on MAPREDUCE-5186:
----------------------------------------

I understand that is the mrv1 behavior. But for CombineFileInputFormat, "global 
splits" are definitely possibilities, and I'm not sure whether it would result 
in a correct split if we simply picked the first max_block_location locations 
in that case...

We may need [~tomwhite]'s input for this, but I think it is acceptable to log 
WARNING if the block location count exceeds max block locations but let it 
proceed. Thoughts?

> mapreduce.job.max.split.locations causes some splits created by 
> CombineFileInputFormat to fail
> ----------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5186
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5186
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: job submission
>    Affects Versions: 2.0.4-alpha, 2.2.0
>            Reporter: Sangjin Lee
>            Assignee: Robert Parker
>            Priority: Critical
>         Attachments: MAPREDUCE-5186v1.patch, MAPREDUCE-5186v2.patch
>
>
> CombineFileInputFormat can easily create splits that can come from many 
> different locations (during the last pass of creating "global" splits). 
> However, we observe that this often runs afoul of the 
> mapreduce.job.max.split.locations check that's done by JobSplitWriter.
> The default value for mapreduce.job.max.split.locations is 10, and with any 
> decent size cluster, CombineFileInputFormat creates splits that are well 
> above this limit.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to