[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805358#comment-13805358
 ] 

Jason Lowe commented on MAPREDUCE-5186:
---------------------------------------

Maybe I'm misreading the code, but the first patch essentially is the MR1 
behavior.  It logs a message and truncates the splits to the configured maximum 
locations.  IMHO we either should replicate the MR1 behavior or we should just 
allow the splits to be what they are and not log anything.  I'm not sure 
there's a lot of utility to warning the user about a lot of splits if we're not 
going to do anything about it.  In MR1 the warning is informing the user that 
some of their split locations were ignored due to the truncation, but if we're 
not going to truncate is there a point in warning?

[~sjlee0] points out that truncating the splits could be problematic for 
CombineFileInputFormat, so the MR1 behavior may not be desirable.  In that case 
I think the second patch is more appropriate.  Thoughts?

> mapreduce.job.max.split.locations causes some splits created by 
> CombineFileInputFormat to fail
> ----------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5186
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5186
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: job submission
>    Affects Versions: 2.0.4-alpha, 2.2.0
>            Reporter: Sangjin Lee
>            Assignee: Robert Parker
>            Priority: Critical
>         Attachments: MAPREDUCE-5186v1.patch, MAPREDUCE-5186v2.patch
>
>
> CombineFileInputFormat can easily create splits that can come from many 
> different locations (during the last pass of creating "global" splits). 
> However, we observe that this often runs afoul of the 
> mapreduce.job.max.split.locations check that's done by JobSplitWriter.
> The default value for mapreduce.job.max.split.locations is 10, and with any 
> decent size cluster, CombineFileInputFormat creates splits that are well 
> above this limit.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to