[jira] [Commented] (MAPREDUCE-5049) CombineFileInputFormat counts all compressed files non-splitable
[ https://issues.apache.org/jira/browse/MAPREDUCE-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594387#comment-13594387 ] Hadoop QA commented on MAPREDUCE-5049: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12572258/MAPREDUCE-5049.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3388//console This message is automatically generated. > CombineFileInputFormat counts all compressed files non-splitable > > > Key: MAPREDUCE-5049 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5049 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.1.1 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5049.patch > > > In branch-1, CombineFileInputFormat doesn't take SplittableCompressionCodec > into account and thinks that all compressible input files aren't splittable. > This is a regression from when handling for non-splitable compression codecs > was originally added in MAPREDUCE-1597, and seems to have somehow gotten in > when the code was pulled from 0.22 to branch-1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5049) CombineFileInputFormat counts all compressed files non-splitable
[ https://issues.apache.org/jira/browse/MAPREDUCE-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5049: -- Status: Patch Available (was: Open) > CombineFileInputFormat counts all compressed files non-splitable > > > Key: MAPREDUCE-5049 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5049 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.1.1 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5049.patch > > > In branch-1, CombineFileInputFormat doesn't take SplittableCompressionCodec > into account and thinks that all compressible input files aren't splittable. > This is a regression from when handling for non-splitable compression codecs > was originally added in MAPREDUCE-1597, and seems to have somehow gotten in > when the code was pulled from 0.22 to branch-1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5049) CombineFileInputFormat counts all compressed files non-splitable
[ https://issues.apache.org/jira/browse/MAPREDUCE-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5049: -- Attachment: MAPREDUCE-5049.patch > CombineFileInputFormat counts all compressed files non-splitable > > > Key: MAPREDUCE-5049 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5049 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.1.1 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5049.patch > > > In branch-1, CombineFileInputFormat doesn't take SplittableCompressionCodec > into account and thinks that all compressible input files aren't splittable. > This is a regression from when handling for non-splitable compression codecs > was originally added in MAPREDUCE-1597, and seems to have somehow gotten in > when the code was pulled from 0.22 to branch-1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594334#comment-13594334 ] Hadoop QA commented on MAPREDUCE-5038: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12572246/MAPREDUCE-5038-1.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3387//console This message is automatically generated. > old API CombineFileInputFormat missing fixes that are in new API > - > > Key: MAPREDUCE-5038 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.1.1 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5038-1.patch, MAPREDUCE-5038.patch > > > The following changes patched the CombineFileInputFormat in mapreduce, but > neglected the one in mapred > MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files > MAPREDUCE-2021 solved returning duplicate hostnames in split locations > MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default > FS > In trunk this is not an issue as the one in mapred extends the one in > mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5038: -- Status: Patch Available (was: Open) > old API CombineFileInputFormat missing fixes that are in new API > - > > Key: MAPREDUCE-5038 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.1.1 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5038-1.patch, MAPREDUCE-5038.patch > > > The following changes patched the CombineFileInputFormat in mapreduce, but > neglected the one in mapred > MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files > MAPREDUCE-2021 solved returning duplicate hostnames in split locations > MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default > FS > In trunk this is not an issue as the one in mapred extends the one in > mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594329#comment-13594329 ] Sandy Ryza commented on MAPREDUCE-5038: --- Filed MAPREDUCE-5049 to handle SplittableCompressionCodec. Uploaded a new patch that includes MAPREDUCE-1423. > old API CombineFileInputFormat missing fixes that are in new API > - > > Key: MAPREDUCE-5038 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.1.1 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5038-1.patch, MAPREDUCE-5038.patch > > > The following changes patched the CombineFileInputFormat in mapreduce, but > neglected the one in mapred > MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files > MAPREDUCE-2021 solved returning duplicate hostnames in split locations > MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default > FS > In trunk this is not an issue as the one in mapred extends the one in > mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5038: -- Attachment: MAPREDUCE-5038-1.patch > old API CombineFileInputFormat missing fixes that are in new API > - > > Key: MAPREDUCE-5038 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.1.1 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5038-1.patch, MAPREDUCE-5038.patch > > > The following changes patched the CombineFileInputFormat in mapreduce, but > neglected the one in mapred > MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files > MAPREDUCE-2021 solved returning duplicate hostnames in split locations > MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default > FS > In trunk this is not an issue as the one in mapred extends the one in > mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5049) CombineFileInputFormat counts all compressed files non-splitable
Sandy Ryza created MAPREDUCE-5049: - Summary: CombineFileInputFormat counts all compressed files non-splitable Key: MAPREDUCE-5049 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5049 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza In branch-1, CombineFileInputFormat doesn't take SplittableCompressionCodec into account and thinks that all compressible input files aren't splittable. This is a regression from when handling for non-splitable compression codecs was originally added in MAPREDUCE-1597, and seems to have somehow gotten in when the code was pulled from 0.22 to branch-1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5047) keep.failed.task.files=true causes job failure on secure clusters
[ https://issues.apache.org/jira/browse/MAPREDUCE-5047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594253#comment-13594253 ] Hadoop QA commented on MAPREDUCE-5047: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12572230/MAPREDUCE-5047.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3386//console This message is automatically generated. > keep.failed.task.files=true causes job failure on secure clusters > - > > Key: MAPREDUCE-5047 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5047 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task, tasktracker >Affects Versions: 1.1.1 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5047.patch > > > To support IsolationRunner, split info is written to local directories. This > occurs inside MapTask#localizeConfiguration, which is called both tasktracker > and by the child JVM. On a secure cluster, the tasktacker's attempt to write > it fails, because the tasktracker does not have permission to write to the > user's directory. It is likely that the call to localizeConfiguration in the > tasktracker can be removed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5047) keep.failed.task.files=true causes job failure on secure clusters
[ https://issues.apache.org/jira/browse/MAPREDUCE-5047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594251#comment-13594251 ] Sandy Ryza commented on MAPREDUCE-5047: --- localizeConfiguration is needed in the tasktracker in order to set task-specific configuration options, but split.info does not need to be created at that time. The patch moves the action of writing out split.info into a new writeFilesRequiredForRerun method. This method is called by the Child, but not by the tasktracker. Tested on a pseudo distributed cluster and on a secure distributed cluster that the permissions error no longer shows up and that split.info is still written out to the correct location. > keep.failed.task.files=true causes job failure on secure clusters > - > > Key: MAPREDUCE-5047 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5047 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task, tasktracker >Affects Versions: 1.1.1 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5047.patch > > > To support IsolationRunner, split info is written to local directories. This > occurs inside MapTask#localizeConfiguration, which is called both tasktracker > and by the child JVM. On a secure cluster, the tasktacker's attempt to write > it fails, because the tasktracker does not have permission to write to the > user's directory. It is likely that the call to localizeConfiguration in the > tasktracker can be removed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5047) keep.failed.task.files=true causes job failure on secure clusters
[ https://issues.apache.org/jira/browse/MAPREDUCE-5047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5047: -- Status: Patch Available (was: Open) > keep.failed.task.files=true causes job failure on secure clusters > - > > Key: MAPREDUCE-5047 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5047 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task, tasktracker >Affects Versions: 1.1.1 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5047.patch > > > To support IsolationRunner, split info is written to local directories. This > occurs inside MapTask#localizeConfiguration, which is called both tasktracker > and by the child JVM. On a secure cluster, the tasktacker's attempt to write > it fails, because the tasktracker does not have permission to write to the > user's directory. It is likely that the call to localizeConfiguration in the > tasktracker can be removed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5047) keep.failed.task.files=true causes job failure on secure clusters
[ https://issues.apache.org/jira/browse/MAPREDUCE-5047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5047: -- Attachment: MAPREDUCE-5047.patch > keep.failed.task.files=true causes job failure on secure clusters > - > > Key: MAPREDUCE-5047 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5047 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task, tasktracker >Affects Versions: 1.1.1 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5047.patch > > > To support IsolationRunner, split info is written to local directories. This > occurs inside MapTask#localizeConfiguration, which is called both tasktracker > and by the child JVM. On a secure cluster, the tasktacker's attempt to write > it fails, because the tasktracker does not have permission to write to the > user's directory. It is likely that the call to localizeConfiguration in the > tasktracker can be removed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4842) Shuffle race can hang reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594248#comment-13594248 ] Mariappan Asokan commented on MAPREDUCE-4842: - Hi Ravi, Thanks for the compliment. I will look at the patch for MAPREDUCE-3685 and post my comments there once I understand it completely. -- Asokan > Shuffle race can hang reducer > - > > Key: MAPREDUCE-4842 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.0.2-alpha, 0.23.5 >Reporter: Jason Lowe >Assignee: Mariappan Asokan >Priority: Blocker > Fix For: 2.0.3-alpha, 0.23.6 > > Attachments: MAPREDUCE-4842-2.patch, mapreduce-4842.patch, > mapreduce-4842.patch, mapreduce-4842.patch, mapreduce-4842.patch, > mapreduce-4842.patch, mapreduce-4842.patch, MAPREDUCE-4842.patch, > MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch > > > Saw an instance where the shuffle caused multiple reducers in a job to hang. > It looked similar to the problem described in MAPREDUCE-3721, where the > fetchers were all being told to WAIT by the MergeManager but no merge was > taking place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5042) Reducer unable to fetch for a map task that was recovered
[ https://issues.apache.org/jira/browse/MAPREDUCE-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5042: -- Attachment: MAPREDUCE-5042.patch This is complicated by the fact that the job token currently serves a dual-role to authenticate both the shuffle *and* the task umbilical. The former is something that should persist across app attempts, while the latter should not. We don't want old task attempts authenticating with the new app attempt, at least not at this point. It would only serve to confuse the new app attempt. Therefore I propose the following: * The current job token remains primarily as-is for the authenticating of the task umbilical, and each AM attempt continues to generate its own job token. * A new secret key, the shuffle secret, will be generated by the job client when the job is submitted as part of the job's credentials. Each app attempt will extract the shuffle secret from the job's credentials and use it as the shared secret to authenticate the shuffle Attaching the first draft of a patch that implements that proposal. It needs unit tests, but I've manually tested that it can recover map tasks and successfully shuffle their data. > Reducer unable to fetch for a map task that was recovered > - > > Key: MAPREDUCE-5042 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5042 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am, security >Affects Versions: 0.23.7, 2.0.4-beta >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Blocker > Attachments: MAPREDUCE-5042.patch > > > If an application attempt fails and is relaunched the AM will try to recover > previously completed tasks. If a reducer needs to fetch the output of a map > task attempt that was recovered then it will fail with a 401 error like this: > {noformat} > java.io.IOException: Server returned HTTP response code: 401 for URL: > http://xx:xx/mapOutput?job=job_1361569180491_21845&reduce=0&map=attempt_1361569180491_21845_m_16_0 > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1615) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:231) > at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:156) > {noformat} > Looking at the corresponding NM's logs, we see the shuffle failed due to > "Verification of the hashReply failed". -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-5042) Reducer unable to fetch for a map task that was recovered
[ https://issues.apache.org/jira/browse/MAPREDUCE-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe reassigned MAPREDUCE-5042: - Assignee: Jason Lowe > Reducer unable to fetch for a map task that was recovered > - > > Key: MAPREDUCE-5042 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5042 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am, security >Affects Versions: 0.23.7, 2.0.4-beta >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Blocker > > If an application attempt fails and is relaunched the AM will try to recover > previously completed tasks. If a reducer needs to fetch the output of a map > task attempt that was recovered then it will fail with a 401 error like this: > {noformat} > java.io.IOException: Server returned HTTP response code: 401 for URL: > http://xx:xx/mapOutput?job=job_1361569180491_21845&reduce=0&map=attempt_1361569180491_21845_m_16_0 > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1615) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:231) > at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:156) > {noformat} > Looking at the corresponding NM's logs, we see the shuffle failed due to > "Verification of the hashReply failed". -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value
[ https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594174#comment-13594174 ] Alejandro Abdelnur commented on MAPREDUCE-5028: --- Thanks Karthik. I've committed the patch for branch-1. Thanks Chris for reviewing it. +1 for the trunk patch. I'll wait for a bit to see if there are comments for others before committing it. > Maps fail when io.sort.mb is set to high value > -- > > Key: MAPREDUCE-5028 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Critical > Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, > mr-5028-branch1.patch, mr-5028-trunk.patch > > > Verified the problem exists on branch-1 with the following configuration: > Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, > io.sort.mb=1280, dfs.block.size=2147483648 > Run teragen to generate 4 GB data > Maps fail when you run wordcount on this configuration with the following > error: > {noformat} > java.io.IOException: Spill failed > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031) > at > org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692) > at > org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) > at > org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45) > at > org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > Caused by: java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38) > at > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67) > at > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40) > at > org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116) > at > org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92) > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175) > at > org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections
[ https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594141#comment-13594141 ] Hudson commented on MAPREDUCE-5027: --- Integrated in Hadoop-trunk-Commit #3421 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3421/]) MAPREDUCE-5027. Shuffle does not limit number of outstanding connections (Robert Parker via jeagles) (Revision 1453098) Result = SUCCESS jeagles : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1453098 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java > Shuffle does not limit number of outstanding connections > > > Key: MAPREDUCE-5027 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.0.3-alpha, 0.23.5 >Reporter: Jason Lowe >Assignee: Robert Parker > Fix For: 3.0.0, 0.23.7, 2.0.4-beta > > Attachments: MAPREDUCE-5027-2.patch, MAPREDUCE-5027-3.patch, > MAPREDUCE-5027-4.patch, MAPREDUCE-5027-b023-2.patch, > MAPREDUCE-5027-b023.patch, MAPREDUCE-5027.patch, MAPREDUCE-5027.patch > > > The ShuffleHandler does not have any configurable limits to the number of > outstanding connections allowed. Therefore a node with many map outputs and > many reducers in the cluster trying to fetch those outputs can exhaust a > nodemanager out of file descriptors. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections
[ https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated MAPREDUCE-5027: --- Resolution: Fixed Fix Version/s: 2.0.4-beta 0.23.7 3.0.0 Status: Resolved (was: Patch Available) > Shuffle does not limit number of outstanding connections > > > Key: MAPREDUCE-5027 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.0.3-alpha, 0.23.5 >Reporter: Jason Lowe >Assignee: Robert Parker > Fix For: 3.0.0, 0.23.7, 2.0.4-beta > > Attachments: MAPREDUCE-5027-2.patch, MAPREDUCE-5027-3.patch, > MAPREDUCE-5027-4.patch, MAPREDUCE-5027-b023-2.patch, > MAPREDUCE-5027-b023.patch, MAPREDUCE-5027.patch, MAPREDUCE-5027.patch > > > The ShuffleHandler does not have any configurable limits to the number of > outstanding connections allowed. Therefore a node with many map outputs and > many reducers in the cluster trying to fetch those outputs can exhaust a > nodemanager out of file descriptors. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3688) Need better Error message if AM is killed/throws exception
[ https://issues.apache.org/jira/browse/MAPREDUCE-3688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated MAPREDUCE-3688: Attachment: mapreduce-3688-h0.23-v01.patch This has been a pain for our users as well. I don't think this patch will fly well with the reviewers, but maybe it'll help move the discussion forward. I didn't see a good way of communicating the error message to the caller so decided to sacrifice the stdout that current MRAppMaster does not use. After the patch, webUI would show {quote} Diagnostics: Application application_1362527487477_0005 failed 1 times due to AM Container for appattempt_1362527487477_0005_01 exited with exitCode: 1 due to: Error starting MRAppMaster: org.apache.hadoop.yarn.YarnException: java.io.IOException: Split metadata size exceeded 20. Aborting job job_1362527487477_0005 at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1290) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1146) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1118) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:382) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:299) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:823) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:121) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1094) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.start(MRAppMaster.java:998) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1273) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1221) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1269) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1226) Caused by: java.io.IOException: Split metadata size exceeded 20. Aborting job job_1362527487477_0005 at org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:53) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1285) ... 16 more .Failing this attempt.. Failing the application. {quote} (This patch is based on 0.23) > Need better Error message if AM is killed/throws exception > -- > > Key: MAPREDUCE-3688 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3688 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am, mrv2 >Affects Versions: 0.23.1 >Reporter: David Capwell >Assignee: Sandy Ryza > Fix For: 0.23.2 > > Attachments: mapreduce-3688-h0.23-v01.patch > > > We need better error messages in the UI if the AM gets killed or throws an > Exception. > If the following error gets thrown: > java.lang.NumberFormatException: For input string: "9223372036854775807l" // > last char is an L > then the UI should say this exception. Instead I get the following: > Application application_1326504761991_0018 failed 1 times due to AM Container > for appattempt_1326504761991_0018_01 > exited with exitCode: 1 due to: Exception from container-launch: > org.apache.hadoop.util.Shell$ExitCodeException -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections
[ https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594127#comment-13594127 ] Jonathan Eagles commented on MAPREDUCE-5027: +1. This patch looks good. Thanks, Rob. > Shuffle does not limit number of outstanding connections > > > Key: MAPREDUCE-5027 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.0.3-alpha, 0.23.5 >Reporter: Jason Lowe >Assignee: Robert Parker > Attachments: MAPREDUCE-5027-2.patch, MAPREDUCE-5027-3.patch, > MAPREDUCE-5027-4.patch, MAPREDUCE-5027-b023-2.patch, > MAPREDUCE-5027-b023.patch, MAPREDUCE-5027.patch, MAPREDUCE-5027.patch > > > The ShuffleHandler does not have any configurable limits to the number of > outstanding connections allowed. Therefore a node with many map outputs and > many reducers in the cluster trying to fetch those outputs can exhaust a > nodemanager out of file descriptors. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3685) There are some bugs in implementation of MergeManager
[ https://issues.apache.org/jira/browse/MAPREDUCE-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594124#comment-13594124 ] Ravi Prakash commented on MAPREDUCE-3685: - This is probably more for just my reference than anything. Here's my understanding from reading the code. This is very approximate and may be inaccurate for some cases IntermediateMemoryToMemoryMerger - Can be toggled on / off - Merges map outputs *from* memory *to* memory - When is it triggered? (If at all enabled, which it isn't by default) When the number of in memory map outputs > memToMemMergeOutputsThreshold I am guessing this was put in on the premise that it might be faster to sort fewer number of streams even in memory. And also we can sort perhaps while waiting to fetch. InMemoryMerger - Merges map outputs *from* memory *to* disk - When is it triggered? When storing more map outputs in memory would cause to go over memory allocated for shuffle. > There are some bugs in implementation of MergeManager > - > > Key: MAPREDUCE-3685 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3685 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 0.23.1 >Reporter: anty.rao >Assignee: anty >Priority: Critical > Attachments: MAPREDUCE-3685-branch-0.23.1.patch, > MAPREDUCE-3685-branch-0.23.1.patch, MAPREDUCE-3685-branch-0.23.1.patch, > MAPREDUCE-3685.branch-0.23.patch, MAPREDUCE-3685.branch-0.23.patch, > MAPREDUCE-3685.branch-0.23.patch, MAPREDUCE-3685.branch-0.23.patch, > MAPREDUCE-3685.branch-0.23.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch, > MAPREDUCE-3685.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch, > MAPREDUCE-3685.patch, MAPREDUCE-3685.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4842) Shuffle race can hang reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594110#comment-13594110 ] Ravi Prakash commented on MAPREDUCE-4842: - Hi Mariappan, bq. This is a tangent to point 1. The mergeFactor is set to the configured value for IntermediateMemoryToMemoryMerger but to Integer.MAX_VALUE for InMemoryMerger and OnDiskMerger. We have to find out the rationale behind these choices. Thanks for all your work on the MergeManager. It is soo much cleaner now! Thanks much. Anyway, since you have been in this area of the code, I was wondering if you could please review MAPREDUCE-3685? The mergeFactor for the OnDiskMerger was wrong. For inMemoryMerger it seems to be correct (because io.sort.factor is defined as "The number of streams to merge at once while sorting files. This determines the number of open file handles."). Besides I wonder if we want to really go into the level of detail of the number of fetched cache lines and not just simplify by assuming constant access to all memory. Please consider continuing the discussion there. Thanks > Shuffle race can hang reducer > - > > Key: MAPREDUCE-4842 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.0.2-alpha, 0.23.5 >Reporter: Jason Lowe >Assignee: Mariappan Asokan >Priority: Blocker > Fix For: 2.0.3-alpha, 0.23.6 > > Attachments: MAPREDUCE-4842-2.patch, mapreduce-4842.patch, > mapreduce-4842.patch, mapreduce-4842.patch, mapreduce-4842.patch, > mapreduce-4842.patch, mapreduce-4842.patch, MAPREDUCE-4842.patch, > MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch > > > Saw an instance where the shuffle caused multiple reducers in a job to hang. > It looked similar to the problem described in MAPREDUCE-3721, where the > fetchers were all being told to WAIT by the MergeManager but no merge was > taking place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5048) streaming combiner feature breaks when input binary, output text
Antonio Piccolboni created MAPREDUCE-5048: - Summary: streaming combiner feature breaks when input binary, output text Key: MAPREDUCE-5048 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5048 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Affects Versions: 1.0.2 Environment: centos 6.2 Reporter: Antonio Piccolboni When running hadoop streaming job with binary input and shuffling but text output with combiner on, it fails with error java.lang.RuntimeException: java.io.IOException: wrong key class: class org.apache.hadoop.io.Text is not class org.apache.hadoop.typedbytes.TypedBytesWritable repro: hadoop jar -D 'stream.map.input=typedbytes' -D 'stream.map.output=typedbytes' -D 'stream.reduce.input=typedbytes' -input -output -mappercat -combiner cat -reducer cat -inputformat 'org.apache.hadoop.streaming.AutoInputFormat' if you remove the -combiner option, it works with only performance implications. If you specify in addition -D 'stream.reduce.output=typedbytes', it succeeds but outputs raw typedbytes (without the sequence file superstructure) I asked in the discussion of HADOOP-1722 (where typedbytes was first introduced) if this is a bug or my misunderstanding of that spec and a committer chipped in saying it seems a bug to him too. Originally reported by a user of the rmr2 package for R and filed by me here https://github.com/RevolutionAnalytics/rmr2/issues/16 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594002#comment-13594002 ] Sangjin Lee commented on MAPREDUCE-5038: I filed MAPREDUCE-5046 to backport MAPREDUCE-1423, then found this. I took a look at the patch here, but I'm not sure if it subsumes the changes contained in MAPREDUCE-1423. Specifically, rackToNodes seems still static, which is a thread-safety problem. Could you absorb the fix that's in MAPREDUCE-1423? I'd be happy to look at that if you want. > old API CombineFileInputFormat missing fixes that are in new API > - > > Key: MAPREDUCE-5038 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.1.1 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5038.patch > > > The following changes patched the CombineFileInputFormat in mapreduce, but > neglected the one in mapred > MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files > MAPREDUCE-2021 solved returning duplicate hostnames in split locations > MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default > FS > In trunk this is not an issue as the one in mapred extends the one in > mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections
[ https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13593919#comment-13593919 ] Hadoop QA commented on MAPREDUCE-5027: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12572165/MAPREDUCE-5027-4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 tests included appear to have a timeout.{color} {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3385//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3385//console This message is automatically generated. > Shuffle does not limit number of outstanding connections > > > Key: MAPREDUCE-5027 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.0.3-alpha, 0.23.5 >Reporter: Jason Lowe >Assignee: Robert Parker > Attachments: MAPREDUCE-5027-2.patch, MAPREDUCE-5027-3.patch, > MAPREDUCE-5027-4.patch, MAPREDUCE-5027-b023-2.patch, > MAPREDUCE-5027-b023.patch, MAPREDUCE-5027.patch, MAPREDUCE-5027.patch > > > The ShuffleHandler does not have any configurable limits to the number of > outstanding connections allowed. Therefore a node with many map outputs and > many reducers in the cluster trying to fetch those outputs can exhaust a > nodemanager out of file descriptors. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5047) keep.failed.task.files=true causes job failure on secure clusters
Sandy Ryza created MAPREDUCE-5047: - Summary: keep.failed.task.files=true causes job failure on secure clusters Key: MAPREDUCE-5047 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5047 Project: Hadoop Map/Reduce Issue Type: Bug Components: task, tasktracker Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza To support IsolationRunner, split info is written to local directories. This occurs inside MapTask#localizeConfiguration, which is called both tasktracker and by the child JVM. On a secure cluster, the tasktacker's attempt to write it fails, because the tasktracker does not have permission to write to the user's directory. It is likely that the call to localizeConfiguration in the tasktracker can be removed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections
[ https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Parker updated MAPREDUCE-5027: - Attachment: (was: MAPREDUCE-5027-4.patch) > Shuffle does not limit number of outstanding connections > > > Key: MAPREDUCE-5027 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.0.3-alpha, 0.23.5 >Reporter: Jason Lowe >Assignee: Robert Parker > Attachments: MAPREDUCE-5027-2.patch, MAPREDUCE-5027-3.patch, > MAPREDUCE-5027-4.patch, MAPREDUCE-5027-b023-2.patch, > MAPREDUCE-5027-b023.patch, MAPREDUCE-5027.patch, MAPREDUCE-5027.patch > > > The ShuffleHandler does not have any configurable limits to the number of > outstanding connections allowed. Therefore a node with many map outputs and > many reducers in the cluster trying to fetch those outputs can exhaust a > nodemanager out of file descriptors. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections
[ https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Parker updated MAPREDUCE-5027: - Attachment: MAPREDUCE-5027-4.patch > Shuffle does not limit number of outstanding connections > > > Key: MAPREDUCE-5027 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.0.3-alpha, 0.23.5 >Reporter: Jason Lowe >Assignee: Robert Parker > Attachments: MAPREDUCE-5027-2.patch, MAPREDUCE-5027-3.patch, > MAPREDUCE-5027-4.patch, MAPREDUCE-5027-b023-2.patch, > MAPREDUCE-5027-b023.patch, MAPREDUCE-5027.patch, MAPREDUCE-5027.patch > > > The ShuffleHandler does not have any configurable limits to the number of > outstanding connections allowed. Therefore a node with many map outputs and > many reducers in the cluster trying to fetch those outputs can exhaust a > nodemanager out of file descriptors. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections
[ https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13593886#comment-13593886 ] Hadoop QA commented on MAPREDUCE-5027: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12572161/MAPREDUCE-5027-b023-2.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3384//console This message is automatically generated. > Shuffle does not limit number of outstanding connections > > > Key: MAPREDUCE-5027 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.0.3-alpha, 0.23.5 >Reporter: Jason Lowe >Assignee: Robert Parker > Attachments: MAPREDUCE-5027-2.patch, MAPREDUCE-5027-3.patch, > MAPREDUCE-5027-4.patch, MAPREDUCE-5027-b023-2.patch, > MAPREDUCE-5027-b023.patch, MAPREDUCE-5027.patch, MAPREDUCE-5027.patch > > > The ShuffleHandler does not have any configurable limits to the number of > outstanding connections allowed. Therefore a node with many map outputs and > many reducers in the cluster trying to fetch those outputs can exhaust a > nodemanager out of file descriptors. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections
[ https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Parker updated MAPREDUCE-5027: - Attachment: MAPREDUCE-5027-b023-2.patch MAPREDUCE-5027-4.patch > Shuffle does not limit number of outstanding connections > > > Key: MAPREDUCE-5027 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.0.3-alpha, 0.23.5 >Reporter: Jason Lowe >Assignee: Robert Parker > Attachments: MAPREDUCE-5027-2.patch, MAPREDUCE-5027-3.patch, > MAPREDUCE-5027-4.patch, MAPREDUCE-5027-b023-2.patch, > MAPREDUCE-5027-b023.patch, MAPREDUCE-5027.patch, MAPREDUCE-5027.patch > > > The ShuffleHandler does not have any configurable limits to the number of > outstanding connections allowed. Therefore a node with many map outputs and > many reducers in the cluster trying to fetch those outputs can exhaust a > nodemanager out of file descriptors. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections
[ https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13593881#comment-13593881 ] Robert Parker commented on MAPREDUCE-5027: -- Jon, I have uploaded a new patch for trunk and branch 0.23, I have eliminated the timing issues in the test. > Shuffle does not limit number of outstanding connections > > > Key: MAPREDUCE-5027 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.0.3-alpha, 0.23.5 >Reporter: Jason Lowe >Assignee: Robert Parker > Attachments: MAPREDUCE-5027-2.patch, MAPREDUCE-5027-3.patch, > MAPREDUCE-5027-4.patch, MAPREDUCE-5027-b023-2.patch, > MAPREDUCE-5027-b023.patch, MAPREDUCE-5027.patch, MAPREDUCE-5027.patch > > > The ShuffleHandler does not have any configurable limits to the number of > outstanding connections allowed. Therefore a node with many map outputs and > many reducers in the cluster trying to fetch those outputs can exhaust a > nodemanager out of file descriptors. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5046) backport MAPREDUCE-1423 to mapred.lib.CombineFileInputFormat
Sangjin Lee created MAPREDUCE-5046: -- Summary: backport MAPREDUCE-1423 to mapred.lib.CombineFileInputFormat Key: MAPREDUCE-5046 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5046 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 1.1.1 Reporter: Sangjin Lee The CombineFileInputFormat class in org.apache.hadoop.mapred.lib (the old API) has a couple of issues. These issues were addressed in the new API (MAPREDUCE-1423), but the old class was not fixed. The main issue the JIRA refers to is a performance problem. However, IMO there is a more serious problem which is a thread-safety issue (rackToNodes) which was fixed alongside. What is the policy on addressing issues in the old API? Can we backport this to the old class? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5045) UtilTest#isCygwin method appears to be unused
Chris Nauroth created MAPREDUCE-5045: Summary: UtilTest#isCygwin method appears to be unused Key: MAPREDUCE-5045 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5045 Project: Hadoop Map/Reduce Issue Type: Test Components: contrib/streaming, test Affects Versions: 3.0.0 Reporter: Chris Nauroth Priority: Trivial Method {{UtilTest#isCygwin}} in /hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/UtilTest.java appears to be unused. If so, then we need to remove it. If anything is calling it, then we need to update the naming to isWindows, or perhaps just change call sites to use {{Shell#WINDOWS}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5043) Fetch failure processing can cause AM event queue to backup and eventually OOM
[ https://issues.apache.org/jira/browse/MAPREDUCE-5043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13593384#comment-13593384 ] Hudson commented on MAPREDUCE-5043: --- Integrated in Hadoop-Hdfs-trunk #1335 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1335/]) MAPREDUCE-5043. Fetch failure processing can cause AM event queue to backup and eventually OOM (Jason Lowe via bobby) (Revision 1452372) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1452372 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/TaskAttempt.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MockJobs.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRuntimeEstimators.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/CompletedTaskAttempt.java > Fetch failure processing can cause AM event queue to backup and eventually OOM > -- > > Key: MAPREDUCE-5043 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5043 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 0.23.7, 2.0.4-beta >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Blocker > Fix For: 3.0.0, 0.23.7, 2.0.4-beta > > Attachments: MAPREDUCE-5043.patch > > > Saw an MRAppMaster with a 3G heap OOM. Upon investigating another instance > of it running, we saw the UI in a weird state where the task table and task > attempt tables in the job overview page weren't consistent. The AM log > showed the AsyncDispatcher had hundreds of thousands of events in the event > queue, and jstacks showed it spending a lot of time in fetch failure > processing. It turns out fetch failure processing is currently *very* > expensive, with a triple {{for}} loop where the inner loop is calling the > quite-expensive {{TaskAttempt.getReport}}. That function ends up > type-converting the entire task report, counters and all, and performing > locale conversions among other things. It does this for every reduce task in > the job, for every map task that failed. And when it's done building up the > large task report, it pulls out one field, the phase, then throws the report > away. > While the AM is busy processing fetch failures, tasks attempts are continuing > to send events to the AM including memory-expensive events like status > updates which include the counters. These back up in the AsyncDispatcher > event queue and eventually even an AM with a large heap size will run out of > memory and crash or expire because it thrashes in garbage collect. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5043) Fetch failure processing can cause AM event queue to backup and eventually OOM
[ https://issues.apache.org/jira/browse/MAPREDUCE-5043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13593307#comment-13593307 ] Hudson commented on MAPREDUCE-5043: --- Integrated in Hadoop-Yarn-trunk #146 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/146/]) MAPREDUCE-5043. Fetch failure processing can cause AM event queue to backup and eventually OOM (Jason Lowe via bobby) (Revision 1452372) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1452372 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/TaskAttempt.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MockJobs.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRuntimeEstimators.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/CompletedTaskAttempt.java > Fetch failure processing can cause AM event queue to backup and eventually OOM > -- > > Key: MAPREDUCE-5043 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5043 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 0.23.7, 2.0.4-beta >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Blocker > Fix For: 3.0.0, 0.23.7, 2.0.4-beta > > Attachments: MAPREDUCE-5043.patch > > > Saw an MRAppMaster with a 3G heap OOM. Upon investigating another instance > of it running, we saw the UI in a weird state where the task table and task > attempt tables in the job overview page weren't consistent. The AM log > showed the AsyncDispatcher had hundreds of thousands of events in the event > queue, and jstacks showed it spending a lot of time in fetch failure > processing. It turns out fetch failure processing is currently *very* > expensive, with a triple {{for}} loop where the inner loop is calling the > quite-expensive {{TaskAttempt.getReport}}. That function ends up > type-converting the entire task report, counters and all, and performing > locale conversions among other things. It does this for every reduce task in > the job, for every map task that failed. And when it's done building up the > large task report, it pulls out one field, the phase, then throws the report > away. > While the AM is busy processing fetch failures, tasks attempts are continuing > to send events to the AM including memory-expensive events like status > updates which include the counters. These back up in the AsyncDispatcher > event queue and eventually even an AM with a large heap size will run out of > memory and crash or expire because it thrashes in garbage collect. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira