[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit
[ https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540917#comment-13540917 ] Hudson commented on MAPREDUCE-4813: --- Integrated in Hadoop-Mapreduce-trunk #1299 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1299/]) MAPREDUCE-4813. AM timing out during job commit (jlowe via bobby) (Revision 1426536) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1426536 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/MapTaskAttemptImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/ReduceTaskAttemptImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterEvent.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventType.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterJobAbortEvent.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterJobCommitEvent.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterJobSetupEvent.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterTaskAbortEvent.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/package-info.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/JobStateInternal.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobAbortCompletedEvent.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobCommitCompletedEvent.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobCommitFailedEvent.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobEventType.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobSetupCompletedEvent.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobSetupFailedEvent.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/MapTaskImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/ReduceTaskImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-
[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit
[ https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540901#comment-13540901 ] Hudson commented on MAPREDUCE-4813: --- Integrated in Hadoop-Hdfs-trunk #1269 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1269/]) MAPREDUCE-4813. AM timing out during job commit (jlowe via bobby) (Revision 1426536) Result = FAILURE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1426536 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/MapTaskAttemptImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/ReduceTaskAttemptImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterEvent.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventType.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterJobAbortEvent.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterJobCommitEvent.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterJobSetupEvent.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterTaskAbortEvent.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/package-info.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/JobStateInternal.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobAbortCompletedEvent.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobCommitCompletedEvent.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobCommitFailedEvent.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobEventType.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobSetupCompletedEvent.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobSetupFailedEvent.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/MapTaskImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/ReduceTaskImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app
[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit
[ https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540892#comment-13540892 ] Hudson commented on MAPREDUCE-4813: --- Integrated in Hadoop-Hdfs-0.23-Build #478 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/478/]) MAPREDUCE-4813. AM timing out during job commit (jlowe via bobby) (Revision 1426540) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1426540 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/MapTaskAttemptImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/ReduceTaskAttemptImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterEvent.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventType.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterJobAbortEvent.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterJobCommitEvent.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterJobSetupEvent.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterTaskAbortEvent.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/package-info.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/JobStateInternal.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobAbortCompletedEvent.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobCommitCompletedEvent.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobCommitFailedEvent.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobEventType.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobSetupCompletedEvent.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobSetupFailedEvent.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/MapTaskImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/ReduceTaskImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapre
[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit
[ https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540226#comment-13540226 ] Hadoop QA commented on MAPREDUCE-4813: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12562531/MAPREDUCE-4813-2-branch-0.23.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3179//console This message is automatically generated. > AM timing out during job commit > --- > > Key: MAPREDUCE-4813 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4813 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 0.23.3, 2.0.1-alpha >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Attachments: JobImplStateMachine.pdf, > MAPREDUCE-4813-2-branch-0.23.patch, MAPREDUCE-4813-2.patch, > MAPREDUCE-4813-2.patch, MAPREDUCE-4813-2.patch, MAPREDUCE-4813-2.patch, > MAPREDUCE-4813.patch, MAPREDUCE-4813.patch, MAPREDUCE-4813.patch > > > The AM calls the output committer's {{commitJob}} method synchronously during > JobImpl state transitions, which means the JobImpl write lock is held the > entire time the job is being committed. Holding the write lock prevents the > RM allocator thread from heartbeating to the RM. Therefore if committing the > job takes too long (e.g.: the job has tons of files to commit and/or the > namenode is bogged down) then the AM appears to be unresponsive to the RM and > the RM kills the AM attempt. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit
[ https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540108#comment-13540108 ] Robert Joseph Evans commented on MAPREDUCE-4813: The new set of changes look good to me I am +1. > AM timing out during job commit > --- > > Key: MAPREDUCE-4813 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4813 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 0.23.3, 2.0.1-alpha >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Attachments: JobImplStateMachine.pdf, MAPREDUCE-4813-2.patch, > MAPREDUCE-4813-2.patch, MAPREDUCE-4813-2.patch, MAPREDUCE-4813-2.patch, > MAPREDUCE-4813.patch, MAPREDUCE-4813.patch, MAPREDUCE-4813.patch > > > The AM calls the output committer's {{commitJob}} method synchronously during > JobImpl state transitions, which means the JobImpl write lock is held the > entire time the job is being committed. Holding the write lock prevents the > RM allocator thread from heartbeating to the RM. Therefore if committing the > job takes too long (e.g.: the job has tons of files to commit and/or the > namenode is bogged down) then the AM appears to be unresponsive to the RM and > the RM kills the AM attempt. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit
[ https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540088#comment-13540088 ] Hadoop QA commented on MAPREDUCE-4813: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12562484/MAPREDUCE-4813-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3178//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3178//console This message is automatically generated. > AM timing out during job commit > --- > > Key: MAPREDUCE-4813 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4813 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 0.23.3, 2.0.1-alpha >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Attachments: JobImplStateMachine.pdf, MAPREDUCE-4813-2.patch, > MAPREDUCE-4813-2.patch, MAPREDUCE-4813-2.patch, MAPREDUCE-4813-2.patch, > MAPREDUCE-4813.patch, MAPREDUCE-4813.patch, MAPREDUCE-4813.patch > > > The AM calls the output committer's {{commitJob}} method synchronously during > JobImpl state transitions, which means the JobImpl write lock is held the > entire time the job is being committed. Holding the write lock prevents the > RM allocator thread from heartbeating to the RM. Therefore if committing the > job takes too long (e.g.: the job has tons of files to commit and/or the > namenode is bogged down) then the AM appears to be unresponsive to the RM and > the RM kills the AM attempt. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit
[ https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540066#comment-13540066 ] Hadoop QA commented on MAPREDUCE-4813: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12562484/MAPREDUCE-4813-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3177//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3177//console This message is automatically generated. > AM timing out during job commit > --- > > Key: MAPREDUCE-4813 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4813 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 0.23.3, 2.0.1-alpha >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Attachments: JobImplStateMachine.pdf, MAPREDUCE-4813-2.patch, > MAPREDUCE-4813-2.patch, MAPREDUCE-4813-2.patch, MAPREDUCE-4813-2.patch, > MAPREDUCE-4813.patch, MAPREDUCE-4813.patch, MAPREDUCE-4813.patch > > > The AM calls the output committer's {{commitJob}} method synchronously during > JobImpl state transitions, which means the JobImpl write lock is held the > entire time the job is being committed. Holding the write lock prevents the > RM allocator thread from heartbeating to the RM. Therefore if committing the > job takes too long (e.g.: the job has tons of files to commit and/or the > namenode is bogged down) then the AM appears to be unresponsive to the RM and > the RM kills the AM attempt. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit
[ https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13539976#comment-13539976 ] Robert Joseph Evans commented on MAPREDUCE-4813: I have a few minor comments about CommitterEventHandler and one more serious one as well. When checking for the timeout we call context.getClock().getTime() twice. {code} // wait up to configured timeout for commit thread to finish long timeoutTimestamp = context.getClock().getTime() + commitThreadCancelTimeoutMs; long now = context.getClock().getTime(); {code} I personally think it would be cleaner to call it once {code} // wait up to configured timeout for commit thread to finish long now = context.getClock().getTime(); long timeoutTimestamp = now + commitThreadCancelTimeoutMs; {code} Also I noticed some inconsistencies in the error handling of various functions. In some places to get a message for an event we call {code}StringUtils.stringifyException(e){code} but in others it is just call {code}e.getMessage(){code} I am not sure if there is a reason for this or not, but I would prefer it to be consistent. On a bit more serious note it looks like if we get any RuntimeException or Error while processing jobSetup, etc. the exception will be eaten. We are not catching it and threadpool will likely just eat it. This could result in a deadlock. I would suggest at a minimum we catch all Exceptions instead of just IOException. > AM timing out during job commit > --- > > Key: MAPREDUCE-4813 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4813 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 0.23.3, 2.0.1-alpha >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Attachments: JobImplStateMachine.pdf, MAPREDUCE-4813-2.patch, > MAPREDUCE-4813-2.patch, MAPREDUCE-4813-2.patch, MAPREDUCE-4813.patch, > MAPREDUCE-4813.patch, MAPREDUCE-4813.patch > > > The AM calls the output committer's {{commitJob}} method synchronously during > JobImpl state transitions, which means the JobImpl write lock is held the > entire time the job is being committed. Holding the write lock prevents the > RM allocator thread from heartbeating to the RM. Therefore if committing the > job takes too long (e.g.: the job has tons of files to commit and/or the > namenode is bogged down) then the AM appears to be unresponsive to the RM and > the RM kills the AM attempt. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit
[ https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13539974#comment-13539974 ] Robert Joseph Evans commented on MAPREDUCE-4813: I have gone through the state machine for JobImpl, and I think I have found a few issues. I am not done with my review, but I wanted to give you a heads up on them. It looks like it may be possible to get a JOB_START event after a JOB_KILL event. This would be a very rare race condition where someone is trying to kill their job right after starting it. I am not sure if it is even possible in practice, especially after this patch where the setup is removed from the critical path. So I would suggest that we file a minor JIRA to fix this at some point. Also it looks like we need to handle JOB_MAP_TASK_RESCHEDULED and JOB_TASK_ATTEMPT_COMPLETED in the FAIL_ABORT state. In the KILL_ABORT state we don't need to worry about them because the KILL_WAIT state should make sure all tasks have completed before going on, but there is no corresponding FAIL_WAIT state. > AM timing out during job commit > --- > > Key: MAPREDUCE-4813 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4813 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 0.23.3, 2.0.1-alpha >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Attachments: JobImplStateMachine.pdf, MAPREDUCE-4813-2.patch, > MAPREDUCE-4813-2.patch, MAPREDUCE-4813-2.patch, MAPREDUCE-4813.patch, > MAPREDUCE-4813.patch, MAPREDUCE-4813.patch > > > The AM calls the output committer's {{commitJob}} method synchronously during > JobImpl state transitions, which means the JobImpl write lock is held the > entire time the job is being committed. Holding the write lock prevents the > RM allocator thread from heartbeating to the RM. Therefore if committing the > job takes too long (e.g.: the job has tons of files to commit and/or the > namenode is bogged down) then the AM appears to be unresponsive to the RM and > the RM kills the AM attempt. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit
[ https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13539710#comment-13539710 ] Hadoop QA commented on MAPREDUCE-4813: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12562409/MAPREDUCE-4813-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3176//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3176//console This message is automatically generated. > AM timing out during job commit > --- > > Key: MAPREDUCE-4813 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4813 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 0.23.3, 2.0.1-alpha >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Attachments: JobImplStateMachine.pdf, MAPREDUCE-4813-2.patch, > MAPREDUCE-4813-2.patch, MAPREDUCE-4813-2.patch, MAPREDUCE-4813.patch, > MAPREDUCE-4813.patch, MAPREDUCE-4813.patch > > > The AM calls the output committer's {{commitJob}} method synchronously during > JobImpl state transitions, which means the JobImpl write lock is held the > entire time the job is being committed. Holding the write lock prevents the > RM allocator thread from heartbeating to the RM. Therefore if committing the > job takes too long (e.g.: the job has tons of files to commit and/or the > namenode is bogged down) then the AM appears to be unresponsive to the RM and > the RM kills the AM attempt. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit
[ https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13539698#comment-13539698 ] Robert Joseph Evans commented on MAPREDUCE-4813: A quick look at the patch looks OK to me. I need to dig into it in more detail. Also the patch no longer compiles on trunk. Could you please upmerge. The generated state transition charts make looking at the events a lot simpler, but that does not work if the code does not compile. > AM timing out during job commit > --- > > Key: MAPREDUCE-4813 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4813 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 0.23.3, 2.0.1-alpha >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Attachments: JobImplStateMachine.pdf, MAPREDUCE-4813-2.patch, > MAPREDUCE-4813-2.patch, MAPREDUCE-4813.patch, MAPREDUCE-4813.patch, > MAPREDUCE-4813.patch > > > The AM calls the output committer's {{commitJob}} method synchronously during > JobImpl state transitions, which means the JobImpl write lock is held the > entire time the job is being committed. Holding the write lock prevents the > RM allocator thread from heartbeating to the RM. Therefore if committing the > job takes too long (e.g.: the job has tons of files to commit and/or the > namenode is bogged down) then the AM appears to be unresponsive to the RM and > the RM kills the AM attempt. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit
[ https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13534119#comment-13534119 ] Hadoop QA commented on MAPREDUCE-4813: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12561317/MAPREDUCE-4813-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3128//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3128//console This message is automatically generated. > AM timing out during job commit > --- > > Key: MAPREDUCE-4813 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4813 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 0.23.3, 2.0.1-alpha >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Attachments: JobImplStateMachine.pdf, MAPREDUCE-4813-2.patch, > MAPREDUCE-4813-2.patch, MAPREDUCE-4813.patch, MAPREDUCE-4813.patch, > MAPREDUCE-4813.patch > > > The AM calls the output committer's {{commitJob}} method synchronously during > JobImpl state transitions, which means the JobImpl write lock is held the > entire time the job is being committed. Holding the write lock prevents the > RM allocator thread from heartbeating to the RM. Therefore if committing the > job takes too long (e.g.: the job has tons of files to commit and/or the > namenode is bogged down) then the AM appears to be unresponsive to the RM and > the RM kills the AM attempt. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit
[ https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13532815#comment-13532815 ] Hadoop QA commented on MAPREDUCE-4813: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12561060/MAPREDUCE-4813-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3125//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3125//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-app.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3125//console This message is automatically generated. > AM timing out during job commit > --- > > Key: MAPREDUCE-4813 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4813 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 0.23.3, 2.0.1-alpha >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Attachments: JobImplStateMachine.pdf, MAPREDUCE-4813-2.patch, > MAPREDUCE-4813.patch, MAPREDUCE-4813.patch, MAPREDUCE-4813.patch > > > The AM calls the output committer's {{commitJob}} method synchronously during > JobImpl state transitions, which means the JobImpl write lock is held the > entire time the job is being committed. Holding the write lock prevents the > RM allocator thread from heartbeating to the RM. Therefore if committing the > job takes too long (e.g.: the job has tons of files to commit and/or the > namenode is bogged down) then the AM appears to be unresponsive to the RM and > the RM kills the AM attempt. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit
[ https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507040#comment-13507040 ] Hadoop QA commented on MAPREDUCE-4813: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12555453/MAPREDUCE-4813.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3085//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3085//console This message is automatically generated. > AM timing out during job commit > --- > > Key: MAPREDUCE-4813 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4813 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 0.23.3, 2.0.1-alpha >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Attachments: MAPREDUCE-4813.patch, MAPREDUCE-4813.patch, > MAPREDUCE-4813.patch > > > The AM calls the output committer's {{commitJob}} method synchronously during > JobImpl state transitions, which means the JobImpl write lock is held the > entire time the job is being committed. Holding the write lock prevents the > RM allocator thread from heartbeating to the RM. Therefore if committing the > job takes too long (e.g.: the job has tons of files to commit and/or the > namenode is bogged down) then the AM appears to be unresponsive to the RM and > the RM kills the AM attempt. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit
[ https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13506328#comment-13506328 ] Vinod Kumar Vavilapalli commented on MAPREDUCE-4813: Some comments on the patch: - Similar to JobCommitFailedEvent, add an event class for JOB_COMMIT_COMPLETED. - JobImpl.checkJobCompleteSuccess() and corresponding return variables should be renamed to mean checkIfJobReadyForCommit(). Similary, checkJobForCompletion(job). - For now, we may be just be addressing MAPREDUCE-4815, but the same argument of committer being arbitrary user code is valid for other calls like abortJob, setupJob too. We will need states capturing those calls and put them on separate threads so that dispatches isn't blocked. We can do that later, but to be future-proof, let's move the committer-thread to a top-level service ala TaskCleaner. We may even re-purpose TaskCleanerImpl for this. Scope the effort and split it as you see fit. - Commit-thread interrupting and joining is only meaning-ful in the case of kill-during-commit. So let's move that code there. Also, earlier, we never supported kill-during-commit, but now we do and the patch is putting a 60second upper bound on commitJob() before abortJob(). Comparing this with 1.*, we do allow kill-during-commit as commit happens in a separate JVM. So interrupt and join seems fine, let's just put in a config so that we can tweak if ever there is a need. - The test looks good. Can you extend it to include kill-during-commit too. That will also validate that the dispatcher isn't blocked anymore because of long commit. > AM timing out during job commit > --- > > Key: MAPREDUCE-4813 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4813 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 0.23.3, 2.0.1-alpha >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Attachments: MAPREDUCE-4813.patch, MAPREDUCE-4813.patch > > > The AM calls the output committer's {{commitJob}} method synchronously during > JobImpl state transitions, which means the JobImpl write lock is held the > entire time the job is being committed. Holding the write lock prevents the > RM allocator thread from heartbeating to the RM. Therefore if committing the > job takes too long (e.g.: the job has tons of files to commit and/or the > namenode is bogged down) then the AM appears to be unresponsive to the RM and > the RM kills the AM attempt. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit
[ https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13506299#comment-13506299 ] Vinod Kumar Vavilapalli commented on MAPREDUCE-4813: bq. So I still think we need this Agreed, also we need a fix before MAPREDUCE-4815 is resolved. So let's get this in. Looking at the patch now. > AM timing out during job commit > --- > > Key: MAPREDUCE-4813 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4813 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 0.23.3, 2.0.1-alpha >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Attachments: MAPREDUCE-4813.patch, MAPREDUCE-4813.patch > > > The AM calls the output committer's {{commitJob}} method synchronously during > JobImpl state transitions, which means the JobImpl write lock is held the > entire time the job is being committed. Holding the write lock prevents the > RM allocator thread from heartbeating to the RM. Therefore if committing the > job takes too long (e.g.: the job has tons of files to commit and/or the > namenode is bogged down) then the AM appears to be unresponsive to the RM and > the RM kills the AM attempt. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit
[ https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504635#comment-13504635 ] Jason Lowe commented on MAPREDUCE-4813: --- MAPREDUCE-4815 only addresses FileOutputCommitter and friends, but the committer is arbitrary user code. It could be doing all sorts of things including connecting to databases, etc. So I still think we need this, although the priority of it is reduced given how many things are built from FileOutputCommitter. > AM timing out during job commit > --- > > Key: MAPREDUCE-4813 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4813 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 0.23.3, 2.0.1-alpha >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Attachments: MAPREDUCE-4813.patch > > > The AM calls the output committer's {{commitJob}} method synchronously during > JobImpl state transitions, which means the JobImpl write lock is held the > entire time the job is being committed. Holding the write lock prevents the > RM allocator thread from heartbeating to the RM. Therefore if committing the > job takes too long (e.g.: the job has tons of files to commit and/or the > namenode is bogged down) then the AM appears to be unresponsive to the RM and > the RM kills the AM attempt. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit
[ https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504304#comment-13504304 ] Vinod Kumar Vavilapalli commented on MAPREDUCE-4813: Started looking at the patch but realized a thing. When we fix MAPREDUCE-4815, commitJob won't be expensive anymore? We still need to make sure that a hung DFS move doesn't make the AM timeout, but I believe that is automatically handled via RPC timeouts for e.g. > AM timing out during job commit > --- > > Key: MAPREDUCE-4813 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4813 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 0.23.3, 2.0.1-alpha >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Attachments: MAPREDUCE-4813.patch > > > The AM calls the output committer's {{commitJob}} method synchronously during > JobImpl state transitions, which means the JobImpl write lock is held the > entire time the job is being committed. Holding the write lock prevents the > RM allocator thread from heartbeating to the RM. Therefore if committing the > job takes too long (e.g.: the job has tons of files to commit and/or the > namenode is bogged down) then the AM appears to be unresponsive to the RM and > the RM kills the AM attempt. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira