[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit

2012-12-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540917#comment-13540917
 ] 

Hudson commented on MAPREDUCE-4813:
---

Integrated in Hadoop-Mapreduce-trunk #1299 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1299/])
MAPREDUCE-4813. AM timing out during job commit (jlowe via bobby) (Revision 
1426536)

 Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1426536
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/MapTaskAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/ReduceTaskAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterEvent.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventType.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterJobAbortEvent.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterJobCommitEvent.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterJobSetupEvent.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterTaskAbortEvent.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/package-info.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/JobStateInternal.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobAbortCompletedEvent.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobCommitCompletedEvent.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobCommitFailedEvent.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobEventType.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobSetupCompletedEvent.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobSetupFailedEvent.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/MapTaskImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/ReduceTaskImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-

[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit

2012-12-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540901#comment-13540901
 ] 

Hudson commented on MAPREDUCE-4813:
---

Integrated in Hadoop-Hdfs-trunk #1269 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1269/])
MAPREDUCE-4813. AM timing out during job commit (jlowe via bobby) (Revision 
1426536)

 Result = FAILURE
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1426536
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/MapTaskAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/ReduceTaskAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterEvent.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventType.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterJobAbortEvent.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterJobCommitEvent.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterJobSetupEvent.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterTaskAbortEvent.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/package-info.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/JobStateInternal.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobAbortCompletedEvent.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobCommitCompletedEvent.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobCommitFailedEvent.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobEventType.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobSetupCompletedEvent.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobSetupFailedEvent.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/MapTaskImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/ReduceTaskImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app

[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit

2012-12-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540892#comment-13540892
 ] 

Hudson commented on MAPREDUCE-4813:
---

Integrated in Hadoop-Hdfs-0.23-Build #478 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/478/])
MAPREDUCE-4813. AM timing out during job commit (jlowe via bobby) (Revision 
1426540)

 Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1426540
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/MapTaskAttemptImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/ReduceTaskAttemptImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterEvent.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventType.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterJobAbortEvent.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterJobCommitEvent.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterJobSetupEvent.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterTaskAbortEvent.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/package-info.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/JobStateInternal.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobAbortCompletedEvent.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobCommitCompletedEvent.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobCommitFailedEvent.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobEventType.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobSetupCompletedEvent.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/event/JobSetupFailedEvent.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/MapTaskImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/ReduceTaskImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapre

[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit

2012-12-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540226#comment-13540226
 ] 

Hadoop QA commented on MAPREDUCE-4813:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12562531/MAPREDUCE-4813-2-branch-0.23.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3179//console

This message is automatically generated.

> AM timing out during job commit
> ---
>
> Key: MAPREDUCE-4813
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4813
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 0.23.3, 2.0.1-alpha
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: JobImplStateMachine.pdf, 
> MAPREDUCE-4813-2-branch-0.23.patch, MAPREDUCE-4813-2.patch, 
> MAPREDUCE-4813-2.patch, MAPREDUCE-4813-2.patch, MAPREDUCE-4813-2.patch, 
> MAPREDUCE-4813.patch, MAPREDUCE-4813.patch, MAPREDUCE-4813.patch
>
>
> The AM calls the output committer's {{commitJob}} method synchronously during 
> JobImpl state transitions, which means the JobImpl write lock is held the 
> entire time the job is being committed.  Holding the write lock prevents the 
> RM allocator thread from heartbeating to the RM.  Therefore if committing the 
> job takes too long (e.g.: the job has tons of files to commit and/or the 
> namenode is bogged down) then the AM appears to be unresponsive to the RM and 
> the RM kills the AM attempt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit

2012-12-27 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540108#comment-13540108
 ] 

Robert Joseph Evans commented on MAPREDUCE-4813:


The new set of changes look good to me I am +1.

> AM timing out during job commit
> ---
>
> Key: MAPREDUCE-4813
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4813
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 0.23.3, 2.0.1-alpha
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: JobImplStateMachine.pdf, MAPREDUCE-4813-2.patch, 
> MAPREDUCE-4813-2.patch, MAPREDUCE-4813-2.patch, MAPREDUCE-4813-2.patch, 
> MAPREDUCE-4813.patch, MAPREDUCE-4813.patch, MAPREDUCE-4813.patch
>
>
> The AM calls the output committer's {{commitJob}} method synchronously during 
> JobImpl state transitions, which means the JobImpl write lock is held the 
> entire time the job is being committed.  Holding the write lock prevents the 
> RM allocator thread from heartbeating to the RM.  Therefore if committing the 
> job takes too long (e.g.: the job has tons of files to commit and/or the 
> namenode is bogged down) then the AM appears to be unresponsive to the RM and 
> the RM kills the AM attempt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit

2012-12-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540088#comment-13540088
 ] 

Hadoop QA commented on MAPREDUCE-4813:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12562484/MAPREDUCE-4813-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3178//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3178//console

This message is automatically generated.

> AM timing out during job commit
> ---
>
> Key: MAPREDUCE-4813
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4813
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 0.23.3, 2.0.1-alpha
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: JobImplStateMachine.pdf, MAPREDUCE-4813-2.patch, 
> MAPREDUCE-4813-2.patch, MAPREDUCE-4813-2.patch, MAPREDUCE-4813-2.patch, 
> MAPREDUCE-4813.patch, MAPREDUCE-4813.patch, MAPREDUCE-4813.patch
>
>
> The AM calls the output committer's {{commitJob}} method synchronously during 
> JobImpl state transitions, which means the JobImpl write lock is held the 
> entire time the job is being committed.  Holding the write lock prevents the 
> RM allocator thread from heartbeating to the RM.  Therefore if committing the 
> job takes too long (e.g.: the job has tons of files to commit and/or the 
> namenode is bogged down) then the AM appears to be unresponsive to the RM and 
> the RM kills the AM attempt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit

2012-12-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540066#comment-13540066
 ] 

Hadoop QA commented on MAPREDUCE-4813:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12562484/MAPREDUCE-4813-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3177//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3177//console

This message is automatically generated.

> AM timing out during job commit
> ---
>
> Key: MAPREDUCE-4813
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4813
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 0.23.3, 2.0.1-alpha
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: JobImplStateMachine.pdf, MAPREDUCE-4813-2.patch, 
> MAPREDUCE-4813-2.patch, MAPREDUCE-4813-2.patch, MAPREDUCE-4813-2.patch, 
> MAPREDUCE-4813.patch, MAPREDUCE-4813.patch, MAPREDUCE-4813.patch
>
>
> The AM calls the output committer's {{commitJob}} method synchronously during 
> JobImpl state transitions, which means the JobImpl write lock is held the 
> entire time the job is being committed.  Holding the write lock prevents the 
> RM allocator thread from heartbeating to the RM.  Therefore if committing the 
> job takes too long (e.g.: the job has tons of files to commit and/or the 
> namenode is bogged down) then the AM appears to be unresponsive to the RM and 
> the RM kills the AM attempt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit

2012-12-27 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13539976#comment-13539976
 ] 

Robert Joseph Evans commented on MAPREDUCE-4813:


I have a few minor comments about CommitterEventHandler and one more serious 
one as well. When checking for the timeout we call context.getClock().getTime() 
twice.
{code}
// wait up to configured timeout for commit thread to finish
long timeoutTimestamp = context.getClock().getTime()
 + commitThreadCancelTimeoutMs;
long now = context.getClock().getTime();
{code}

I personally think it would be cleaner to call it once
{code}
// wait up to configured timeout for commit thread to finish
long now = context.getClock().getTime();
long timeoutTimestamp = now + commitThreadCancelTimeoutMs;
{code}

Also I noticed some inconsistencies in the error handling of various functions. 
In some places to get a message for an event we call 
{code}StringUtils.stringifyException(e){code} but in others it is just call 
{code}e.getMessage(){code} I am not sure if there is a reason for this or not, 
but I would prefer it to be consistent.

On a bit more serious note it looks like if we get any RuntimeException or 
Error while processing jobSetup, etc. the exception will be eaten.  We are not 
catching it and threadpool will likely just eat it.  This could result in a 
deadlock.  I would suggest at a minimum we catch all Exceptions instead of just 
IOException.


> AM timing out during job commit
> ---
>
> Key: MAPREDUCE-4813
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4813
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 0.23.3, 2.0.1-alpha
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: JobImplStateMachine.pdf, MAPREDUCE-4813-2.patch, 
> MAPREDUCE-4813-2.patch, MAPREDUCE-4813-2.patch, MAPREDUCE-4813.patch, 
> MAPREDUCE-4813.patch, MAPREDUCE-4813.patch
>
>
> The AM calls the output committer's {{commitJob}} method synchronously during 
> JobImpl state transitions, which means the JobImpl write lock is held the 
> entire time the job is being committed.  Holding the write lock prevents the 
> RM allocator thread from heartbeating to the RM.  Therefore if committing the 
> job takes too long (e.g.: the job has tons of files to commit and/or the 
> namenode is bogged down) then the AM appears to be unresponsive to the RM and 
> the RM kills the AM attempt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit

2012-12-27 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13539974#comment-13539974
 ] 

Robert Joseph Evans commented on MAPREDUCE-4813:


I have gone through the state machine for JobImpl, and I think I have found a 
few issues.  I am not done with my review, but I wanted to give you a heads up 
on them.

It looks like it may be possible to get a JOB_START event after a JOB_KILL 
event.  This would be a very rare race condition where someone is trying to 
kill their job right after starting it.  I am not sure if it is even possible 
in practice, especially after this patch where the setup is removed from the 
critical path. So I would suggest that we file a minor JIRA to fix this at some 
point.

Also it looks like we need to handle JOB_MAP_TASK_RESCHEDULED and 
JOB_TASK_ATTEMPT_COMPLETED in the FAIL_ABORT state.  In the KILL_ABORT state we 
don't need to worry about them because the KILL_WAIT state should make sure all 
tasks have completed before going on, but there is no corresponding FAIL_WAIT 
state. 

> AM timing out during job commit
> ---
>
> Key: MAPREDUCE-4813
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4813
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 0.23.3, 2.0.1-alpha
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: JobImplStateMachine.pdf, MAPREDUCE-4813-2.patch, 
> MAPREDUCE-4813-2.patch, MAPREDUCE-4813-2.patch, MAPREDUCE-4813.patch, 
> MAPREDUCE-4813.patch, MAPREDUCE-4813.patch
>
>
> The AM calls the output committer's {{commitJob}} method synchronously during 
> JobImpl state transitions, which means the JobImpl write lock is held the 
> entire time the job is being committed.  Holding the write lock prevents the 
> RM allocator thread from heartbeating to the RM.  Therefore if committing the 
> job takes too long (e.g.: the job has tons of files to commit and/or the 
> namenode is bogged down) then the AM appears to be unresponsive to the RM and 
> the RM kills the AM attempt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit

2012-12-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13539710#comment-13539710
 ] 

Hadoop QA commented on MAPREDUCE-4813:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12562409/MAPREDUCE-4813-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3176//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3176//console

This message is automatically generated.

> AM timing out during job commit
> ---
>
> Key: MAPREDUCE-4813
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4813
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 0.23.3, 2.0.1-alpha
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: JobImplStateMachine.pdf, MAPREDUCE-4813-2.patch, 
> MAPREDUCE-4813-2.patch, MAPREDUCE-4813-2.patch, MAPREDUCE-4813.patch, 
> MAPREDUCE-4813.patch, MAPREDUCE-4813.patch
>
>
> The AM calls the output committer's {{commitJob}} method synchronously during 
> JobImpl state transitions, which means the JobImpl write lock is held the 
> entire time the job is being committed.  Holding the write lock prevents the 
> RM allocator thread from heartbeating to the RM.  Therefore if committing the 
> job takes too long (e.g.: the job has tons of files to commit and/or the 
> namenode is bogged down) then the AM appears to be unresponsive to the RM and 
> the RM kills the AM attempt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit

2012-12-26 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13539698#comment-13539698
 ] 

Robert Joseph Evans commented on MAPREDUCE-4813:


A quick look at the patch looks OK to me. I need to dig into it in more detail. 
 Also the patch no longer compiles on trunk.  Could you please upmerge.  The 
generated state transition charts make looking at the events a lot simpler, but 
that does not work if the code does not compile.

> AM timing out during job commit
> ---
>
> Key: MAPREDUCE-4813
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4813
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 0.23.3, 2.0.1-alpha
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: JobImplStateMachine.pdf, MAPREDUCE-4813-2.patch, 
> MAPREDUCE-4813-2.patch, MAPREDUCE-4813.patch, MAPREDUCE-4813.patch, 
> MAPREDUCE-4813.patch
>
>
> The AM calls the output committer's {{commitJob}} method synchronously during 
> JobImpl state transitions, which means the JobImpl write lock is held the 
> entire time the job is being committed.  Holding the write lock prevents the 
> RM allocator thread from heartbeating to the RM.  Therefore if committing the 
> job takes too long (e.g.: the job has tons of files to commit and/or the 
> namenode is bogged down) then the AM appears to be unresponsive to the RM and 
> the RM kills the AM attempt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit

2012-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13534119#comment-13534119
 ] 

Hadoop QA commented on MAPREDUCE-4813:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12561317/MAPREDUCE-4813-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3128//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3128//console

This message is automatically generated.

> AM timing out during job commit
> ---
>
> Key: MAPREDUCE-4813
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4813
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 0.23.3, 2.0.1-alpha
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: JobImplStateMachine.pdf, MAPREDUCE-4813-2.patch, 
> MAPREDUCE-4813-2.patch, MAPREDUCE-4813.patch, MAPREDUCE-4813.patch, 
> MAPREDUCE-4813.patch
>
>
> The AM calls the output committer's {{commitJob}} method synchronously during 
> JobImpl state transitions, which means the JobImpl write lock is held the 
> entire time the job is being committed.  Holding the write lock prevents the 
> RM allocator thread from heartbeating to the RM.  Therefore if committing the 
> job takes too long (e.g.: the job has tons of files to commit and/or the 
> namenode is bogged down) then the AM appears to be unresponsive to the RM and 
> the RM kills the AM attempt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit

2012-12-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13532815#comment-13532815
 ] 

Hadoop QA commented on MAPREDUCE-4813:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12561060/MAPREDUCE-4813-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3125//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3125//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-app.html
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3125//console

This message is automatically generated.

> AM timing out during job commit
> ---
>
> Key: MAPREDUCE-4813
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4813
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 0.23.3, 2.0.1-alpha
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: JobImplStateMachine.pdf, MAPREDUCE-4813-2.patch, 
> MAPREDUCE-4813.patch, MAPREDUCE-4813.patch, MAPREDUCE-4813.patch
>
>
> The AM calls the output committer's {{commitJob}} method synchronously during 
> JobImpl state transitions, which means the JobImpl write lock is held the 
> entire time the job is being committed.  Holding the write lock prevents the 
> RM allocator thread from heartbeating to the RM.  Therefore if committing the 
> job takes too long (e.g.: the job has tons of files to commit and/or the 
> namenode is bogged down) then the AM appears to be unresponsive to the RM and 
> the RM kills the AM attempt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit

2012-11-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507040#comment-13507040
 ] 

Hadoop QA commented on MAPREDUCE-4813:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12555453/MAPREDUCE-4813.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3085//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3085//console

This message is automatically generated.

> AM timing out during job commit
> ---
>
> Key: MAPREDUCE-4813
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4813
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 0.23.3, 2.0.1-alpha
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: MAPREDUCE-4813.patch, MAPREDUCE-4813.patch, 
> MAPREDUCE-4813.patch
>
>
> The AM calls the output committer's {{commitJob}} method synchronously during 
> JobImpl state transitions, which means the JobImpl write lock is held the 
> entire time the job is being committed.  Holding the write lock prevents the 
> RM allocator thread from heartbeating to the RM.  Therefore if committing the 
> job takes too long (e.g.: the job has tons of files to commit and/or the 
> namenode is bogged down) then the AM appears to be unresponsive to the RM and 
> the RM kills the AM attempt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit

2012-11-29 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13506328#comment-13506328
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-4813:


Some comments on the patch:
 - Similar to JobCommitFailedEvent, add an event class for JOB_COMMIT_COMPLETED.
 - JobImpl.checkJobCompleteSuccess() and corresponding return variables should 
be renamed to mean checkIfJobReadyForCommit(). Similary, 
checkJobForCompletion(job).
 - For now, we may be just be addressing MAPREDUCE-4815, but the same argument 
of committer being arbitrary user code is valid for other calls like abortJob, 
setupJob too. We will need states capturing those calls and put them on 
separate threads so that dispatches isn't blocked. We can do that later, but to 
be future-proof, let's move the committer-thread to a top-level service ala 
TaskCleaner. We may even re-purpose TaskCleanerImpl for this. Scope the effort 
and split it as you see fit.
 - Commit-thread interrupting and joining is only meaning-ful in the case of 
kill-during-commit. So let's move that code there. Also, earlier, we never 
supported kill-during-commit, but now we do and the patch is putting a 60second 
upper bound on commitJob() before abortJob(). Comparing this with 1.*, we do 
allow kill-during-commit as commit happens in a separate JVM. So interrupt and 
join seems fine, let's just put in a config so that we can tweak if ever there 
is a need.
 - The test looks good. Can you extend it to include kill-during-commit too. 
That will also validate that the dispatcher isn't blocked anymore because of 
long commit.

> AM timing out during job commit
> ---
>
> Key: MAPREDUCE-4813
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4813
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 0.23.3, 2.0.1-alpha
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: MAPREDUCE-4813.patch, MAPREDUCE-4813.patch
>
>
> The AM calls the output committer's {{commitJob}} method synchronously during 
> JobImpl state transitions, which means the JobImpl write lock is held the 
> entire time the job is being committed.  Holding the write lock prevents the 
> RM allocator thread from heartbeating to the RM.  Therefore if committing the 
> job takes too long (e.g.: the job has tons of files to commit and/or the 
> namenode is bogged down) then the AM appears to be unresponsive to the RM and 
> the RM kills the AM attempt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit

2012-11-28 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13506299#comment-13506299
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-4813:


bq. So I still think we need this
Agreed, also we need a fix before MAPREDUCE-4815 is resolved. So let's get this 
in. Looking at the patch now.

> AM timing out during job commit
> ---
>
> Key: MAPREDUCE-4813
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4813
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 0.23.3, 2.0.1-alpha
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: MAPREDUCE-4813.patch, MAPREDUCE-4813.patch
>
>
> The AM calls the output committer's {{commitJob}} method synchronously during 
> JobImpl state transitions, which means the JobImpl write lock is held the 
> entire time the job is being committed.  Holding the write lock prevents the 
> RM allocator thread from heartbeating to the RM.  Therefore if committing the 
> job takes too long (e.g.: the job has tons of files to commit and/or the 
> namenode is bogged down) then the AM appears to be unresponsive to the RM and 
> the RM kills the AM attempt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit

2012-11-27 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504635#comment-13504635
 ] 

Jason Lowe commented on MAPREDUCE-4813:
---

MAPREDUCE-4815 only addresses FileOutputCommitter and friends, but the 
committer is arbitrary user code.  It could be doing all sorts of things 
including connecting to databases, etc.  So I still think we need this, 
although the priority of it is reduced given how many things are built from 
FileOutputCommitter.

> AM timing out during job commit
> ---
>
> Key: MAPREDUCE-4813
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4813
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 0.23.3, 2.0.1-alpha
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: MAPREDUCE-4813.patch
>
>
> The AM calls the output committer's {{commitJob}} method synchronously during 
> JobImpl state transitions, which means the JobImpl write lock is held the 
> entire time the job is being committed.  Holding the write lock prevents the 
> RM allocator thread from heartbeating to the RM.  Therefore if committing the 
> job takes too long (e.g.: the job has tons of files to commit and/or the 
> namenode is bogged down) then the AM appears to be unresponsive to the RM and 
> the RM kills the AM attempt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4813) AM timing out during job commit

2012-11-26 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504304#comment-13504304
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-4813:


Started looking at the patch but realized a thing. When we fix MAPREDUCE-4815, 
commitJob won't be expensive anymore? We still need to make sure that a hung 
DFS move doesn't make the AM timeout, but I believe that is automatically 
handled via RPC timeouts for e.g.

> AM timing out during job commit
> ---
>
> Key: MAPREDUCE-4813
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4813
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 0.23.3, 2.0.1-alpha
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: MAPREDUCE-4813.patch
>
>
> The AM calls the output committer's {{commitJob}} method synchronously during 
> JobImpl state transitions, which means the JobImpl write lock is held the 
> entire time the job is being committed.  Holding the write lock prevents the 
> RM allocator thread from heartbeating to the RM.  Therefore if committing the 
> job takes too long (e.g.: the job has tons of files to commit and/or the 
> namenode is bogged down) then the AM appears to be unresponsive to the RM and 
> the RM kills the AM attempt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira