[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

2012-11-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489431#comment-13489431
 ] 

Hudson commented on MAPREDUCE-4729:
---

Integrated in Hadoop-Mapreduce-trunk #1244 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1244/])
MAPREDUCE-4729. job history UI not showing all job attempts. Contributed by 
Vinod Kumar Vavilapalli (Revision 1404817)

 Result = FAILURE
jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1404817
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestAMInfos.java


> job history UI not showing all job attempts
> ---
>
> Key: MAPREDUCE-4729
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 0.23.3
>Reporter: Thomas Graves
>Assignee: Vinod Kumar Vavilapalli
> Fix For: 2.0.3-alpha, 0.23.5
>
> Attachments: MAPREDUCE-4729-20121031.txt, MAPREDUCE-4729-20121101.txt
>
>
> We are seeing a case where a job runs but the AM is running out of memory in 
> the first 3 attempts. The job eventually finishes on the 4th attempt.  When 
> you go to the job history UI for that job, it only shows the last attempt.  
> This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that 
> case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

2012-11-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489400#comment-13489400
 ] 

Hudson commented on MAPREDUCE-4729:
---

Integrated in Hadoop-Hdfs-trunk #1214 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1214/])
MAPREDUCE-4729. job history UI not showing all job attempts. Contributed by 
Vinod Kumar Vavilapalli (Revision 1404817)

 Result = FAILURE
jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1404817
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestAMInfos.java


> job history UI not showing all job attempts
> ---
>
> Key: MAPREDUCE-4729
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 0.23.3
>Reporter: Thomas Graves
>Assignee: Vinod Kumar Vavilapalli
> Fix For: 2.0.3-alpha, 0.23.5
>
> Attachments: MAPREDUCE-4729-20121031.txt, MAPREDUCE-4729-20121101.txt
>
>
> We are seeing a case where a job runs but the AM is running out of memory in 
> the first 3 attempts. The job eventually finishes on the 4th attempt.  When 
> you go to the job history UI for that job, it only shows the last attempt.  
> This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that 
> case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

2012-11-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489395#comment-13489395
 ] 

Hudson commented on MAPREDUCE-4729:
---

Integrated in Hadoop-Hdfs-0.23-Build #423 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/423/])
svn merge -c 1404817 FIXES: MAPREDUCE-4729. job history UI not showing all 
job attempts. Contributed by Vinod Kumar Vavilapalli (Revision 1404825)

 Result = SUCCESS
jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1404825
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestAMInfos.java


> job history UI not showing all job attempts
> ---
>
> Key: MAPREDUCE-4729
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 0.23.3
>Reporter: Thomas Graves
>Assignee: Vinod Kumar Vavilapalli
> Fix For: 2.0.3-alpha, 0.23.5
>
> Attachments: MAPREDUCE-4729-20121031.txt, MAPREDUCE-4729-20121101.txt
>
>
> We are seeing a case where a job runs but the AM is running out of memory in 
> the first 3 attempts. The job eventually finishes on the 4th attempt.  When 
> you go to the job history UI for that job, it only shows the last attempt.  
> This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that 
> case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

2012-11-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489354#comment-13489354
 ] 

Hudson commented on MAPREDUCE-4729:
---

Integrated in Hadoop-Yarn-trunk #24 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/24/])
MAPREDUCE-4729. job history UI not showing all job attempts. Contributed by 
Vinod Kumar Vavilapalli (Revision 1404817)

 Result = SUCCESS
jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1404817
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestAMInfos.java


> job history UI not showing all job attempts
> ---
>
> Key: MAPREDUCE-4729
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 0.23.3
>Reporter: Thomas Graves
>Assignee: Vinod Kumar Vavilapalli
> Fix For: 2.0.3-alpha, 0.23.5
>
> Attachments: MAPREDUCE-4729-20121031.txt, MAPREDUCE-4729-20121101.txt
>
>
> We are seeing a case where a job runs but the AM is running out of memory in 
> the first 3 attempts. The job eventually finishes on the 4th attempt.  When 
> you go to the job history UI for that job, it only shows the last attempt.  
> This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that 
> case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

2012-11-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489128#comment-13489128
 ] 

Hudson commented on MAPREDUCE-4729:
---

Integrated in Hadoop-trunk-Commit #2949 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/2949/])
MAPREDUCE-4729. job history UI not showing all job attempts. Contributed by 
Vinod Kumar Vavilapalli (Revision 1404817)

 Result = SUCCESS
jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1404817
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestAMInfos.java


> job history UI not showing all job attempts
> ---
>
> Key: MAPREDUCE-4729
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 0.23.3
>Reporter: Thomas Graves
>Assignee: Vinod Kumar Vavilapalli
> Fix For: 2.0.3-alpha, 0.23.5
>
> Attachments: MAPREDUCE-4729-20121031.txt, MAPREDUCE-4729-20121101.txt
>
>
> We are seeing a case where a job runs but the AM is running out of memory in 
> the first 3 attempts. The job eventually finishes on the 4th attempt.  When 
> you go to the job history UI for that job, it only shows the last attempt.  
> This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that 
> case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

2012-11-01 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489098#comment-13489098
 ] 

Jason Lowe commented on MAPREDUCE-4729:
---

+1, will commit shortly.  Filed MAPREDUCE-4767 to track the lack of flushing 
for AMInfo records.

> job history UI not showing all job attempts
> ---
>
> Key: MAPREDUCE-4729
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 0.23.3
>Reporter: Thomas Graves
>Assignee: Vinod Kumar Vavilapalli
> Attachments: MAPREDUCE-4729-20121031.txt, MAPREDUCE-4729-20121101.txt
>
>
> We are seeing a case where a job runs but the AM is running out of memory in 
> the first 3 attempts. The job eventually finishes on the 4th attempt.  When 
> you go to the job history UI for that job, it only shows the last attempt.  
> This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that 
> case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

2012-11-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489049#comment-13489049
 ] 

Hadoop QA commented on MAPREDUCE-4729:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12551761/MAPREDUCE-4729-20121101.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2980//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2980//console

This message is automatically generated.

> job history UI not showing all job attempts
> ---
>
> Key: MAPREDUCE-4729
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 0.23.3
>Reporter: Thomas Graves
>Assignee: Vinod Kumar Vavilapalli
> Attachments: MAPREDUCE-4729-20121031.txt, MAPREDUCE-4729-20121101.txt
>
>
> We are seeing a case where a job runs but the AM is running out of memory in 
> the first 3 attempts. The job eventually finishes on the 4th attempt.  When 
> you go to the job history UI for that job, it only shows the last attempt.  
> This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that 
> case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

2012-11-01 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489009#comment-13489009
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-4729:


bq.  However, I am a bit conflicted about where the parsing of the job history 
file is happening. I kind of feel that it should go in the recovery service.
I am not sure either ways but RecoveryService is wired into the AppMaster and 
is selected dynamically depending on whether recovery is enabled or not. We can 
rewire this, but that's a slightly bigger change. I'd like to keep this as is.

> job history UI not showing all job attempts
> ---
>
> Key: MAPREDUCE-4729
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 0.23.3
>Reporter: Thomas Graves
>Assignee: Vinod Kumar Vavilapalli
> Attachments: MAPREDUCE-4729-20121031.txt
>
>
> We are seeing a case where a job runs but the AM is running out of memory in 
> the first 3 attempts. The job eventually finishes on the 4th attempt.  When 
> you go to the job history UI for that job, it only shows the last attempt.  
> This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that 
> case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

2012-11-01 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488741#comment-13488741
 ] 

Jason Lowe commented on MAPREDUCE-4729:
---

I tried testing the patch with a sleep job using 
-Dyarn.app.mapreduce.am.job.recovery.enable=false and manually killing the 
ApplicationMaster with a kill -9, but it didn't work.  The log showed this 
exception:

{noformat}
2012-11-01 14:37:01,543 WARN [main] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Could not parse the old history 
file. Will not have old AMinfos 
java.io.IOException: Incompatible event log version: null
at 
org.apache.hadoop.mapreduce.jobhistory.EventReader.(EventReader.java:70)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.readJustAMInfos(MRAppMaster.java:915)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.start(MRAppMaster.java:846)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1143)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1378)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1139)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1098)
{noformat}

It looks like the AM is buffering the history file output, and we didn't flush 
out the AMInfos from previous runs.  When I used a normal kill instead of kill 
-9, it worked.  We will want to flush/sync the job history file after writing 
the AMInfos to help guard against unclean teardowns losing prior AM attempts in 
the history.  This can be fixed in a separate JIRA if we don't want to fix it 
here.

Couple of other comments on the patch:
* Application attempts start from 1 instead of 0, so the first attempt tries to 
recover AMInfos when it shouldn't and leads to a large FileNotFoundException 
stacktrace being logged
* Nit: In RecoveryService.parse there's an extra space logged before a comma.  
{{LOG.info("Got an error parsing job-history file "}} should be {{LOG.info("Got 
an error parsing job-history file"}}
* Nit: The body of the while loop in readJustAMInfos could be a bit cleaner 
with fewer conditionals.  For example:
{code}
  while ((event = jobHistoryEventReader.getNextEvent()) != null) {
if (event.getEventType() == EventType.AM_STARTED) {
  amStartedEventsBegan = true;
  AMStartedEvent amStartedEvent = (AMStartedEvent) event;
  amInfos.add(MRBuilderUtils.newAMInfo(
amStartedEvent.getAppAttemptId(), amStartedEvent.getStartTime(),
amStartedEvent.getContainerId(),
StringInterner.weakIntern(amStartedEvent.getNodeManagerHost()),
amStartedEvent.getNodeManagerPort(),
amStartedEvent.getNodeManagerHttpPort()));
} else if (amStartedEventsBegan) {
  // This means AMStartedEvents began and this event is a
  // non-AMStarted event.
  // No need to continue reading all the other events.
  break;
}
  }
{code}

> job history UI not showing all job attempts
> ---
>
> Key: MAPREDUCE-4729
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 0.23.3
>Reporter: Thomas Graves
>Assignee: Vinod Kumar Vavilapalli
> Attachments: MAPREDUCE-4729-20121031.txt
>
>
> We are seeing a case where a job runs but the AM is running out of memory in 
> the first 3 attempts. The job eventually finishes on the 4th attempt.  When 
> you go to the job history UI for that job, it only shows the last attempt.  
> This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that 
> case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

2012-11-01 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488729#comment-13488729
 ] 

Robert Joseph Evans commented on MAPREDUCE-4729:


The patch looks good to me.  I like how we short circuit the reading of the 
history file because we know that what we want to read is in a single chunk. 
However, I am a bit conflicted about where the parsing of the job history file 
is happening.  I kind of feel that it should go in the recovery service. In 
this case we are only doing a partial recovery, not a full recovery.  I can see 
in the future we would want to add in other things to the partial recovery, 
like when/if we add in a summary of the history file for fast reading it would 
be nice to propagate that to the next file no matter what happens so that the 
history server can show a full view of what is happened in the application.  
But, I am fine with it the way it is and if you feel strongly the other way 
feel free to check it in as is +1. 

> job history UI not showing all job attempts
> ---
>
> Key: MAPREDUCE-4729
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 0.23.3
>Reporter: Thomas Graves
>Assignee: Vinod Kumar Vavilapalli
> Attachments: MAPREDUCE-4729-20121031.txt
>
>
> We are seeing a case where a job runs but the AM is running out of memory in 
> the first 3 attempts. The job eventually finishes on the 4th attempt.  When 
> you go to the job history UI for that job, it only shows the last attempt.  
> This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that 
> case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

2012-10-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488431#comment-13488431
 ] 

Hadoop QA commented on MAPREDUCE-4729:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12551662/MAPREDUCE-4729-20121031.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2974//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2974//console

This message is automatically generated.

> job history UI not showing all job attempts
> ---
>
> Key: MAPREDUCE-4729
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 0.23.3
>Reporter: Thomas Graves
>Assignee: Vinod Kumar Vavilapalli
> Attachments: MAPREDUCE-4729-20121031.txt
>
>
> We are seeing a case where a job runs but the AM is running out of memory in 
> the first 3 attempts. The job eventually finishes on the 4th attempt.  When 
> you go to the job history UI for that job, it only shows the last attempt.  
> This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that 
> case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

2012-10-18 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479369#comment-13479369
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-4729:


bq. I think we should fix that so that even if recovery isn't supported we 
atleast parse and get the previous AM attempt info.
+1

bq. I'm not a pig expert but perhaps we should also follow up with pig team to 
see if they should support Recovery
This is a general issue with all OutputFormats. We need to educate users that 
they need to start implementing recovery.

> job history UI not showing all job attempts
> ---
>
> Key: MAPREDUCE-4729
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 0.23.3
>Reporter: Thomas Graves
>
> We are seeing a case where a job runs but the AM is running out of memory in 
> the first 3 attempts. The job eventually finishes on the 4th attempt.  When 
> you go to the job history UI for that job, it only shows the last attempt.  
> This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that 
> case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

2012-10-18 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479309#comment-13479309
 ] 

Thomas Graves commented on MAPREDUCE-4729:
--

I'm not a pig expert but perhaps we should also follow up with pig team to see 
if they should support Recovery

> job history UI not showing all job attempts
> ---
>
> Key: MAPREDUCE-4729
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 0.23.3
>Reporter: Thomas Graves
>
> We are seeing a case where a job runs but the AM is running out of memory in 
> the first 3 attempts. The job eventually finishes on the 4th attempt.  When 
> you go to the job history UI for that job, it only shows the last attempt.  
> This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that 
> case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

2012-10-18 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479306#comment-13479306
 ] 

Thomas Graves commented on MAPREDUCE-4729:
--

Ok, so I figured this out.  The job is using output format 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat, 
which has the OutputCommitter which is set to null.  This caused the 
MRAppMaster recoveryService to not start:

 "org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Not starting RecoveryService: 
recoveryEnabled: true recoverySupportedByCommitter: false ApplicationAttemptID: 
4"

Since the recovery service didn't start it didn't parse the old job history 
files, thus didn't have the list of old AMs. 

I think we should fix that so that even if recovery isn't supported we atleast 
parse and get the previous AM attempt info.

> job history UI not showing all job attempts
> ---
>
> Key: MAPREDUCE-4729
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 0.23.3
>Reporter: Thomas Graves
>
> We are seeing a case where a job runs but the AM is running out of memory in 
> the first 3 attempts. The job eventually finishes on the 4th attempt.  When 
> you go to the job history UI for that job, it only shows the last attempt.  
> This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that 
> case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

2012-10-18 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479201#comment-13479201
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-4729:


It looks like we retain the history files from all attempts, did you look at 
them all? Also, please see if you find this log in any of the AM attempts:
{code}
"Got an error parsing job-history file " + historyFile + ", ignoring incomplete 
events."
{code}

> job history UI not showing all job attempts
> ---
>
> Key: MAPREDUCE-4729
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 0.23.3
>Reporter: Thomas Graves
>
> We are seeing a case where a job runs but the AM is running out of memory in 
> the first 3 attempts. The job eventually finishes on the 4th attempt.  When 
> you go to the job history UI for that job, it only shows the last attempt.  
> This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that 
> case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

2012-10-17 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478393#comment-13478393
 ] 

Thomas Graves commented on MAPREDUCE-4729:
--

The jobhistory file itself only has one AM_STARTED event which is for the 4th 
attempt. So it must be the jhist file is getting lost of corrupt from the other 
attempts somehow.

> job history UI not showing all job attempts
> ---
>
> Key: MAPREDUCE-4729
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 0.23.3
>Reporter: Thomas Graves
>
> We are seeing a case where a job runs but the AM is running out of memory in 
> the first 3 attempts. The job eventually finishes on the 4th attempt.  When 
> you go to the job history UI for that job, it only shows the last attempt.  
> This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that 
> case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

2012-10-17 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478250#comment-13478250
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-4729:


IIRC, The AM recovery is tolerant to corrupted records towards the end of file.

Thomas, can you look at the history files directly and see if AMStarted events 
are getting correctly logged in each generation? Each Job Attempt should have 
AMStarted events from all the previous generations.

> job history UI not showing all job attempts
> ---
>
> Key: MAPREDUCE-4729
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 0.23.3
>Reporter: Thomas Graves
>
> We are seeing a case where a job runs but the AM is running out of memory in 
> the first 3 attempts. The job eventually finishes on the 4th attempt.  When 
> you go to the job history UI for that job, it only shows the last attempt.  
> This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that 
> case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

2012-10-17 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478236#comment-13478236
 ] 

Robert Joseph Evans commented on MAPREDUCE-4729:


That is a bit odd.  The only thing I can think of is that the jhist files from 
the previous runs were corrupted some how.  The way we have set things up on 
the AM an OOM will shut things down very hard with no time for cleanup, so the 
end of the file may have had some issues. This is just speculation. 

> job history UI not showing all job attempts
> ---
>
> Key: MAPREDUCE-4729
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 0.23.3
>Reporter: Thomas Graves
>
> We are seeing a case where a job runs but the AM is running out of memory in 
> the first 3 attempts. The job eventually finishes on the 4th attempt.  When 
> you go to the job history UI for that job, it only shows the last attempt.  
> This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that 
> case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira