[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15146537#comment-15146537
 ] 

Naganarasimha G R commented on MAPREDUCE-6579:
----------------------------------------------

Thanks [~jlowe],[~sunilg] & [~ajisakaa], for sharing your views but we have 
already taken care of what you guys have suggested. YARN-3946 takes care of 
resetting the diagnostic message once the AM is registered already in 
{{RMAppAttemptImpl.AMRegisteredTransition}}
{code}
@@ -1507,6 +1515,9 @@ public void transition(RMAppAttemptImpl appAttempt,
       appAttempt.originalTrackingUrl =
           sanitizeTrackingUrl(registrationEvent.getTrackingurl());
 
+      // reset AMLaunchDiagnostics once AM Registers with RM
+      appAttempt.updateAMLaunchDiagnostics(null);
+
{code}
[~jlowe], 
As we have ensured that the Am launch Diagnostic info is reset once the AM is 
registered, do you see any further problems with the approach taken in 
YARN-3946 ? IMHO it was very useful information to get this statistics and it 
was apt to put it into App's Diagnostic information instead of creating a new 
*AM Launch diagnostic* and i have taken care such that if anywhere diagnostic 
is set then *RMAppImpl or RMAppAttemptImpl* will not return the *AM Launch 
diagnostic* but return the actual message.
Further in Map reduce, currently its assumed that the diagnostics are always 
*failure diagnostics* which IMO is wrong. As per *MAPREDUCE-6579.05.patch*  
{{JobStatus}} is as follows
{code}
    public synchronized String getFailureInfo() {
-     return this.failureInfo;
+     if (runState == State.FAILED || runState == State.KILLED) {
+       return this.failureInfo;
+     }
+     return "";
    }
{code} 
As you were informing this might not be correct place to do this check, whether 
would it be better to handle in {{NotRunningJob.getJobReport}} ?

[~ajisakaa],
As per modifications in *MAPREDUCE-6579.06*, you have reset the diagnostics in 
*RMAppAttemptImpl.AMRegisteredTransition* just after the location i have reset. 
I commented that code and ran the test case, it ran successfully. So IMO the 
modification in the testcase is fine but RMAppAttemptImpl modifications are not 
required and at the same time we need to discuss whether we need to add 
additional check in {{NotRunningJob.getJobReport}}

> JobStatus#getFailureInfo should not output diagnostic information when the 
> job is running
> -----------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6579
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6579
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: test
>            Reporter: Rohith Sharma K S
>            Assignee: Akira AJISAKA
>            Priority: Blocker
>         Attachments: MAPREDUCE-6579.01.patch, MAPREDUCE-6579.02.patch, 
> MAPREDUCE-6579.03.patch, MAPREDUCE-6579.04.patch, MAPREDUCE-6579.05.patch, 
> MAPREDUCE-6579.06.patch
>
>
> From 
> [https://builds.apache.org/job/PreCommit-YARN-Build/9976/artifact/patchprocess/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient-jdk1.8.0_66.txt]
>  TestNetworkedJob are failed intermittently.
> {code}
> Running org.apache.hadoop.mapred.TestNetworkedJob
> Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 81.131 sec 
> <<< FAILURE! - in org.apache.hadoop.mapred.TestNetworkedJob
> testNetworkedJob(org.apache.hadoop.mapred.TestNetworkedJob)  Time elapsed: 
> 30.55 sec  <<< FAILURE!
> org.junit.ComparisonFailure: expected:<[[Tue Dec 15 14:02:45 +0000 2015] 
> Application is Activated, waiting for resources to be assigned for AM.  
> Details : AM Partition = <DEFAULT_PARTITION> ; Partition Resource = 
> <memory:8192, vCores:16> ; Queue's Absolute capacity = 100.0 % ; Queue's 
> Absolute used capacity = 0.0 % ; Queue's Absolute max capacity = 100.0 % ; ]> 
> but was:<[]>
>       at org.junit.Assert.assertEquals(Assert.java:115)
>       at org.junit.Assert.assertEquals(Assert.java:144)
>       at 
> org.apache.hadoop.mapred.TestNetworkedJob.testNetworkedJob(TestNetworkedJob.java:174)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to