[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588901#comment-13588901
 ] 

Sandy Ryza commented on MAPREDUCE-3688:
---------------------------------------

I didn't have a specific plan yet, so any guidance on the best course would be 
helpful. I was planning to only handle things on the YARN side in this JIRA.

It seems like the question that needs to be answered is, what information is 
percolated up to the user in the following (not necessarily mutually-exclusive) 
situations:
* AM is killed by NM for going over resource limits
* AM OOMEs
* AM dies before registering with the RM
* AM dies after registering with the RM
* AM is killed by RM (preempted or because of logic error)
* AM localization fails
* AM launch fails

It looks like the issue raised in MAPREDUCE-3949 about a container being killed 
when ever resource limits is now fixed - a sensible diagnostic shows up both on 
the command line and in the UI.  I haven't yet tested what happens when the AM 
dies unexpectedly after being registered, but if log aggregation is on its logs 
should be accessible through the RM UI. When the AM dies before being 
registered, nothing useful gets reported.  I'm not sure what the best thing to 
do in this case is - at the very least it would be good to report that it died 
before registering.
                
> Need better Error message if AM is killed/throws exception
> ----------------------------------------------------------
>
>                 Key: MAPREDUCE-3688
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3688
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am, mrv2
>    Affects Versions: 0.23.1
>            Reporter: David Capwell
>            Assignee: Ravi Prakash
>             Fix For: 0.23.2
>
>
> We need better error messages in the UI if the AM gets killed or throws an 
> Exception.
> If the following error gets thrown: 
> java.lang.NumberFormatException: For input string: "9223372036854775807l" // 
> last char is an L
> then the UI should say this exception.  Instead I get the following:
> Application application_1326504761991_0018 failed 1 times due to AM Container 
> for appattempt_1326504761991_0018_000001
> exited with exitCode: 1 due to: Exception from container-launch: 
> org.apache.hadoop.util.Shell$ExitCodeException

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to