Chris Riccomini created SAMZA-65:
------------------------------------

             Summary: Samza should use YARN's setDiagnosticMessage command when 
failures occur
                 Key: SAMZA-65
                 URL: https://issues.apache.org/jira/browse/SAMZA-65
             Project: Samza
          Issue Type: Bug
          Components: yarn
    Affects Versions: 0.6.0
            Reporter: Chris Riccomini


Currently, when an AM container fails, the diagnostic message reads:

{noformat}
Diagnostics:    
Application application_1382474502616_0004 failed 2 times due to AM Container 
for appattempt_1382474502616_0004_000002 exited with exitCode: 1 due to: 
Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
at org.apache.hadoop.util.Shell.run(Shell.java:379)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
.Failing this attempt.. Failing the application.
{noformat}

Users then generally click through to the AM logs to see the stderr message.

Samza actually knows what exception is being thrown, which triggers the 
non-zero exit code. It should set a better diagnostic with the actual stack 
trace.

This change should definitely be made for the Samza AM.

I'm not sure how to best handle this with SamzaContainer, since it is job-type 
agnostic, and doesn't know anything about YARN. For now, I thin it's best to 
only do the AM.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to