[ 
https://issues.apache.org/jira/browse/SPARK-24182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-24182.
---------------------------------
       Resolution: Fixed
    Fix Version/s: 2.4.0

Issue resolved by pull request 21243
[https://github.com/apache/spark/pull/21243]

> Improve error message for client mode when AM fails
> ---------------------------------------------------
>
>                 Key: SPARK-24182
>                 URL: https://issues.apache.org/jira/browse/SPARK-24182
>             Project: Spark
>          Issue Type: Improvement
>          Components: YARN
>    Affects Versions: 2.3.0
>            Reporter: Marcelo Vanzin
>            Assignee: Marcelo Vanzin
>            Priority: Minor
>             Fix For: 2.4.0
>
>
> Today, when the client AM fails, there's not a lot of useful information 
> printed on the output. Depending on the type of failure, the information 
> provided by the YARN AM is also not very useful. For example, you'd see this 
> in the Spark shell:
> {noformat}
> 18/05/04 11:07:38 ERROR spark.SparkContext: Error initializing SparkContext.
> org.apache.spark.SparkException: Yarn application has already ended! It might 
> have been killed or unable to launch application master.
>         at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:86)
>         at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:63)
>         at 
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
>         at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
>  [long stack trace]
> {noformat}
> Similarly, on the YARN RM, for certain failures you see a generic error like 
> this:
> {noformat}
> ExitCodeException exitCode=10: at 
> org.apache.hadoop.util.Shell.runCommand(Shell.java:543) at 
> org.apache.hadoop.util.Shell.run(Shell.java:460) at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720) at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:366)
>  at 
> [blah blah blah]
> {noformat}
> It would be nice if we could provide a more accurate description of what went 
> wrong when possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to