[ 
https://issues.apache.org/jira/browse/LIVY-712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972668#comment-16972668
 ] 

Michal Sankot commented on LIVY-712:
------------------------------------

As I search though the code i see a change in SparkYarnApp.scala line 295 on 
between 0.4.0 and 0.5.0 (same in 0.6.0).

 

0.4.0:

{{ } catch {}}
 {{  case e: InterruptedException =>}}
 {{    yarnDiagnostics = ArrayBuffer("Session stopped by user.")}}
 {{    changeState(SparkApp.State.KILLED)}}
 {{  case e: Throwable =>}}
 {{    error(s"Error whiling refreshing YARN state: $e")}}
 {{    yarnDiagnostics = ArrayBuffer(e.toString, e.getStackTrace().mkString(" 
"))}}
 {{    changeState(SparkApp.State.FAILED)}}

  }

0.5.0/0.6.0:

{{ } catch {}}
 {{  case _: InterruptedException =>}}
 {{    yarnDiagnostics = ArrayBuffer("Session stopped by user.")}}
 {{    changeState(SparkApp.State.KILLED)}}
 {{  case NonFatal(e) =>}}
 {{    error(s"Error whiling refreshing YARN state", e)}}
 {{    yarnDiagnostics = ArrayBuffer(e.getMessage)}}
 {{    changeState(SparkApp.State.FAILED)}}

  }

So it seems that in 0.5.0+ all Fatal exceptions are ignored and don't make App 
go into FAILED state. That looks like a bug. Is it so?

> EMR 5.23/5.27 - Livy does not recognise that Spark job failed
> -------------------------------------------------------------
>
>                 Key: LIVY-712
>                 URL: https://issues.apache.org/jira/browse/LIVY-712
>             Project: Livy
>          Issue Type: Bug
>          Components: API
>    Affects Versions: 0.5.0, 0.6.0
>         Environment: AWS EMR 5.23/5.27, Scala
>            Reporter: Michal Sankot
>            Priority: Major
>              Labels: EMR, api, spark
>
> We've upgraded from AWS EMR 5.13 -> 5.23 (Livy 0.4.0 -> 0.5.0, Spark 2.3.0 -> 
> 2.4.0) and an issue appears that when there is an exception thrown during 
> Spark job execution, Spark shuts down as if there was no problem and job 
> appears as Completed in EMR. So we're not notified when system crashes. The 
> same problem appears in EMR 5.27 (Livy 0.6.0, Spark 2.4.4).
> Is it something with Spark? Or a known issue with Livy?
> In Livy logs I see that spark-submit exists with error code 1
> {quote}{{05:34:59 WARN BatchSession$: spark-submit exited with code 1}}
> {quote}
>  And then Livy API states that batch state is
> {quote}{{"state": "success"}}
> {quote}
> How can it be made work again?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to