[ 
https://issues.apache.org/jira/browse/SPARK-32197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157070#comment-17157070
 ] 

Jungtaek Lim commented on SPARK-32197:
--------------------------------------

Lowering the priority, as Critical+ requires committer's judgement.

> 'Spark driver' stays running even though 'spark application' has FAILED
> -----------------------------------------------------------------------
>
>                 Key: SPARK-32197
>                 URL: https://issues.apache.org/jira/browse/SPARK-32197
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler, Spark Core
>    Affects Versions: 2.4.6
>            Reporter: t oo
>            Priority: Major
>         Attachments: app_executors.png, applog.txt, driverlog.txt, 
> failed1.png, failed_stages.png, failedapp.png, j1.out, stuckdriver.png
>
>
> App failed in 6 minutes, driver has been stuck for > 8 hours. I would expect 
> driver to fail if app fails.
>  
> Thread dump from jstack (on the driver pid) attached (j1.out)
> Last part of stdout driver log attached (full log is 23MB, stderr log just 
> has launch command)
> Last part of app logs attached
>  
> Can see that "org.apache.spark.util.ShutdownHookManager - Shutdown hook 
> called"  line never appears in the driver log after 
> "org.apache.spark.SparkContext - Successfully stopped SparkContext"
>  
> Using spark 2.4.6 with spark standalone mode. spark-submit to REST API (port 
> 6066) in cluster mode was used. Other drivers/apps have worked fine with this 
> setup, just this one getting stuck. My cluster has 1 EC2 dedicated as spark 
> master and 1 Spot EC2 dedicated as spark worker. They can auto heal/spot 
> terminate at any time. From checking aws logs: the worker was terminated at 
> 01:53:38
>  
> I think you can replicate this by tearing down worker machine while app is 
> running. You might have to try several times.
>  
> Similar to https://issues.apache.org/jira/browse/SPARK-24617 i raised before!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to