[ https://issues.apache.org/jira/browse/SPARK-32197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157070#comment-17157070 ]
Jungtaek Lim commented on SPARK-32197: -------------------------------------- Lowering the priority, as Critical+ requires committer's judgement. > 'Spark driver' stays running even though 'spark application' has FAILED > ----------------------------------------------------------------------- > > Key: SPARK-32197 > URL: https://issues.apache.org/jira/browse/SPARK-32197 > Project: Spark > Issue Type: Bug > Components: Scheduler, Spark Core > Affects Versions: 2.4.6 > Reporter: t oo > Priority: Major > Attachments: app_executors.png, applog.txt, driverlog.txt, > failed1.png, failed_stages.png, failedapp.png, j1.out, stuckdriver.png > > > App failed in 6 minutes, driver has been stuck for > 8 hours. I would expect > driver to fail if app fails. > > Thread dump from jstack (on the driver pid) attached (j1.out) > Last part of stdout driver log attached (full log is 23MB, stderr log just > has launch command) > Last part of app logs attached > > Can see that "org.apache.spark.util.ShutdownHookManager - Shutdown hook > called" line never appears in the driver log after > "org.apache.spark.SparkContext - Successfully stopped SparkContext" > > Using spark 2.4.6 with spark standalone mode. spark-submit to REST API (port > 6066) in cluster mode was used. Other drivers/apps have worked fine with this > setup, just this one getting stuck. My cluster has 1 EC2 dedicated as spark > master and 1 Spot EC2 dedicated as spark worker. They can auto heal/spot > terminate at any time. From checking aws logs: the worker was terminated at > 01:53:38 > > I think you can replicate this by tearing down worker machine while app is > running. You might have to try several times. > > Similar to https://issues.apache.org/jira/browse/SPARK-24617 i raised before! > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org