Josh Rosen created SPARK-48547: ---------------------------------- Summary: Add opt-in flag to have SparkSubmit automatically call System.exit after user code main method exits Key: SPARK-48547 URL: https://issues.apache.org/jira/browse/SPARK-48547 Project: Spark Issue Type: Improvement Components: Deploy Affects Versions: 4.0.0 Reporter: Josh Rosen Assignee: Josh Rosen
This PR proposes to add a new flag, `spark.submit.callSystemExitOnMainExit` (default false), which when true will instruct SparkSubmit to call System.exit() in the JVM once the user code's main method has exited (for Java / Scala jobs) or once the user's Python or R script has exited. This is intended to address a longstanding issue where SparkSubmit invocations might hang after user code has completed: [According to Java’s java.lang.Runtime docs|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Runtime.html#shutdown]: {quote}The Java Virtual Machine initiates the _shutdown sequence_ in response to one of several events: # when the number of [live|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Thread.html#isAlive()] non-daemon threads drops to zero for the first time (see note below on the JNI Invocation API); # when the {{Runtime.exit}} or {{System.exit}} method is called for the first time; or # when some external event occurs, such as an interrupt or a signal is received from the operating system.{quote} For Python and R programs, SparkSubmit’s PythonRunner and RRunner will call {{System.exit()}} if the user program exits with a non-zero exit code (see [python|https://github.com/apache/spark/blob/d5c33c6bfb5757b243fc8e1734daeaa4fe3b9b32/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala#L101-L104] and [R|https://github.com/apache/spark/blob/d5c33c6bfb5757b243fc8e1734daeaa4fe3b9b32/core/src/main/scala/org/apache/spark/deploy/RRunner.scala#L109-L111] runner code). But for Java and Scala programs, plus any _successful_ R or Python programs, Spark will _not_ automatically call System.exit. In those situation, the JVM will only shutdown when, via event (1), all non-[daemon|https://stackoverflow.com/questions/2213340/what-is-a-daemon-thread-in-java] threads have exited (unless the job is cancelled and sent an external interrupt / kill signal, corresponding to event (3)). Thus, *non-daemon* threads might cause logically-completed spark-submit jobs to hang rather than completing. The non-daemon threads are not always under Spark's own control and may not necessarily be cleaned up by SparkContext.stop(). Thus, it is useful to have an opt-in functionality to have SparkSubmit automatically call `System.exit()` upon main method exit (which usually, but not always, corresponds to job completion): this option will allow users and platform operators to enforce System.exit() calls without having to modify individual jobs' code. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org