Josh Rosen created SPARK-48547:
----------------------------------

             Summary: Add opt-in flag to have SparkSubmit automatically call 
System.exit after user code main method exits
                 Key: SPARK-48547
                 URL: https://issues.apache.org/jira/browse/SPARK-48547
             Project: Spark
          Issue Type: Improvement
          Components: Deploy
    Affects Versions: 4.0.0
            Reporter: Josh Rosen
            Assignee: Josh Rosen


This PR proposes to add a new flag, `spark.submit.callSystemExitOnMainExit` 
(default false), which when true will instruct SparkSubmit to call 
System.exit() in the JVM once the user code's main method has exited (for Java 
/ Scala jobs) or once the user's Python or R script has exited.

This is intended to address a longstanding issue where SparkSubmit invocations 
might hang after user code has completed:

[According to Java’s java.lang.Runtime 
docs|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Runtime.html#shutdown]:
{quote}The Java Virtual Machine initiates the _shutdown sequence_ in response 
to one of several events:
 # when the number of 
[live|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Thread.html#isAlive()]
 non-daemon threads drops to zero for the first time (see note below on the JNI 
Invocation API);

 # when the {{Runtime.exit}} or {{System.exit}} method is called for the first 
time; or

 # when some external event occurs, such as an interrupt or a signal is 
received from the operating system.{quote}
For Python and R programs, SparkSubmit’s PythonRunner and RRunner will call 
{{System.exit()}} if the user program exits with a non-zero exit code (see 
[python|https://github.com/apache/spark/blob/d5c33c6bfb5757b243fc8e1734daeaa4fe3b9b32/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala#L101-L104]
 and 
[R|https://github.com/apache/spark/blob/d5c33c6bfb5757b243fc8e1734daeaa4fe3b9b32/core/src/main/scala/org/apache/spark/deploy/RRunner.scala#L109-L111]
 runner code).

But for Java and Scala programs, plus any _successful_ R or Python programs, 
Spark will _not_ automatically call System.exit.

In those situation, the JVM will only shutdown when, via event (1), all 
non-[daemon|https://stackoverflow.com/questions/2213340/what-is-a-daemon-thread-in-java]
 threads have exited (unless the job is cancelled and sent an external 
interrupt / kill signal, corresponding to event (3)).

Thus, *non-daemon* threads might cause logically-completed spark-submit jobs to 
hang rather than completing.

The non-daemon threads are not always under Spark's own control and may not 
necessarily be cleaned up by SparkContext.stop().

Thus, it is useful to have an opt-in functionality to have SparkSubmit 
automatically call `System.exit()` upon main method exit (which usually, but 
not always, corresponds to job completion): this option will allow users and 
platform operators to enforce System.exit() calls without having to modify 
individual jobs' code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to