[ https://issues.apache.org/jira/browse/SPARK-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15310204#comment-15310204 ]
Sean Owen commented on SPARK-15685: ----------------------------------- I don't think we can install a SecurityManager, since it will kill performance. If you can find a clean way to handle these, that could be reasonable, but these are Errors, and generally by definition not something even a framework should handle. If the use case is a long-running app executing third-party code, I'd say that's not what Spark was designed for, which is short-lived execution of fixed code. > StackOverflowError (VirtualMachineError) or NoClassDefFoundError > (LinkageError) should not System.exit() in local mode > ---------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-15685 > URL: https://issues.apache.org/jira/browse/SPARK-15685 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.6.1 > Reporter: Brett Randall > Priority: Critical > > Spark, when running in local mode, can encounter certain types of {{Error}} > exceptions in developer-code or third-party libraries and call > {{System.exit()}}, potentially killing a long-running JVM/service. The > caller should decide on the exception-handling and whether the error should > be deemed fatal. > *Consider this scenario:* > * Spark is being used in local master mode within a long-running JVM > microservice, e.g. a Jetty instance. > * A task is run. The task errors with particular types of unchecked > throwables: > ** a) there some bad code and/or bad data that exposes a bug where there's > unterminated recursion, leading to a {{StackOverflowError}}, or > ** b) a particular not-often used function is called - there's a packaging > error with the service, a third-party library is missing some dependencies, a > {{NoClassDefFoundError}} is found. > *Expected behaviour:* Since we are running in local mode, we might expect > some unchecked exception to be thrown, to be optionally-handled by the Spark > caller. In the case of Jetty, a request thread or some other background > worker thread might handle the exception or not, the thread might exit or > note an error. The caller should decide how the error is handled. > *Actual behaviour:* {{System.exit()}} is called, the JVM exits and the Jetty > microservice is down and must be restarted. > *Consequence:* Any local code or third-party library might cause Spark to > exit the long-running JVM/microservice, so Spark can be a problem in this > architecture. I have seen this now on three separate occasions, leading to > service-down bug reports. > *Analysis:* > The line of code that seems to be the problem is: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L405 > {code} > // Don't forcibly exit unless the exception was inherently fatal, to avoid > // stopping other tasks unnecessarily. > if (Utils.isFatalError(t)) { > SparkUncaughtExceptionHandler.uncaughtException(t) > } > {code} > [Utils.isFatalError()|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1818] > first excludes Scala > [NonFatal|https://github.com/scala/scala/blob/2.12.x/src/library/scala/util/control/NonFatal.scala#L31], > which excludes everything except {{VirtualMachineError}}, {{ThreadDeath}}, > {{InterruptedException}}, {{LinkageError}} and {{ControlThrowable}}. > {{Utils.isFatalError()}} further excludes {{InterruptedException}}, > {{NotImplementedError}} and {{ControlThrowable}}. > Remaining are {{Error}} s such as {{StackOverflowError extends > VirtualMachineError}} or {{NoClassDefFoundError extends LinkageError}}, which > occur in the aforementioned scenarios. > {{SparkUncaughtExceptionHandler.uncaughtException()}} proceeds to call > {{System.exit()}}. > [Further > up|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L77] > in in {{Executor}} we see exclusions for registering > {{SparkUncaughtExceptionHandler}} if in local mode: > {code} > if (!isLocal) { > // Setup an uncaught exception handler for non-local mode. > // Make any thread terminations due to uncaught exceptions kill the entire > // executor process to avoid surprising stalls. > Thread.setDefaultUncaughtExceptionHandler(SparkUncaughtExceptionHandler) > } > {code} > This same exclusion must be applied for local mode for "fatal" errors - > cannot afford to shutdown the enclosing JVM (e.g. Jetty), the caller should > decide. > A minimal test-case is supplied. It installs a logging {{SecurityManager}} > to confirm that {{System.exit()}} was called from > {{SparkUncaughtExceptionHandler.uncaughtException}} via {{Executor}}. It > also hints at the workaround - install your own {{SecurityManager}} and > inspect the current stack in {{checkExit()}} to prevent Spark from exiting > the JVM. > Test-case: https://github.com/javabrett/SPARK-15685 . -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org