This question is regarding
https://issues.apache.org/jira/browse/SPARK-15685 (StackOverflowError
(VirtualMachineError) or NoClassDefFoundError (LinkageError) should not
System.exit() in local mode) and hopes to draw attention-to and
discussion-on that issue.

I have a product that is hosted as a microservice, running in a
web-container e.g. Jetty, as a long-running service, publishing a REST
API.  For small-computations, to reduce latency, I wish to run Spark in
local mode.  For larger jobs the service might launch a remote job on a
cluster e.g. Spark-on-YARN.  Either way, there may be custom modules
deployed to the service from time-to-time, involving third-part libraries
etc.

My concern is as outlined in SPARK-15685.  If I have a third-party library,
and either direct or transient dependencies are not satisfied, when the
code is deployed and run I might suffer a NoClassDefFoundError.  Or there
may be some broken logic leading to a StackOverflowError
(VirtualMachineError).  Normally if this occurred in a plan
microservice/web-application, the thread handling the request would see the
unchecked Throwable/Error and fail, but otherwise the service continues.

With Spark in local mode, due to the quite-specific categorization and
handling of the aforementioned specific Throwable/Error types (ref
Utils.isFatalError
and other Scala definitions), the result when they are thrown is that Spark
deems that the JVM should be forcibly shutdown via System.exit(), thereby
killing the microservice.

Is it reasonable that in the face of the above Errors occuring, we should
ask that Spark does not exit the JVM, instead allowing some exception or
error to be thrown? The System.exit() approach seems aligned with the idea
of a command-line job batch and a quick-exit of the entire JVM and any
running threads, but it is poorly suited to running in local mode in a
microservice.

Thoughts?

Thanks,
Brett

Reply via email to