skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-499837796 @zsxwing @srowen one solution that works is running the logic of stop in another thread: ``` def stop() { @volatile var onStopCalled = false try { new Thread() { new Runnable { setDaemon(true) override def run(): Unit = { try { eventThread.join() // Call onStop after the event thread exits to make sure onReceive happens before onStop onStopCalled = true onStop() } catch { case ie: InterruptedException => Thread.currentThread().interrupt() if (!onStopCalled) { // ie is thrown from `eventThread.join()`. Otherwise, we should not call `onStop` since // it's already called. onStop() } } } } }.start() } ``` Assuming that is safe we let the shutdownhook proceed... any issues with that if we let that run in the background? ``` 19/06/07 10:31:21 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[dag-scheduler-event-loop,5,main] java.lang.OutOfMemoryError: Java heap space at scala.collection.mutable.ResizableArray.ensureSize(ResizableArray.scala:106) at scala.collection.mutable.ResizableArray.ensureSize$(ResizableArray.scala:96) at scala.collection.mutable.ArrayBuffer.ensureSize(ArrayBuffer.scala:49) at scala.collection.mutable.ArrayBuffer.$plus$eq(ArrayBuffer.scala:85) at org.apache.spark.scheduler.TaskSetManager.addPendingTask(TaskSetManager.scala:264) at org.apache.spark.scheduler.TaskSetManager.$anonfun$addPendingTasks$2(TaskSetManager.scala:194) at org.apache.spark.scheduler.TaskSetManager$$Lambda$1106/850290016.apply$mcVI$sp(Unknown Source) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) at org.apache.spark.scheduler.TaskSetManager.$anonfun$addPendingTasks$1(TaskSetManager.scala:193) at org.apache.spark.scheduler.TaskSetManager$$Lambda$1105/1310826901.apply$mcV$sp(Unknown Source) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.timeTakenMs(Utils.scala:534) at org.apache.spark.scheduler.TaskSetManager.addPendingTasks(TaskSetManager.scala:192) at org.apache.spark.scheduler.TaskSetManager.<init>(TaskSetManager.scala:189) at org.apache.spark.scheduler.TaskSchedulerImpl.createTaskSetManager(TaskSchedulerImpl.scala:252) at org.apache.spark.scheduler.TaskSchedulerImpl.submitTasks(TaskSchedulerImpl.scala:210) at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1233) at org.apache.spark.scheduler.DAGScheduler.submitStage(DAGScheduler.scala:1084) at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:1028) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2126) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2118) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2107) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) 19/06/07 10:31:21 INFO SparkContext: Invoking stop() from shutdown hook 19/06/07 10:31:21 INFO SparkUI: Stopped Spark web UI at http://spark-pi2-1559903354898-driver-svc.spark.svc:4040 19/06/07 10:31:21 INFO BlockManagerInfo: Removed broadcast_0_piece0 on spark-pi2-1559903354898-driver-svc.spark.svc:7079 in memory (size: 1765.0 B, free: 110.0 MiB) 19/06/07 10:31:21 INFO KubernetesClusterSchedulerBackend: Shutting down all executors 19/06/07 10:31:21 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each executor to shut down 19/06/07 10:31:21 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.) 19/06/07 10:31:22 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 19/06/07 10:31:22 INFO MemoryStore: MemoryStore cleared 19/06/07 10:31:22 INFO BlockManager: BlockManager stopped 19/06/07 10:31:22 INFO BlockManagerMaster: BlockManagerMaster stopped 19/06/07 10:31:22 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 19/06/07 10:31:22 INFO SparkContext: Successfully stopped SparkContext 19/06/07 10:31:22 INFO ShutdownHookManager: Shutdown hook called 19/06/07 10:31:22 INFO ShutdownHookManager: Deleting directory /tmp/spark-fef9ec63-c71e-4859-9910-12c51a336d75 19/06/07 10:31:22 INFO ShutdownHookManager: Deleting directory /var/data/spark-6a47a0e7-9676-4cf4-96e1-07d710838b8e/spark-657ed153-d11d-478d-94a8-955a80296405 ``` ``` State: Terminated Reason: Error Exit Code: 52 Started: Fri, 07 Jun 2019 13:29:35 +0300 Finished: Fri, 07 Jun 2019 13:31:22 +0300 ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org