[ https://issues.apache.org/jira/browse/SPARK-24981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hieu Tri Huynh updated SPARK-24981: ----------------------------------- Priority: Minor (was: Major) > ShutdownHook timeout causes job to fail when succeeded when SparkContext > stop() not called by user program > ---------------------------------------------------------------------------------------------------------- > > Key: SPARK-24981 > URL: https://issues.apache.org/jira/browse/SPARK-24981 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.3.1 > Reporter: Hieu Tri Huynh > Priority: Minor > > When user does not stop the SparkContext at the end of their program, > ShutdownHookManger will stop the SparkContext. However, each shutdown hook is > only given 10s to run, it will be interrupted and cancelled after that given > time. In case stopping spark context takes longer than 10s, > InterruptedException will be thrown, and the job will fail even though it > succeeded before. An example of this is shown below. > I think there are a few ways to fix this, below are the 2 ways that I have > now: > 1. After user program finished, we can check if user program stoped > SparkContext or not. If user didn't stop the SparkContext, we can stop it > before finishing the userThread. By doing so, SparkContext.stop() can take as > much time as it needed. > 2. We can just catch the InterruptedException thrown by ShutdownHookManger > while we are stopping the SparkContext, and ignoring all the things that we > haven't stopped inside the SparkContext. Since we are shutting down, I think > it will be okay to ignore those things. > > {code:java} > 18/07/31 17:11:49 ERROR Utils: Uncaught exception in thread pool-4-thread-1 > java.lang.InterruptedException > at java.lang.Object.wait(Native Method) > at java.lang.Thread.join(Thread.java:1249) > at java.lang.Thread.join(Thread.java:1323) > at > org.apache.spark.scheduler.AsyncEventQueue.stop(AsyncEventQueue.scala:136) > at > org.apache.spark.scheduler.LiveListenerBus$$anonfun$stop$1.apply(LiveListenerBus.scala:219) > at > org.apache.spark.scheduler.LiveListenerBus$$anonfun$stop$1.apply(LiveListenerBus.scala:219) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at > org.apache.spark.scheduler.LiveListenerBus.stop(LiveListenerBus.scala:219) > at > org.apache.spark.SparkContext$$anonfun$stop$6.apply$mcV$sp(SparkContext.scala:1922) > at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1360) > at org.apache.spark.SparkContext.stop(SparkContext.scala:1921) > at > org.apache.spark.SparkContext$$anonfun$2.apply$mcV$sp(SparkContext.scala:573) > at > org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1991) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) > at scala.util.Try$.apply(Try.scala:192) > at > org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 18/07/31 17:11:49 WARN ShutdownHookManager: ShutdownHook '$anon$2' timeout, > java.util.concurrent.TimeoutException > java.util.concurrent.TimeoutException > at java.util.concurrent.FutureTask.get(FutureTask.java:205) > at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:67) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org