Those are daemon threads and not the cause of the problem. The main thread is waiting for the "org.apache.hadoop.util.ShutdownHookManager" thread, but I don't see that one in your list.
On Wed, Jan 16, 2019 at 12:08 PM Pola Yao <pola....@gmail.com> wrote: > > Hi Marcelo, > > Thanks for your response. > > I have dumped the threads on the server where I submitted the spark > application: > > ''' > ... > "dispatcher-event-loop-2" #28 daemon prio=5 os_prio=0 tid=0x00007f56cee0e000 > nid=0x1cb6 waiting on condition [0x00007f5699811000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000006400161b8> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > "dispatcher-event-loop-1" #27 daemon prio=5 os_prio=0 tid=0x00007f56cee0c800 > nid=0x1cb5 waiting on condition [0x00007f5699912000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000006400161b8> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > "dispatcher-event-loop-0" #26 daemon prio=5 os_prio=0 tid=0x00007f56cee0c000 > nid=0x1cb4 waiting on condition [0x00007f569a120000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000006400161b8> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > "Service Thread" #20 daemon prio=9 os_prio=0 tid=0x00007f56cc12d800 > nid=0x1ca5 runnable [0x0000000000000000] > java.lang.Thread.State: RUNNABLE > > "C1 CompilerThread14" #19 daemon prio=9 os_prio=0 tid=0x00007f56cc12a000 > nid=0x1ca4 waiting on condition [0x0000000000000000] > java.lang.Thread.State: RUNNABLE > ... > "Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007f56cc0ce000 nid=0x1c93 in > Object.wait() [0x00007f56ab3f2000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143) > - locked <0x00000006400cd498> (a java.lang.ref.ReferenceQueue$Lock) > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164) > at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209) > > "Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00007f56cc0c9800 > nid=0x1c92 in Object.wait() [0x00007f55cfffe000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at java.lang.ref.Reference.tryHandlePending(Reference.java:191) > - locked <0x00000006400a2660> (a java.lang.ref.Reference$Lock) > at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153) > > "main" #1 prio=5 os_prio=0 tid=0x00007f56cc021000 nid=0x1c74 in Object.wait() > [0x00007f56d344c000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Thread.join(Thread.java:1249) > - locked <0x000000064056f6a0> (a org.apache.hadoop.util.ShutdownHookManager$1) > at java.lang.Thread.join(Thread.java:1323) > at > java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:106) > at java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46) > at java.lang.Shutdown.runHooks(Shutdown.java:123) > at java.lang.Shutdown.sequence(Shutdown.java:167) > at java.lang.Shutdown.exit(Shutdown.java:212) > - locked <0x00000006404e65b8> (a java.lang.Class for java.lang.Shutdown) > at java.lang.Runtime.exit(Runtime.java:109) > at java.lang.System.exit(System.java:971) > at scala.sys.package$.exit(package.scala:40) > at scala.sys.package$.exit(package.scala:33) > at > actionmodel.ParallelAdvertiserBeaconModel$.main(ParallelAdvertiserBeaconModel.scala:252) > at > actionmodel.ParallelAdvertiserBeaconModel.main(ParallelAdvertiserBeaconModel.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879) > at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > "VM Thread" os_prio=0 tid=0x00007f56cc0c1800 nid=0x1c91 runnable > ... > ''' > > I have no clear idea what went wrong. I did call awaitTermination to > terminate the thread pool. Or is there any way to force close all those > 'WAITING' threads associated with my spark application? > > On Wed, Jan 16, 2019 at 8:31 AM Marcelo Vanzin <van...@cloudera.com> wrote: >> >> If System.exit() doesn't work, you may have a bigger problem >> somewhere. Check your threads (using e.g. jstack) to see what's going >> on. >> >> On Wed, Jan 16, 2019 at 8:09 AM Pola Yao <pola....@gmail.com> wrote: >> > >> > Hi Marcelo, >> > >> > Thanks for your reply! It made sense to me. However, I've tried many ways >> > to exit the spark (e.g., System.exit()), but failed. Is there an explicit >> > way to shutdown all the alive threads in the spark application and then >> > quit afterwards? >> > >> > >> > On Tue, Jan 15, 2019 at 2:38 PM Marcelo Vanzin <van...@cloudera.com> wrote: >> >> >> >> You should check the active threads in your app. Since your pool uses >> >> non-daemon threads, that will prevent the app from exiting. >> >> >> >> spark.stop() should have stopped the Spark jobs in other threads, at >> >> least. But if something is blocking one of those threads, or if >> >> something is creating a non-daemon thread that stays alive somewhere, >> >> you'll see that. >> >> >> >> Or you can force quit with sys.exit. >> >> >> >> On Tue, Jan 15, 2019 at 1:30 PM Pola Yao <pola....@gmail.com> wrote: >> >> > >> >> > I submitted a Spark job through ./spark-submit command, the code was >> >> > executed successfully, however, the application got stuck when trying >> >> > to quit spark. >> >> > >> >> > My code snippet: >> >> > ''' >> >> > { >> >> > >> >> > val spark = SparkSession.builder.master(...).getOrCreate >> >> > >> >> > val pool = Executors.newFixedThreadPool(3) >> >> > implicit val xc = ExecutionContext.fromExecutorService(pool) >> >> > val taskList = List(train1, train2, train3) // where train* is a >> >> > Future function which wrapped up some data reading and feature >> >> > engineering and machine learning steps >> >> > val results = Await.result(Future.sequence(taskList), 20 minutes) >> >> > >> >> > println("Shutting down pool and executor service") >> >> > pool.shutdown() >> >> > xc.shutdown() >> >> > >> >> > println("Exiting spark") >> >> > spark.stop() >> >> > >> >> > } >> >> > ''' >> >> > >> >> > After I submitted the job, from terminal, I could see the code was >> >> > executed and printing "Exiting spark", however, after printing that >> >> > line, it never existed spark, just got stuck. >> >> > >> >> > Does any body know what the reason is? Or how to force quitting? >> >> > >> >> > Thanks! >> >> > >> >> > >> >> >> >> >> >> -- >> >> Marcelo >> >> >> >> -- >> Marcelo -- Marcelo --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org