Hi all: thanks for your reply. the job is hang as 20+ hours, The history server has deleted the log. I will monitor and l try to use thread dump to try to find something. Best Regards Kelly Zhang At 2020-05-11 15:41:29, "ZHANG Wei" <wezh...@outlook.com> wrote: >Sometimes, the Thread dump result table of Spark UI can provide some clues to >find out thread locks issue, such as: > > Thread ID | Thread Name | Thread State | Thread Locks > 13 | NonBlockingInputStreamThread | WAITING | Blocked by Thread > Some(48) Lock(jline.internal.NonBlockingInputStream@103008951}) > 48 | Thread-16 | RUNNABLE | > Monitor(jline.internal.NonBlockingInputStream@103008951}) > >And echo thread row can show the call stacks after being clicked, such as this >case, for thread 48, there are the function which is holding the lock: > > org.fusesource.jansi.internal.Kernel32.ReadConsoleInputW(Native Method) > > org.fusesource.jansi.internal.Kernel32.readConsoleInputHelper(Kernel32.java:811) > org.fusesource.jansi.internal.Kernel32.readConsoleKeyInput(Kernel32.java:842) > > org.fusesource.jansi.internal.WindowsSupport.readConsoleInput(WindowsSupport.java:97) > jline.WindowsTerminal.readConsoleInput(WindowsTerminal.java:222) > <snip...> > >Cheers, >-z > >________________________________________ >From: zhangliyun <kelly...@126.com> >Sent: Monday, May 11, 2020 9:44 >To: Russell Spitzer; Spark Dev List >Subject: Re:Re: Screen Shot 2020-05-11 at 5.28.03 AM > > >Hi > > appreciate your reply > i guess you want me to see the executor page, i go to the page, if the > deadlock, will the thread_state states "Dead Lock" ? which clue i should use > to find the >reason why there are running tasks but actually not have. >[cid:3f7815ce$1$17201673b40$Coremail$kellyzly$126.com] > > > > >At 2020-05-11 08:55:25, "Russell Spitzer" <russell.spit...@gmail.com> wrote: > >Have you checked the executor thread dumps? It may give you some insight if >there is a deadlock or something else. > >They should be available under the executor tab on the ui > >On Sun, May 10, 2020, 4:43 PM zhangliyun ><kelly...@126.com<mailto:kelly...@126.com>> wrote: >Hi all: > i have a spark 2.3.1 job stuck for 23 hours , when i go to spark history > server. it shows that 5039 tasks in totally 5043 tasks have been finished. so > it means there are 4 still running. but when i go to tasks page, there is no > running tasks. I have downloaded logs, wants to grep "Dropping event from > queue" stdout , there is no result for that. seems this stuck is not caused > by " spark.scheduler.listenerbus.eventqueue.capacity " is not big. > Appreciate if you can give me some suggestion to find the reason why the job > is stuck? >[cid:54633c54$1$1720089560a$Coremail$kellyzly$126.com] > there is no running tasks in the running stage >[cid:789f4af8$2$1720089560a$Coremail$kellyzly$126.com][cid:54633c54$1$1720089560a$Coremail$kellyzly$126.com] > >--------------------------------------------------------------------- >To unsubscribe e-mail: >dev-unsubscr...@spark.apache.org<mailto:dev-unsubscr...@spark.apache.org> > > > >