Re: How to find cause(waiting threads etc) of hanging job for 7 hours?

2016-01-12 Thread Umesh Kacha
Hi Prabhu thanks for the response. I did the same the problem is when I get process id using jps or ps - ef I don't get user in the very first column I see number in place of user name so can't run jstack on it because of permission issue it gives something like following 728852 3553 9833

Re: How to find cause(waiting threads etc) of hanging job for 7 hours?

2016-01-11 Thread Prabhu Joseph
Umesh, Running task is a thread within the executor process. We need to take stack trace for the executor process. The executor will be running in any NodeManager machine as a container. YARN RM UI running jobs will have the host details where executor is running. Login to that NodeManager

Re: How to find cause(waiting threads etc) of hanging job for 7 hours?

2016-01-11 Thread Umesh Kacha
Hi Prabhu thanks for the response. How do I find pid of a slow running task. Task is running in yarn cluster node. When I try to see pid of a running task using my user I see some 7-8 digit number instead of user running process any idea why spark creates this number instead of displaying user On

Re: How to find cause(waiting threads etc) of hanging job for 7 hours?

2016-01-02 Thread Prabhu Joseph
The attached image just has thread states, and WAITING threads need not be the issue. We need to take thread stack traces and identify at which area of code, threads are spending lot of time. Use jstack -l or kill -3 , where pid is the process id of the executor process. Take jstack stack trace

Re: How to find cause(waiting threads etc) of hanging job for 7 hours?

2016-01-01 Thread Umesh Kacha
Hi thanks I did that and I have attached thread dump images. That was the intention of my question asking for help to identify which waiting thread is culprit. Regards, Umesh On Sat, Jan 2, 2016 at 8:38 AM, Prabhu Joseph wrote: > Take thread dump of Executor process

How to find cause(waiting threads etc) of hanging job for 7 hours?

2016-01-01 Thread unk1102
Hi I have a Spark job which hangs for around 7 hours or more than that until jobs killed out by Autosys because of time out. Data is not huge I am sure it stucks because of GC but I cant find source code which causes GC I am reusing almost all variable trying to minimize creating local objects

Re: How to find cause(waiting threads etc) of hanging job for 7 hours?

2016-01-01 Thread Prabhu Joseph
Take thread dump of Executor process several times in a short time period and check what each threads are doing at different times which will help to identify the expensive sections in user code. Thanks, Prabhu Joseph On Sat, Jan 2, 2016 at 3:28 AM, unk1102 wrote: >

Re: How to find cause(waiting threads etc) of hanging job for 7 hours?

2016-01-01 Thread unk1102
Sorry please see attached waiting thread log -- View this message in context: