Also, how are you launching the application? Through spark submit or creating spark content in your app?
Thanks, Aniket On Wed, Nov 16, 2016 at 10:44 AM Aniket Bhatnagar < aniket.bhatna...@gmail.com> wrote: > Thanks for sharing the thread dump. I had a look at them and couldn't find > anything unusual. Is there anything in the logs (driver + executor) that > suggests what's going on? Also, what does the spark job do and what is the > version of spark and hadoop you are using? > > Thanks, > Aniket > > > On Wed, Nov 16, 2016 at 2:07 AM Michael Johnson <mjjohnson....@yahoo.com> > wrote: > > The extremely long hand/pause has started happening again. I've been > running on a small remote cluster, so I used the UI to grab thread dumps > rather than doing it from the command line. There seems to be one executor > still alive, along with the driver; I grabbed 4 thread dumps from each, a > couple of seconds apart. I'd greatly appreciate any help tracking down > what's going on! (I've attached them, but I can paste them somewhere if > that's more convenient.) > > Thanks, > Michael > > > > > On Sunday, November 6, 2016 10:49 PM, Michael Johnson > <mjjohnson....@yahoo.com.INVALID> wrote: > > > Hm. Something must have changed, as it was happening quite consistently > and now I can't get it to reproduce. Thank you for the offer, and if it > happens again I will try grabbing thread dumps and I will see if I can > figure out what is going on. > > > On Sunday, November 6, 2016 10:02 AM, Aniket Bhatnagar < > aniket.bhatna...@gmail.com> wrote: > > > I doubt it's GC as you mentioned that the pause is several minutes. Since > it's reproducible in local mode, can you run the spark application locally > and once your job is complete (and application appears paused), can you > take 5 thread dumps (using jstack or jcmd on the local spark JVM process) > with 1 second delay between each dump and attach them? I can take a look. > > Thanks, > Aniket > > On Sun, Nov 6, 2016 at 2:21 PM Michael Johnson <mjjohnson....@yahoo.com> > wrote: > > Thanks; I tried looking at the thread dumps for the driver and the one > executor that had that option in the UI, but I'm afraid I don't know how to > interpret what I saw... I don't think it could be my code directly, since > at this point my code has all completed? Could GC be taking that long? > > (I could also try grabbing the thread dumps and pasting them here, if that > would help?) > > On Sunday, November 6, 2016 8:36 AM, Aniket Bhatnagar < > aniket.bhatna...@gmail.com> wrote: > > > In order to know what's going on, you can study the thread dumps either > from spark UI or from any other thread dump analysis tool. > > Thanks, > Aniket > > On Sun, Nov 6, 2016 at 1:31 PM Michael Johnson > <mjjohnson....@yahoo.com.invalid> wrote: > > I'm doing some processing and then clustering of a small dataset (~150 > MB). Everything seems to work fine, until the end; the last few lines of my > program are log statements, but after printing those, nothing seems to > happen for a long time...many minutes; I'm not usually patient enough to > let it go, but I think one time when I did just wait, it took over an hour > (and did eventually exit on its own). Any ideas on what's happening, or how > to troubleshoot? > > (This happens both when running locally, using the localhost mode, as well > as on a small cluster with four 4-processor nodes each with 15GB of RAM; in > both cases the executors have 2GB+ of RAM, and none of the inputs/outputs > on any of the stages is more than 75 MB...) > > Thanks, > Michael > > > > > > > >