Hi, In case your process finishes after a lag, then please check whether you are writing by converting to Pandas or using coalesce (in which case entire traffic is being directed to a single node) or writing over S3, in which case there can be lags.
Regards, Gourav On Sun, Nov 6, 2016 at 1:28 PM, Michael Johnson < mjjohnson....@yahoo.com.invalid> wrote: > I'm doing some processing and then clustering of a small dataset (~150 > MB). Everything seems to work fine, until the end; the last few lines of my > program are log statements, but after printing those, nothing seems to > happen for a long time...many minutes; I'm not usually patient enough to > let it go, but I think one time when I did just wait, it took over an hour > (and did eventually exit on its own). Any ideas on what's happening, or how > to troubleshoot? > > (This happens both when running locally, using the localhost mode, as well > as on a small cluster with four 4-processor nodes each with 15GB of RAM; in > both cases the executors have 2GB+ of RAM, and none of the inputs/outputs > on any of the stages is more than 75 MB...) > > Thanks, > Michael >