Hi,

In case your process finishes after a lag, then please check whether you
are writing by converting to Pandas or using coalesce (in which case entire
traffic is being directed to a single node) or writing over S3, in which
case there can be lags.

Regards,
Gourav

On Sun, Nov 6, 2016 at 1:28 PM, Michael Johnson <
mjjohnson....@yahoo.com.invalid> wrote:

> I'm doing some processing and then clustering of a small dataset (~150
> MB). Everything seems to work fine, until the end; the last few lines of my
> program are log statements, but after printing those, nothing seems to
> happen for a long time...many minutes; I'm not usually patient enough to
> let it go, but I think one time when I did just wait, it took over an hour
> (and did eventually exit on its own). Any ideas on what's happening, or how
> to troubleshoot?
>
> (This happens both when running locally, using the localhost mode, as well
> as on a small cluster with four 4-processor nodes each with 15GB of RAM; in
> both cases the executors have 2GB+ of RAM, and none of the inputs/outputs
> on any of the stages is more than 75 MB...)
>
> Thanks,
> Michael
>

Reply via email to