I am working on a problem which will eventually involve many millions of
function calls. A have a small sample with several thousand calls working
but when I try to scale up the amount of data things stall. I use 120
partitions and 116 finish in very little time. The remaining 4 seem to do
all the work and stall after a fixed number (about 1000) calls and even
after hours make no more progress.

This is my first large and complex job with spark and I would like any
insight on how to debug  the issue or even better why it might exist. The
cluster has 15 machines and I am setting executor memory at 16G.

Also what other questions are relevant to solving the issue

Reply via email to