Hi,

I'm trying to run spark applications on a standalone cluster, running on
top of AWS. Since my slaves are spot instances, in some cases they are
being killed and lost due to bid prices. When apps are running during this
event, sometimes the spark application dies - and the driver process just
hangs, and stays up forever (zombie process), capturing memory / cpu
resources on the master machine. Then we have to manually kill -9 to free
these resources.

Has anyone seen this kind of problem before? Any suggested solution to work
around this problem?

Thanks,
Tomer

Reply via email to