Hi,

I am running a small 6 node spark cluster for testing purposes. Recently,
one of the node's physical memory was filled up by temporary files and there
was no space left on the disk. Due to this my Spark jobs started failing
even though on the Spark Web UI the was shown 'Alive'. Once I logged on to
the machine and cleaned up some trash, I was able to run the jobs again.

My question is, how reliable my Spark cluster can be if issues like these
can bring down my jobs? I would have expected Spark to not use this node or
at least distribute this work to other nodes. But as the node was still
alive, it tried to run tasks on it regardless.

Thanks,
Jatin



-----
Novice Big Data Programmer
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-cluster-stability-tp17929.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to