Hi, I am running a small 6 node spark cluster for testing purposes. Recently, one of the node's physical memory was filled up by temporary files and there was no space left on the disk. Due to this my Spark jobs started failing even though on the Spark Web UI the was shown 'Alive'. Once I logged on to the machine and cleaned up some trash, I was able to run the jobs again.
My question is, how reliable my Spark cluster can be if issues like these can bring down my jobs? I would have expected Spark to not use this node or at least distribute this work to other nodes. But as the node was still alive, it tried to run tasks on it regardless. Thanks, Jatin ----- Novice Big Data Programmer -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-cluster-stability-tp17929.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org