Re: Spark cluster stability
Great! Thanks for the information. I will try it out. - Novice Big Data Programmer -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-cluster-stability-tp17929p17956.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark cluster stability
Hi, I am running a small 6 node spark cluster for testing purposes. Recently, one of the node's physical memory was filled up by temporary files and there was no space left on the disk. Due to this my Spark jobs started failing even though on the Spark Web UI the was shown 'Alive'. Once I logged on to the machine and cleaned up some trash, I was able to run the jobs again. My question is, how reliable my Spark cluster can be if issues like these can bring down my jobs? I would have expected Spark to not use this node or at least distribute this work to other nodes. But as the node was still alive, it tried to run tasks on it regardless. Thanks, Jatin - Novice Big Data Programmer -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-cluster-stability-tp17929.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark cluster stability
You can enable monitoring (nagios) with alerts to tackle these kind of issues. Thanks Best Regards On Mon, Nov 3, 2014 at 10:55 AM, jatinpreet jatinpr...@gmail.com wrote: Hi, I am running a small 6 node spark cluster for testing purposes. Recently, one of the node's physical memory was filled up by temporary files and there was no space left on the disk. Due to this my Spark jobs started failing even though on the Spark Web UI the was shown 'Alive'. Once I logged on to the machine and cleaned up some trash, I was able to run the jobs again. My question is, how reliable my Spark cluster can be if issues like these can bring down my jobs? I would have expected Spark to not use this node or at least distribute this work to other nodes. But as the node was still alive, it tried to run tasks on it regardless. Thanks, Jatin - Novice Big Data Programmer -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-cluster-stability-tp17929.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org