Re: Spark cluster stability

2014-11-03 Thread jatinpreet
Great! Thanks for the information. I will try it out.



-
Novice Big Data Programmer
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-cluster-stability-tp17929p17956.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark cluster stability

2014-11-02 Thread jatinpreet
Hi,

I am running a small 6 node spark cluster for testing purposes. Recently,
one of the node's physical memory was filled up by temporary files and there
was no space left on the disk. Due to this my Spark jobs started failing
even though on the Spark Web UI the was shown 'Alive'. Once I logged on to
the machine and cleaned up some trash, I was able to run the jobs again.

My question is, how reliable my Spark cluster can be if issues like these
can bring down my jobs? I would have expected Spark to not use this node or
at least distribute this work to other nodes. But as the node was still
alive, it tried to run tasks on it regardless.

Thanks,
Jatin



-
Novice Big Data Programmer
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-cluster-stability-tp17929.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark cluster stability

2014-11-02 Thread Akhil Das
You can enable monitoring (nagios) with alerts to tackle these kind of
issues.

Thanks
Best Regards

On Mon, Nov 3, 2014 at 10:55 AM, jatinpreet jatinpr...@gmail.com wrote:

 Hi,

 I am running a small 6 node spark cluster for testing purposes. Recently,
 one of the node's physical memory was filled up by temporary files and
 there
 was no space left on the disk. Due to this my Spark jobs started failing
 even though on the Spark Web UI the was shown 'Alive'. Once I logged on to
 the machine and cleaned up some trash, I was able to run the jobs again.

 My question is, how reliable my Spark cluster can be if issues like these
 can bring down my jobs? I would have expected Spark to not use this node or
 at least distribute this work to other nodes. But as the node was still
 alive, it tried to run tasks on it regardless.

 Thanks,
 Jatin



 -
 Novice Big Data Programmer
 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-cluster-stability-tp17929.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org