Re: Shark resilience to unusable slaves

2014-05-23 Thread Praveen R
You might use bin/shark-withdebug to find the exact issue for the failure.

That said, easiest way to get the cluster running, is to get rid of
dis-functional machine from spark cluster (remove it from slaves file).
Hope that helps.


On Thu, May 22, 2014 at 9:04 PM, Yana Kadiyska yana.kadiy...@gmail.comwrote:

 Hi, I am running into a pretty concerning issue with Shark (granted I'm
 running v. 0.8.1).

 I have a Spark slave node that has run out of disk space. When I try to
 start Shark it attempts to deploy the application to a directory on that
 node, fails and eventually gives up  (I see a Master Removed our
 application message in the shark server log).

 Is Spark supposed to be able to ignore a slave if something goes wrong for
 it (I realize that the slave probably appears alive enough)? I restarted
 the Spark master in hopes that it would detect that the slave is suffering
 but it doesn't seem to be the case.

 Any thoughts appreciated -- we'll monitor disk space but I'm a little
 worried that the cluster is not functional on account of a single slave.



Shark resilience to unusable slaves

2014-05-22 Thread Yana Kadiyska
Hi, I am running into a pretty concerning issue with Shark (granted I'm
running v. 0.8.1).

I have a Spark slave node that has run out of disk space. When I try to
start Shark it attempts to deploy the application to a directory on that
node, fails and eventually gives up  (I see a Master Removed our
application message in the shark server log).

Is Spark supposed to be able to ignore a slave if something goes wrong for
it (I realize that the slave probably appears alive enough)? I restarted
the Spark master in hopes that it would detect that the slave is suffering
but it doesn't seem to be the case.

Any thoughts appreciated -- we'll monitor disk space but I'm a little
worried that the cluster is not functional on account of a single slave.