Re: Worker re-spawn and dynamic node joining

2014-05-20 Thread Han JU
Thank you guys for the detailed answer.
Akhil, yes I would like to have a try of your tool. Is it open-sourced?


2014-05-17 17:55 GMT+02:00 Mayur Rustagi :

> A better way would be use Mesos (and quite possibly Yarn in 1.0.0).
> That will allow you to add nodes on the fly & leverage it for Spark.
> Frankly Standalone mode is not meant to handle those issues. That said we
> use our deployment tool as stopping the cluster for adding nodes is not
> really an issue at the moment.
>
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi 
>
>
>
> On Sat, May 17, 2014 at 9:05 PM, Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> Thanks for the info about adding/removing nodes dynamically. That's
>> valuable.
>>
>> 2014년 5월 16일 금요일, Akhil Das님이 작성한 메시지:
>>
>>  Hi Han :)
>>>
>>> 1. Is there a way to automatically re-spawn spark workers? We've
>>> situations where executor OOM causes worker process to be DEAD and it does
>>> not came back automatically.
>>>
>>> => Yes. You can either add OOM killer 
>>> exception on
>>> all of your Spark processes. Or you can have a cronjob which will keep
>>> monitoring your worker processes and if they goes down the cronjob will
>>> bring it back.
>>>
>>>   2. How to dynamically add (or remove) some worker machines to (from)
>>> the cluster? We'd like to leverage the auto-scaling group in EC2 for
>>> example.
>>>
>>> => You can add/remove worker nodes on the fly by spawning a new machine
>>> and then adding that machine's ip address in the master node then rsyncing
>>> the spark directory with all worker machines including the one you added.
>>> Then simply you can use the *start-all.sh* script inside the master
>>> node to bring up the new worker in action. For removing a worker machine
>>> from master can be done in the same way, you have to remove the workers IP
>>> address from the masters *slaves *file and then you can restart your
>>> slaves and that will get your worker removed.
>>>
>>>
>>> FYI, we have a deployment tool (a web-based UI) that we use for internal
>>> purposes, it is build on top of the spark-ec2 script (with some changes)
>>> and it has a module for adding/removing worker nodes on the fly. It looks
>>> like the attached screenshot. If you want i can give you some access.
>>>
>>> Thanks
>>> Best Regards
>>>
>>>
>>> On Wed, May 14, 2014 at 9:52 PM, Han JU  wrote:
>>>
 Hi all,

 Just 2 questions:

   1. Is there a way to automatically re-spawn spark workers? We've
 situations where executor OOM causes worker process to be DEAD and it does
 not came back automatically.

   2. How to dynamically add (or remove) some worker machines to (from)
 the cluster? We'd like to leverage the auto-scaling group in EC2 for
 example.

 We're using spark-standalone.

 Thanks a lot.

 --
 *JU Han*

 Data Engineer @ Botify.com

 +33 061960

>>>
>>>
>


-- 
*JU Han*

Data Engineer @ Botify.com

+33 061960


Re: Worker re-spawn and dynamic node joining

2014-05-17 Thread Mayur Rustagi
A better way would be use Mesos (and quite possibly Yarn in 1.0.0).
That will allow you to add nodes on the fly & leverage it for Spark.
Frankly Standalone mode is not meant to handle those issues. That said we
use our deployment tool as stopping the cluster for adding nodes is not
really an issue at the moment.


Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi 



On Sat, May 17, 2014 at 9:05 PM, Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:

> Thanks for the info about adding/removing nodes dynamically. That's
> valuable.
>
> 2014년 5월 16일 금요일, Akhil Das님이 작성한 메시지:
>
>  Hi Han :)
>>
>> 1. Is there a way to automatically re-spawn spark workers? We've
>> situations where executor OOM causes worker process to be DEAD and it does
>> not came back automatically.
>>
>> => Yes. You can either add OOM killer 
>> exception on
>> all of your Spark processes. Or you can have a cronjob which will keep
>> monitoring your worker processes and if they goes down the cronjob will
>> bring it back.
>>
>>   2. How to dynamically add (or remove) some worker machines to (from)
>> the cluster? We'd like to leverage the auto-scaling group in EC2 for
>> example.
>>
>> => You can add/remove worker nodes on the fly by spawning a new machine
>> and then adding that machine's ip address in the master node then rsyncing
>> the spark directory with all worker machines including the one you added.
>> Then simply you can use the *start-all.sh* script inside the master node
>> to bring up the new worker in action. For removing a worker machine from
>> master can be done in the same way, you have to remove the workers IP
>> address from the masters *slaves *file and then you can restart your
>> slaves and that will get your worker removed.
>>
>>
>> FYI, we have a deployment tool (a web-based UI) that we use for internal
>> purposes, it is build on top of the spark-ec2 script (with some changes)
>> and it has a module for adding/removing worker nodes on the fly. It looks
>> like the attached screenshot. If you want i can give you some access.
>>
>> Thanks
>> Best Regards
>>
>>
>> On Wed, May 14, 2014 at 9:52 PM, Han JU  wrote:
>>
>>> Hi all,
>>>
>>> Just 2 questions:
>>>
>>>   1. Is there a way to automatically re-spawn spark workers? We've
>>> situations where executor OOM causes worker process to be DEAD and it does
>>> not came back automatically.
>>>
>>>   2. How to dynamically add (or remove) some worker machines to (from)
>>> the cluster? We'd like to leverage the auto-scaling group in EC2 for
>>> example.
>>>
>>> We're using spark-standalone.
>>>
>>> Thanks a lot.
>>>
>>> --
>>> *JU Han*
>>>
>>> Data Engineer @ Botify.com
>>>
>>> +33 061960
>>>
>>
>>


Re: Worker re-spawn and dynamic node joining

2014-05-17 Thread Nicholas Chammas
Thanks for the info about adding/removing nodes dynamically. That's
valuable.

2014년 5월 16일 금요일, Akhil Das님이 작성한 메시지:

> Hi Han :)
>
> 1. Is there a way to automatically re-spawn spark workers? We've
> situations where executor OOM causes worker process to be DEAD and it does
> not came back automatically.
>
> => Yes. You can either add OOM killer 
> exception on
> all of your Spark processes. Or you can have a cronjob which will keep
> monitoring your worker processes and if they goes down the cronjob will
> bring it back.
>
>   2. How to dynamically add (or remove) some worker machines to (from) the
> cluster? We'd like to leverage the auto-scaling group in EC2 for example.
>
> => You can add/remove worker nodes on the fly by spawning a new machine
> and then adding that machine's ip address in the master node then rsyncing
> the spark directory with all worker machines including the one you added.
> Then simply you can use the *start-all.sh* script inside the master node
> to bring up the new worker in action. For removing a worker machine from
> master can be done in the same way, you have to remove the workers IP
> address from the masters *slaves *file and then you can restart your
> slaves and that will get your worker removed.
>
>
> FYI, we have a deployment tool (a web-based UI) that we use for internal
> purposes, it is build on top of the spark-ec2 script (with some changes)
> and it has a module for adding/removing worker nodes on the fly. It looks
> like the attached screenshot. If you want i can give you some access.
>
> Thanks
> Best Regards
>
>
> On Wed, May 14, 2014 at 9:52 PM, Han JU 
> 
> > wrote:
>
>> Hi all,
>>
>> Just 2 questions:
>>
>>   1. Is there a way to automatically re-spawn spark workers? We've
>> situations where executor OOM causes worker process to be DEAD and it does
>> not came back automatically.
>>
>>   2. How to dynamically add (or remove) some worker machines to (from)
>> the cluster? We'd like to leverage the auto-scaling group in EC2 for
>> example.
>>
>> We're using spark-standalone.
>>
>> Thanks a lot.
>>
>> --
>> *JU Han*
>>
>> Data Engineer @ Botify.com
>>
>> +33 061960
>>
>
>


Worker re-spawn and dynamic node joining

2014-05-14 Thread Han JU
Hi all,

Just 2 questions:

  1. Is there a way to automatically re-spawn spark workers? We've
situations where executor OOM causes worker process to be DEAD and it does
not came back automatically.

  2. How to dynamically add (or remove) some worker machines to (from) the
cluster? We'd like to leverage the auto-scaling group in EC2 for example.

We're using spark-standalone.

Thanks a lot.

-- 
*JU Han*

Data Engineer @ Botify.com

+33 061960