Re: Autoscaling of Spark YARN cluster

2015-12-14 Thread Mingyu Kim
Cool. Using Ambari to monitor and scale up/down the cluster sounds
promising. Thanks for the pointer!

Mingyu

From:  Deepak Sharma 
Date:  Monday, December 14, 2015 at 1:53 AM
To:  cs user 
Cc:  Mingyu Kim , "user@spark.apache.org"

Subject:  Re: Autoscaling of Spark YARN cluster

An approach I can think of  is using Ambari Metrics Service(AMS)
Using these metrics , you can decide upon if the cluster is low in
resources.
If yes, call the Ambari management API to add the node to the cluster.

Thanks
Deepak

On Mon, Dec 14, 2015 at 2:48 PM, cs user  wrote:
> Hi Mingyu, 
> 
> I'd be interested in hearing about anything else you find which might meet
> your needs for this.
> 
> One way perhaps this could be done would be to use Ambari. Ambari comes with a
> nice api which you can use to add additional nodes into a cluster:
> 
> https://github.com/apache/ambari/blob/trunk/ambari-server/docs/api/v1/index.md
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_ambari
> _blob_trunk_ambari-2Dserver_docs_api_v1_index.md&d=CwMFaQ&c=izlc9mHr637UR4lpLE
> ZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=ennQJq47pNnObsDh-88a9YUrUulcYQoV8giPASqXB84&m=tDt9
> pyS5Gz-4R50zQ9pG1lDSVv8Gg03JsQXzTtsghag&s=aceNpj9HLTmsTeVMI5VMxj9HmbU3ls0gqxa2
> OVkkUOA&e=> 
> 
> Once the node has been built, the ambari agent installed, you can then call
> back to the management node via the api, tell it what you want the new node to
> be, and it will connect, configure your new node and add it to the cluster.
> 
> You could create a host group within the cluster blueprint with the minimal
> components you need to install to have it operate as a yarn node.
> 
> As for the decision to scale, that is outside of the remit of Ambari. I guess
> you could look into using aws autoscaling or you could look into a product
> called scalr, which has an opensource version. We are using this to install an
> ambari cluster using chef to configure the nodes up until the point it hands
> over to Ambari. 
> 
> Scalr allows you to write custom scaling metrics which you could use to query
> the # of applications queued, # of resources available values and add nodes
> when required. 
> 
> Cheers!
> 
> On Mon, Dec 14, 2015 at 8:57 AM, Mingyu Kim  wrote:
>> Hi all,
>> 
>> Has anyone tried out autoscaling Spark YARN cluster on a public cloud (e.g.
>> EC2) based on workload? To be clear, I¹m interested in scaling the cluster
>> itself up and down by adding and removing YARN nodes based on the cluster
>> resource utilization (e.g. # of applications queued, # of resources
>> available), as opposed to scaling resources assigned to Spark applications,
>> which is natively supported by Spark¹s dynamic resource scheduling. I¹ve
>> found that Cloudbreak
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__sequenceiq.com_cloudbrea
>> k-2Ddocs_latest_periscope_-23how-2Dit-2Dworks&d=CwMFaQ&c=izlc9mHr637UR4lpLEZL
>> FFS3Vn2UXBrZ4tFb6oOnmz8&r=ennQJq47pNnObsDh-88a9YUrUulcYQoV8giPASqXB84&m=tDt9p
>> yS5Gz-4R50zQ9pG1lDSVv8Gg03JsQXzTtsghag&s=qKfLbs_mv_rLKTEHN1FUW98fehzu7HAbdD7t
>> h9dykTg&e=>  has a similar feature, but it¹s in ³technical preview², and I
>> didn¹t find much else from my search.
>> 
>> This might be a general YARN question, but wanted to check if there¹s a
>> solution popular in the Spark community. Any sharing of experience around
>> autoscaling will be helpful!
>> 
>> Thanks,
>> Mingyu
> 



-- 
Thanks
Deepak
www.bigdatabig.com 
<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.bigdatabig.com&d=Cw
MFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=ennQJq47pNnObsDh-88a9YU
rUulcYQoV8giPASqXB84&m=tDt9pyS5Gz-4R50zQ9pG1lDSVv8Gg03JsQXzTtsghag&s=HGOZP3P
urGS6jiGFWaz2IevpABa9qmCrmkbP-hwvmhI&e=>
www.keosha.net 
<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.keosha.net&d=CwMFaQ
&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=ennQJq47pNnObsDh-88a9YUrUul
cYQoV8giPASqXB84&m=tDt9pyS5Gz-4R50zQ9pG1lDSVv8Gg03JsQXzTtsghag&s=U8sfm5YwpBP
1s8c4QjkSsmIESUG56RNKo3O6ZEnijA4&e=>




smime.p7s
Description: S/MIME cryptographic signature


Re: Autoscaling of Spark YARN cluster

2015-12-14 Thread Deepak Sharma
An approach I can think of  is using Ambari Metrics Service(AMS)
Using these metrics , you can decide upon if the cluster is low in
resources.
If yes, call the Ambari management API to add the node to the cluster.

Thanks
Deepak

On Mon, Dec 14, 2015 at 2:48 PM, cs user  wrote:

> Hi Mingyu,
>
> I'd be interested in hearing about anything else you find which might meet
> your needs for this.
>
> One way perhaps this could be done would be to use Ambari. Ambari comes
> with a nice api which you can use to add additional nodes into a cluster:
>
>
> https://github.com/apache/ambari/blob/trunk/ambari-server/docs/api/v1/index.md
>
> Once the node has been built, the ambari agent installed, you can then
> call back to the management node via the api, tell it what you want the new
> node to be, and it will connect, configure your new node and add it to the
> cluster.
>
> You could create a host group within the cluster blueprint with the
> minimal components you need to install to have it operate as a yarn node.
>
> As for the decision to scale, that is outside of the remit of Ambari. I
> guess you could look into using aws autoscaling or you could look into a
> product called scalr, which has an opensource version. We are using this to
> install an ambari cluster using chef to configure the nodes up until the
> point it hands over to Ambari.
>
> Scalr allows you to write custom scaling metrics which you could use to
> query the # of applications queued, # of resources available values and
> add nodes when required.
>
> Cheers!
>
> On Mon, Dec 14, 2015 at 8:57 AM, Mingyu Kim  wrote:
>
>> Hi all,
>>
>> Has anyone tried out autoscaling Spark YARN cluster on a public cloud
>> (e.g. EC2) based on workload? To be clear, I’m interested in scaling the
>> cluster itself up and down by adding and removing YARN nodes based on the
>> cluster resource utilization (e.g. # of applications queued, # of resources
>> available), as opposed to scaling resources assigned to Spark applications,
>> which is natively supported by Spark’s dynamic resource scheduling. I’ve
>> found that Cloudbreak
>>  has
>> a similar feature, but it’s in “technical preview”, and I didn’t find much
>> else from my search.
>>
>> This might be a general YARN question, but wanted to check if there’s a
>> solution popular in the Spark community. Any sharing of experience around
>> autoscaling will be helpful!
>>
>> Thanks,
>> Mingyu
>>
>
>


-- 
Thanks
Deepak
www.bigdatabig.com
www.keosha.net


Re: Autoscaling of Spark YARN cluster

2015-12-14 Thread cs user
Hi Mingyu,

I'd be interested in hearing about anything else you find which might meet
your needs for this.

One way perhaps this could be done would be to use Ambari. Ambari comes
with a nice api which you can use to add additional nodes into a cluster:

https://github.com/apache/ambari/blob/trunk/ambari-server/docs/api/v1/index.md

Once the node has been built, the ambari agent installed, you can then call
back to the management node via the api, tell it what you want the new node
to be, and it will connect, configure your new node and add it to the
cluster.

You could create a host group within the cluster blueprint with the minimal
components you need to install to have it operate as a yarn node.

As for the decision to scale, that is outside of the remit of Ambari. I
guess you could look into using aws autoscaling or you could look into a
product called scalr, which has an opensource version. We are using this to
install an ambari cluster using chef to configure the nodes up until the
point it hands over to Ambari.

Scalr allows you to write custom scaling metrics which you could use to
query the # of applications queued, # of resources available values and add
nodes when required.

Cheers!

On Mon, Dec 14, 2015 at 8:57 AM, Mingyu Kim  wrote:

> Hi all,
>
> Has anyone tried out autoscaling Spark YARN cluster on a public cloud
> (e.g. EC2) based on workload? To be clear, I’m interested in scaling the
> cluster itself up and down by adding and removing YARN nodes based on the
> cluster resource utilization (e.g. # of applications queued, # of resources
> available), as opposed to scaling resources assigned to Spark applications,
> which is natively supported by Spark’s dynamic resource scheduling. I’ve
> found that Cloudbreak
>  has
> a similar feature, but it’s in “technical preview”, and I didn’t find much
> else from my search.
>
> This might be a general YARN question, but wanted to check if there’s a
> solution popular in the Spark community. Any sharing of experience around
> autoscaling will be helpful!
>
> Thanks,
> Mingyu
>


Autoscaling of Spark YARN cluster

2015-12-14 Thread Mingyu Kim
Hi all,

Has anyone tried out autoscaling Spark YARN cluster on a public cloud (e.g.
EC2) based on workload? To be clear, I¹m interested in scaling the cluster
itself up and down by adding and removing YARN nodes based on the cluster
resource utilization (e.g. # of applications queued, # of resources
available), as opposed to scaling resources assigned to Spark applications,
which is natively supported by Spark¹s dynamic resource scheduling. I¹ve
found that Cloudbreak
  has
a similar feature, but it¹s in ³technical preview², and I didn¹t find much
else from my search.

This might be a general YARN question, but wanted to check if there¹s a
solution popular in the Spark community. Any sharing of experience around
autoscaling will be helpful!

Thanks,
Mingyu




smime.p7s
Description: S/MIME cryptographic signature