Re: Spark EC2 script on Large clusters

2015-11-05 Thread Sabarish Sasidharan
Qubole uses yarn.

Regards
Sab
On 06-Nov-2015 8:31 am, "Jerry Lam"  wrote:

> Does Qubole use Yarn or Mesos for resource management?
>
> Sent from my iPhone
>
> > On 5 Nov, 2015, at 9:02 pm, Sabarish Sasidharan <
> sabarish.sasidha...@manthan.com> wrote:
> >
> > Qubole
>


Re: Spark EC2 script on Large clusters

2015-11-05 Thread Jerry Lam
Does Qubole use Yarn or Mesos for resource management?

Sent from my iPhone

> On 5 Nov, 2015, at 9:02 pm, Sabarish Sasidharan 
>  wrote:
> 
> Qubole

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark EC2 script on Large clusters

2015-11-05 Thread Sabarish Sasidharan
Qubole is one option where you can use spots and get a couple other
benefits. We use Qubole at Manthan for our Spark workloads.

For ensuring all the nodes are ready, you could use
yarn.minregisteredresourcesratio config property to ensure the execution
doesn't start till the requisite containers have been allocated.

Regards
Sab
On 06-Nov-2015 12:22 am, "Christian"  wrote:

> Let me rephrase. Emr cost is about twice as much as the spot price, making
> it almost 2/3 of the overall cost.
> On Thu, Nov 5, 2015 at 11:50 AM Christian  wrote:
>
>> Hi Johnathan,
>>
>> We are using EMR now and it's costing way too much. We do spot pricing
>> and the emr addon cost is about 2/3 the price of the actual spot instance.
>> On Thu, Nov 5, 2015 at 11:31 AM Jonathan Kelly 
>> wrote:
>>
>>> Christian,
>>>
>>> Is there anything preventing you from using EMR, which will manage your
>>> cluster for you? Creating large clusters would take mins on EMR instead of
>>> hours. Also, EMR supports growing your cluster easily and recently added
>>> support for shrinking your cluster gracefully (even while jobs are running).
>>>
>>> ~ Jonathan
>>>
>>> On Thu, Nov 5, 2015 at 9:48 AM, Nicholas Chammas <
>>> nicholas.cham...@gmail.com> wrote:
>>>
 Yeah, as Shivaram mentioned, this issue is well-known. It's documented
 in SPARK-5189  and a
 bunch of related issues. Unfortunately, it's hard to resolve this issue in
 spark-ec2 without rewriting large parts of the project. But if you take a
 crack at it and succeed I'm sure a lot of people will be happy.

 I've started a separate project 
  -- which Shivaram also mentioned -- which aims to solve the problem
 of long launch times and other issues
  with spark-ec2.
 It's still very young and lacks several critical features, but we are
 making steady progress.

 Nick

 On Thu, Nov 5, 2015 at 12:30 PM Shivaram Venkataraman <
 shiva...@eecs.berkeley.edu> wrote:

> It is a known limitation that spark-ec2 is very slow for large
> clusters and as you mention most of this is due to the use of rsync to
> transfer things from the master to all the slaves.
>
> Nick cc'd has been working on an alternative approach at
> https://github.com/nchammas/flintrock that is more scalable.
>
> Thanks
> Shivaram
>
> On Thu, Nov 5, 2015 at 8:12 AM, Christian  wrote:
> > For starters, thanks for the awesome product!
> >
> > When creating ec2-clusters of 20-40 nodes, things work great. When
> we create
> > a cluster with the provided spark-ec2 script, it takes hours. When
> creating
> > a 200 node cluster, it takes 2 1/2 hours and for a 500 node cluster
> it takes
> > over 5 hours. One other problem we are having is that some nodes
> don't come
> > up when the other ones do, the process seems to just move on,
> skipping the
> > rsync and any installs on those ones.
> >
> > My guess as to why it takes so long to set up a large cluster is
> because of
> > the use of rsync. What if instead of using rsync, you synched to s3
> and then
> > did a pdsh to pull it down on all of the machines. This is a big
> deal for us
> > and if we can come up with a good plan, we might be able help out
> with the
> > required changes.
> >
> > Are there any suggestions on how to deal with some of the nodes not
> being
> > ready when the process starts?
> >
> > Thanks for your time,
> > Christian
> >
>

>>>


Re: Spark EC2 script on Large clusters

2015-11-05 Thread Christian
Let me rephrase. Emr cost is about twice as much as the spot price, making
it almost 2/3 of the overall cost.
On Thu, Nov 5, 2015 at 11:50 AM Christian  wrote:

> Hi Johnathan,
>
> We are using EMR now and it's costing way too much. We do spot pricing and
> the emr addon cost is about 2/3 the price of the actual spot instance.
> On Thu, Nov 5, 2015 at 11:31 AM Jonathan Kelly 
> wrote:
>
>> Christian,
>>
>> Is there anything preventing you from using EMR, which will manage your
>> cluster for you? Creating large clusters would take mins on EMR instead of
>> hours. Also, EMR supports growing your cluster easily and recently added
>> support for shrinking your cluster gracefully (even while jobs are running).
>>
>> ~ Jonathan
>>
>> On Thu, Nov 5, 2015 at 9:48 AM, Nicholas Chammas <
>> nicholas.cham...@gmail.com> wrote:
>>
>>> Yeah, as Shivaram mentioned, this issue is well-known. It's documented
>>> in SPARK-5189  and a
>>> bunch of related issues. Unfortunately, it's hard to resolve this issue in
>>> spark-ec2 without rewriting large parts of the project. But if you take a
>>> crack at it and succeed I'm sure a lot of people will be happy.
>>>
>>> I've started a separate project  --
>>> which Shivaram also mentioned -- which aims to solve the problem of
>>> long launch times and other issues
>>>  with spark-ec2. It's
>>> still very young and lacks several critical features, but we are making
>>> steady progress.
>>>
>>> Nick
>>>
>>> On Thu, Nov 5, 2015 at 12:30 PM Shivaram Venkataraman <
>>> shiva...@eecs.berkeley.edu> wrote:
>>>
 It is a known limitation that spark-ec2 is very slow for large
 clusters and as you mention most of this is due to the use of rsync to
 transfer things from the master to all the slaves.

 Nick cc'd has been working on an alternative approach at
 https://github.com/nchammas/flintrock that is more scalable.

 Thanks
 Shivaram

 On Thu, Nov 5, 2015 at 8:12 AM, Christian  wrote:
 > For starters, thanks for the awesome product!
 >
 > When creating ec2-clusters of 20-40 nodes, things work great. When we
 create
 > a cluster with the provided spark-ec2 script, it takes hours. When
 creating
 > a 200 node cluster, it takes 2 1/2 hours and for a 500 node cluster
 it takes
 > over 5 hours. One other problem we are having is that some nodes
 don't come
 > up when the other ones do, the process seems to just move on,
 skipping the
 > rsync and any installs on those ones.
 >
 > My guess as to why it takes so long to set up a large cluster is
 because of
 > the use of rsync. What if instead of using rsync, you synched to s3
 and then
 > did a pdsh to pull it down on all of the machines. This is a big deal
 for us
 > and if we can come up with a good plan, we might be able help out
 with the
 > required changes.
 >
 > Are there any suggestions on how to deal with some of the nodes not
 being
 > ready when the process starts?
 >
 > Thanks for your time,
 > Christian
 >

>>>
>>


Re: Spark EC2 script on Large clusters

2015-11-05 Thread Christian
Hi Johnathan,

We are using EMR now and it's costing way too much. We do spot pricing and
the emr addon cost is about 2/3 the price of the actual spot instance.
On Thu, Nov 5, 2015 at 11:31 AM Jonathan Kelly 
wrote:

> Christian,
>
> Is there anything preventing you from using EMR, which will manage your
> cluster for you? Creating large clusters would take mins on EMR instead of
> hours. Also, EMR supports growing your cluster easily and recently added
> support for shrinking your cluster gracefully (even while jobs are running).
>
> ~ Jonathan
>
> On Thu, Nov 5, 2015 at 9:48 AM, Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> Yeah, as Shivaram mentioned, this issue is well-known. It's documented in
>> SPARK-5189  and a
>> bunch of related issues. Unfortunately, it's hard to resolve this issue in
>> spark-ec2 without rewriting large parts of the project. But if you take a
>> crack at it and succeed I'm sure a lot of people will be happy.
>>
>> I've started a separate project  --
>> which Shivaram also mentioned -- which aims to solve the problem of long
>> launch times and other issues
>>  with spark-ec2. It's
>> still very young and lacks several critical features, but we are making
>> steady progress.
>>
>> Nick
>>
>> On Thu, Nov 5, 2015 at 12:30 PM Shivaram Venkataraman <
>> shiva...@eecs.berkeley.edu> wrote:
>>
>>> It is a known limitation that spark-ec2 is very slow for large
>>> clusters and as you mention most of this is due to the use of rsync to
>>> transfer things from the master to all the slaves.
>>>
>>> Nick cc'd has been working on an alternative approach at
>>> https://github.com/nchammas/flintrock that is more scalable.
>>>
>>> Thanks
>>> Shivaram
>>>
>>> On Thu, Nov 5, 2015 at 8:12 AM, Christian  wrote:
>>> > For starters, thanks for the awesome product!
>>> >
>>> > When creating ec2-clusters of 20-40 nodes, things work great. When we
>>> create
>>> > a cluster with the provided spark-ec2 script, it takes hours. When
>>> creating
>>> > a 200 node cluster, it takes 2 1/2 hours and for a 500 node cluster it
>>> takes
>>> > over 5 hours. One other problem we are having is that some nodes don't
>>> come
>>> > up when the other ones do, the process seems to just move on, skipping
>>> the
>>> > rsync and any installs on those ones.
>>> >
>>> > My guess as to why it takes so long to set up a large cluster is
>>> because of
>>> > the use of rsync. What if instead of using rsync, you synched to s3
>>> and then
>>> > did a pdsh to pull it down on all of the machines. This is a big deal
>>> for us
>>> > and if we can come up with a good plan, we might be able help out with
>>> the
>>> > required changes.
>>> >
>>> > Are there any suggestions on how to deal with some of the nodes not
>>> being
>>> > ready when the process starts?
>>> >
>>> > Thanks for your time,
>>> > Christian
>>> >
>>>
>>
>


Re: Spark EC2 script on Large clusters

2015-11-05 Thread Jonathan Kelly
Christian,

Is there anything preventing you from using EMR, which will manage your
cluster for you? Creating large clusters would take mins on EMR instead of
hours. Also, EMR supports growing your cluster easily and recently added
support for shrinking your cluster gracefully (even while jobs are running).

~ Jonathan

On Thu, Nov 5, 2015 at 9:48 AM, Nicholas Chammas  wrote:

> Yeah, as Shivaram mentioned, this issue is well-known. It's documented in
> SPARK-5189  and a bunch
> of related issues. Unfortunately, it's hard to resolve this issue in
> spark-ec2 without rewriting large parts of the project. But if you take a
> crack at it and succeed I'm sure a lot of people will be happy.
>
> I've started a separate project  --
> which Shivaram also mentioned -- which aims to solve the problem of long
> launch times and other issues
>  with spark-ec2. It's
> still very young and lacks several critical features, but we are making
> steady progress.
>
> Nick
>
> On Thu, Nov 5, 2015 at 12:30 PM Shivaram Venkataraman <
> shiva...@eecs.berkeley.edu> wrote:
>
>> It is a known limitation that spark-ec2 is very slow for large
>> clusters and as you mention most of this is due to the use of rsync to
>> transfer things from the master to all the slaves.
>>
>> Nick cc'd has been working on an alternative approach at
>> https://github.com/nchammas/flintrock that is more scalable.
>>
>> Thanks
>> Shivaram
>>
>> On Thu, Nov 5, 2015 at 8:12 AM, Christian  wrote:
>> > For starters, thanks for the awesome product!
>> >
>> > When creating ec2-clusters of 20-40 nodes, things work great. When we
>> create
>> > a cluster with the provided spark-ec2 script, it takes hours. When
>> creating
>> > a 200 node cluster, it takes 2 1/2 hours and for a 500 node cluster it
>> takes
>> > over 5 hours. One other problem we are having is that some nodes don't
>> come
>> > up when the other ones do, the process seems to just move on, skipping
>> the
>> > rsync and any installs on those ones.
>> >
>> > My guess as to why it takes so long to set up a large cluster is
>> because of
>> > the use of rsync. What if instead of using rsync, you synched to s3 and
>> then
>> > did a pdsh to pull it down on all of the machines. This is a big deal
>> for us
>> > and if we can come up with a good plan, we might be able help out with
>> the
>> > required changes.
>> >
>> > Are there any suggestions on how to deal with some of the nodes not
>> being
>> > ready when the process starts?
>> >
>> > Thanks for your time,
>> > Christian
>> >
>>
>


Re: Spark EC2 script on Large clusters

2015-11-05 Thread Nicholas Chammas
Yeah, as Shivaram mentioned, this issue is well-known. It's documented in
SPARK-5189  and a bunch
of related issues. Unfortunately, it's hard to resolve this issue in
spark-ec2 without rewriting large parts of the project. But if you take a
crack at it and succeed I'm sure a lot of people will be happy.

I've started a separate project  --
which Shivaram also mentioned -- which aims to solve the problem of long
launch times and other issues
 with spark-ec2. It's
still very young and lacks several critical features, but we are making
steady progress.

Nick

On Thu, Nov 5, 2015 at 12:30 PM Shivaram Venkataraman <
shiva...@eecs.berkeley.edu> wrote:

> It is a known limitation that spark-ec2 is very slow for large
> clusters and as you mention most of this is due to the use of rsync to
> transfer things from the master to all the slaves.
>
> Nick cc'd has been working on an alternative approach at
> https://github.com/nchammas/flintrock that is more scalable.
>
> Thanks
> Shivaram
>
> On Thu, Nov 5, 2015 at 8:12 AM, Christian  wrote:
> > For starters, thanks for the awesome product!
> >
> > When creating ec2-clusters of 20-40 nodes, things work great. When we
> create
> > a cluster with the provided spark-ec2 script, it takes hours. When
> creating
> > a 200 node cluster, it takes 2 1/2 hours and for a 500 node cluster it
> takes
> > over 5 hours. One other problem we are having is that some nodes don't
> come
> > up when the other ones do, the process seems to just move on, skipping
> the
> > rsync and any installs on those ones.
> >
> > My guess as to why it takes so long to set up a large cluster is because
> of
> > the use of rsync. What if instead of using rsync, you synched to s3 and
> then
> > did a pdsh to pull it down on all of the machines. This is a big deal
> for us
> > and if we can come up with a good plan, we might be able help out with
> the
> > required changes.
> >
> > Are there any suggestions on how to deal with some of the nodes not being
> > ready when the process starts?
> >
> > Thanks for your time,
> > Christian
> >
>


Re: Spark EC2 script on Large clusters

2015-11-05 Thread Shivaram Venkataraman
It is a known limitation that spark-ec2 is very slow for large
clusters and as you mention most of this is due to the use of rsync to
transfer things from the master to all the slaves.

Nick cc'd has been working on an alternative approach at
https://github.com/nchammas/flintrock that is more scalable.

Thanks
Shivaram

On Thu, Nov 5, 2015 at 8:12 AM, Christian  wrote:
> For starters, thanks for the awesome product!
>
> When creating ec2-clusters of 20-40 nodes, things work great. When we create
> a cluster with the provided spark-ec2 script, it takes hours. When creating
> a 200 node cluster, it takes 2 1/2 hours and for a 500 node cluster it takes
> over 5 hours. One other problem we are having is that some nodes don't come
> up when the other ones do, the process seems to just move on, skipping the
> rsync and any installs on those ones.
>
> My guess as to why it takes so long to set up a large cluster is because of
> the use of rsync. What if instead of using rsync, you synched to s3 and then
> did a pdsh to pull it down on all of the machines. This is a big deal for us
> and if we can come up with a good plan, we might be able help out with the
> required changes.
>
> Are there any suggestions on how to deal with some of the nodes not being
> ready when the process starts?
>
> Thanks for your time,
> Christian
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark EC2 script on Large clusters

2015-11-05 Thread Christian
For starters, thanks for the awesome product!

When creating ec2-clusters of 20-40 nodes, things work great. When we
create a cluster with the provided spark-ec2 script, it takes hours. When
creating a 200 node cluster, it takes 2 1/2 hours and for a 500 node
cluster it takes over 5 hours. One other problem we are having is that some
nodes don't come up when the other ones do, the process seems to just move
on, skipping the rsync and any installs on those ones.

My guess as to why it takes so long to set up a large cluster is because of
the use of rsync. What if instead of using rsync, you synched to s3 and
then did a pdsh to pull it down on all of the machines. This is a big deal
for us and if we can come up with a good plan, we might be able help out
with the required changes.

Are there any suggestions on how to deal with some of the nodes not being
ready when the process starts?

Thanks for your time,
Christian