Spark EC2 script on Large clusters

2015-11-05 Thread Christian
For starters, thanks for the awesome product! When creating ec2-clusters of 20-40 nodes, things work great. When we create a cluster with the provided spark-ec2 script, it takes hours. When creating a 200 node cluster, it takes 2 1/2 hours and for a 500 node cluster it takes over 5 hours. One

Re: Spark EC2 script on Large clusters

2015-11-05 Thread Shivaram Venkataraman
It is a known limitation that spark-ec2 is very slow for large clusters and as you mention most of this is due to the use of rsync to transfer things from the master to all the slaves. Nick cc'd has been working on an alternative approach at https://github.com/nchammas/flintrock that is more

Re: Spark EC2 script on Large clusters

2015-11-05 Thread Nicholas Chammas
Yeah, as Shivaram mentioned, this issue is well-known. It's documented in SPARK-5189 and a bunch of related issues. Unfortunately, it's hard to resolve this issue in spark-ec2 without rewriting large parts of the project. But if you take a crack

Re: Spark EC2 script on Large clusters

2015-11-05 Thread Christian
Let me rephrase. Emr cost is about twice as much as the spot price, making it almost 2/3 of the overall cost. On Thu, Nov 5, 2015 at 11:50 AM Christian wrote: > Hi Johnathan, > > We are using EMR now and it's costing way too much. We do spot pricing and > the emr addon cost is

Re: Spark EC2 script on Large clusters

2015-11-05 Thread Christian
Hi Johnathan, We are using EMR now and it's costing way too much. We do spot pricing and the emr addon cost is about 2/3 the price of the actual spot instance. On Thu, Nov 5, 2015 at 11:31 AM Jonathan Kelly wrote: > Christian, > > Is there anything preventing you from

Re: Spark EC2 script on Large clusters

2015-11-05 Thread Jonathan Kelly
Christian, Is there anything preventing you from using EMR, which will manage your cluster for you? Creating large clusters would take mins on EMR instead of hours. Also, EMR supports growing your cluster easily and recently added support for shrinking your cluster gracefully (even while jobs are

Re: Spark EC2 script on Large clusters

2015-11-05 Thread Jerry Lam
Does Qubole use Yarn or Mesos for resource management? Sent from my iPhone > On 5 Nov, 2015, at 9:02 pm, Sabarish Sasidharan > wrote: > > Qubole - To unsubscribe, e-mail:

Re: Spark EC2 script on Large clusters

2015-11-05 Thread Sabarish Sasidharan
Qubole uses yarn. Regards Sab On 06-Nov-2015 8:31 am, "Jerry Lam" wrote: > Does Qubole use Yarn or Mesos for resource management? > > Sent from my iPhone > > > On 5 Nov, 2015, at 9:02 pm, Sabarish Sasidharan < > sabarish.sasidha...@manthan.com> wrote: > > > > Qubole >

Re: Spark EC2 script on Large clusters

2015-11-05 Thread Sabarish Sasidharan
Qubole is one option where you can use spots and get a couple other benefits. We use Qubole at Manthan for our Spark workloads. For ensuring all the nodes are ready, you could use yarn.minregisteredresourcesratio config property to ensure the execution doesn't start till the requisite containers