You are partially correct.

It's not terribly complex, but also not easy to accomplish.  Sounds like you 
want to manage some partially/fully baked AMI's with the core spark libs and 
dependencies already on the image.  Main issues that crop up are:

1) image sprawl, as libs/config/defaults/etc change, images need to be 
"rebuilt" and references updated
2) cross region support (not too huge deal now with copy functionality, just 
more complex image mgmt.)

If you don’t want to restrict which instance types/sizes one can use, you also 
have uptick in image mgmt. complexity with:

3) instance type (need both standard and hvm)

Starting to work through some automation/config stuff for spark stack on EC2 
with a project, will be focusing the work through the apache bigtop effort to 
start, can then share with spark community directly as things progress if 
people are interested

Nate


-----Original Message-----
From: Nicholas Chammas [mailto:nicholas.cham...@gmail.com] 
Sent: Thursday, July 10, 2014 3:06 PM
To: dev
Subject: EC2 clusters ready in launch time + 30 seconds

Hi devs!

Right now it takes a non-trivial amount of time to launch EC2 clusters.
Part of this time is spent starting the EC2 instances, which is out of our 
control. Another part of this time is spent installing stuff on and configuring 
the instances. This, we can control.

I’d like to explore approaches to upgrading spark-ec2 so that launching a 
cluster of any size generally takes only 30 seconds on top of the time to 
launch the base EC2 instances. Since Amazon can launch instances concurrently, 
I believe this means we should be able to launch a fully operational Spark 
cluster of any size in constant time. Is that correct?

Do we already have an idea of what it would take to get to that point?

Nick
​

Reply via email to