You are partially correct. It's not terribly complex, but also not easy to accomplish. Sounds like you want to manage some partially/fully baked AMI's with the core spark libs and dependencies already on the image. Main issues that crop up are:
1) image sprawl, as libs/config/defaults/etc change, images need to be "rebuilt" and references updated 2) cross region support (not too huge deal now with copy functionality, just more complex image mgmt.) If you don’t want to restrict which instance types/sizes one can use, you also have uptick in image mgmt. complexity with: 3) instance type (need both standard and hvm) Starting to work through some automation/config stuff for spark stack on EC2 with a project, will be focusing the work through the apache bigtop effort to start, can then share with spark community directly as things progress if people are interested Nate -----Original Message----- From: Nicholas Chammas [mailto:nicholas.cham...@gmail.com] Sent: Thursday, July 10, 2014 3:06 PM To: dev Subject: EC2 clusters ready in launch time + 30 seconds Hi devs! Right now it takes a non-trivial amount of time to launch EC2 clusters. Part of this time is spent starting the EC2 instances, which is out of our control. Another part of this time is spent installing stuff on and configuring the instances. This, we can control. I’d like to explore approaches to upgrading spark-ec2 so that launching a cluster of any size generally takes only 30 seconds on top of the time to launch the base EC2 instances. Since Amazon can launch instances concurrently, I believe this means we should be able to launch a fully operational Spark cluster of any size in constant time. Is that correct? Do we already have an idea of what it would take to get to that point? Nick