Is there perhaps a way to define an AMI programmatically? Like, a
collection of base AMI id + list of required stuff to be installed + list
of required configuration changes. I’m guessing that’s what people use
things like Puppet, Ansible, or maybe also AWS CloudFormation for, right?

If we could do something like that, then with every new release of Spark we
could quickly and easily create new AMIs that have everything we need.
spark-ec2 would only have to bring up the instances and do a minimal amount
of configuration, and the only thing we’d need to track in the Spark repo
is the code that defines what goes on the AMI, as well as a list of the AMI
ids specific to each release.

I’m just thinking out loud here. Does this make sense?

Nate,

Any progress on your end with this work?

Nick
​

On Sun, Jul 13, 2014 at 8:53 PM, Shivaram Venkataraman <
shiva...@eecs.berkeley.edu> wrote:

> It should be possible to improve cluster launch time if we are careful
> about what commands we run during setup. One way to do this would be to
> walk down the list of things we do for cluster initialization and see if
> there is anything we can do make things faster. Unfortunately this might be
> pretty time consuming, but I don't know of a better strategy. The place to
> start would be the setup.sh file at
> https://github.com/mesos/spark-ec2/blob/v3/setup.sh
>
> Here are some things that take a lot of time and could be improved:
> 1. Creating swap partitions on all machines. We could check if there is a
> way to get EC2 to always mount a swap partition
> 2. Copying / syncing things across slaves. The copy-dir script is called
> too many times right now and each time it pauses for a few milliseconds
> between slaves [1]. This could be improved by removing unnecessary copies
> 3. We could make less frequently used modules like Tachyon, persistent hdfs
> not a part of the default setup.
>
> [1] https://github.com/mesos/spark-ec2/blob/v3/copy-dir.sh#L42
>
> Thanks
> Shivaram
>
>
>
>
> On Sat, Jul 12, 2014 at 7:02 PM, Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
> > On Thu, Jul 10, 2014 at 8:10 PM, Nate D'Amico <n...@reactor8.com> wrote:
> >
> > > Starting to work through some automation/config stuff for spark stack
> on
> > > EC2 with a project, will be focusing the work through the apache bigtop
> > > effort to start, can then share with spark community directly as things
> > > progress if people are interested
> >
> >
> > Let us know how that goes. I'm definitely interested in hearing more.
> >
> > Nick
> >
>

Reply via email to