I think this is exactly what packer is for. See e.g.
http://www.packer.io/intro/getting-started/build-image.html

On a related note, the current AMI for hvm systems (e.g. m3.*, r3.*) has a
bad package for httpd, whcih causes ganglia not to start. For some reason I
can't get access to the raw AMI to fix it.

On Fri, Oct 3, 2014 at 9:30 AM, Nicholas Chammas <nicholas.cham...@gmail.com
> wrote:

> Is there perhaps a way to define an AMI programmatically? Like, a
> collection of base AMI id + list of required stuff to be installed + list
> of required configuration changes. I’m guessing that’s what people use
> things like Puppet, Ansible, or maybe also AWS CloudFormation for, right?
>
> If we could do something like that, then with every new release of Spark we
> could quickly and easily create new AMIs that have everything we need.
> spark-ec2 would only have to bring up the instances and do a minimal amount
> of configuration, and the only thing we’d need to track in the Spark repo
> is the code that defines what goes on the AMI, as well as a list of the AMI
> ids specific to each release.
>
> I’m just thinking out loud here. Does this make sense?
>
> Nate,
>
> Any progress on your end with this work?
>
> Nick
> ​
>
> On Sun, Jul 13, 2014 at 8:53 PM, Shivaram Venkataraman <
> shiva...@eecs.berkeley.edu> wrote:
>
> > It should be possible to improve cluster launch time if we are careful
> > about what commands we run during setup. One way to do this would be to
> > walk down the list of things we do for cluster initialization and see if
> > there is anything we can do make things faster. Unfortunately this might
> be
> > pretty time consuming, but I don't know of a better strategy. The place
> to
> > start would be the setup.sh file at
> > https://github.com/mesos/spark-ec2/blob/v3/setup.sh
> >
> > Here are some things that take a lot of time and could be improved:
> > 1. Creating swap partitions on all machines. We could check if there is a
> > way to get EC2 to always mount a swap partition
> > 2. Copying / syncing things across slaves. The copy-dir script is called
> > too many times right now and each time it pauses for a few milliseconds
> > between slaves [1]. This could be improved by removing unnecessary copies
> > 3. We could make less frequently used modules like Tachyon, persistent
> hdfs
> > not a part of the default setup.
> >
> > [1] https://github.com/mesos/spark-ec2/blob/v3/copy-dir.sh#L42
> >
> > Thanks
> > Shivaram
> >
> >
> >
> >
> > On Sat, Jul 12, 2014 at 7:02 PM, Nicholas Chammas <
> > nicholas.cham...@gmail.com> wrote:
> >
> > > On Thu, Jul 10, 2014 at 8:10 PM, Nate D'Amico <n...@reactor8.com>
> wrote:
> > >
> > > > Starting to work through some automation/config stuff for spark stack
> > on
> > > > EC2 with a project, will be focusing the work through the apache
> bigtop
> > > > effort to start, can then share with spark community directly as
> things
> > > > progress if people are interested
> > >
> > >
> > > Let us know how that goes. I'm definitely interested in hearing more.
> > >
> > > Nick
> > >
> >
>

Reply via email to