Is there an important difference between creating an existing AMI or using
an existing AMI with a startup script that populates everything from S3?

Building an AMI takes a few hours of time and is a total pain in the butt.
My eventual result was that I didn't need to do it at all.

I found that I had roughly three levels of variation in my production
systems:

- the OS
- the infrastructural components like java, hadoop and zookeeeper
- the application that I wanted to run

My initial thought was that the AMI should cover the first two aspects of
variability.  But I also found that I wanted to change the version of the
infrastructure stuff fairly often in development of the AMI and not
infrequently in production.

For Mahout customers, I would imagine that there is a reasonable amount of
variability in desired OS (Ubuntu versus Redhat versus Centos at least), JDK
and Hadoop versions.  We definitely can't afford the time to build AMI's for
all options.

My final answer for deepdyve was to use a standard alestic.com AMI.  That
let me change the OS whenever I needed to and would let Mahout customers
pick their preference.  These AMI's allow a 16K startup script which I used
to handle infrastructure variation.  That worked very well for me and could
be used for Mahout.

The cost was a few 10's of seconds at boot time.  The benefit was vastly
better debug and development cycle.  Somebody else handled the OS and I
could test many variations of setup script very quickly.  This practice is
very much in line with what RightScale does.

Generally, I would avoid the full-custom AMI in favor of a few S3 hosted tar
balls rooted at / that anybody can rain down on any Linux version they
want.

On Mon, Jan 18, 2010 at 6:54 AM, Grant Ingersoll <[email protected]>wrote:

> Create an AMI with:
> 1. Java 1.6
> 2. Maven
> 3. svn
> 4. Mahout's exact Hadoop version
> 5. A checkout of Mahout
>



-- 
Ted Dunning, CTO
DeepDyve

Reply via email to