+1 this is a smarter version of what I tried to put together too. A
semi-custom AMI would download components and configure via an /etc/rc
script. Quite nice.

Point taken about Hadoop and the usefulness amongst ourselves of such
a thing. Based on incomplete experience with running AMIs, and a
Hadoop cluster, it's going to be no small feet to craft a series of
AMIs (or one configurable one) that will reliably come up, find its
workers, accept jobs, etc. It's not terrible but the work of a week
I'm guessing.

That would be pretty great, for the whole community, should you
succeed. You could probably make a nice paid AMI out of it!

On Mon, Jan 18, 2010 at 8:15 PM, Ted Dunning <[email protected]> wrote:
> Is there an important difference between creating an existing AMI or using
> an existing AMI with a startup script that populates everything from S3?
>
> Building an AMI takes a few hours of time and is a total pain in the butt.
> My eventual result was that I didn't need to do it at all.
>
> I found that I had roughly three levels of variation in my production
> systems:
>
> - the OS
> - the infrastructural components like java, hadoop and zookeeeper
> - the application that I wanted to run
>
> My initial thought was that the AMI should cover the first two aspects of
> variability.  But I also found that I wanted to change the version of the
> infrastructure stuff fairly often in development of the AMI and not
> infrequently in production.
>
> For Mahout customers, I would imagine that there is a reasonable amount of
> variability in desired OS (Ubuntu versus Redhat versus Centos at least), JDK
> and Hadoop versions.  We definitely can't afford the time to build AMI's for
> all options.
>
> My final answer for deepdyve was to use a standard alestic.com AMI.  That
> let me change the OS whenever I needed to and would let Mahout customers
> pick their preference.  These AMI's allow a 16K startup script which I used
> to handle infrastructure variation.  That worked very well for me and could
> be used for Mahout.
>
> The cost was a few 10's of seconds at boot time.  The benefit was vastly
> better debug and development cycle.  Somebody else handled the OS and I
> could test many variations of setup script very quickly.  This practice is
> very much in line with what RightScale does.
>
> Generally, I would avoid the full-custom AMI in favor of a few S3 hosted tar
> balls rooted at / that anybody can rain down on any Linux version they
> want.
>
> On Mon, Jan 18, 2010 at 6:54 AM, Grant Ingersoll <[email protected]>wrote:
>
>> Create an AMI with:
>> 1. Java 1.6
>> 2. Maven
>> 3. svn
>> 4. Mahout's exact Hadoop version
>> 5. A checkout of Mahout
>>
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>

Reply via email to