On Jan 18, 2010, at 12:15pm, Ted Dunning wrote:

Is there an important difference between creating an existing AMI or using an existing AMI with a startup script that populates everything from S3?

Building an AMI takes a few hours of time and is a total pain in the butt.
My eventual result was that I didn't need to do it at all.

[snip]

Leaving aside the pros/cons of having a pre-installed Hadoop, there were two things that I found non-trivial to handle via the init script:

1. Get LZO support installed.

Though I didn't dig into the various ways to do a scripted install.

2. Turn off noatime.

You can do it via the script, but it feels kind of odd to have to re- mount disks, and either know about the set of volumes or do fancy sed- fu to dynamically generate the list.

Maybe there's an easy way that I missed? Input welcome...

-- Ken


The two things that

I found that I had roughly three levels of variation in my production
systems:

- the OS
- the infrastructural components like java, hadoop and zookeeeper
- the application that I wanted to run

My initial thought was that the AMI should cover the first two aspects of variability. But I also found that I wanted to change the version of the
infrastructure stuff fairly often in development of the AMI and not
infrequently in production.

For Mahout customers, I would imagine that there is a reasonable amount of variability in desired OS (Ubuntu versus Redhat versus Centos at least), JDK and Hadoop versions. We definitely can't afford the time to build AMI's for
all options.

My final answer for deepdyve was to use a standard alestic.com AMI. That let me change the OS whenever I needed to and would let Mahout customers pick their preference. These AMI's allow a 16K startup script which I used to handle infrastructure variation. That worked very well for me and could
be used for Mahout.

The cost was a few 10's of seconds at boot time. The benefit was vastly better debug and development cycle. Somebody else handled the OS and I could test many variations of setup script very quickly. This practice is
very much in line with what RightScale does.

Generally, I would avoid the full-custom AMI in favor of a few S3 hosted tar
balls rooted at / that anybody can rain down on any Linux version they
want.

On Mon, Jan 18, 2010 at 6:54 AM, Grant Ingersoll <[email protected]>wrote:

Create an AMI with:
1. Java 1.6
2. Maven
3. svn
4. Mahout's exact Hadoop version
5. A checkout of Mahout




--
Ted Dunning, CTO
DeepDyve

--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g




Reply via email to