On Jan 18, 2010, at 12:15pm, Ted Dunning wrote:
Is there an important difference between creating an existing AMI or
using
an existing AMI with a startup script that populates everything from
S3?
Building an AMI takes a few hours of time and is a total pain in the
butt.
My eventual result was that I didn't need to do it at all.
[snip]
Leaving aside the pros/cons of having a pre-installed Hadoop, there
were two things that I found non-trivial to handle via the init script:
1. Get LZO support installed.
Though I didn't dig into the various ways to do a scripted install.
2. Turn off noatime.
You can do it via the script, but it feels kind of odd to have to re-
mount disks, and either know about the set of volumes or do fancy sed-
fu to dynamically generate the list.
Maybe there's an easy way that I missed? Input welcome...
-- Ken
The two things that
I found that I had roughly three levels of variation in my production
systems:
- the OS
- the infrastructural components like java, hadoop and zookeeeper
- the application that I wanted to run
My initial thought was that the AMI should cover the first two
aspects of
variability. But I also found that I wanted to change the version
of the
infrastructure stuff fairly often in development of the AMI and not
infrequently in production.
For Mahout customers, I would imagine that there is a reasonable
amount of
variability in desired OS (Ubuntu versus Redhat versus Centos at
least), JDK
and Hadoop versions. We definitely can't afford the time to build
AMI's for
all options.
My final answer for deepdyve was to use a standard alestic.com AMI.
That
let me change the OS whenever I needed to and would let Mahout
customers
pick their preference. These AMI's allow a 16K startup script which
I used
to handle infrastructure variation. That worked very well for me
and could
be used for Mahout.
The cost was a few 10's of seconds at boot time. The benefit was
vastly
better debug and development cycle. Somebody else handled the OS
and I
could test many variations of setup script very quickly. This
practice is
very much in line with what RightScale does.
Generally, I would avoid the full-custom AMI in favor of a few S3
hosted tar
balls rooted at / that anybody can rain down on any Linux version they
want.
On Mon, Jan 18, 2010 at 6:54 AM, Grant Ingersoll
<[email protected]>wrote:
Create an AMI with:
1. Java 1.6
2. Maven
3. svn
4. Mahout's exact Hadoop version
5. A checkout of Mahout
--
Ted Dunning, CTO
DeepDyve
--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g