2010/1/18 Grant Ingersoll <[email protected]>:
> OK, thanks for all the advice.  I'm wondering if this makes sense:'
>
> Create an AMI with:
> 1. Java 1.6
> 2. Maven
> 3. svn
> 4. Mahout's exact Hadoop version
> 5. A checkout of Mahout

I am running CDH2 with hadoop currently in version 0.20.1+152-1~j
(using cloudera's intrepid-testing  apt repo on a regular ubuntu
karmic distro) with on my 2 dev boxes (one is 32bit bi core and one is
64bit quad core) in conf-pseudo (single node cIuser). I could
successfully run mahout-0.3-SNAPSHOT jobs (including the
hadoop-0.20.2-SNAPSHOT. I guess this would run exactly the same on a
real EC2 cluster setup with http://archive.cloudera.com/docs/ec2.html
.

> I want to be able to run the trunk version of Mahout with little upgrade 
> pain, both on an individual node and in a cluster.
>
> Is this the shortest path?  I don't have much experience w/ creating AMIs, 
> but I want my work to be reusable by the community (remember, committers can 
> get credits from Amazon for testing Mahout)
>
> After that, I want to convert some of the public datasets to vector format 
> and run some performance benchmarks.

I think we should host sample datasets that are know to be
vectorizable using mahout utilities either on S3 (using s3:// and not
s3n:// when individual files are larger than 5GB) or using a dedicated
EBS volume with a public snapshot.

-- 
Olivier
http://twitter.com/ogrisel - http://code.oliviergrisel.name

Reply via email to