I still don't see a point in producing an AMI -- it's like distributing our .jar, which we already do -- plus a gigabyte of operating system.
However by all means I think we should produce the sort of runnable .jar files that AEMR needs and post them in S3. That is, a .jar with all the Mahout code, plus a proper Main-Class manifest entry, is all you need to start your own instance of the job with AEMR (you supply .jar location and program arguments). I have this sort of ready to go for collaborative filtering, even as I'm hitting snags farther down the road. AEMR is exactly what we want to support. On Tue, May 19, 2009 at 11:59 AM, Tim Bass <[email protected]> wrote: > Dear All, > > A few months ago (on the developer's list) we briefly touched on the > idea of building a Mahout public AMI on EC2. > > Subsequently, Amazon released EMR and a number of folks have > experimented with running sample Mahout jobs on EMR. > > What are the pros and cons of creating a public Mahout AMI with Hadoop > and MapReduce configured with the versions that > are supported by the developers, in addition to Amazon's EMR implementation? > > Should we revisit the AMI idea? Pros and cons? >
