Thanks for your fast answers :) I'll rethink this and post as soon as I get something
--- En date de : Jeu 26.2.09, Grant Ingersoll <gsing...@apache.org> a écrit : > De: Grant Ingersoll <gsing...@apache.org> > Objet: Re: GSoC 2009 proposition > À: mahout-dev@lucene.apache.org > Date: Jeudi 26 Février 2009, 16h20 > You might have a look at > http://www.lucidimagination.com/search/document/5ab9ddafa19ee04b/thought_offering_ec2_s3_based_services#2d096f39b02ec289 > for some background thoughts. > > I think it's a nice idea and I've been meaning to > use my Amazon credits for just such a thing for a while now, > but not sure how high priority it is. > > You might consider extending/altering this thought to have > more of a focus on developing demos (including code) of > Mahout with real data sets on larger scale systems. Part of > this might involve showing people how to do this on EC2, but > the bigger focus to me should be on demoing/documenting > Mahout's capabilities, versus showing how to run Mahout > on any particular platform. > > > On Feb 26, 2009, at 9:58 AM, deneche abdelhakim wrote: > > > > > Hi, > > Im planning to participate, again, at GSoC and I want > to do it, again, with Mahout. > > This year, lets make Mahout run over Amazon EC2. This > means building the proper AMIs, run some Mahout projects > (the GA examples) over EC2, give feedback and write simple, > clear How-Tos about running a Mahout project on EC2. > > > > The Mahout.GA examples (TSP and CDGA) should be good > real-world scenarios about how one may need to use Mahout.GA > on EC2. The TSP example should be modified to be able to run > on a console and to load TSPLIB benchmarks, thus we can > tackle more challenging TSP problems with the help of EC2. > The CDGA example should run unmodified given, of course, > that Hadoop is configured correctly on EC2 and the the > benchmark is on HDFS. > > > > This two examples will give us three use cases about > Mahout on EC2: > > > > 1. TSP can be run on a single, High-CPU, EC2 instance. > In this case, Watchmaker's ConcurrentEvolutionEngine > should take care of the multi-threading part (or at least I > hope!) and there will be no need for Hadoop; > > > > 2. TSP can also be run over multiple EC2 instances > with the help of Hadoop; > > > > 3. CDGA not only needs Hadoop to run, but its data > should be on HDFS. > > > > > > So what do you think, is the "elephant" > ready for a walk on EC2 ? > > > > > > > > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem > (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: > http://www.lucidimagination.com/search