Re: GSoC 2009 proposition

deneche abdelhakim Thu, 26 Feb 2009 07:49:05 -0800

Thanks for your fast answers :) I'll rethink this and post as soon as I get 
something



--- En date de : Jeu 26.2.09, Grant Ingersoll <gsing...@apache.org> a écrit :

> De: Grant Ingersoll <gsing...@apache.org>
> Objet: Re: GSoC 2009 proposition
> À: mahout-dev@lucene.apache.org
> Date: Jeudi 26 Février 2009, 16h20
> You might have a look at
> http://www.lucidimagination.com/search/document/5ab9ddafa19ee04b/thought_offering_ec2_s3_based_services#2d096f39b02ec289
> for some background thoughts.
> 
> I think it's a nice idea and I've been meaning to
> use my Amazon credits for just such a thing for a while now,
> but not sure how high priority it is.
> 
> You might consider extending/altering this thought to have
> more of a focus on developing demos (including code) of
> Mahout with real data sets on larger scale systems.  Part of
> this might involve showing people how to do this on EC2, but
> the bigger focus to me should be on demoing/documenting
> Mahout's capabilities, versus showing how to run Mahout
> on any particular platform.
> 
> 
> On Feb 26, 2009, at 9:58 AM, deneche abdelhakim wrote:
> 
> > 
> > Hi,
> > Im planning to participate, again, at GSoC and I want
> to do it, again, with Mahout.
> > This year, lets make Mahout run over Amazon EC2. This
> means building the proper AMIs, run some Mahout projects
> (the GA examples) over EC2, give feedback and write simple,
> clear How-Tos about running a Mahout project on EC2.
> > 
> > The Mahout.GA examples (TSP and CDGA) should be good
> real-world scenarios about how one may need to use Mahout.GA
> on EC2. The TSP example should be modified to be able to run
> on a console and to load TSPLIB benchmarks, thus we can
> tackle more challenging TSP problems with the help of EC2.
> The CDGA example should run unmodified given, of course,
> that Hadoop is configured correctly on EC2 and the the
> benchmark is on HDFS.
> > 
> > This two examples will give us three use cases about
> Mahout on EC2:
> > 
> > 1. TSP can be run on a single, High-CPU, EC2 instance.
> In this case, Watchmaker's ConcurrentEvolutionEngine
> should take care of the multi-threading part (or at least I
> hope!) and there will be no need for Hadoop;
> > 
> > 2. TSP can also be run over multiple EC2 instances
> with the help of Hadoop;
> > 
> > 3. CDGA not only needs Hadoop to run, but its data
> should be on HDFS.
> > 
> > 
> > So what do you think, is the "elephant"
> ready for a walk on EC2 ?
> > 
> > 
> > 
> 
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem
> (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
> http://www.lucidimagination.com/search

Re: GSoC 2009 proposition

Reply via email to