Re: GSOC Mahout.GA, next steps ?

2008-06-09 Thread Grant Ingersoll
Cool, thanks!  Sorry I have been so quiet.  Do feel free to ask more  
questions if you have them.  I hope to finally have a bunch of  
personal things past very soon and will be able to focus some time on  
Mahout again.


-Grant

On Jun 9, 2008, at 6:14 AM, deneche abdelhakim wrote:

I found a cool introduction to evolutionary algorithms, I added it  
to the wiki if someone is interested...



--- En date de : Mer 28.5.08, Grant Ingersoll <[EMAIL PROTECTED]>  
a écrit :



De: Grant Ingersoll <[EMAIL PROTECTED]>
Objet: Re: GSOC Mahout.GA, next steps ?
À: mahout-dev@lucene.apache.org
Date: Mercredi 28 Mai 2008, 13h11
This sounds good.  I don't know a lot about GAs, so if
others have
insight, that would be great.  It would also be handy if
you could put
up a section on the Wiki about GAs and maybe post some
links to basic
papers there, so people that aren't familiar can go do
some background
reading.

I will try to get to MAHOUT-56 this week, but others can
jump in and
review as well.

-Grant

On May 27, 2008, at 4:52 AM, deneche abdelhakim wrote:


In a GA there are many things that can be distributed,

and one

should always start with the most compute demanding

task . This is

very problem dependent, but in most cases the fitness

evaluation

function (FEF) "is" the part to distribute.

The FEF evaluates each single individual in the

population, and it

may need some datas (D) to do so. For example in the

traveling

Salesman Problem, the problem is defined by a set of

cities and the

distances between them, the FEF needs those distances

to evaluate

the individuals.

I see 2 ways to distribute the FEF:

A. if the datas D is not big and can fit in each

single cluster

node, then the easiest solution is to use each Mapper

to evaluate

one individual and to pass the Datas D to all the

mappers (using

some Job parameter or the DistributedCache). The input

of the job is

the population of individuals. For someone used to

work with

Watchmaker, the solution A is straightforward, he

needs to change

one line of code.

B. if the datas D are really big and span over

multiple nodes, then

the FEF should be writen in the form of

Mappers-Reducers, the

population of individuals is passed to all the mappers

(again using

the DistributedCache or a Job parameter) and the datas

D are now the

input of the Job.

[MAHOUT-56] contains a possible implementation for

solution A. Now I

should start thinking about solution B and all I need

is a problem

that uses very big datasets. I already proposed one in

my GSoC

proposal, it consists of using a Genetic Algorithm to

find good

binary classification rule for a given dataset. But I

am open to any

other suggestion.

__
Do You Yahoo!?
En finir avec le spam? Yahoo! Mail vous offre la

meilleure

protection possible contre les messages non

sollicités

http://mail.yahoo.fr Yahoo! Mail



  
_

Envoyez avec Yahoo! Mail. Une boite mail plus intelligente http://mail.yahoo.fr


--
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ









Re: GSOC Mahout.GA, next steps ?

2008-06-09 Thread deneche abdelhakim
I found a cool introduction to evolutionary algorithms, I added it to the wiki 
if someone is interested...


--- En date de : Mer 28.5.08, Grant Ingersoll <[EMAIL PROTECTED]> a écrit :

> De: Grant Ingersoll <[EMAIL PROTECTED]>
> Objet: Re: GSOC Mahout.GA, next steps ?
> À: mahout-dev@lucene.apache.org
> Date: Mercredi 28 Mai 2008, 13h11
> This sounds good.  I don't know a lot about GAs, so if
> others have  
> insight, that would be great.  It would also be handy if
> you could put  
> up a section on the Wiki about GAs and maybe post some
> links to basic  
> papers there, so people that aren't familiar can go do
> some background  
> reading.
> 
> I will try to get to MAHOUT-56 this week, but others can
> jump in and  
> review as well.
> 
> -Grant
> 
> On May 27, 2008, at 4:52 AM, deneche abdelhakim wrote:
> 
> > In a GA there are many things that can be distributed,
> and one  
> > should always start with the most compute demanding
> task . This is  
> > very problem dependent, but in most cases the fitness
> evaluation  
> > function (FEF) "is" the part to distribute.
> >
> > The FEF evaluates each single individual in the
> population, and it  
> > may need some datas (D) to do so. For example in the
> traveling  
> > Salesman Problem, the problem is defined by a set of
> cities and the  
> > distances between them, the FEF needs those distances
> to evaluate  
> > the individuals.
> >
> > I see 2 ways to distribute the FEF:
> >
> > A. if the datas D is not big and can fit in each
> single cluster  
> > node, then the easiest solution is to use each Mapper
> to evaluate  
> > one individual and to pass the Datas D to all the
> mappers (using  
> > some Job parameter or the DistributedCache). The input
> of the job is  
> > the population of individuals. For someone used to
> work with  
> > Watchmaker, the solution A is straightforward, he
> needs to change  
> > one line of code.
> >
> > B. if the datas D are really big and span over
> multiple nodes, then  
> > the FEF should be writen in the form of
> Mappers-Reducers, the  
> > population of individuals is passed to all the mappers
> (again using  
> > the DistributedCache or a Job parameter) and the datas
> D are now the  
> > input of the Job.
> >
> > [MAHOUT-56] contains a possible implementation for
> solution A. Now I  
> > should start thinking about solution B and all I need
> is a problem  
> > that uses very big datasets. I already proposed one in
> my GSoC  
> > proposal, it consists of using a Genetic Algorithm to
> find good  
> > binary classification rule for a given dataset. But I
> am open to any  
> > other suggestion.
> >
> > __
> > Do You Yahoo!?
> > En finir avec le spam? Yahoo! Mail vous offre la
> meilleure  
> > protection possible contre les messages non
> sollicités
> > http://mail.yahoo.fr Yahoo! Mail


  
_ 
Envoyez avec Yahoo! Mail. Une boite mail plus intelligente http://mail.yahoo.fr