I found a cool introduction to evolutionary algorithms, I added it to the wiki if someone is interested...
--- En date de : Mer 28.5.08, Grant Ingersoll <[EMAIL PROTECTED]> a écrit : > De: Grant Ingersoll <[EMAIL PROTECTED]> > Objet: Re: GSOC Mahout.GA, next steps ? > À: mahout-dev@lucene.apache.org > Date: Mercredi 28 Mai 2008, 13h11 > This sounds good. I don't know a lot about GAs, so if > others have > insight, that would be great. It would also be handy if > you could put > up a section on the Wiki about GAs and maybe post some > links to basic > papers there, so people that aren't familiar can go do > some background > reading. > > I will try to get to MAHOUT-56 this week, but others can > jump in and > review as well. > > -Grant > > On May 27, 2008, at 4:52 AM, deneche abdelhakim wrote: > > > In a GA there are many things that can be distributed, > and one > > should always start with the most compute demanding > task . This is > > very problem dependent, but in most cases the fitness > evaluation > > function (FEF) "is" the part to distribute. > > > > The FEF evaluates each single individual in the > population, and it > > may need some datas (D) to do so. For example in the > traveling > > Salesman Problem, the problem is defined by a set of > cities and the > > distances between them, the FEF needs those distances > to evaluate > > the individuals. > > > > I see 2 ways to distribute the FEF: > > > > A. if the datas D is not big and can fit in each > single cluster > > node, then the easiest solution is to use each Mapper > to evaluate > > one individual and to pass the Datas D to all the > mappers (using > > some Job parameter or the DistributedCache). The input > of the job is > > the population of individuals. For someone used to > work with > > Watchmaker, the solution A is straightforward, he > needs to change > > one line of code. > > > > B. if the datas D are really big and span over > multiple nodes, then > > the FEF should be writen in the form of > Mappers-Reducers, the > > population of individuals is passed to all the mappers > (again using > > the DistributedCache or a Job parameter) and the datas > D are now the > > input of the Job. > > > > [MAHOUT-56] contains a possible implementation for > solution A. Now I > > should start thinking about solution B and all I need > is a problem > > that uses very big datasets. I already proposed one in > my GSoC > > proposal, it consists of using a Genetic Algorithm to > find good > > binary classification rule for a given dataset. But I > am open to any > > other suggestion. > > > > __________________________________________________ > > Do You Yahoo!? > > En finir avec le spam? Yahoo! Mail vous offre la > meilleure > > protection possible contre les messages non > sollicités > > http://mail.yahoo.fr Yahoo! Mail _____________________________________________________________________________ Envoyez avec Yahoo! Mail. Une boite mail plus intelligente http://mail.yahoo.fr