On Fri, May 4, 2012 at 9:06 AM, sharat attupurath <shara...@hotmail.com>wrote:
> Hi, > > We are trying to parallelize the ant colony optimization algorithm for TSP > over hadoop and are facing some issues. We are using TSPLIB as input files. > The input is a text file containing eucledian coordinates of the cities - > first column is city number and the next two columns contain the x and y > coordinates respectively. > > What we intend to do is to take input from this single file, send copies > of it to multiple mappers (each mapper acts like the ant in the algorithm), > each mapper works on its input to find its own TSP solution that it outputs > and finally the reducer outputs the smallest tour found by the mappers. > Hope we are in the right track. Here are the issues: > > 1) Since the input file is small, we need to force hadoop to fire up > multiple map tasks by replicating the input. How can we make an InputSplit > of the whole file and replicate it so that the input can be sent to > multiple mappers? > Write a custom splitter sending the same data to all mappers - the only critical criteria is the number of "Ants" > > 2) the algorithm uses a shared pheromone array and each mapper needs to > read and write data from this. How can we share the pheromone data across > the mappers. > You cannot share data across mappers and should not attempt to do so - Better to use the reducer(s) to combine trails combine first pass trails and then pass the combined trails to another mapreduce job - this with the original problem plus the current pheremone trails > > Hope the questions are clear enough. Any help would be greatly > appreciated. > > Thank you > > Regards > > Sharat > -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com