What is the status of map reduce? i just got finished reading your paper and all of the threads and i'm drooling over the notion of such a system :)
We have an initial implementation of MapReduce that works, but has not been yet used heavily, and thus probably needs improvement. Next month I plan to start porting all of Nutch's algorithm's to sit on MapReduce, as outlined in:
http://www.mail-archive.com/[email protected]/msg03754.html
In the first iteration I will probably not implement full link analysis, only inlink counts and text. Nor will implement continuous fetching: one will still alternately fetch and update the page db. But updating the pagedb should be much more scalable. Also, no link db will be maintained while fetching. I hope to have this working by June or so and start trying to use it to build billion-page scale indexes for the Internet Archive. These plans are subject to change.
Doug
