Hi AJ, I very well may be wrong, but as I understand it, nutch/hadoop implements map/reduce primarily as a means of efficiently and reliably distributing work among nodes in a (large) cluster of consumer grade machines. I suspect that there is not much to be gained from implementing it with a single machine.
http://labs.google.com/papers/mapreduce.html http://wiki.apache.org/lucene-hadoop/HadoopMapReduce http://wiki.apache.org/lucene-hadoop/ happy hunting, joe On 10/27/06, AJ Chen <[EMAIL PROTECTED]> wrote: > I'm using 0.9-dev code to crawl the web on a single machine. Using default > configuration, it spends ~5 hours to fetch 100,000 pages, but also >5 hours > in doing map-reduce. Is this the expected performance for map-reduce phase > relative to fetch phase? It seems to me map-reduce takes too much time. Is > there anything to configure in order to reduce the operation (time) for > map-reduce? I'll appreciate any suggetion on how to improve web search > performance on single machine. > > Thanks, > > AJ > http://web2express.org > > ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
