Robin Haswell wrote: > On Fri, 2006-12-08 at 11:01 +0100, Andrzej Bialecki wrote: > >> Ad 1. >> >> I suspect that it's sorting the reduce output now ... in 0.8.x this >> operation has poor performance, especially when run on a single server. >> So, I advise patience, and giving as much CPU and RAM as possible. For >> the future, it's also much much better to run the fetcher in non-parsing >> mode and run "nutch parse" afterwards as a separate step. >> > > Okay, I'll give it a while and see what happens. Is it possible to get > any information on what's going on? I'm running 0.8 pretty much > out-of-the-box on a single server. I've seen people mentioning phases of > Hadoop - can it tell me what's going on? >
This should be shown in the logs - the map xx% or reduce xx% progress is printed to the logs. The reduce phase consists of copying map outputs (reduce 0-33%), then sorting them - and here's where most CPU and disk IO and time is spent - which happens between 33%-66%, and finally copying sorted outputs to form the final result. You can also do a kill -SIGQUIT <pid> to get a thread dump - you will be able to see what the threads are really doing. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
