Please read this too : http://ken-blog.krugler.org/2009/05/19/performance-problems-with-verticalfocused-web-crawling/
Interesting build from ken. 2009/5/26 Raymond Balmès <[email protected]> > yes already reported in multiple-threads. > I noted that if one does a "recrawl" you don't get this behavior... no idea > why. > > -Raymond- > > 2009/5/26 Larsson85 <[email protected]> > > >> When I try to do my crawl it seems like the threads get stuck in som >> spinwaiting mode. At first the crawl goes as planned, and I couldnt be >> happier. But after som time, it starts reporting more of these spinwaiting >> messages. >> >> I print a log here to show you what it looks like. As you can see it gets >> stuck, and the queue decrease by 1 all the time. I've tried doing a >> smaller >> crawl, and what happends is that it counts down untill the >> "fetchQueues.totalSize" reaches 0, and then the crawl is done. >> >> But the problem is that this countdown is very slow,there's no effective >> crawling going on, not using eather bandwith or cpu power. Basicly, this >> costs way to much time, I cant let it go on like this for hours to be >> done. >> How can I fix this? >> >> >> after about an hour of crawling this is what the log looks like >> -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2526 >> -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2526 >> - fetching http://home.swipnet.se/~w-147200/ >> -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2525 >> -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2525 >> - fetching http://biphome.spray.se/alarsson/ >> -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2524 >> -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2524 >> -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2524 >> - fetching http://home.swipnet.se/~w-31853/html/ >> -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2523 >> >> .... >> >> -- >> View this message in context: >> http://www.nabble.com/threads-get-stuck-in-spinwaiting-tp23723825p23723825.html >> Sent from the Nutch - User mailing list archive at Nabble.com. >> >> >
