Observing what my crawls do, I believe Ken must be right. Towards the end of the crawl (when the fetchqueues.totalSize="xxxx" counts down) in some cases I'm only fetching on two sites roughly , so indeed the politeness starts to play a role there at least it should.
-Ray- 2009/5/26 Raymond Balmès <[email protected]> > Please read this too : > > http://ken-blog.krugler.org/2009/05/19/performance-problems-with-verticalfocused-web-crawling/ > > Interesting build from ken. > > 2009/5/26 Raymond Balmès <[email protected]> > > yes already reported in multiple-threads. >> I noted that if one does a "recrawl" you don't get this behavior... no >> idea why. >> >> -Raymond- >> >> 2009/5/26 Larsson85 <[email protected]> >> >> >>> When I try to do my crawl it seems like the threads get stuck in som >>> spinwaiting mode. At first the crawl goes as planned, and I couldnt be >>> happier. But after som time, it starts reporting more of these >>> spinwaiting >>> messages. >>> >>> I print a log here to show you what it looks like. As you can see it gets >>> stuck, and the queue decrease by 1 all the time. I've tried doing a >>> smaller >>> crawl, and what happends is that it counts down untill the >>> "fetchQueues.totalSize" reaches 0, and then the crawl is done. >>> >>> But the problem is that this countdown is very slow,there's no effective >>> crawling going on, not using eather bandwith or cpu power. Basicly, this >>> costs way to much time, I cant let it go on like this for hours to be >>> done. >>> How can I fix this? >>> >>> >>> after about an hour of crawling this is what the log looks like >>> -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2526 >>> -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2526 >>> - fetching http://home.swipnet.se/~w-147200/ >>> -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2525 >>> -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2525 >>> - fetching http://biphome.spray.se/alarsson/ >>> -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2524 >>> -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2524 >>> -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2524 >>> - fetching http://home.swipnet.se/~w-31853/html/ >>> -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2523 >>> >>> .... >>> >>> -- >>> View this message in context: >>> http://www.nabble.com/threads-get-stuck-in-spinwaiting-tp23723825p23723825.html >>> Sent from the Nutch - User mailing list archive at Nabble.com. >>> >>> >> >
