Observing what my crawls do, I believe Ken must be right.
Towards the end of the crawl (when the fetchqueues.totalSize="xxxx" counts
down) in some cases I'm only fetching on two sites roughly , so indeed the
politeness starts to play a role there at least it should.

-Ray-

2009/5/26 Raymond Balmès <[email protected]>

> Please read this too :
>
> http://ken-blog.krugler.org/2009/05/19/performance-problems-with-verticalfocused-web-crawling/
>
> Interesting build from ken.
>
> 2009/5/26 Raymond Balmès <[email protected]>
>
>  yes already reported in multiple-threads.
>> I noted that if one does a "recrawl" you don't get this behavior... no
>> idea why.
>>
>> -Raymond-
>>
>> 2009/5/26 Larsson85 <[email protected]>
>>
>>
>>> When I try to do my crawl it seems like the threads get stuck in som
>>> spinwaiting mode. At first the crawl goes as planned, and I couldnt be
>>> happier. But after som time, it starts reporting more of these
>>> spinwaiting
>>> messages.
>>>
>>> I print a log here to show you what it looks like. As you can see it gets
>>> stuck, and the queue decrease by 1 all the time. I've tried doing a
>>> smaller
>>> crawl, and what happends is that it counts down untill the
>>> "fetchQueues.totalSize" reaches 0, and then the crawl is done.
>>>
>>> But the problem is that this countdown is very slow,there's no effective
>>> crawling going on, not using eather bandwith or cpu power. Basicly, this
>>> costs way to much time, I cant let it go on like this for hours to be
>>> done.
>>> How can I fix this?
>>>
>>>
>>> after about an hour of crawling this is what the log looks like
>>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2526
>>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2526
>>>  - fetching http://home.swipnet.se/~w-147200/
>>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2525
>>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2525
>>>  - fetching http://biphome.spray.se/alarsson/
>>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2524
>>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2524
>>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2524
>>>  - fetching http://home.swipnet.se/~w-31853/html/
>>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2523
>>>
>>> ....
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/threads-get-stuck-in-spinwaiting-tp23723825p23723825.html
>>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>>
>>>
>>
>

Reply via email to