Please read this too :
http://ken-blog.krugler.org/2009/05/19/performance-problems-with-verticalfocused-web-crawling/

Interesting build from ken.

2009/5/26 Raymond Balmès <[email protected]>

> yes already reported in multiple-threads.
> I noted that if one does a "recrawl" you don't get this behavior... no idea
> why.
>
> -Raymond-
>
> 2009/5/26 Larsson85 <[email protected]>
>
>
>> When I try to do my crawl it seems like the threads get stuck in som
>> spinwaiting mode. At first the crawl goes as planned, and I couldnt be
>> happier. But after som time, it starts reporting more of these spinwaiting
>> messages.
>>
>> I print a log here to show you what it looks like. As you can see it gets
>> stuck, and the queue decrease by 1 all the time. I've tried doing a
>> smaller
>> crawl, and what happends is that it counts down untill the
>> "fetchQueues.totalSize" reaches 0, and then the crawl is done.
>>
>> But the problem is that this countdown is very slow,there's no effective
>> crawling going on, not using eather bandwith or cpu power. Basicly, this
>> costs way to much time, I cant let it go on like this for hours to be
>> done.
>> How can I fix this?
>>
>>
>> after about an hour of crawling this is what the log looks like
>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2526
>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2526
>>  - fetching http://home.swipnet.se/~w-147200/
>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2525
>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2525
>>  - fetching http://biphome.spray.se/alarsson/
>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2524
>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2524
>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2524
>>  - fetching http://home.swipnet.se/~w-31853/html/
>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2523
>>
>> ....
>>
>> --
>> View this message in context:
>> http://www.nabble.com/threads-get-stuck-in-spinwaiting-tp23723825p23723825.html
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>
>>
>

Reply via email to