@otis
or did you mean nutch/host. I have only one server for my tests.

@larsson
my spinwaiting phase is usually less  than 30minutes.

Something I noticed as well is the speed in the beginning is so fast that I
can't read the screen. Not sure when the standard.out occurs at start_fetch
or fetch_complete.

-Raymond-
2009/5/27 Raymond Balmès <[email protected]>

> I have many URLs per host of course. Need to get all the pages of the
> sites, don't understand the question.
>
> -Raymond
>
> 2009/5/26 Otis Gospodnetic <[email protected]>
>
>
>> But how, Ray, if you have only 1 URL per host?
>>
>> Otis
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>
>>
>>
>> ----- Original Message ----
>> > From: Raymond Balmès <[email protected]>
>> > To: [email protected]
>> > Sent: Tuesday, May 26, 2009 4:11:27 PM
>> > Subject: Re: threads get stuck in spinwaiting
>> >
>> > Observing what my crawls do, I believe Ken must be right.
>> > Towards the end of the crawl (when the fetchqueues.totalSize="xxxx"
>> counts
>> > down) in some cases I'm only fetching on two sites roughly , so indeed
>> the
>> > politeness starts to play a role there at least it should.
>> >
>> > -Ray-
>> >
>> > 2009/5/26 Raymond Balmès
>> >
>> > > Please read this too :
>> > >
>> > >
>> >
>> http://ken-blog.krugler.org/2009/05/19/performance-problems-with-verticalfocused-web-crawling/
>> > >
>> > > Interesting build from ken.
>> > >
>> > > 2009/5/26 Raymond Balmès
>> > >
>> > >  yes already reported in multiple-threads.
>> > >> I noted that if one does a "recrawl" you don't get this behavior...
>> no
>> > >> idea why.
>> > >>
>> > >> -Raymond-
>> > >>
>> > >> 2009/5/26 Larsson85
>> > >>
>> > >>
>>  > >>> When I try to do my crawl it seems like the threads get stuck in
>> som
>> > >>> spinwaiting mode. At first the crawl goes as planned, and I couldnt
>> be
>> > >>> happier. But after som time, it starts reporting more of these
>> > >>> spinwaiting
>> > >>> messages.
>> > >>>
>> > >>> I print a log here to show you what it looks like. As you can see it
>> gets
>> > >>> stuck, and the queue decrease by 1 all the time. I've tried doing a
>> > >>> smaller
>> > >>> crawl, and what happends is that it counts down untill the
>> > >>> "fetchQueues.totalSize" reaches 0, and then the crawl is done.
>> > >>>
>> > >>> But the problem is that this countdown is very slow,there's no
>> effective
>> > >>> crawling going on, not using eather bandwith or cpu power. Basicly,
>> this
>> > >>> costs way to much time, I cant let it go on like this for hours to
>> be
>> > >>> done.
>> > >>> How can I fix this?
>> > >>>
>> > >>>
>> > >>> after about an hour of crawling this is what the log looks like
>> > >>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2526
>> > >>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2526
>> > >>>  - fetching http://home.swipnet.se/~w-147200/
>> > >>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2525
>> > >>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2525
>> > >>>  - fetching http://biphome.spray.se/alarsson/
>> > >>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2524
>> > >>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2524
>> > >>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2524
>> > >>>  - fetching http://home.swipnet.se/~w-31853/html/
>> > >>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2523
>> > >>>
>> > >>> ....
>> > >>>
>> > >>> --
>> > >>> View this message in context:
>> > >>>
>> >
>> http://www.nabble.com/threads-get-stuck-in-spinwaiting-tp23723825p23723825.html
>> > >>> Sent from the Nutch - User mailing list archive at Nabble.com.
>> > >>>
>> > >>>
>> > >>
>> > >
>>
>>
>

Reply via email to