On Wed, Apr 1, 2009 at 22:47, consultas <[email protected]> wrote:

> Hi,
>
> I have been using Nuth for some years now.  I am using it under Gygwin,
> with Windows XP, with 2GB memory, nominal bandwith 6 Megs,  using a single
> server,with pages in the range of 300,000 for a vertical semi-production
> engine.  I use 60 threads,  using the crawl method for the initial crawl and
> end up using the whole web method.  Until the last release, in the fetching
> phase, I had, on my screen a steady rolling list of the pages being indexed.
>  Everything worked, almost 100% of the time, quite smoothly.
>
> Them I tried the new version, and, on the screen, I got some weird
> indications, like below, and , unfortunateley, on a turtle like speed:
>
> fetch of
> http://www.greenpeace.org/brasil/transgenicos/noticias/text/javascriptfailed 
> with: java.net.SocketTimeoutException: Read timed out
> -activeThreads=60, spinWaiting=57, fetchQueues.totalSize=0
> -activeThreads=60, spinWaiting=57, fetchQueues.totalSize=0
> fetch of
> http://www.greenpeace.org/international/press/reports/nuclear-waste-crisis-francefailed
>  with: java.net.SocketTimeoutException: Read timed out
> -activeThreads=60, spinWaiting=58, fetchQueues.totalSize=0
> -activeThreads=60, spinWaiting=59, fetchQueues.totalSize=0
> -activeThreads=60, spinWaiting=59, fetchQueues.totalSize=0
> -activeThreads=60, spinWaiting=59, fetchQueues.totalSize=0
> -activeThreads=60, spinWaiting=58, fetchQueues.totalSize=0
> Unable to resolve: www.fishunlimited.org, skipping.
> fetching
> http://www.forests.org/archived_site/today/recent/1997/forfadef_files/filelist.xml
> fetching http://www.rpi.edu/news/podcasts.html
> fetching http://www.news24.com/Beeld/Gallery/Home/0,,,00.html
> fetching http://www.epo.org/
> -activeThreads=60 <http://www.epo.org/%0A-activeThreads=60>,
> spinWaiting=55, fetchQueues.totalSize=0
> fetching http://vcforum.eagle.org/banning.cfm
> fetching http://cdn.socialtwist.com/2009022511095/script.js
> fetching http://www.lrqa.com.br/treinamento/
> -activeThreads=60<http://www.lrqa.com.br/treinamento/%0A-activeThreads=60>,
> spinWaiting=54, fetchQueues.totalSize=0
> fetching http://www.processingtalk.com/news/eme/eme416.html
> fetching http://www.sciencedaily.com/releases/2009/03/090324111600.htm
> fetching http://www.asnt-glas.org/meetings.htm
> -activeThreads=60, spinWaiting=53, fetchQueues.totalSize=0
> fetching http://www.embrapa.gov.br/destaques_imagem/brasil-visto-do-espaco
> -activeThreads=60<http://www.embrapa.gov.br/destaques_imagem/brasil-visto-do-espaco%0A-activeThreads=60>,
> spinWaiting=57, fetchQueues.totalSize=0
> -activeThreads=60, spinWaiting=57, fetchQueues.totalSize=0
> -activeThreads=60, spinWaiting=57, fetchQueues.totalSize=0
> -activeThreads=60, spinWaiting=57, fetchQueues.totalSize=0
> fetching http://www.uscg.mil/comdt/blog/2009/01
> fetching
> http://www1.eere.energy.gov/inventions/energytechnet/includes/opera/5
>
> More than this, very often the fect is aborted with 60 hung trheads and,
> when I suceed, it seems ( I am not absolutely sure about this,but with a
> very strong feeling,  considering the size of the resulting segment), that,
> some times the option `topN` is not respected, with less pages fetched than
> intended.
>
> So, I am relating my own experience, as a simple user of Nutch, hoping that
> the problems that I faced can be correct, so that I can use Nutch-1.0, wht
> is not feasable now.
>

This log:

-activeThreads=60, spinWaiting=53, fetchQueues.totalSize=0

is no big deal. This is nutch showing you information you probably
don't need :)

During nutch 1.0 development, a new fetcher was developed and
it replaced the old fetcher. Because the new fetcher has a better more
flexible code base. However, you are not the first person who reported
problems with it. You may find tracking this issue useful while this
is sorted out:

https://issues.apache.org/jira/browse/NUTCH-721


>
> Thank you
>
>


-- 
Doğacan Güney

Reply via email to