Hi,

I have been using Nuth for some years now.  I am using it under Gygwin, with 
Windows XP, with 2GB memory, nominal bandwith 6 Megs,  using a single 
server,with pages in the range of 300,000 for a vertical semi-production 
engine.  I use 60 threads,  using the crawl method for the initial crawl and 
end up using the whole web method.  Until the last release, in the fetching 
phase, I had, on my screen a steady rolling list of the pages being indexed.  
Everything worked, almost 100% of the time, quite smoothly.

Them I tried the new version, and, on the screen, I got some weird indications, 
like below, and , unfortunateley, on a turtle like speed:

fetch of http://www.greenpeace.org/brasil/transgenicos/noticias/text/javascript 
failed with: java.net.SocketTimeoutException: Read timed out
-activeThreads=60, spinWaiting=57, fetchQueues.totalSize=0
-activeThreads=60, spinWaiting=57, fetchQueues.totalSize=0
fetch of 
http://www.greenpeace.org/international/press/reports/nuclear-waste-crisis-france
 failed with: java.net.SocketTimeoutException: Read timed out
-activeThreads=60, spinWaiting=58, fetchQueues.totalSize=0
-activeThreads=60, spinWaiting=59, fetchQueues.totalSize=0
-activeThreads=60, spinWaiting=59, fetchQueues.totalSize=0
-activeThreads=60, spinWaiting=59, fetchQueues.totalSize=0
-activeThreads=60, spinWaiting=58, fetchQueues.totalSize=0
Unable to resolve: www.fishunlimited.org, skipping.
fetching 
http://www.forests.org/archived_site/today/recent/1997/forfadef_files/filelist.xml
fetching http://www.rpi.edu/news/podcasts.html
fetching http://www.news24.com/Beeld/Gallery/Home/0,,,00.html
fetching http://www.epo.org/
-activeThreads=60, spinWaiting=55, fetchQueues.totalSize=0
fetching http://vcforum.eagle.org/banning.cfm
fetching http://cdn.socialtwist.com/2009022511095/script.js
fetching http://www.lrqa.com.br/treinamento/
-activeThreads=60, spinWaiting=54, fetchQueues.totalSize=0
fetching http://www.processingtalk.com/news/eme/eme416.html
fetching http://www.sciencedaily.com/releases/2009/03/090324111600.htm
fetching http://www.asnt-glas.org/meetings.htm
-activeThreads=60, spinWaiting=53, fetchQueues.totalSize=0
fetching http://www.embrapa.gov.br/destaques_imagem/brasil-visto-do-espaco
-activeThreads=60, spinWaiting=57, fetchQueues.totalSize=0
-activeThreads=60, spinWaiting=57, fetchQueues.totalSize=0
-activeThreads=60, spinWaiting=57, fetchQueues.totalSize=0
-activeThreads=60, spinWaiting=57, fetchQueues.totalSize=0
fetching http://www.uscg.mil/comdt/blog/2009/01
fetching http://www1.eere.energy.gov/inventions/energytechnet/includes/opera/5

More than this, very often the fect is aborted with 60 hung trheads and, when I 
suceed, it seems ( I am not absolutely sure about this,but with a very strong 
feeling,  considering the size of the resulting segment), that, some times the 
option `topN` is not respected, with less pages fetched than intended.

So, I am relating my own experience, as a simple user of Nutch, hoping that the 
problems that I faced can be correct, so that I can use Nutch-1.0, wht is not 
feasable now.

Thank you

Reply via email to