Hi, I have been using Nuth for some years now. I am using it under Gygwin, with Windows XP, with 2GB memory, nominal bandwith 6 Megs, using a single server,with pages in the range of 300,000 for a vertical semi-production engine. I use 60 threads, using the crawl method for the initial crawl and end up using the whole web method. Until the last release, in the fetching phase, I had, on my screen a steady rolling list of the pages being indexed. Everything worked, almost 100% of the time, quite smoothly.
Them I tried the new version, and, on the screen, I got some weird indications, like below, and , unfortunateley, on a turtle like speed: fetch of http://www.greenpeace.org/brasil/transgenicos/noticias/text/javascript failed with: java.net.SocketTimeoutException: Read timed out -activeThreads=60, spinWaiting=57, fetchQueues.totalSize=0 -activeThreads=60, spinWaiting=57, fetchQueues.totalSize=0 fetch of http://www.greenpeace.org/international/press/reports/nuclear-waste-crisis-france failed with: java.net.SocketTimeoutException: Read timed out -activeThreads=60, spinWaiting=58, fetchQueues.totalSize=0 -activeThreads=60, spinWaiting=59, fetchQueues.totalSize=0 -activeThreads=60, spinWaiting=59, fetchQueues.totalSize=0 -activeThreads=60, spinWaiting=59, fetchQueues.totalSize=0 -activeThreads=60, spinWaiting=58, fetchQueues.totalSize=0 Unable to resolve: www.fishunlimited.org, skipping. fetching http://www.forests.org/archived_site/today/recent/1997/forfadef_files/filelist.xml fetching http://www.rpi.edu/news/podcasts.html fetching http://www.news24.com/Beeld/Gallery/Home/0,,,00.html fetching http://www.epo.org/ -activeThreads=60, spinWaiting=55, fetchQueues.totalSize=0 fetching http://vcforum.eagle.org/banning.cfm fetching http://cdn.socialtwist.com/2009022511095/script.js fetching http://www.lrqa.com.br/treinamento/ -activeThreads=60, spinWaiting=54, fetchQueues.totalSize=0 fetching http://www.processingtalk.com/news/eme/eme416.html fetching http://www.sciencedaily.com/releases/2009/03/090324111600.htm fetching http://www.asnt-glas.org/meetings.htm -activeThreads=60, spinWaiting=53, fetchQueues.totalSize=0 fetching http://www.embrapa.gov.br/destaques_imagem/brasil-visto-do-espaco -activeThreads=60, spinWaiting=57, fetchQueues.totalSize=0 -activeThreads=60, spinWaiting=57, fetchQueues.totalSize=0 -activeThreads=60, spinWaiting=57, fetchQueues.totalSize=0 -activeThreads=60, spinWaiting=57, fetchQueues.totalSize=0 fetching http://www.uscg.mil/comdt/blog/2009/01 fetching http://www1.eere.energy.gov/inventions/energytechnet/includes/opera/5 More than this, very often the fect is aborted with 60 hung trheads and, when I suceed, it seems ( I am not absolutely sure about this,but with a very strong feeling, considering the size of the resulting segment), that, some times the option `topN` is not respected, with less pages fetched than intended. So, I am relating my own experience, as a simple user of Nutch, hoping that the problems that I faced can be correct, so that I can use Nutch-1.0, wht is not feasable now. Thank you
