The reason no one answered is because it has been answered before a couple of times. If you do a search on this mailing list for fetcher slowness or fetcher hung threads you will get answers. You can also take a look at NUTCH-344. This problem has come up before and there are patches which fix this. It has to do with crawl delays being set to a big value by the pages being fetched. The configuration below is the nutch-site.xml file should fix this depending on the version of Nutch you are using.
<property> <name>fetcher.max.crawl.delay</name> <value>30</value> <description> If the Crawl-Delay in robots.txt is set to greater than this value (in seconds) then the fetcher will skip this page, generating an error report. If set to -1 the fetcher will never skip such pages and will wait the amount of time retrieved from robots.txt Crawl-Delay, however long that might be. </description> </property> Dennis Aïcha wrote: > Hi, > > I don't know why but I have no answer on the 3 forums where I sent my > problem........ > As the problem of Fetcher freezes occurs every time I try to fetch my file > system I can't imagine that I am the only one who have this problem and as I > said in my last e-mail, I found many mails about this problem but no solution > seems have been done........ > It is a big problem so I don't understand why nobody seems interested on > it........ > > can anyone tell me if he encountred the problem and how to do......... > thanks in advance. > Aïcha > > > ----- Message d'origine ---- > De : Aïcha <[EMAIL PROTECTED]> > À : [email protected] > Envoyé le : Lundi, 30 Octobre 2006, 18h16mn 26s > Objet : Urgent : Fetcher aborts with hung threads > > > Hi, > > I try to crawl over my file system but the crawl was never finished, it > aborted > with the message "Aborting with 3 hung threads". > > The number of hung threads is not the same if I retry.... > > I see that the problem was posted many times and the last was by Bruno Thiel > the 2006/10/11, > but I think it isn't linked with the xls files as the problem occurs after > different type of format. > > I modify the configuration grawing the number of threads but it doen't solved > the problem........ > > Please could somebody help me, > I can't crawl my file system.......... > > Best Regards, > Aïcha > > > > > > > ___________________________________________________________________________ > Découvrez une nouvelle façon d'obtenir des réponses à toutes vos questions ! > Profitez des connaissances, des opinions et des expériences des internautes > sur Yahoo! Questions/Réponses > http://fr.answers.yahoo.com > > > > > > > ___________________________________________________________________________ > Découvrez une nouvelle façon d'obtenir des réponses à toutes vos questions ! > Profitez des connaissances, des opinions et des expériences des internautes > sur Yahoo! Questions/Réponses > http://fr.answers.yahoo.com > ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
