[ http://issues.apache.org/jira/browse/NUTCH-277?page=comments#action_12413603 ]
Andrzej Bialecki commented on NUTCH-277: ----------------------------------------- Could you provide the URL that is causing this problem? I'd like to know if it's happening during the page fetching, or when the plugin first tries to fetch /robots.txt . > Fetcher dies because of "max. redirects" (avoiding infinite loop) > ----------------------------------------------------------------- > > Key: NUTCH-277 > URL: http://issues.apache.org/jira/browse/NUTCH-277 > Project: Nutch > Type: Bug > Components: fetcher > Versions: 0.8-dev > Environment: nightly-2006-05-20 > Reporter: Stefan Neufeind > Priority: Critical > > Error in the logs is: > 060521 213401 SEVERE Narrowly avoided an infinite loop in execute > org.apache.commons.httpclient.RedirectException: Maximum redirects (100) > exceeded > at > org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:183) > at > org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:396) > at > org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:324) > at > org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:87) > at org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:97) > at > org.apache.nutch.protocol.http.api.RobotRulesParser.isAllowed(RobotRulesParser.java:394) > at > org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:173) > at > org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:135) > This happens during normal crawling. Unfortunately I don't know how to > further track this down. But it's problematic, since it actually makes the > fetcher die. > Workaround (for the symptom) is in NUTCH-258 (avoid dying on SEVERE > logentry). That works for me, crawling works fine and it does not hang/crash. > However this is working around the problems not solving them - I know. But > it helps for the moment ... > Hope somebody can help - this loops quite important to track down to me. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------- All the advantages of Linux Managed Hosting--Without the Cost and Risk! Fully trained technicians. The highest number of Red Hat certifications in the hosting industry. Fanatical Support. Click to learn more http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
