Hi,

I just had to solve the same problem. Luckily, I found a patch for it here:
http://issues.apache.org/jira/browse/NUTCH-273

I applied it to Nutch 0.8.1, and it works. No more infinite loops.
Just be sure to put the line into the right block (redirect).

greetings from Berlin (CET),

RĂ¼diger


Carl Cerecke-3 wrote:
> 
> This is the behaviour I am noticing with pages that have a server 
> redirect (300-range code):
> 
> Say page A redirects to page B. A is in the fetchlist created by 
> generate. When A is fetched, the redirect is followed and B is fetched. 
> At the next updatedb, both A and B go into the crawldb. For some reason, 
>   at the next generate, page B is listed to be fetched. And again at the 
> next generate, and so on.
> 
> 
> An example is:
> 
> http://www.selecthotels.com
> 
> which redirects to http://203.210.113.143/ ('page B').
> This page always seems to be in the fetchlist no matter how many times 
> it gets fetched. (To make matter more complicated, it also redirects to 
> yet another URL.)
> 
> How do I fix this behaviour?
> 
> Also, other URLs whose fetch fails for some reason stay in the crawldb 
> and are tried again and again. For a 'deep' search using topN=1000, each 
> fetchlist generated after a number of runs has many hundreds of these 
> failed URLs that it tries to refetch.
> 
> How do I fix this behaviour too?
> 
> 
> 
> End of the day for me (NZST). I'll try again tomorrow....
> 
> Cheers,
> Carl.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Redirected-to-pages-and-not-there-pages-are-fetched-multiple-times-tf4149250.html#a11811984
Sent from the Nutch - User mailing list archive at Nabble.com.


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to