Set the http.redirect.max property in nutch-site.xml to > 0, usually
around 3. Default is 0 so won't follow redirects.
Dennis
<property>
<name>http.redirect.max</name>
<value>0</value>
<description>The maximum number of redirects the fetcher will follow when
trying to fetch a page. If set to negative or 0, fetcher won't
immediately
follow redirected URLs, instead it will record them for later fetching.
</description>
</property>
Larsson85 wrote:
When I do a dump of my segments I often find entries that looks like the
following
<HTML><HEAD>
<TITLE>301 Moved Permanently</TITLE>
</HEAD><BODY>
<H1>Moved Permanently</H1>
I suppose that this means that the page wants to redirect. How can I make
nutch follow that redirection and crawl that page instead?
It's not just one or two pages that looks like this, it's very frequently.