Set the http.redirect.max property in nutch-site.xml to > 0, usually around 3. Default is 0 so won't follow redirects.

Dennis

<property>
  <name>http.redirect.max</name>
  <value>0</value>
  <description>The maximum number of redirects the fetcher will follow when
trying to fetch a page. If set to negative or 0, fetcher won't immediately
  follow redirected URLs, instead it will record them for later fetching.
  </description>
</property>

Larsson85 wrote:
When I do a dump of my segments I often find entries that looks like the
following

<HTML><HEAD>
<TITLE>301 Moved Permanently</TITLE>
</HEAD><BODY>
<H1>Moved Permanently</H1>
I suppose that this means that the page wants to redirect. How can I make
nutch follow that redirection and crawl that page instead?

It's not just one or two pages that looks like this, it's very frequently.

Reply via email to