Richard Braman wrote:
when you get an error while fetching, and you get the
org.apache.nutch.protocol.retrylater because the max retries have been
reached, nutch says it has given up and will retry later, when does that
retry occur?  How would you make a fetchlist of all urls that have
failed?  Is this information maintained somewhere?

Each url in the crawldb has a retry count, the number of times it has been tried without a conclusive result. When the maximum (db.fetch.retry.max) then the page is considered gone. Until then it will be generated for fetch along with other pages. There is no command that generates a fetchlist for only pages whose retry count is greater than zero.

Doug

Reply via email to