Two days ago I posted this message below to the nutch-user list already.
Because nobody answered yet I think this is more an developer than an
user issue.
(for me it seems to be a bug).
I would like to discuss it with a nutch developer.
thanks!

----------------------------------------------
Hello,

just a view days ago we started to use Nutch (0.7.1).
It's really nice and I would like to see it evolve.

Here's my issue/question:

While fetching our URLs, we got some errors like this:
60202 154316 fetch of http://www.test-domain.de/crawl_html/page_2.html
failed with: java.lang.Exception:
org.apache.nutch.protocol.RetryLater: Exceeded http.max.delays: retry
later.
That seems to be ok and indicates some network problems.

The problem is that the entry in the Webdb shows the following:

Page 4: Version: 4
URL: http://www.test-domain.de/crawl_html/page_2.html
ID: b360ec931855b0420776909bd96557c0
Next fetch: Sun Aug 17 07:12:55 CET 292278994
Retries since fetch: 0
Retry interval: 0 days

The 'Next fetch' date is set to the year '292278994'.
Probably I wouldn't be able to see the refetch alive. ;)

What's wrong here? I hope it's not my lifespan. ;)
A page that couldn't be crawled because of networks-problems,
should be refetched with the next crawl (== set next fetch date to the
next day).

I'm just using standard api of nutch 0.7.1 like:

WebDBWriter webdb = new WebDBWriter(fileSystem, new File(dbPath));
UpdateDatabaseTool tool = new UpdateDatabaseTool(webdb, true, -1);
tool.updateForSegment(fileSystem, lseg);
tool.close();

Thanks
mos


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid3432&bid#0486&dat1642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to