[
https://issues.apache.org/jira/browse/NUTCH-516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney closed NUTCH-516.
-------------------------------
Resolved and committed.
> Next fetch time is not set when it is a CrawlDatum.STATUS_FETCH_GONE
> --------------------------------------------------------------------
>
> Key: NUTCH-516
> URL: https://issues.apache.org/jira/browse/NUTCH-516
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Environment: Java 1.6, Linux 2.6
> Reporter: Emmanuel Joke
> Fix For: 1.0.0
>
> Attachments: NUTCH-516.patch
>
>
> We can not crawl some page due to a robots restriction. In this case we
> update the db with the Metada: _pst_:robots_denied(18) , we add the status
> code 3 and we change the fecth interval to 67.5 days.
> Unfortunetely the Fetch time is never change, so it keeps generating this
> page and fetching it every time.
> We should update the schedule fetch in crawldb to reflect to the fetch
> interval.
> We should add in crawldbreducer:
> case CrawlDatum.STATUS_FETCH_GONE: // permanent failure
> if (old != null)
> result.setSignature(old.getSignature()); // use old signature
> result.setStatus(CrawlDatum.STATUS_DB_GONE);
> result = schedule.setPageGoneSchedule((Text)key, result, prevFetchTime,
> prevModifiedTime, fetch.getFetchTime());
> // set the schedule
> result = schedule.setFetchSchedule((Text)key, result, prevFetchTime,
> prevModifiedTime, fetch.getFetchTime(), fetch.getModifiedTime(),
> modified);
> break;
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers