Thamme Gowda N created NUTCH-2164: ------------------------------------- Summary: Inconsistent 'Modified Time' in crawl db Key: NUTCH-2164 URL: https://issues.apache.org/jira/browse/NUTCH-2164 Project: Nutch Issue Type: Improvement Components: crawldb, fetcher Affects Versions: 1.11 Reporter: Thamme Gowda N Priority: Minor
The 'Modified time' in crawldb is invalid. It is set to (0-Timezone Difference) *How to verify/reproduce:* Run 'nutch readdb /path/to/crawldb -dump yy' and then inspect content of 'yy' The following improvements can be done: 1. Set modified time by DefaultFetchSchedule 2. Set ProtocolStatus.lastModified if modified time is available in protocol response headers This issue is also discussed in dev mailing lists: http://www.mail-archive.com/dev@nutch.apache.org/msg19803.html# -- This message was sent by Atlassian JIRA (v6.3.4#6332)