Thamme Gowda N created NUTCH-2164:
-------------------------------------

             Summary: Inconsistent 'Modified Time' in crawl db
                 Key: NUTCH-2164
                 URL: https://issues.apache.org/jira/browse/NUTCH-2164
             Project: Nutch
          Issue Type: Improvement
          Components: crawldb, fetcher
    Affects Versions: 1.11
            Reporter: Thamme Gowda N
            Priority: Minor


The 'Modified time' in crawldb is invalid. It is set to (0-Timezone Difference)

*How to verify/reproduce:*
  Run 'nutch readdb /path/to/crawldb -dump yy' and then inspect content of 'yy'

The following improvements can be done:
1. Set modified time by DefaultFetchSchedule
2. Set ProtocolStatus.lastModified if modified time is available in protocol 
response headers


This issue is also discussed in dev mailing lists: 
http://www.mail-archive.com/dev@nutch.apache.org/msg19803.html#



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to