Jurian Broertjes created NUTCH-2565: ---------------------------------------
Summary: MergeDB incorrectly handles unfetched CrawlDatums Key: NUTCH-2565 URL: https://issues.apache.org/jira/browse/NUTCH-2565 Project: Nutch Issue Type: Bug Affects Versions: 1.14 Reporter: Jurian Broertjes I ran into this issue when merging a crawlDB originating from sitemaps into our normal crawlDB. CrawlDatums are merged based on output of AbstractFetchSchedule::calculateLastFetchTime(). When CrawlDatums are unfetched, this can overwrite fetchTime or other stuff. I assume this is a bug and have a simple fix for it that checks if CrawlDatum has status db_unfetched. -- This message was sent by Atlassian JIRA (v7.6.3#76005)