[ https://issues.apache.org/jira/browse/NUTCH-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432083#comment-16432083 ]
Jurian Broertjes commented on NUTCH-2565: ----------------------------------------- PR: https://github.com/apache/nutch/pull/311 > MergeDB incorrectly handles unfetched CrawlDatums > ------------------------------------------------- > > Key: NUTCH-2565 > URL: https://issues.apache.org/jira/browse/NUTCH-2565 > Project: Nutch > Issue Type: Bug > Affects Versions: 1.14 > Reporter: Jurian Broertjes > Priority: Minor > > I ran into this issue when merging a crawlDB originating from sitemaps into > our normal crawlDB. CrawlDatums are merged based on output of > AbstractFetchSchedule::calculateLastFetchTime(). When CrawlDatums are > unfetched, this can overwrite fetchTime or other stuff. > I assume this is a bug and have a simple fix for it that checks if CrawlDatum > has status db_unfetched. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)