[ 
https://issues.apache.org/jira/browse/NUTCH-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699034#comment-13699034
 ] 

lufeng commented on NUTCH-1600:
-------------------------------

test work fine. 
+1
                
> Injector overwrite does not always work properly
> ------------------------------------------------
>
>                 Key: NUTCH-1600
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1600
>             Project: Nutch
>          Issue Type: Bug
>          Components: injector
>    Affects Versions: 1.7
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.8
>
>         Attachments: NUTCH-1600-1.8.patch
>
>
> db.injector.update works as it should but db.injector.overwrite doesn't 
> always seem to properly overwrite the record. This issue exists for some time 
> and we've already fixed it in our dist of Nutch.
> This record just has been updated (interval).
> {code}
> Injector: starting at 2013-07-03 10:34:15
> Injector: crawlDb: crawl/crawldb
> Injector: urlDir: seeds
> Injector: Converting injected urls to crawl db entries.
> Injector: total number of urls rejected by filters: 0
> Injector: total number of urls injected after normalization and filtering: 9
> Injector: Merging injected urls into crawl db.
> Injector: finished at 2013-07-03 10:34:21, elapsed: 00:00:05
> URL: url
> Version: 7
> Status: 2 (db_fetched)
> Fetch time: Fri Jul 05 12:11:44 CEST 2013
> Modified time: Fri Jun 28 12:11:44 CEST 2013
> Retries since fetch: 0
> Retry interval: 604800 seconds (7 days)
> Score: 0.0
> Signature: ba29ef3e680323a6d0da74c156800e03
> Metadata: Content-Type: text/html_pst_: success(1), lastModified=0
> {code}
> If we now overwrite the record, nothing happens. With this patch installed it 
> overwrites the record as it should and also logs update & overwrite switches 
> to console:
> {code}
> Injector: starting at 2013-07-03 10:36:30
> Injector: crawlDb: crawl/crawldb
> Injector: urlDir: seeds
> Injector: Converting injected urls to crawl db entries.
> Injector: total number of urls rejected by filters: 0
> Injector: total number of urls injected after normalization and filtering: 9
> Injector: Merging injected urls into crawl db.
> Injector: overwrite: true
> Injector: update: false
> Injector: finished at 2013-07-03 10:36:36, elapsed: 00:00:05
> URL: url
> Version: 7
> Status: 1 (db_unfetched)
> Fetch time: Wed Jul 03 10:36:30 CEST 2013
> Modified time: Thu Jan 01 01:00:00 CET 1970
> Retries since fetch: 0
> Retry interval: 14000 seconds (0 days)
> Score: 1.0
> Signature: null
> Metadata: fixedInterval: 14000.0
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to