[ https://issues.apache.org/jira/browse/NUTCH-2222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Adnane B. updated NUTCH-2222: ----------------------------- Description: This problem happens at the the second time a crawl a page bin/nutch inject urls/ bin/nutch generate -topN 1000 bin/nutch fetch -all bin/nutch parse -force -all bin/nutch updatedb -all seconde time : bin/nutch generate -topN 1000 --> bachid changes for all existing pages bin/nutch fetch -all --> *** metadatas are delete for all pages already crawled ** bin/nutch parse -force -all bin/nutch updatedb -all I'm using mongodb was: This problem happens at the the second time a crawl a page bin/nutch inject urls/ bin/nutch generate -topN 1000 bin/nutch fetch -all bin/nutch parse -force -all bin/nutch updatedb -all seconde time : bin/nutch generate -topN 1000 bin/nutch fetch -all --> *** metadatas are delete for all pages already crawled ** bin/nutch parse -force -all bin/nutch updatedb -all I'm using mongodb > fetch deletes all metadata except _csh_ and _rs_ > ------------------------------------------------- > > Key: NUTCH-2222 > URL: https://issues.apache.org/jira/browse/NUTCH-2222 > Project: Nutch > Issue Type: Bug > Components: crawldb > Affects Versions: 2.3.1 > Environment: Centos 6, mongodb 2.6 and mongodb 3.0 > Reporter: Adnane B. > > This problem happens at the the second time a crawl a page > bin/nutch inject urls/ > bin/nutch generate -topN 1000 > bin/nutch fetch -all > bin/nutch parse -force -all > bin/nutch updatedb -all > seconde time : > bin/nutch generate -topN 1000 --> bachid changes for all existing pages > bin/nutch fetch -all --> *** metadatas are delete for all pages already > crawled ** > bin/nutch parse -force -all > bin/nutch updatedb -all > I'm using mongodb -- This message was sent by Atlassian JIRA (v6.3.4#6332)