[ https://issues.apache.org/jira/browse/NUTCH-907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910109#action_12910109 ]
Andrzej Bialecki commented on NUTCH-907: ----------------------------------------- That's very good news - in that case I'm fine with the Gora API as it is now, we should change Nutch to make use of this functionality. > DataStore API doesn't support multiple storage areas for multiple disjoint > crawls > --------------------------------------------------------------------------------- > > Key: NUTCH-907 > URL: https://issues.apache.org/jira/browse/NUTCH-907 > Project: Nutch > Issue Type: Bug > Reporter: Andrzej Bialecki > Fix For: 2.0 > > > In Nutch 1.x it was possible to easily select a set of crawl data (crawldb, > page data, linkdb, etc) by specifying a path where the data was stored. This > enabled users to run several disjoint crawls with different configs, but > still using the same storage medium, just under different paths. > This is not possible now because there is a 1:1 mapping between a specific > DataStore instance and a set of crawl data. > In order to support this functionality the Gora API should be extended so > that it can create stores (and data tables in the underlying storage) that > use arbitrary prefixes to identify the particular crawl dataset. Then the > Nutch API should be extended to allow passing this "crawlId" value to select > one of possibly many existing crawl datasets. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.