[ https://issues.apache.org/jira/browse/NUTCH-844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrzej Bialecki updated NUTCH-844: ------------------------------------ Attachment: NUTCH-844.patch Updated patch. This also addresses an issue in PluginRepository that uses Configuration as a key in its internal cache - the problem though is that Configuration doesn't implement hashCode, so the cache would have been ineffective in situations like this: {code} Configuration conf = NutchConfiguration.create(); PluginRepository repo1 = PluginRepository.get(conf); JobConf job = new NutchJob(conf); PluginRepository repo2 = PluginRepository.get(job); // repo2 is a new instance, but should be the same instance! {code} The new code sets a UUID property, so the cache knows it's still the same instance. There's a new unit test to ensure this works properly when using NutchConfiguration.create(), and illustrates that it fails without the uuid. > Improve NutchConfiguration > -------------------------- > > Key: NUTCH-844 > URL: https://issues.apache.org/jira/browse/NUTCH-844 > Project: Nutch > Issue Type: Improvement > Affects Versions: 2.0 > Reporter: Andrzej Bialecki > Assignee: Andrzej Bialecki > Fix For: 2.0 > > Attachments: conf.patch, NUTCH-844.patch > > > This patch cleans up NutchConfiguration from servlet dependency, and modifies > the API to allow bootstrapping via API from Properties. This is important for > use cases where Nutch is embedded in a larger application. > Also, while I'm at it, remove the support for alternative "crawl" > configuration when running Crawl tool, which has always been a source of > confusion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.