[ 
https://issues.apache.org/jira/browse/NUTCH-844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  updated NUTCH-844:
------------------------------------

    Attachment: NUTCH-844.patch

Updated patch. This also addresses an issue in PluginRepository that uses 
Configuration as a key in its internal cache - the problem though is that 
Configuration doesn't implement hashCode, so the cache would have been 
ineffective in situations like this:
{code}
Configuration conf = NutchConfiguration.create();
PluginRepository repo1 = PluginRepository.get(conf);
JobConf job = new NutchJob(conf);
PluginRepository repo2 = PluginRepository.get(job);
// repo2 is a new instance, but should be the same instance!
{code}

The new code sets a UUID property, so the cache knows it's still the same 
instance. There's a new unit test to ensure this works properly when using 
NutchConfiguration.create(), and illustrates that it fails without the uuid.

> Improve NutchConfiguration
> --------------------------
>
>                 Key: NUTCH-844
>                 URL: https://issues.apache.org/jira/browse/NUTCH-844
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 2.0
>            Reporter: Andrzej Bialecki 
>            Assignee: Andrzej Bialecki 
>             Fix For: 2.0
>
>         Attachments: conf.patch, NUTCH-844.patch
>
>
> This patch cleans up NutchConfiguration from servlet dependency, and modifies 
> the API to allow bootstrapping via API from Properties. This is important for 
> use cases where Nutch is embedded in a larger application.
> Also, while I'm at it, remove the support for alternative "crawl" 
> configuration when running Crawl tool, which has always been a source of 
> confusion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to