Andrzej Bialecki wrote:
Example: what happens now if you try to run more than one fetcher at the same time, where the fetcher parameters differ (or a set of activated plugins differs)? You can't - the local tasks on each tasktracker will use whatever local config is there.

That's true when mapred.job.tracker=local, but when things are distributed the config can vary since each task is spawned in a separate JVM with a separate classpath. The nutch-site.xml on each node can never be overidden. For example, so long as plugin.includes is not specified in nutch-site.xml on each node, then each task can override plugin.includes to use different plugins.

Also note that plugin implementations can submitted in a jar file with the job, and plugin.folders can be overridden in the job to find the new plugins. So a job jar might include a folder named "my.plugins" and set plugin.folders to "my.plugins, plugins", then alter plugin.includes to include job-specific plugins.

What happens if you change the config on a node that submits the job? The changes won't be propagated to the tasktracker nodes, because tasktrackers use local configuration (through a singleton NutchConf.get()), instead of supplying a serialized/deserialized instance of the config from the originating node... etc.

Again, I'm not sure this is a problem. Properties which tasks should be able to override should not be specified in nutch-site.xml, but rather in mapred-default.xml. Lots of job-specific properties are currently passed this way.

Another use case for eliminating the static uses of NutchConf is to simplify the construction of a configuration gui. It would be nice to have a web-based interface which permits one to configure parameters and then have it run the system. This should be able to run multiple Nutch instances in a single JVM. For example, a single Nutch-based "search appliance" daemon should be able to crawl and search both your intranet and your public websites, each configured separately.

Doug


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to