Jérôme Charron wrote:
Excuse me in advance, I probably missed something, but what are the use
cases for having many NutchConf instances with different values?
Running many different tasks in parallel, each using different config,
inside the same JVM.
Ok, I understand this Andrzej, but it is not really what I call a use case.
It is more a feature that you describe here.
In fact, what I mean is that I don't understand in which cases it will be
usefull. And I don't understand how a particular
NutchConfig will be selected for a particular task...
Use case: executing multiple tasks on any single tasktracker node, but
with drastically different configurations per each task.
Example: what happens now if you try to run more than one fetcher at the
same time, where the fetcher parameters differ (or a set of activated
plugins differs)? You can't - the local tasks on each tasktracker will
use whatever local config is there. What happens if you change the
config on a node that submits the job? The changes won't be propagated
to the tasktracker nodes, because tasktrackers use local configuration
(through a singleton NutchConf.get()), instead of supplying a
serialized/deserialized instance of the config from the originating
node... etc.
NutchConf instances will be created when you create a JobConf. Then they
will have to be serialized/deserialized when job descriptors are sent by
jobtracker to tasktrackers on mapred nodes, and used locally by
tasktrackers to instantiate local tasks using copies of the original
NutchConf instance.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com