Andrzej, How do you choose the NutchConf to use ? Here is a short discussion I had with Doug about a kind of dynamic NutchConf inside the same JVM:
"... By looking at the mailing lists archives it seems that having some behavior depending on the documents URL is a recurrent problem (for instance for boosting documents matching a url pattern - NUTCH-16 issue, and many other topics). So, our idea is to provide a way to provide a "dynamic" nutch configuration (that override the default one, like for the nutch-site) based on documents matching urls pattern. The idea is as follow: 1. The default configuration is as usualy the nutch-default.xml file 2. An xml file can map some url regexp to some many others configurations files (that override the nutch-default): <nutch:conf> <url regexp="http://www.mydomain1.com/*"> <!-- A set of nutch properties that override the nutch-default for this domain --> <property> <name>property1</name> <value>value1</name> </property> .... </url> .... </nutch:conf>" What do you think about this? Looking deeper, this is more messy that I thought... Some changes would > be required to the plugin instantiation mechanisms, e.g.: > > Extension.getExtensionInstance() -> getExtensionInstance(NutchConf) > ExtensionPoint.getExtensions() -> getExtensions(NutchConf) > PluginRepository.getExtensionPoint(String) -> > getExtensionPoint(String, NutchConf) > > etc, etc... > > The way this would work would be similar to the mechanism described > above: if plugin instances are not created yet, they would be created > once (based on the current NutchConf argument), and then cached in this > NutchConf instance. > > And also the plugin implementations would have to extend > NutchConfigured, taking NutchConf as the argument to their constructors > - because now the Extension.getExtensionInstance would pass the current > NutchConf instance to their contructors. That's exactly what I had in mind while speaking about a dynamic NutchConf with Doug. For me it's a +1 The only think I don't really like is extending the NutchConfigured, but it is the most secured way to implement it. Regards Jérôme -- http://motrech.free.fr/ http://www.frutch.org/