Hi,
until writing theses Test that mades the generation bug reproducable I discovered another strange behavior.
Following test fail:

        public void testConf() throws Exception {
                NutchConf conf = NutchConf.get();
                conf.setInt("mapred.reduce.tasks", 2);
                JobConf jobConf = new JobConf(conf);
                assertEquals(2, jobConf.getInt("mapred.reduce.tasks", 25));
        }

What happen is that the JobConf calles addConfResource("mapred- default.xml"). Sure that make sense but the way the new Resource is loaded is really strange - from my point of view. Instead reading the configuration file and add or overwriting the method, the properties object is setted to null and the file is added to a list of files that need to be loaded.
Until next get call the all configuration files are reloaded.
That means anytime a new Configuration resource is added all configuration file will be reloaded and more important setted values will be deleted or overwritten by default values as soon a configuration resouce is added.
This happens for example in the JobConf.
So on the one hand side all classes that implementing NutchConfigurable should be configurable but in reality since all this of such classes create a own new JobConf they are not configurable at all.

My suggestion is that we change NutchConf is following way:

changing the private synchronized void addConfResourceInternal (Object name)
from

resourceNames.add(resourceNames.size()-1, name); // add second to last
properties = null;                            // trigger reload

to:

resourceNames.add(resourceNames.size()-1, name); // add second to last
loadResource(properties, name, false);


Any comments?
Should I contribute a patch for this one line edit?

Stefan

P.S. BTW, this bug makes unit testing distributed map reduce impossible..

Reply via email to