[ http://issues.apache.org/jira/browse/HADOOP-127?page=comments#action_12432153 ] Frédéric Bertin commented on HADOOP-127: ----------------------------------------
<quote>Folks should only define things in the -site files if they want to force them for all code. </quote> I should have read this earlier, it would have saved me some time. Actually, the fact that properties defined in hadoop-final.xml override EVERYTHING, included properties defined in job config files, is something very important that should be well documented, because it's not the intuitively expected behaviour (which, to me, was: - hadoop-default.xml, mapred-default.xml overrided by - hadoop-final.xml, overrided by - job config files I've searched the wiki (afterwards, unfortunately) and it's very well documented there. However, the comments included in hadoop-default.xml and other delivered config files are not clear about this. Maybe they should be detailed, or just link to the wiki page. > Unclear precedence of config files and property definitions > ----------------------------------------------------------- > > Key: HADOOP-127 > URL: http://issues.apache.org/jira/browse/HADOOP-127 > Project: Hadoop > Issue Type: Bug > Components: conf > Environment: Hadoop 0.1.1, Nutch 0.8-dev > Reporter: Andrzej Bialecki > > The order in which configuration resources are read is not sufficiently > documented, and also there are no mechanisms preventing harmful re-definition > of certain properties, if they are put in wrong config files. > From reading the code in Hadoop Configuration.java, JobConf.java and Nutch > NutchConfiguration.java I _think_ this is what's happening. > There are two groups of resources: default resources, loaded first, and final > resources, loaded at the end. All properties (re)-defined in files loaded > later will override any previous definitions: > * default resources: loaded in the order as they are added. The following > files are added here, in order: > 1. hadoop-default.xml (Configuration) > 2. nutch-default.xml (NutchConfiguration) > 3. mapred-default.xml (JobConf) > 4. job_xx_xxx.xml (JobConf, in JobConf(File config)) > * final resource: which always come after default resources, i.e. if any > value is defined here it will always override those set in default resources > (NOTE: including per job settings!!!). The following files are added here, in > reversed order: > 2. hadoop-site.xml (Configuration) > 1. nutch-site.xml (NutchConfiguration) > (i.e. hadoop-site.xml will take precedence over anything else defined in any > other config file). > I would appreciate checking that this is indeed the case, and suggestions how > to ensure that you cannot so easily shoot yourself in the foot if you define > wrong properties in hadoop-site or nutch-site ... -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira