[ 
http://issues.apache.org/jira/browse/HADOOP-127?page=comments#action_12432153 ] 
            
Frédéric Bertin commented on HADOOP-127:
----------------------------------------

<quote>Folks should only define things in the -site files if they want to force 
them for all code. </quote>

I should have read this earlier, it would have saved me some time.

Actually, the fact that properties defined in hadoop-final.xml override 
EVERYTHING, included properties defined in job config files, is something very 
important that should be well documented, because it's not the intuitively 
expected behaviour (which, to me, was:
 - hadoop-default.xml, mapred-default.xml overrided by
 - hadoop-final.xml, overrided by
 - job config files

I've searched the wiki (afterwards, unfortunately) and it's very well 
documented there. However, the comments included in hadoop-default.xml and 
other delivered config files are not clear about this. Maybe they should be 
detailed, or just link to the wiki page.



> Unclear precedence of config files and property definitions
> -----------------------------------------------------------
>
>                 Key: HADOOP-127
>                 URL: http://issues.apache.org/jira/browse/HADOOP-127
>             Project: Hadoop
>          Issue Type: Bug
>          Components: conf
>         Environment: Hadoop 0.1.1, Nutch 0.8-dev
>            Reporter: Andrzej Bialecki 
>
> The order in which configuration resources are read is not sufficiently 
> documented, and also there are no mechanisms preventing harmful re-definition 
> of certain properties, if they are put in wrong config files.
> From reading the code in Hadoop Configuration.java, JobConf.java and Nutch 
> NutchConfiguration.java I _think_ this is what's happening.
> There are two groups of resources: default resources, loaded first, and final 
> resources, loaded at the end. All properties (re)-defined in files loaded 
> later will override any previous definitions:
> * default resources: loaded in the order as they are added. The following 
> files are added here, in order:
>     1. hadoop-default.xml (Configuration)
>     2. nutch-default.xml  (NutchConfiguration)
>     3. mapred-default.xml (JobConf)
>     4. job_xx_xxx.xml       (JobConf, in JobConf(File config))
> * final resource: which always come after default resources, i.e. if any 
> value is defined here it will always override those set in default resources 
> (NOTE: including per job settings!!!). The following files are added here, in 
> reversed order:
>     2. hadoop-site.xml (Configuration)
>     1. nutch-site.xml    (NutchConfiguration)
> (i.e. hadoop-site.xml will take precedence over anything else defined in any 
> other config file).
> I would appreciate checking that this is indeed the case, and suggestions how 
> to ensure that you cannot so easily shoot yourself in the foot if you define 
> wrong properties in hadoop-site or nutch-site ...

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to