[ 
https://issues.apache.org/jira/browse/HADOOP-2866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12570982#action_12570982
 ] 

Aaron Kimball commented on HADOOP-2866:
---------------------------------------

Joydeep,

You're definitely correct.

But in general, there are several problems with the JobConf system from a 
software engineering point of view:

1) Naming conventions don't exist. foo.bar.camelBaz, foo.bar.noncamelbaz, and 
foo.bar.dots.between.each.word are all used
2) The hierarchy imposed by the keys in the JobConfs have nothing to do with 
which modules actually use them. Two isolated modules can both depend on the 
same key for arbitrarily different functionality, tying one another together -- 
and no system exists to prevent this.
3) The hierarchy is arbitrarily ignored: why does "map.input.file" exist, when 
there is already an established "mapred.map" hierarchy? What is the difference 
between "hadoop.job".\* and "job.\*" ? Shouldn't everything in the entire 
system technically be hadoop.\* ?
4) Most config options are hardcoded throughout the source as raw strings; they 
are not placed in public static final Strings at the head of the dependent 
class, nor are they "registered" in any way with JobConf.

I think that a major refactoring of JobConf & friends is probably necessary to 
address all these issues.  Furthermore, coding standards need to address 
formatting and hierarchy of config strings and approach this from the human 
side. 

So for starters we can:
1) Add this mechanism, which at the very least will catch typos in user 
configurations
2) Encourage people who commit user patches to develop and enforce guidelines 
for naming conventions 
3) Encourage people who commit user patches to require that patches update the 
JobConfValidator if they deprecate key names. 

And longer-term, I may file another JIRA to address the rest of this.

> JobConf should validate key names in well-defined namespaces and warn on 
> misspelling
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2866
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2866
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.16.0
>            Reporter: Aaron Kimball
>            Priority: Minor
>             Fix For: 0.16.1, 0.17.0
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> A discussion on the mailing list reveals that some configuration strings in 
> the JobConf are deprecated over time and new configuration names replace them:
> e.g., "mapred.output.compression.type" is now replaced with 
> "mapred.map.output.compression.type"
> Programmers who have been manually specifying the former string, however, 
> receive no diagnostic output during testing to suggest that their compression 
> type is being silently ignored.
> It would be desirable to notify developers of this change by printing a 
> warning message when deprecated configuration names are used in a newer 
> version of Hadoop. More generally, when any configuration string in the 
> mapred.\*, fs.\*, dfs.\*, etc namespaces are provided by a user and are not 
> recognized by Hadoop, it is desirable to print a warning, to indicate 
> malformed configurations. No warnings should be printed when configuration 
> keys are in user-defined namespaces (e.g., "myprogram.mytask.myvalue").

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to