Provide a unified way to pass jobconf options from bin/hadoop
-------------------------------------------------------------

                 Key: HADOOP-3722
                 URL: https://issues.apache.org/jira/browse/HADOOP-3722
             Project: Hadoop Core
          Issue Type: New Feature
            Reporter: Matei Zaharia
            Priority: Minor


Often when running a job it is useful to override some jobconf parameters from 
jobconf.xml for that particular job - for example, setting the job priority, 
setting the number of reduce tasks, setting the HDFS replication level, etc. 
Currently the Hadoop examples, streaming, pipes, etc take these extra jobconf 
parameters in different was: the examples in hadoop-examples.jar use 
-Dkey=value, streaming uses -jobconf key=value, and pipes uses -jobconf 
key1=value1,key2=value2,etc. Things would be simpler if bin/hadoop could take 
the jobconf parameters itself, so that you could run for example bin/hadoop 
-Dkey=value jar [whatever] as well as bin/hadoop -Dkey=value pipes [whatever]. 
This is especially useful when an organization needs to require users to use a 
particular property, e.g. the name of a queue to use for scheduling in 
HADOOP-3445. Otherwise, users may confuse one way of passing parameters with 
another and may not notice that they forgot to include certain properties.

I propose adding support in bin/hadoop for jobconf options to be specified with 
-C key=value. This would have the effect of setting hadoop.jobconf.key=value in 
Java's system properties. The Configuration class would then be modified to 
read any system properties that begin with hadoop.jobconf and override the 
values in hadoop-site.xml.

I can write a patch for this pretty quickly if the design is sound. If there's 
a better way of specifying jobconf parameters uniformly across Hadoop commands, 
let me know.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to