Re: Passing Command-line Parameters to the Job Submit Command
Thanks Hemanth, Yes, the java variables passed as -Dkey=value. But for the arguments passed to the main method (i.e. String[] args) I cannot find any other way to pass them apart from hadoop jar CLASSNAME arguments. So if I have a job file, I'll will compulsorily have to use the java variables, and not the command line arguments. Thanks, Varad On 25-Sep-2012, at 12:40 PM, Hemanth Yamijala wrote: By java environment variables, do you mean the ones passed as -Dkey=value ? That's one way of passing them. I suppose another way is to have a client side site configuration (like mapred-site.xml) that is in the classpath of the client app. Thanks Hemanth On Tue, Sep 25, 2012 at 12:20 AM, Varad Meru meru.va...@gmail.com wrote: Thanks Hemanth, But in general, if we want to pass arguments to any job (not only PiEstimator from examples-jar) and submit the Job to the Job queue scheduler, by the looks of it, we might always need to use the java environment variables only. Is my above assumption correct? Thanks, Varad On Mon, Sep 24, 2012 at 9:48 AM, Hemanth Yamijala yhema...@gmail.comwrote: Varad, Looking at the code for the PiEstimator class which implements the 'pi' example, the two arguments are mandatory and are used *before* the job is submitted for execution - i.e on the client side. In particular, one of them (nSamples) is used not by the MapReduce job, but by the client code (i.e. PiEstimator) to generate some input. Hence, I believe all of this additional work that is being done by the PiEstimator class will be bypassed if we directly use the job -submit command. In other words, I don't think these two ways of running the job: - using the hadoop jar examples pi - using hadoop job -submit are equivalent. As a general answer to your question though, if additional parameters are used by the Mappers or reducers, then they will generally be set as additional job specific configuration items. So, one way of using them with the job -submit command will be to find out the specific names of the configuration items (from code, or some other documentation), and include them in the job.xml used when submitting the job. Thanks Hemanth On Sun, Sep 23, 2012 at 1:24 PM, Varad Meru meru.va...@gmail.com wrote: Hi, I want to run the PiEstimator example from using the following command $hadoop job -submit pieestimatorconf.xml which contains all the info required by hadoop to run the job. E.g. the input file location, the output file location and other details. propertynamemapred.jar/namevaluefile:Users/varadmeru/Work/Hadoop/hadoop-examples-1.0.3.jar/value/property propertynamemapred.map.tasks/namevalue20/value/property propertynamemapred.reduce.tasks/namevalue2/value/property ... propertynamemapred.job.name /namevaluePiEstimator/value/property propertynamemapred.output.dir/namevaluefile:Users/varadmeru/Work/out/value/property Now, as we now, to run the PiEstimator, we can use the following command too $hadoop jar hadoop-examples.1.0.3 pi 5 10 where 5 and 10 are the arguments to the main class of the PiEstimator. How can I pass the same arguments (5 and 10) using the job -submit command through conf. file or any other way, without changing the code of the examples to reflect the use of environment variables. Thanks in advance, Varad - Varad Meru Software Engineer, Business Intelligence and Analytics, Persistent Systems and Solutions Ltd., Pune, India.
Re: Passing Command-line Parameters to the Job Submit Command
By java environment variables, do you mean the ones passed as -Dkey=value ? That's one way of passing them. I suppose another way is to have a client side site configuration (like mapred-site.xml) that is in the classpath of the client app. Thanks Hemanth On Tue, Sep 25, 2012 at 12:20 AM, Varad Meru meru.va...@gmail.com wrote: Thanks Hemanth, But in general, if we want to pass arguments to any job (not only PiEstimator from examples-jar) and submit the Job to the Job queue scheduler, by the looks of it, we might always need to use the java environment variables only. Is my above assumption correct? Thanks, Varad On Mon, Sep 24, 2012 at 9:48 AM, Hemanth Yamijala yhema...@gmail.comwrote: Varad, Looking at the code for the PiEstimator class which implements the 'pi' example, the two arguments are mandatory and are used *before* the job is submitted for execution - i.e on the client side. In particular, one of them (nSamples) is used not by the MapReduce job, but by the client code (i.e. PiEstimator) to generate some input. Hence, I believe all of this additional work that is being done by the PiEstimator class will be bypassed if we directly use the job -submit command. In other words, I don't think these two ways of running the job: - using the hadoop jar examples pi - using hadoop job -submit are equivalent. As a general answer to your question though, if additional parameters are used by the Mappers or reducers, then they will generally be set as additional job specific configuration items. So, one way of using them with the job -submit command will be to find out the specific names of the configuration items (from code, or some other documentation), and include them in the job.xml used when submitting the job. Thanks Hemanth On Sun, Sep 23, 2012 at 1:24 PM, Varad Meru meru.va...@gmail.com wrote: Hi, I want to run the PiEstimator example from using the following command $hadoop job -submit pieestimatorconf.xml which contains all the info required by hadoop to run the job. E.g. the input file location, the output file location and other details. propertynamemapred.jar/namevaluefile:Users/varadmeru/Work/Hadoop/hadoop-examples-1.0.3.jar/value/property propertynamemapred.map.tasks/namevalue20/value/property propertynamemapred.reduce.tasks/namevalue2/value/property ... propertynamemapred.job.name /namevaluePiEstimator/value/property propertynamemapred.output.dir/namevaluefile:Users/varadmeru/Work/out/value/property Now, as we now, to run the PiEstimator, we can use the following command too $hadoop jar hadoop-examples.1.0.3 pi 5 10 where 5 and 10 are the arguments to the main class of the PiEstimator. How can I pass the same arguments (5 and 10) using the job -submit command through conf. file or any other way, without changing the code of the examples to reflect the use of environment variables. Thanks in advance, Varad - Varad Meru Software Engineer, Business Intelligence and Analytics, Persistent Systems and Solutions Ltd., Pune, India.
Re: Passing Command-line Parameters to the Job Submit Command
Building on Hemanth answer : at the end your variables should be in the job.xml (the second file needed with the jar to run a job). Building this job.xml can be done in various way but it does inherit from your local configuration and you can change it using the java API but at the end it is only a xml file so you are not hand tied. I know there is a job file that you can provide with the shell command : http://hadoop.apache.org/docs/r1.0.3/commands_manual.html#job But I haven't used it yet so I can tell you more about this option. Regards Bertrand On Tue, Sep 25, 2012 at 9:10 AM, Hemanth Yamijala yhema...@gmail.comwrote: By java environment variables, do you mean the ones passed as -Dkey=value ? That's one way of passing them. I suppose another way is to have a client side site configuration (like mapred-site.xml) that is in the classpath of the client app. Thanks Hemanth On Tue, Sep 25, 2012 at 12:20 AM, Varad Meru meru.va...@gmail.com wrote: Thanks Hemanth, But in general, if we want to pass arguments to any job (not only PiEstimator from examples-jar) and submit the Job to the Job queue scheduler, by the looks of it, we might always need to use the java environment variables only. Is my above assumption correct? Thanks, Varad On Mon, Sep 24, 2012 at 9:48 AM, Hemanth Yamijala yhema...@gmail.com wrote: Varad, Looking at the code for the PiEstimator class which implements the 'pi' example, the two arguments are mandatory and are used *before* the job is submitted for execution - i.e on the client side. In particular, one of them (nSamples) is used not by the MapReduce job, but by the client code (i.e. PiEstimator) to generate some input. Hence, I believe all of this additional work that is being done by the PiEstimator class will be bypassed if we directly use the job -submit command. In other words, I don't think these two ways of running the job: - using the hadoop jar examples pi - using hadoop job -submit are equivalent. As a general answer to your question though, if additional parameters are used by the Mappers or reducers, then they will generally be set as additional job specific configuration items. So, one way of using them with the job -submit command will be to find out the specific names of the configuration items (from code, or some other documentation), and include them in the job.xml used when submitting the job. Thanks Hemanth On Sun, Sep 23, 2012 at 1:24 PM, Varad Meru meru.va...@gmail.com wrote: Hi, I want to run the PiEstimator example from using the following command $hadoop job -submit pieestimatorconf.xml which contains all the info required by hadoop to run the job. E.g. the input file location, the output file location and other details. propertynamemapred.jar/namevaluefile:Users/varadmeru/Work/Hadoop/hadoop-examples-1.0.3.jar/value/property propertynamemapred.map.tasks/namevalue20/value/property propertynamemapred.reduce.tasks/namevalue2/value/property ... propertynamemapred.job.name /namevaluePiEstimator/value/property propertynamemapred.output.dir/namevaluefile:Users/varadmeru/Work/out/value/property Now, as we now, to run the PiEstimator, we can use the following command too $hadoop jar hadoop-examples.1.0.3 pi 5 10 where 5 and 10 are the arguments to the main class of the PiEstimator. How can I pass the same arguments (5 and 10) using the job -submit command through conf. file or any other way, without changing the code of the examples to reflect the use of environment variables. Thanks in advance, Varad - Varad Meru Software Engineer, Business Intelligence and Analytics, Persistent Systems and Solutions Ltd., Pune, India. -- Bertrand Dechoux
Re: Passing Command-line Parameters to the Job Submit Command
You could always write your own properties file and read it as resource. On Tue, Sep 25, 2012 at 12:10 AM, Hemanth Yamijala yhema...@gmail.comwrote: By java environment variables, do you mean the ones passed as -Dkey=value ? That's one way of passing them. I suppose another way is to have a client side site configuration (like mapred-site.xml) that is in the classpath of the client app. Thanks Hemanth On Tue, Sep 25, 2012 at 12:20 AM, Varad Meru meru.va...@gmail.com wrote: Thanks Hemanth, But in general, if we want to pass arguments to any job (not only PiEstimator from examples-jar) and submit the Job to the Job queue scheduler, by the looks of it, we might always need to use the java environment variables only. Is my above assumption correct? Thanks, Varad On Mon, Sep 24, 2012 at 9:48 AM, Hemanth Yamijala yhema...@gmail.com wrote: Varad, Looking at the code for the PiEstimator class which implements the 'pi' example, the two arguments are mandatory and are used *before* the job is submitted for execution - i.e on the client side. In particular, one of them (nSamples) is used not by the MapReduce job, but by the client code (i.e. PiEstimator) to generate some input. Hence, I believe all of this additional work that is being done by the PiEstimator class will be bypassed if we directly use the job -submit command. In other words, I don't think these two ways of running the job: - using the hadoop jar examples pi - using hadoop job -submit are equivalent. As a general answer to your question though, if additional parameters are used by the Mappers or reducers, then they will generally be set as additional job specific configuration items. So, one way of using them with the job -submit command will be to find out the specific names of the configuration items (from code, or some other documentation), and include them in the job.xml used when submitting the job. Thanks Hemanth On Sun, Sep 23, 2012 at 1:24 PM, Varad Meru meru.va...@gmail.com wrote: Hi, I want to run the PiEstimator example from using the following command $hadoop job -submit pieestimatorconf.xml which contains all the info required by hadoop to run the job. E.g. the input file location, the output file location and other details. propertynamemapred.jar/namevaluefile:Users/varadmeru/Work/Hadoop/hadoop-examples-1.0.3.jar/value/property propertynamemapred.map.tasks/namevalue20/value/property propertynamemapred.reduce.tasks/namevalue2/value/property ... propertynamemapred.job.name /namevaluePiEstimator/value/property propertynamemapred.output.dir/namevaluefile:Users/varadmeru/Work/out/value/property Now, as we now, to run the PiEstimator, we can use the following command too $hadoop jar hadoop-examples.1.0.3 pi 5 10 where 5 and 10 are the arguments to the main class of the PiEstimator. How can I pass the same arguments (5 and 10) using the job -submit command through conf. file or any other way, without changing the code of the examples to reflect the use of environment variables. Thanks in advance, Varad - Varad Meru Software Engineer, Business Intelligence and Analytics, Persistent Systems and Solutions Ltd., Pune, India.
Re: Passing Command-line Parameters to the Job Submit Command
Thanks Hemanth, But in general, if we want to pass arguments to any job (not only PiEstimator from examples-jar) and submit the Job to the Job queue scheduler, by the looks of it, we might always need to use the java environment variables only. Is my above assumption correct? Thanks, Varad On Mon, Sep 24, 2012 at 9:48 AM, Hemanth Yamijala yhema...@gmail.comwrote: Varad, Looking at the code for the PiEstimator class which implements the 'pi' example, the two arguments are mandatory and are used *before* the job is submitted for execution - i.e on the client side. In particular, one of them (nSamples) is used not by the MapReduce job, but by the client code (i.e. PiEstimator) to generate some input. Hence, I believe all of this additional work that is being done by the PiEstimator class will be bypassed if we directly use the job -submit command. In other words, I don't think these two ways of running the job: - using the hadoop jar examples pi - using hadoop job -submit are equivalent. As a general answer to your question though, if additional parameters are used by the Mappers or reducers, then they will generally be set as additional job specific configuration items. So, one way of using them with the job -submit command will be to find out the specific names of the configuration items (from code, or some other documentation), and include them in the job.xml used when submitting the job. Thanks Hemanth On Sun, Sep 23, 2012 at 1:24 PM, Varad Meru meru.va...@gmail.com wrote: Hi, I want to run the PiEstimator example from using the following command $hadoop job -submit pieestimatorconf.xml which contains all the info required by hadoop to run the job. E.g. the input file location, the output file location and other details. propertynamemapred.jar/namevaluefile:Users/varadmeru/Work/Hadoop/hadoop-examples-1.0.3.jar/value/property propertynamemapred.map.tasks/namevalue20/value/property propertynamemapred.reduce.tasks/namevalue2/value/property ... propertynamemapred.job.name /namevaluePiEstimator/value/property propertynamemapred.output.dir/namevaluefile:Users/varadmeru/Work/out/value/property Now, as we now, to run the PiEstimator, we can use the following command too $hadoop jar hadoop-examples.1.0.3 pi 5 10 where 5 and 10 are the arguments to the main class of the PiEstimator. How can I pass the same arguments (5 and 10) using the job -submit command through conf. file or any other way, without changing the code of the examples to reflect the use of environment variables. Thanks in advance, Varad - Varad Meru Software Engineer, Business Intelligence and Analytics, Persistent Systems and Solutions Ltd., Pune, India.
Passing Command-line Parameters to the Job Submit Command
Hi, I want to run the PiEstimator example from using the following command $hadoop job -submit pieestimatorconf.xml which contains all the info required by hadoop to run the job. E.g. the input file location, the output file location and other details. propertynamemapred.jar/namevaluefile:Users/varadmeru/Work/Hadoop/hadoop-examples-1.0.3.jar/value/property propertynamemapred.map.tasks/namevalue20/value/property propertynamemapred.reduce.tasks/namevalue2/value/property ... propertynamemapred.job.name/namevaluePiEstimator/value/property propertynamemapred.output.dir/namevaluefile:Users/varadmeru/Work/out/value/property Now, as we now, to run the PiEstimator, we can use the following command too $hadoop jar hadoop-examples.1.0.3 pi 5 10 where 5 and 10 are the arguments to the main class of the PiEstimator. How can I pass the same arguments (5 and 10) using the job -submit command through conf. file or any other way, without changing the code of the examples to reflect the use of environment variables. Thanks in advance, Varad - Varad Meru Software Engineer, Business Intelligence and Analytics, Persistent Systems and Solutions Ltd., Pune, India.
Re: Passing Command-line Parameters to the Job Submit Command
Varad, Looking at the code for the PiEstimator class which implements the 'pi' example, the two arguments are mandatory and are used *before* the job is submitted for execution - i.e on the client side. In particular, one of them (nSamples) is used not by the MapReduce job, but by the client code (i.e. PiEstimator) to generate some input. Hence, I believe all of this additional work that is being done by the PiEstimator class will be bypassed if we directly use the job -submit command. In other words, I don't think these two ways of running the job: - using the hadoop jar examples pi - using hadoop job -submit are equivalent. As a general answer to your question though, if additional parameters are used by the Mappers or reducers, then they will generally be set as additional job specific configuration items. So, one way of using them with the job -submit command will be to find out the specific names of the configuration items (from code, or some other documentation), and include them in the job.xml used when submitting the job. Thanks Hemanth On Sun, Sep 23, 2012 at 1:24 PM, Varad Meru meru.va...@gmail.com wrote: Hi, I want to run the PiEstimator example from using the following command $hadoop job -submit pieestimatorconf.xml which contains all the info required by hadoop to run the job. E.g. the input file location, the output file location and other details. propertynamemapred.jar/namevaluefile:Users/varadmeru/Work/Hadoop/hadoop-examples-1.0.3.jar/value/property propertynamemapred.map.tasks/namevalue20/value/property propertynamemapred.reduce.tasks/namevalue2/value/property ... propertynamemapred.job.name/namevaluePiEstimator/value/property propertynamemapred.output.dir/namevaluefile:Users/varadmeru/Work/out/value/property Now, as we now, to run the PiEstimator, we can use the following command too $hadoop jar hadoop-examples.1.0.3 pi 5 10 where 5 and 10 are the arguments to the main class of the PiEstimator. How can I pass the same arguments (5 and 10) using the job -submit command through conf. file or any other way, without changing the code of the examples to reflect the use of environment variables. Thanks in advance, Varad - Varad Meru Software Engineer, Business Intelligence and Analytics, Persistent Systems and Solutions Ltd., Pune, India.