Re: execute hadoop job from remote web application

2011-10-20 Thread Steve Loughran

On 18/10/11 17:56, Harsh J wrote:

Oleg,

It will pack up the jar that contains the class specified by
"setJarByClass" into its submission jar and send it up. Thats the
function of that particular API method. So, your deduction is almost
right there :)

On Tue, Oct 18, 2011 at 10:20 PM, Oleg Ruchovets  wrote:

So you mean that in case I am going to submit job remotely and
my_hadoop_job.jar
will be in class path of my web application it will submit job with
my_hadoop_job.jar to
remote hadoop machine (cluster)?




There's also the problem of waiting for your work to finish. If you want 
to see something complicated that does everything but JAR upload, I have 
some code here that listens for events coming out of the job and so 
builds up a history of what is happening. It also does better preflight 
checking of source and dest data directories


http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/hadoop-components/hadoop-ops/src/org/smartfrog/services/hadoop/mapreduce/submitter/SubmitterImpl.java


Re: execute hadoop job from remote web application

2011-10-18 Thread Harsh J
> Thanks  you all for your answers but I still have a questions:
>> >> >  Currently we running our jobs using shell scripts which locates on
>> >> hadoop
>> >> > master machine.
>> >> >
>> >> > Here is an example of command line:
>> >> > /opt/hadoop/bin/hadoop jar /opt/hadoop/hadoop-jobs/my_hadoop_job.jar
>> >> > -inputPath /opt/inputs/  -outputPath /data/output_jobs/output
>> >> >
>> >> > my_hadoop_job.jar has a class which parse input parameters and submit
>> a
>> >> > job.
>> >> > Our code is very similar like you wrote:
>> >> >   ..
>> >> >
>> >> >        job.setJarByClass(HadoopJobExecutor.class);
>> >> >        job.setMapperClass(MultipleOutputMap.class);
>> >> >        job.setCombinerClass(BaseCombine.class);
>> >> >        job.setReducerClass(HBaseReducer.class);
>> >> >        job.setOutputKeyClass(Text.class);
>> >> >        job.setOutputValueClass(MapWritable.class);
>> >> >
>> >> >        FileOutputFormat.setOutputPath(job, new Path(finalOutPutPath));
>> >> >
>> >> >        jobCompleteStatus = job.waitForCompletion(true);
>> >> > ...
>> >> >
>> >> > my question are:
>> >> >
>> >> > 1) my_hadoop_job.jar contains another classes (business logic) not
>> only
>> >> > Map,Combine,Reduce classes and I still don't understand how can I
>> submit
>> >> > job
>> >> > which needs all classes from my_hadoop_job.jar?
>> >> > 2) Do I need to submit a my_hadoop_job.jar too? If yes what is the way
>> to
>> >> > do
>> >> > it?
>> >> >
>> >> > Thanks In Advance
>> >> > Oleg.
>> >> >
>> >> > On Tue, Oct 18, 2011 at 2:11 PM, Uma Maheswara Rao G 72686 <
>> >> > mahesw...@huawei.com> wrote:
>> >> >
>> >> > > - Original Message -
>> >> > > From: Bejoy KS 
>> >> > > Date: Tuesday, October 18, 2011 5:25 pm
>> >> > > Subject: Re: execute hadoop job from remote web application
>> >> > > To: common-user@hadoop.apache.org
>> >> > >
>> >> > > > Oleg
>> >> > > >      If you are looking at how to submit your jobs using
>> >> > > > JobClient then the
>> >> > > > below sample can give you a start.
>> >> > > >
>> >> > > > //get the configuration parameters and assigns a job name
>> >> > > >        JobConf conf = new JobConf(getConf(), MyClass.class);
>> >> > > >        conf.setJobName("SMS Reports");
>> >> > > >
>> >> > > >        //setting key value types for mapper and reducer outputs
>> >> > > >        conf.setOutputKeyClass(Text.class);
>> >> > > >        conf.setOutputValueClass(Text.class);
>> >> > > >
>> >> > > >        //specifying the custom reducer class
>> >> > > >        conf.setReducerClass(SmsReducer.class);
>> >> > > >
>> >> > > >        //Specifying the input directories(@ runtime) and Mappers
>> >> > > > independently for inputs from multiple sources
>> >> > > >        FileInputFormat.addInputPath(conf, new Path(args[0]));
>> >> > > >
>> >> > > >        //Specifying the output directory @ runtime
>> >> > > >        FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>> >> > > >
>> >> > > >        JobClient.runJob(conf);
>> >> > > >
>> >> > > > Along with the hadoop jars you may need to have the config files
>> >> > > > as well on
>> >> > > > your client.
>> >> > > >
>> >> > > > The sample is from old map reduce API. You can use the new one as
>> >> > > > well in
>> >> > > > that we use the Job instead of JobClient.
>> >> > > >
>> >> > > > Hope it helps!..
>> >> > > >
>> >> > > > Regards
>> >> > > > Bejoy.K.S
>> >> > > >
>> >> > > >
>> >> > > > On Tue, Oct 18, 2011 at 5:00 PM, Oleg Ruchovets
>> >> > > > wrote:
>> >> > > > > Excellent. Can you give a small example of code.
>> >> > > > >
>> >> > > Good samle by Bejoy
>> >> > > hope, you have access for this site.
>> >> > > Also please go through this docs,
>> >> > >
>> >> > >
>> >> >
>> >>
>> http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Example%3A+WordCount+v2.0
>> >> > > Here is the wordcount example.
>> >> > >
>> >> > > > >
>> >> > > > > On Tue, Oct 18, 2011 at 1:13 PM, Uma Maheswara Rao G 72686 <
>> >> > > > > mahesw...@huawei.com> wrote:
>> >> > > > >
>> >> > > > > >
>> >> > > > > > - Original Message -
>> >> > > > > > From: Oleg Ruchovets 
>> >> > > > > > Date: Tuesday, October 18, 2011 4:11 pm
>> >> > > > > > Subject: execute hadoop job from remote web application
>> >> > > > > > To: common-user@hadoop.apache.org
>> >> > > > > >
>> >> > > > > > > Hi , what is the way to execute hadoop job on remote
>> >> > > > cluster. I
>> >> > > > > > > want to
>> >> > > > > > > execute my hadoop job from remote web  application , but I
>> >> > > > didn't> > > find any
>> >> > > > > > > hadoop client (remote API) to do it.
>> >> > > > > > >
>> >> > > > > > > Please advice.
>> >> > > > > > > Oleg
>> >> > > > > > >
>> >> > > > > > You can put the Hadoop jars in your web applications classpath
>> >> > > > and find
>> >> > > > > the
>> >> > > > > > Class JobClient and submit the jobs using it.
>> >> > > > > >
>> >> > > > > > Regards,
>> >> > > > > > Uma
>> >> > > > > >
>> >> > > > > >
>> >> > > > >
>> >> > > >
>> >> > > Regards
>> >> > > Uma
>> >> > >
>> >> >
>> >>
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>



-- 
Harsh J


Re: execute hadoop job from remote web application

2011-10-18 Thread Oleg Ruchovets
ob.setJarByClass(HadoopJobExecutor.class);
> >> >job.setMapperClass(MultipleOutputMap.class);
> >> >job.setCombinerClass(BaseCombine.class);
> >> >job.setReducerClass(HBaseReducer.class);
> >> >job.setOutputKeyClass(Text.class);
> >> >job.setOutputValueClass(MapWritable.class);
> >> >
> >> >        FileOutputFormat.setOutputPath(job, new Path(finalOutPutPath));
> >> >
> >> >jobCompleteStatus = job.waitForCompletion(true);
> >> > ...
> >> >
> >> > my question are:
> >> >
> >> > 1) my_hadoop_job.jar contains another classes (business logic) not
> only
> >> > Map,Combine,Reduce classes and I still don't understand how can I
> submit
> >> > job
> >> > which needs all classes from my_hadoop_job.jar?
> >> > 2) Do I need to submit a my_hadoop_job.jar too? If yes what is the way
> to
> >> > do
> >> > it?
> >> >
> >> > Thanks In Advance
> >> > Oleg.
> >> >
> >> > On Tue, Oct 18, 2011 at 2:11 PM, Uma Maheswara Rao G 72686 <
> >> > mahesw...@huawei.com> wrote:
> >> >
> >> > > - Original Message -
> >> > > From: Bejoy KS 
> >> > > Date: Tuesday, October 18, 2011 5:25 pm
> >> > > Subject: Re: execute hadoop job from remote web application
> >> > > To: common-user@hadoop.apache.org
> >> > >
> >> > > > Oleg
> >> > > >  If you are looking at how to submit your jobs using
> >> > > > JobClient then the
> >> > > > below sample can give you a start.
> >> > > >
> >> > > > //get the configuration parameters and assigns a job name
> >> > > >JobConf conf = new JobConf(getConf(), MyClass.class);
> >> > > >conf.setJobName("SMS Reports");
> >> > > >
> >> > > >//setting key value types for mapper and reducer outputs
> >> > > >conf.setOutputKeyClass(Text.class);
> >> > > >conf.setOutputValueClass(Text.class);
> >> > > >
> >> > > >//specifying the custom reducer class
> >> > > >conf.setReducerClass(SmsReducer.class);
> >> > > >
> >> > > >//Specifying the input directories(@ runtime) and Mappers
> >> > > > independently for inputs from multiple sources
> >> > > >FileInputFormat.addInputPath(conf, new Path(args[0]));
> >> > > >
> >> > > >//Specifying the output directory @ runtime
> >> > > >FileOutputFormat.setOutputPath(conf, new Path(args[1]));
> >> > > >
> >> > > >JobClient.runJob(conf);
> >> > > >
> >> > > > Along with the hadoop jars you may need to have the config files
> >> > > > as well on
> >> > > > your client.
> >> > > >
> >> > > > The sample is from old map reduce API. You can use the new one as
> >> > > > well in
> >> > > > that we use the Job instead of JobClient.
> >> > > >
> >> > > > Hope it helps!..
> >> > > >
> >> > > > Regards
> >> > > > Bejoy.K.S
> >> > > >
> >> > > >
> >> > > > On Tue, Oct 18, 2011 at 5:00 PM, Oleg Ruchovets
> >> > > > wrote:
> >> > > > > Excellent. Can you give a small example of code.
> >> > > > >
> >> > > Good samle by Bejoy
> >> > > hope, you have access for this site.
> >> > > Also please go through this docs,
> >> > >
> >> > >
> >> >
> >>
> http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Example%3A+WordCount+v2.0
> >> > > Here is the wordcount example.
> >> > >
> >> > > > >
> >> > > > > On Tue, Oct 18, 2011 at 1:13 PM, Uma Maheswara Rao G 72686 <
> >> > > > > mahesw...@huawei.com> wrote:
> >> > > > >
> >> > > > > >
> >> > > > > > - Original Message -
> >> > > > > > From: Oleg Ruchovets 
> >> > > > > > Date: Tuesday, October 18, 2011 4:11 pm
> >> > > > > > Subject: execute hadoop job from remote web application
> >> > > > > > To: common-user@hadoop.apache.org
> >> > > > > >
> >> > > > > > > Hi , what is the way to execute hadoop job on remote
> >> > > > cluster. I
> >> > > > > > > want to
> >> > > > > > > execute my hadoop job from remote web  application , but I
> >> > > > didn't> > > find any
> >> > > > > > > hadoop client (remote API) to do it.
> >> > > > > > >
> >> > > > > > > Please advice.
> >> > > > > > > Oleg
> >> > > > > > >
> >> > > > > > You can put the Hadoop jars in your web applications classpath
> >> > > > and find
> >> > > > > the
> >> > > > > > Class JobClient and submit the jobs using it.
> >> > > > > >
> >> > > > > > Regards,
> >> > > > > > Uma
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > > Regards
> >> > > Uma
> >> > >
> >> >
> >>
> >
>
>
>
> --
> Harsh J
>


Re: execute hadoop job from remote web application

2011-10-18 Thread Harsh J
Oleg,

Steve already covered this.

The "hadoop jar" subcommand merely runs the jar program for you, as a
utility - it has nothing to do with submissions really.

Have you tried submitting your program by running your jar as a
regular java program (java -jar ), with the proper classpath?
(You may use "hadoop classpath" to get a string.).

It would go through fine, and submit the job jar with classes
included, over to the JobTracker.

On Tue, Oct 18, 2011 at 9:13 PM, Oleg Ruchovets  wrote:
> I  try to be more specific. It is not dependent jar. It is a jar which
> contains map/reduce/combine classes and some business logic.
>  executing our job from command line, class which parse parameters and
> submit a job has a line of code:
>    job.setJarByClass(HadoopJobExecutor.class);
>
> we execute it locally on hadoop master machine using command such command:
> opt/hadoop/bin/hadoop jar /opt/hadoop/hadoop-jobs/my_hadoop_job.jar
> -inputPath /opt/inputs/  -outputPath /data/output_jobs/output
>
> and of course my_hadoop_job.jar  is found because it is located on the same
> machine.
>
> Now , suppose I am going to submit job remotely (from web applications).
>  and I have the same line of code
> job.setJarByClass(HadoopJobExecutor.class);
>
>  In case my_hadoop_job.jar located on remote hadoop machine  (in class path)
> , my jobClient will failed because there is no job jar in class path ( it is
> located on remote hadoop machine). Am I write? I simply don't know how to
> submit a job remotely (in my case job is not a map/combine/reduce classes it
> is a jar which contains other classes too).
>
> Regarding remotely invoke the shellscript that contains the hadoop jar
> command with
> any required input arguments.
>    It is possible to do it  by Runtime.getRuntime().exec(
> submitCommand.toString().split( " " ) );
> But I prefer to use jobClient , because I can monitor my job (get counters
> and other useful information).
>
> Thanks in advance
> Oleg.
>
> On Tue, Oct 18, 2011 at 4:34 PM, Bejoy KS  wrote:
>
>> Hi Oleg
>>          I haven't tried out a scenario like you mentioned. But I think
>> there shouldn't be any issue in submitting a job that has some dependent
>> classes which holds the business logic referred from mapper,reducer or
>> combiner. You should be able to do the job submission remotely the same we
>> were discussing in this thread. If you need to distribute any dependent
>> jars/files along with the application jar, you can use the -libjars option
>> in CLI or use the DistributedCache methods like
>> addArchiveToClassPath()/addFileToClassPath() in your java code. If it is a
>> dependent jar It is better to deploy the same in the cluster environment
>> itself so that every time when you submit your job you don't have to
>> transfer the jar over the network again and again.
>>         Just a suggestion, if you can execute the job from within your
>> hadoop cluster you don't have to do a remote job submission. You just need
>> to remotely invoke the shellscript that contains the hadoop jar command
>> with
>> any required input arguments. Sorry if I'm not getting your requirement
>> exactly.
>>
>> Regards
>> Bejoy.K.S
>>
>> On Tue, Oct 18, 2011 at 6:29 PM, Oleg Ruchovets > >wrote:
>>
>> > Thanks  you all for your answers but I still have a questions:
>> >  Currently we running our jobs using shell scripts which locates on
>> hadoop
>> > master machine.
>> >
>> > Here is an example of command line:
>> > /opt/hadoop/bin/hadoop jar /opt/hadoop/hadoop-jobs/my_hadoop_job.jar
>> > -inputPath /opt/inputs/  -outputPath /data/output_jobs/output
>> >
>> > my_hadoop_job.jar has a class which parse input parameters and submit a
>> > job.
>> > Our code is very similar like you wrote:
>> >   ..
>> >
>> >        job.setJarByClass(HadoopJobExecutor.class);
>> >        job.setMapperClass(MultipleOutputMap.class);
>> >        job.setCombinerClass(BaseCombine.class);
>> >        job.setReducerClass(HBaseReducer.class);
>> >        job.setOutputKeyClass(Text.class);
>> >        job.setOutputValueClass(MapWritable.class);
>> >
>> >        FileOutputFormat.setOutputPath(job, new Path(finalOutPutPath));
>> >
>> >        jobCompleteStatus = job.waitForCompletion(true);
>> > ...
>> >
>> > my question are:
>> >
>> > 1) my_hadoop_job.jar contains another classes (business logic) not only
>> > Map,Combine,Reduce classes and I still don

Re: execute hadoop job from remote web application

2011-10-18 Thread Oleg Ruchovets
I  try to be more specific. It is not dependent jar. It is a jar which
contains map/reduce/combine classes and some business logic.
  executing our job from command line, class which parse parameters and
submit a job has a line of code:
job.setJarByClass(HadoopJobExecutor.class);

we execute it locally on hadoop master machine using command such command:
opt/hadoop/bin/hadoop jar /opt/hadoop/hadoop-jobs/my_hadoop_job.jar
-inputPath /opt/inputs/  -outputPath /data/output_jobs/output

and of course my_hadoop_job.jar  is found because it is located on the same
machine.

Now , suppose I am going to submit job remotely (from web applications).
 and I have the same line of code
job.setJarByClass(HadoopJobExecutor.class);

 In case my_hadoop_job.jar located on remote hadoop machine  (in class path)
, my jobClient will failed because there is no job jar in class path ( it is
located on remote hadoop machine). Am I write? I simply don't know how to
submit a job remotely (in my case job is not a map/combine/reduce classes it
is a jar which contains other classes too).

Regarding remotely invoke the shellscript that contains the hadoop jar
command with
any required input arguments.
It is possible to do it  by Runtime.getRuntime().exec(
submitCommand.toString().split( " " ) );
But I prefer to use jobClient , because I can monitor my job (get counters
and other useful information).

Thanks in advance
Oleg.

On Tue, Oct 18, 2011 at 4:34 PM, Bejoy KS  wrote:

> Hi Oleg
>  I haven't tried out a scenario like you mentioned. But I think
> there shouldn't be any issue in submitting a job that has some dependent
> classes which holds the business logic referred from mapper,reducer or
> combiner. You should be able to do the job submission remotely the same we
> were discussing in this thread. If you need to distribute any dependent
> jars/files along with the application jar, you can use the -libjars option
> in CLI or use the DistributedCache methods like
> addArchiveToClassPath()/addFileToClassPath() in your java code. If it is a
> dependent jar It is better to deploy the same in the cluster environment
> itself so that every time when you submit your job you don't have to
> transfer the jar over the network again and again.
> Just a suggestion, if you can execute the job from within your
> hadoop cluster you don't have to do a remote job submission. You just need
> to remotely invoke the shellscript that contains the hadoop jar command
> with
> any required input arguments. Sorry if I'm not getting your requirement
> exactly.
>
> Regards
> Bejoy.K.S
>
> On Tue, Oct 18, 2011 at 6:29 PM, Oleg Ruchovets  >wrote:
>
> > Thanks  you all for your answers but I still have a questions:
> >  Currently we running our jobs using shell scripts which locates on
> hadoop
> > master machine.
> >
> > Here is an example of command line:
> > /opt/hadoop/bin/hadoop jar /opt/hadoop/hadoop-jobs/my_hadoop_job.jar
> > -inputPath /opt/inputs/  -outputPath /data/output_jobs/output
> >
> > my_hadoop_job.jar has a class which parse input parameters and submit a
> > job.
> > Our code is very similar like you wrote:
> >   ..
> >
> >job.setJarByClass(HadoopJobExecutor.class);
> >job.setMapperClass(MultipleOutputMap.class);
> >job.setCombinerClass(BaseCombine.class);
> >job.setReducerClass(HBaseReducer.class);
> >job.setOutputKeyClass(Text.class);
> >job.setOutputValueClass(MapWritable.class);
> >
> >FileOutputFormat.setOutputPath(job, new Path(finalOutPutPath));
> >
> >jobCompleteStatus = job.waitForCompletion(true);
> > ...
> >
> > my question are:
> >
> > 1) my_hadoop_job.jar contains another classes (business logic) not only
> > Map,Combine,Reduce classes and I still don't understand how can I submit
> > job
> > which needs all classes from my_hadoop_job.jar?
> > 2) Do I need to submit a my_hadoop_job.jar too? If yes what is the way to
> > do
> > it?
> >
> > Thanks In Advance
> > Oleg.
> >
> > On Tue, Oct 18, 2011 at 2:11 PM, Uma Maheswara Rao G 72686 <
> > mahesw...@huawei.com> wrote:
> >
> > > - Original Message -
> > > From: Bejoy KS 
> > > Date: Tuesday, October 18, 2011 5:25 pm
> > > Subject: Re: execute hadoop job from remote web application
> > > To: common-user@hadoop.apache.org
> > >
> > > > Oleg
> > > >  If you are looking at how to submit your jobs using
> > > > JobClient then the
> > > > below sample can give you a start.
> > 

Re: execute hadoop job from remote web application

2011-10-18 Thread Bejoy KS
Hi Oleg
  I haven't tried out a scenario like you mentioned. But I think
there shouldn't be any issue in submitting a job that has some dependent
classes which holds the business logic referred from mapper,reducer or
combiner. You should be able to do the job submission remotely the same we
were discussing in this thread. If you need to distribute any dependent
jars/files along with the application jar, you can use the -libjars option
in CLI or use the DistributedCache methods like
addArchiveToClassPath()/addFileToClassPath() in your java code. If it is a
dependent jar It is better to deploy the same in the cluster environment
itself so that every time when you submit your job you don't have to
transfer the jar over the network again and again.
 Just a suggestion, if you can execute the job from within your
hadoop cluster you don't have to do a remote job submission. You just need
to remotely invoke the shellscript that contains the hadoop jar command with
any required input arguments. Sorry if I'm not getting your requirement
exactly.

Regards
Bejoy.K.S

On Tue, Oct 18, 2011 at 6:29 PM, Oleg Ruchovets wrote:

> Thanks  you all for your answers but I still have a questions:
>  Currently we running our jobs using shell scripts which locates on hadoop
> master machine.
>
> Here is an example of command line:
> /opt/hadoop/bin/hadoop jar /opt/hadoop/hadoop-jobs/my_hadoop_job.jar
> -inputPath /opt/inputs/  -outputPath /data/output_jobs/output
>
> my_hadoop_job.jar has a class which parse input parameters and submit a
> job.
> Our code is very similar like you wrote:
>   ..
>
>job.setJarByClass(HadoopJobExecutor.class);
>job.setMapperClass(MultipleOutputMap.class);
>job.setCombinerClass(BaseCombine.class);
>job.setReducerClass(HBaseReducer.class);
>job.setOutputKeyClass(Text.class);
>job.setOutputValueClass(MapWritable.class);
>
>FileOutputFormat.setOutputPath(job, new Path(finalOutPutPath));
>
>jobCompleteStatus = job.waitForCompletion(true);
> ...
>
> my question are:
>
> 1) my_hadoop_job.jar contains another classes (business logic) not only
> Map,Combine,Reduce classes and I still don't understand how can I submit
> job
> which needs all classes from my_hadoop_job.jar?
> 2) Do I need to submit a my_hadoop_job.jar too? If yes what is the way to
> do
> it?
>
> Thanks In Advance
> Oleg.
>
> On Tue, Oct 18, 2011 at 2:11 PM, Uma Maheswara Rao G 72686 <
> mahesw...@huawei.com> wrote:
>
> > - Original Message -
> > From: Bejoy KS 
> > Date: Tuesday, October 18, 2011 5:25 pm
> > Subject: Re: execute hadoop job from remote web application
> > To: common-user@hadoop.apache.org
> >
> > > Oleg
> > >  If you are looking at how to submit your jobs using
> > > JobClient then the
> > > below sample can give you a start.
> > >
> > > //get the configuration parameters and assigns a job name
> > >JobConf conf = new JobConf(getConf(), MyClass.class);
> > >conf.setJobName("SMS Reports");
> > >
> > >//setting key value types for mapper and reducer outputs
> > >conf.setOutputKeyClass(Text.class);
> > >conf.setOutputValueClass(Text.class);
> > >
> > >//specifying the custom reducer class
> > >conf.setReducerClass(SmsReducer.class);
> > >
> > >//Specifying the input directories(@ runtime) and Mappers
> > > independently for inputs from multiple sources
> > >FileInputFormat.addInputPath(conf, new Path(args[0]));
> > >
> > >//Specifying the output directory @ runtime
> > >FileOutputFormat.setOutputPath(conf, new Path(args[1]));
> > >
> > >JobClient.runJob(conf);
> > >
> > > Along with the hadoop jars you may need to have the config files
> > > as well on
> > > your client.
> > >
> > > The sample is from old map reduce API. You can use the new one as
> > > well in
> > > that we use the Job instead of JobClient.
> > >
> > > Hope it helps!..
> > >
> > > Regards
> > > Bejoy.K.S
> > >
> > >
> > > On Tue, Oct 18, 2011 at 5:00 PM, Oleg Ruchovets
> > > wrote:
> > > > Excellent. Can you give a small example of code.
> > > >
> > Good samle by Bejoy
> > hope, you have access for this site.
> > Also please go through this docs,
> >
> >
> http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Exampl

Re: execute hadoop job from remote web application

2011-10-18 Thread Oleg Ruchovets
Thanks  you all for your answers but I still have a questions:
  Currently we running our jobs using shell scripts which locates on hadoop
master machine.

Here is an example of command line:
/opt/hadoop/bin/hadoop jar /opt/hadoop/hadoop-jobs/my_hadoop_job.jar
-inputPath /opt/inputs/  -outputPath /data/output_jobs/output

my_hadoop_job.jar has a class which parse input parameters and submit a job.
Our code is very similar like you wrote:
   ..

job.setJarByClass(HadoopJobExecutor.class);
job.setMapperClass(MultipleOutputMap.class);
job.setCombinerClass(BaseCombine.class);
job.setReducerClass(HBaseReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(MapWritable.class);

FileOutputFormat.setOutputPath(job, new Path(finalOutPutPath));

jobCompleteStatus = job.waitForCompletion(true);
...

my question are:

1) my_hadoop_job.jar contains another classes (business logic) not only
Map,Combine,Reduce classes and I still don't understand how can I submit job
which needs all classes from my_hadoop_job.jar?
2) Do I need to submit a my_hadoop_job.jar too? If yes what is the way to do
it?

Thanks In Advance
Oleg.

On Tue, Oct 18, 2011 at 2:11 PM, Uma Maheswara Rao G 72686 <
mahesw...@huawei.com> wrote:

> - Original Message -
> From: Bejoy KS 
> Date: Tuesday, October 18, 2011 5:25 pm
> Subject: Re: execute hadoop job from remote web application
> To: common-user@hadoop.apache.org
>
> > Oleg
> >  If you are looking at how to submit your jobs using
> > JobClient then the
> > below sample can give you a start.
> >
> > //get the configuration parameters and assigns a job name
> >JobConf conf = new JobConf(getConf(), MyClass.class);
> >conf.setJobName("SMS Reports");
> >
> >//setting key value types for mapper and reducer outputs
> >conf.setOutputKeyClass(Text.class);
> >conf.setOutputValueClass(Text.class);
> >
> >//specifying the custom reducer class
> >conf.setReducerClass(SmsReducer.class);
> >
> >//Specifying the input directories(@ runtime) and Mappers
> > independently for inputs from multiple sources
> >FileInputFormat.addInputPath(conf, new Path(args[0]));
> >
> >//Specifying the output directory @ runtime
> >FileOutputFormat.setOutputPath(conf, new Path(args[1]));
> >
> >JobClient.runJob(conf);
> >
> > Along with the hadoop jars you may need to have the config files
> > as well on
> > your client.
> >
> > The sample is from old map reduce API. You can use the new one as
> > well in
> > that we use the Job instead of JobClient.
> >
> > Hope it helps!..
> >
> > Regards
> > Bejoy.K.S
> >
> >
> > On Tue, Oct 18, 2011 at 5:00 PM, Oleg Ruchovets
> > wrote:
> > > Excellent. Can you give a small example of code.
> > >
> Good samle by Bejoy
> hope, you have access for this site.
> Also please go through this docs,
>
> http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Example%3A+WordCount+v2.0
> Here is the wordcount example.
>
> > >
> > > On Tue, Oct 18, 2011 at 1:13 PM, Uma Maheswara Rao G 72686 <
> > > mahesw...@huawei.com> wrote:
> > >
> > > >
> > > > - Original Message -
> > > > From: Oleg Ruchovets 
> > > > Date: Tuesday, October 18, 2011 4:11 pm
> > > > Subject: execute hadoop job from remote web application
> > > > To: common-user@hadoop.apache.org
> > > >
> > > > > Hi , what is the way to execute hadoop job on remote
> > cluster. I
> > > > > want to
> > > > > execute my hadoop job from remote web  application , but I
> > didn't> > > find any
> > > > > hadoop client (remote API) to do it.
> > > > >
> > > > > Please advice.
> > > > > Oleg
> > > > >
> > > > You can put the Hadoop jars in your web applications classpath
> > and find
> > > the
> > > > Class JobClient and submit the jobs using it.
> > > >
> > > > Regards,
> > > > Uma
> > > >
> > > >
> > >
> >
> Regards
> Uma
>


Re: execute hadoop job from remote web application

2011-10-18 Thread Uma Maheswara Rao G 72686
- Original Message -
From: Bejoy KS 
Date: Tuesday, October 18, 2011 5:25 pm
Subject: Re: execute hadoop job from remote web application
To: common-user@hadoop.apache.org

> Oleg
>  If you are looking at how to submit your jobs using 
> JobClient then the
> below sample can give you a start.
> 
> //get the configuration parameters and assigns a job name
>JobConf conf = new JobConf(getConf(), MyClass.class);
>conf.setJobName("SMS Reports");
> 
>//setting key value types for mapper and reducer outputs
>conf.setOutputKeyClass(Text.class);
>conf.setOutputValueClass(Text.class);
> 
>//specifying the custom reducer class
>conf.setReducerClass(SmsReducer.class);
> 
>//Specifying the input directories(@ runtime) and Mappers
> independently for inputs from multiple sources
>FileInputFormat.addInputPath(conf, new Path(args[0]));
> 
>//Specifying the output directory @ runtime
>FileOutputFormat.setOutputPath(conf, new Path(args[1]));
> 
>JobClient.runJob(conf);
> 
> Along with the hadoop jars you may need to have the config files 
> as well on
> your client.
> 
> The sample is from old map reduce API. You can use the new one as 
> well in
> that we use the Job instead of JobClient.
> 
> Hope it helps!..
> 
> Regards
> Bejoy.K.S
> 
> 
> On Tue, Oct 18, 2011 at 5:00 PM, Oleg Ruchovets 
> wrote:
> > Excellent. Can you give a small example of code.
> >
Good samle by Bejoy
hope, you have access for this site.
Also please go through this docs,
http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Example%3A+WordCount+v2.0
Here is the wordcount example.

> >
> > On Tue, Oct 18, 2011 at 1:13 PM, Uma Maheswara Rao G 72686 <
> > mahesw...@huawei.com> wrote:
> >
> > >
> > > - Original Message -
> > > From: Oleg Ruchovets 
> > > Date: Tuesday, October 18, 2011 4:11 pm
> > > Subject: execute hadoop job from remote web application
> > > To: common-user@hadoop.apache.org
> > >
> > > > Hi , what is the way to execute hadoop job on remote 
> cluster. I
> > > > want to
> > > > execute my hadoop job from remote web  application , but I 
> didn't> > > find any
> > > > hadoop client (remote API) to do it.
> > > >
> > > > Please advice.
> > > > Oleg
> > > >
> > > You can put the Hadoop jars in your web applications classpath 
> and find
> > the
> > > Class JobClient and submit the jobs using it.
> > >
> > > Regards,
> > > Uma
> > >
> > >
> >
> 
Regards
Uma


Re: execute hadoop job from remote web application

2011-10-18 Thread Bejoy KS
Oleg
  If you are looking at how to submit your jobs using JobClient then the
below sample can give you a start.

//get the configuration parameters and assigns a job name
JobConf conf = new JobConf(getConf(), MyClass.class);
conf.setJobName("SMS Reports");

//setting key value types for mapper and reducer outputs
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Text.class);

//specifying the custom reducer class
conf.setReducerClass(SmsReducer.class);

//Specifying the input directories(@ runtime) and Mappers
independently for inputs from multiple sources
FileInputFormat.addInputPath(conf, new Path(args[0]));

//Specifying the output directory @ runtime
FileOutputFormat.setOutputPath(conf, new Path(args[1]));

JobClient.runJob(conf);

Along with the hadoop jars you may need to have the config files as well on
your client.

The sample is from old map reduce API. You can use the new one as well in
that we use the Job instead of JobClient.

Hope it helps!..

Regards
Bejoy.K.S


On Tue, Oct 18, 2011 at 5:00 PM, Oleg Ruchovets wrote:

> Excellent. Can you give a small example of code.
>
>
> On Tue, Oct 18, 2011 at 1:13 PM, Uma Maheswara Rao G 72686 <
> mahesw...@huawei.com> wrote:
>
> >
> > - Original Message -
> > From: Oleg Ruchovets 
> > Date: Tuesday, October 18, 2011 4:11 pm
> > Subject: execute hadoop job from remote web application
> > To: common-user@hadoop.apache.org
> >
> > > Hi , what is the way to execute hadoop job on remote cluster. I
> > > want to
> > > execute my hadoop job from remote web  application , but I didn't
> > > find any
> > > hadoop client (remote API) to do it.
> > >
> > > Please advice.
> > > Oleg
> > >
> > You can put the Hadoop jars in your web applications classpath and find
> the
> > Class JobClient and submit the jobs using it.
> >
> > Regards,
> > Uma
> >
> >
>


RE: execute hadoop job from remote web application

2011-10-18 Thread Devaraj K
The job submission code can be written this way.

// Create a new Job
Job job = new Job(new Configuration());
job.setJarByClass(MyJob.class);

// Specify various job-specific parameters
job.setJobName("myjob");

job.setInputPath(new Path("in"));
job.setOutputPath(new Path("out"));

job.setMapperClass(MyJob.MyMapper.class);
job.setReducerClass(MyJob.MyReducer.class);

// Submit the job
job.submit();

For submitting this, need to add the hadoop jar files and configuration
files in the class path of the application from where you want to
submit the job.

You can refer this docs for more info on Job API's.


http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapred
uce/Job.html


Devaraj K 

-Original Message-
From: Oleg Ruchovets [mailto:oruchov...@gmail.com] 
Sent: Tuesday, October 18, 2011 5:01 PM
To: common-user@hadoop.apache.org
Subject: Re: execute hadoop job from remote web application

Excellent. Can you give a small example of code.


On Tue, Oct 18, 2011 at 1:13 PM, Uma Maheswara Rao G 72686 <
mahesw...@huawei.com> wrote:

>
> - Original Message -
> From: Oleg Ruchovets 
> Date: Tuesday, October 18, 2011 4:11 pm
> Subject: execute hadoop job from remote web application
> To: common-user@hadoop.apache.org
>
> > Hi , what is the way to execute hadoop job on remote cluster. I
> > want to
> > execute my hadoop job from remote web  application , but I didn't
> > find any
> > hadoop client (remote API) to do it.
> >
> > Please advice.
> > Oleg
> >
> You can put the Hadoop jars in your web applications classpath and find
the
> Class JobClient and submit the jobs using it.
>
> Regards,
> Uma
>
>



Re: execute hadoop job from remote web application

2011-10-18 Thread Oleg Ruchovets
Excellent. Can you give a small example of code.


On Tue, Oct 18, 2011 at 1:13 PM, Uma Maheswara Rao G 72686 <
mahesw...@huawei.com> wrote:

>
> - Original Message -
> From: Oleg Ruchovets 
> Date: Tuesday, October 18, 2011 4:11 pm
> Subject: execute hadoop job from remote web application
> To: common-user@hadoop.apache.org
>
> > Hi , what is the way to execute hadoop job on remote cluster. I
> > want to
> > execute my hadoop job from remote web  application , but I didn't
> > find any
> > hadoop client (remote API) to do it.
> >
> > Please advice.
> > Oleg
> >
> You can put the Hadoop jars in your web applications classpath and find the
> Class JobClient and submit the jobs using it.
>
> Regards,
> Uma
>
>


Re: execute hadoop job from remote web application

2011-10-18 Thread Uma Maheswara Rao G 72686

- Original Message -
From: Oleg Ruchovets 
Date: Tuesday, October 18, 2011 4:11 pm
Subject: execute hadoop job from remote web application
To: common-user@hadoop.apache.org

> Hi , what is the way to execute hadoop job on remote cluster. I 
> want to
> execute my hadoop job from remote web  application , but I didn't 
> find any
> hadoop client (remote API) to do it.
> 
> Please advice.
> Oleg
>
You can put the Hadoop jars in your web applications classpath and find the 
Class JobClient and submit the jobs using it.
 
Regards,
Uma
  


Re: execute hadoop job from remote web application

2011-10-18 Thread Steve Loughran

On 18/10/11 11:40, Oleg Ruchovets wrote:

Hi , what is the way to execute hadoop job on remote cluster. I want to
execute my hadoop job from remote web  application , but I didn't find any
hadoop client (remote API) to do it.

Please advice.
Oleg



the Job class lets you build up and submit jobs from any java process 
that has RPC access to the Job Tracker