Re: doubt on Hadoop job submission process

2012-08-13 Thread Harsh J
Hi Manoj,

As I had said before, Hadoop will auto-find the jar from your runtime
classpath and use that. All you need to do is set the right class
(driver) to use via JobConf.setJarByClass(…).

On Mon, Aug 13, 2012 at 5:50 PM, Manoj Babu  wrote:
> Then i need to submit the jar contains non hadoop activity classes and its
> supporting libraries to all the nodes since i can't create two jar's.
> Is there anyway to do it optimized?
>
>
> Cheers!
> Manoj.
>
>
>
> On Mon, Aug 13, 2012 at 5:20 PM, Harsh J  wrote:
>>
>> Sure, you may separate the logic as you want it to be, but just ensure
>> the configuration object has a proper setJar or setJarByClass done on
>> it before you submit the job.
>>
>> On Mon, Aug 13, 2012 at 4:43 PM, Manoj Babu  wrote:
>> > Hi Harsh,
>> >
>> > Thanks for your reply.
>> >
>> > Consider from my main program i am doing so many
>> > activities(Reading/writing/updating non hadoop activities) before
>> > invoking
>> > JobClient.runJob(conf);
>> > Is it anyway to separate the process flow by programmatic instead of
>> > going
>> > for any workflow engine?
>> >
>> > Cheers!
>> > Manoj.
>> >
>> >
>> >
>> > On Mon, Aug 13, 2012 at 4:10 PM, Harsh J  wrote:
>> >>
>> >> Hi Manoj,
>> >>
>> >> Reply inline.
>> >>
>> >> On Mon, Aug 13, 2012 at 3:42 PM, Manoj Babu  wrote:
>> >> > Hi All,
>> >> >
>> >> > Normal Hadoop job submission process involves:
>> >> >
>> >> > Checking the input and output specifications of the job.
>> >> > Computing the InputSplits for the job.
>> >> > Setup the requisite accounting information for the DistributedCache
>> >> > of
>> >> > the
>> >> > job, if necessary.
>> >> > Copying the job's jar and configuration to the map-reduce system
>> >> > directory
>> >> > on the distributed file-system.
>> >> > Submitting the job to the JobTracker and optionally monitoring it's
>> >> > status.
>> >> >
>> >> > I have a doubt in 4th point of  job execution flow could any of you
>> >> > explain
>> >> > it?
>> >> >
>> >> > What is job's jar?
>> >>
>> >> The job.jar is the jar you supply via "hadoop jar ". Technically
>> >> though, it is the jar pointed by JobConf.getJar() (Set via setJar or
>> >> setJarByClass calls).
>> >>
>> >> > Is it job's jar is the one we submitted to hadoop or hadoop will
>> >> > build
>> >> > based
>> >> > on the job configuration object?
>> >>
>> >> It is the former, as explained above.
>> >>
>> >> --
>> >> Harsh J
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J


Re: doubt on Hadoop job submission process

2012-08-13 Thread Manoj Babu
Then i need to submit the jar contains non hadoop activity classes and its
supporting libraries to all the nodes since i can't create two jar's.
Is there anyway to do it optimized?


Cheers!
Manoj.



On Mon, Aug 13, 2012 at 5:20 PM, Harsh J  wrote:

> Sure, you may separate the logic as you want it to be, but just ensure
> the configuration object has a proper setJar or setJarByClass done on
> it before you submit the job.
>
> On Mon, Aug 13, 2012 at 4:43 PM, Manoj Babu  wrote:
> > Hi Harsh,
> >
> > Thanks for your reply.
> >
> > Consider from my main program i am doing so many
> > activities(Reading/writing/updating non hadoop activities) before
> invoking
> > JobClient.runJob(conf);
> > Is it anyway to separate the process flow by programmatic instead of
> going
> > for any workflow engine?
> >
> > Cheers!
> > Manoj.
> >
> >
> >
> > On Mon, Aug 13, 2012 at 4:10 PM, Harsh J  wrote:
> >>
> >> Hi Manoj,
> >>
> >> Reply inline.
> >>
> >> On Mon, Aug 13, 2012 at 3:42 PM, Manoj Babu  wrote:
> >> > Hi All,
> >> >
> >> > Normal Hadoop job submission process involves:
> >> >
> >> > Checking the input and output specifications of the job.
> >> > Computing the InputSplits for the job.
> >> > Setup the requisite accounting information for the DistributedCache of
> >> > the
> >> > job, if necessary.
> >> > Copying the job's jar and configuration to the map-reduce system
> >> > directory
> >> > on the distributed file-system.
> >> > Submitting the job to the JobTracker and optionally monitoring it's
> >> > status.
> >> >
> >> > I have a doubt in 4th point of  job execution flow could any of you
> >> > explain
> >> > it?
> >> >
> >> > What is job's jar?
> >>
> >> The job.jar is the jar you supply via "hadoop jar ". Technically
> >> though, it is the jar pointed by JobConf.getJar() (Set via setJar or
> >> setJarByClass calls).
> >>
> >> > Is it job's jar is the one we submitted to hadoop or hadoop will build
> >> > based
> >> > on the job configuration object?
> >>
> >> It is the former, as explained above.
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>


Re: doubt on Hadoop job submission process

2012-08-13 Thread Harsh J
Sure, you may separate the logic as you want it to be, but just ensure
the configuration object has a proper setJar or setJarByClass done on
it before you submit the job.

On Mon, Aug 13, 2012 at 4:43 PM, Manoj Babu  wrote:
> Hi Harsh,
>
> Thanks for your reply.
>
> Consider from my main program i am doing so many
> activities(Reading/writing/updating non hadoop activities) before invoking
> JobClient.runJob(conf);
> Is it anyway to separate the process flow by programmatic instead of going
> for any workflow engine?
>
> Cheers!
> Manoj.
>
>
>
> On Mon, Aug 13, 2012 at 4:10 PM, Harsh J  wrote:
>>
>> Hi Manoj,
>>
>> Reply inline.
>>
>> On Mon, Aug 13, 2012 at 3:42 PM, Manoj Babu  wrote:
>> > Hi All,
>> >
>> > Normal Hadoop job submission process involves:
>> >
>> > Checking the input and output specifications of the job.
>> > Computing the InputSplits for the job.
>> > Setup the requisite accounting information for the DistributedCache of
>> > the
>> > job, if necessary.
>> > Copying the job's jar and configuration to the map-reduce system
>> > directory
>> > on the distributed file-system.
>> > Submitting the job to the JobTracker and optionally monitoring it's
>> > status.
>> >
>> > I have a doubt in 4th point of  job execution flow could any of you
>> > explain
>> > it?
>> >
>> > What is job's jar?
>>
>> The job.jar is the jar you supply via "hadoop jar ". Technically
>> though, it is the jar pointed by JobConf.getJar() (Set via setJar or
>> setJarByClass calls).
>>
>> > Is it job's jar is the one we submitted to hadoop or hadoop will build
>> > based
>> > on the job configuration object?
>>
>> It is the former, as explained above.
>>
>> --
>> Harsh J
>
>



-- 
Harsh J


Re: doubt on Hadoop job submission process

2012-08-13 Thread Manoj Babu
Hi Harsh,

Thanks for your reply.

Consider from my main program i am doing so
many activities(Reading/writing/updating non hadoop activities) before
invoking JobClient.runJob(conf);
Is it anyway to separate the process flow by programmatic instead of going
for any workflow engine?

Cheers!
Manoj.



On Mon, Aug 13, 2012 at 4:10 PM, Harsh J  wrote:

> Hi Manoj,
>
> Reply inline.
>
> On Mon, Aug 13, 2012 at 3:42 PM, Manoj Babu  wrote:
> > Hi All,
> >
> > Normal Hadoop job submission process involves:
> >
> > Checking the input and output specifications of the job.
> > Computing the InputSplits for the job.
> > Setup the requisite accounting information for the DistributedCache of
> the
> > job, if necessary.
> > Copying the job's jar and configuration to the map-reduce system
> directory
> > on the distributed file-system.
> > Submitting the job to the JobTracker and optionally monitoring it's
> status.
> >
> > I have a doubt in 4th point of  job execution flow could any of you
> explain
> > it?
> >
> > What is job's jar?
>
> The job.jar is the jar you supply via "hadoop jar ". Technically
> though, it is the jar pointed by JobConf.getJar() (Set via setJar or
> setJarByClass calls).
>
> > Is it job's jar is the one we submitted to hadoop or hadoop will build
> based
> > on the job configuration object?
>
> It is the former, as explained above.
>
> --
> Harsh J
>


Re: doubt on Hadoop job submission process

2012-08-13 Thread Harsh J
Hi Manoj,

Reply inline.

On Mon, Aug 13, 2012 at 3:42 PM, Manoj Babu  wrote:
> Hi All,
>
> Normal Hadoop job submission process involves:
>
> Checking the input and output specifications of the job.
> Computing the InputSplits for the job.
> Setup the requisite accounting information for the DistributedCache of the
> job, if necessary.
> Copying the job's jar and configuration to the map-reduce system directory
> on the distributed file-system.
> Submitting the job to the JobTracker and optionally monitoring it's status.
>
> I have a doubt in 4th point of  job execution flow could any of you explain
> it?
>
> What is job's jar?

The job.jar is the jar you supply via "hadoop jar ". Technically
though, it is the jar pointed by JobConf.getJar() (Set via setJar or
setJarByClass calls).

> Is it job's jar is the one we submitted to hadoop or hadoop will build based
> on the job configuration object?

It is the former, as explained above.

-- 
Harsh J