Re: Cleanup after a Job

2012-05-01 Thread kasi subrahmanyam
Hi arun,

I can see that the output commiter is present in the reducer.
How to make sure thtat this commiter happens at the end of the job or does
it run by default at the end of the job.
I can have more than one reducer tasks.




On Sun, Apr 29, 2012 at 11:28 PM, Arun C Murthy a...@hortonworks.com wrote:

 Use OutputCommitter.(abortJob, commitJob):

 http://hadoop.apache.org/common/docs/r1.0.2/api/org/apache/hadoop/mapred/OutputCommitter.html

 Arun

 On Apr 26, 2012, at 4:44 PM, kasi subrahmanyam wrote:

 Hi

 I have few jobs added to a Job controller .
 I need a afterJob() to be executed after the completion of s Job.
 For example

 Here i am actually overriding the Job of JobControl.
 I have Job2 depending on the output of Job1.This input for Job2is obtained
 after doing some File System operations on the output of the Job1.This
 operation should happen in a afterJob( ) method while is available for each
 Job.How do i make sure that afterJob () method is called for each Job added
 to the controller before running the jobs that are depending on it.


 Thanks


 --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/





Re: Cleanup after a Job

2012-05-01 Thread Robert Evans
Either abortJob or commitJob will be called for all jobs.  AbortJob will be 
called if the job has failed.  CommitJob will be called if it succeeded.  The 
purpose of these are to commit the output of the map/reduce job and cleanup any 
temporary files/data that might be lying around.

CommitTask/abortTask is similar, and is called for each individual task.

--Bobby Evans


On 5/1/12 8:32 AM, kasi subrahmanyam kasisubbu...@gmail.com wrote:

Hi arun,

I can see that the output commiter is present in the reducer.
How to make sure thtat this commiter happens at the end of the job or does it 
run by default at the end of the job.
I can have more than one reducer tasks.




On Sun, Apr 29, 2012 at 11:28 PM, Arun C Murthy a...@hortonworks.com wrote:
Use OutputCommitter.(abortJob, commitJob):
http://hadoop.apache.org/common/docs/r1.0.2/api/org/apache/hadoop/mapred/OutputCommitter.html

Arun

On Apr 26, 2012, at 4:44 PM, kasi subrahmanyam wrote:

Hi

I have few jobs added to a Job controller .
I need a afterJob() to be executed after the completion of s Job.
For example

Here i am actually overriding the Job of JobControl.
I have Job2 depending on the output of Job1.This input for Job2is obtained 
after doing some File System operations on the output of the Job1.This 
operation should happen in a afterJob( ) method while is available for each 
Job.How do i make sure that afterJob () method is called for each Job added to 
the controller before running the jobs that are depending on it.


Thanks

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/






Re: Cleanup after a Job

2012-05-01 Thread kasi subrahmanyam
Hi Robert,
Could you provied me the exact method of the JobControl Job or jobConf
which calls the commitJob method
Thanks

On Tue, May 1, 2012 at 7:36 PM, Robert Evans ev...@yahoo-inc.com wrote:

  Either abortJob or commitJob will be called for all jobs.  AbortJob will
 be called if the job has failed.  CommitJob will be called if it succeeded.
  The purpose of these are to commit the output of the map/reduce job and
 cleanup any temporary files/data that might be lying around.

 CommitTask/abortTask is similar, and is called for each individual task.

 --Bobby Evans



 On 5/1/12 8:32 AM, kasi subrahmanyam kasisubbu...@gmail.com wrote:

 Hi arun,

 I can see that the output commiter is present in the reducer.
 How to make sure thtat this commiter happens at the end of the job or does
 it run by default at the end of the job.
 I can have more than one reducer tasks.




 On Sun, Apr 29, 2012 at 11:28 PM, Arun C Murthy a...@hortonworks.com
 wrote:

 Use OutputCommitter.(abortJob, commitJob):

 http://hadoop.apache.org/common/docs/r1.0.2/api/org/apache/hadoop/mapred/OutputCommitter.html

 Arun

 On Apr 26, 2012, at 4:44 PM, kasi subrahmanyam wrote:

 Hi

 I have few jobs added to a Job controller .
 I need a afterJob() to be executed after the completion of s Job.
 For example

 Here i am actually overriding the Job of JobControl.
 I have Job2 depending on the output of Job1.This input for Job2is obtained
 after doing some File System operations on the output of the Job1.This
 operation should happen in a afterJob( ) method while is available for each
 Job.How do i make sure that afterJob () method is called for each Job added
 to the controller before running the jobs that are depending on it.


 Thanks


 --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/







Re: Cleanup after a Job

2012-05-01 Thread Robert Evans
That really depends on the API that you are using.  In the newer API 
o.a.h.mapreduce.OutputFormat.getOutputCommitter returns the output committer to 
use.  In the older API, which is the one that I expect you are using 
JobConf.getOutputCommitter returns the output committer to use.  Be careful, 
because by default you are probably using the FileOutputCommitter to put the 
files in the proper place when your map/reduce job is done.  If you replace the 
FileOutputCommitter with something else that does not do the same things your 
map/reduce jobs will stop working properly.  Typically what you would want to 
do is to have your class inherit from FileOutputCommitter and then in 
commitJob/abortJob call super.commitJob() or super.abortJob respectively.  Then 
do whatever else you want to do.

--Bobby Evans

On 5/1/12 9:17 AM, kasi subrahmanyam kasisubbu...@gmail.com wrote:

Hi Robert,
Could you provied me the exact method of the JobControl Job or jobConf which 
calls the commitJob method
Thanks

On Tue, May 1, 2012 at 7:36 PM, Robert Evans ev...@yahoo-inc.com wrote:
Either abortJob or commitJob will be called for all jobs.  AbortJob will be 
called if the job has failed.  CommitJob will be called if it succeeded.  The 
purpose of these are to commit the output of the map/reduce job and cleanup any 
temporary files/data that might be lying around.

CommitTask/abortTask is similar, and is called for each individual task.

--Bobby Evans



On 5/1/12 8:32 AM, kasi subrahmanyam kasisubbu...@gmail.com 
http://kasisubbu...@gmail.com  wrote:

Hi arun,

I can see that the output commiter is present in the reducer.
How to make sure thtat this commiter happens at the end of the job or does it 
run by default at the end of the job.
I can have more than one reducer tasks.




On Sun, Apr 29, 2012 at 11:28 PM, Arun C Murthy a...@hortonworks.com 
http://a...@hortonworks.com  wrote:
Use OutputCommitter.(abortJob, commitJob):
http://hadoop.apache.org/common/docs/r1.0.2/api/org/apache/hadoop/mapred/OutputCommitter.html

Arun

On Apr 26, 2012, at 4:44 PM, kasi subrahmanyam wrote:

Hi

I have few jobs added to a Job controller .
I need a afterJob() to be executed after the completion of s Job.
For example

Here i am actually overriding the Job of JobControl.
I have Job2 depending on the output of Job1.This input for Job2is obtained 
after doing some File System operations on the output of the Job1.This 
operation should happen in a afterJob( ) method while is available for each 
Job.How do i make sure that afterJob () method is called for each Job added to 
the controller before running the jobs that are depending on it.


Thanks

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/