Re: Making sure I understand HADOOP_CLASSPATH

2011-08-22 Thread W.P. McNeill
I meant tasks running on the Task Trackers.

Harsh J.'s answer is what I needed. This makes sense now.

On Mon, Aug 22, 2011 at 11:06 AM, John Armstrong wrote:

> On Mon, 22 Aug 2011 11:01:23 -0700, "W.P. McNeill" 
> wrote:
> > If it is, what is the proper way to make MyJar.jar available to both the
> > Job
> > Client and the Task Trackers?
>
> Do you mean the task trackers, or the tasks themselves?  What process do
> you want to be able to run the code in MyJar.jar?
>


RE: Making sure I understand HADOOP_CLASSPATH

2011-08-22 Thread GOEKE, MATTHEW (AG/1000)
If you are asking how to make those classes available at run time you can 
either use the -libjars command for the distributed cache or you can just shade 
those classes into your jar using maven. I have had enough issues in the past 
with classpath being flaky that I prefer the shading method but obviously that 
is not the preferred route.

Matt

-Original Message-
From: W.P. McNeill [mailto:bill...@gmail.com] 
Sent: Monday, August 22, 2011 1:01 PM
To: common-user@hadoop.apache.org
Subject: Making sure I understand HADOOP_CLASSPATH

What does HADOOP_CLASSPATH set in $HADOOP/conf/hadoop-env.sh do?

This isn't clear to me from documentation and books, so I did some
experimenting. Here's the conclusion I came to: the paths in
HADOOP_CLASSPATH are added to the class path of the Job Client, but they are
not added to the class path of the Task Trackers. Therefore if you put a JAR
called MyJar.jar on the HADOOP_CLASSPATH and don't do anything to make it
available to the Task Trackers as well, calls to MyJar.jar code from the
run() method of your job work, but calls from your Mapper or Reducer will
fail at runtime. Is this correct?

If it is, what is the proper way to make MyJar.jar available to both the Job
Client and the Task Trackers?
This e-mail message may contain privileged and/or confidential information, and 
is intended to be received only by persons entitled
to receive such information. If you have received this e-mail in error, please 
notify the sender immediately. Please delete it and
all attachments from any servers, hard drives or any other media. Other use of 
this e-mail by you is strictly prohibited.

All e-mails and attachments sent and received are subject to monitoring, 
reading and archival by Monsanto, including its
subsidiaries. The recipient of this e-mail is solely responsible for checking 
for the presence of "Viruses" or other "Malware".
Monsanto, along with its subsidiaries, accepts no liability for any damage 
caused by any such code transmitted by or accompanying
this e-mail or any attachment.


The information contained in this email may be subject to the export control 
laws and regulations of the United States, potentially
including but not limited to the Export Administration Regulations (EAR) and 
sanctions regulations issued by the U.S. Department of
Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this 
information you are obligated to comply with all
applicable U.S. export laws and regulations.


Re: Making sure I understand HADOOP_CLASSPATH

2011-08-22 Thread Harsh J
On Mon, Aug 22, 2011 at 11:31 PM, W.P. McNeill  wrote:
> What does HADOOP_CLASSPATH set in $HADOOP/conf/hadoop-env.sh do?
>
> This isn't clear to me from documentation and books, so I did some
> experimenting. Here's the conclusion I came to: the paths in
> HADOOP_CLASSPATH are added to the class path of the Job Client, but they are
> not added to the class path of the Task Trackers. Therefore if you put a JAR
> called MyJar.jar on the HADOOP_CLASSPATH and don't do anything to make it
> available to the Task Trackers as well, calls to MyJar.jar code from the
> run() method of your job work, but calls from your Mapper or Reducer will
> fail at runtime. Is this correct?

Yes, this is right.

> If it is, what is the proper way to make MyJar.jar available to both the Job
> Client and the Task Trackers?

You'll need to use the Distributed Cache. Or you'd need to start the
TaskTrackers with the library on their classpath (which copies over to
launched task JVMs). The latter way is rigid/inflexible when it comes
to jar versioning.

-- 
Harsh J


Re: Making sure I understand HADOOP_CLASSPATH

2011-08-22 Thread John Armstrong
On Mon, 22 Aug 2011 11:01:23 -0700, "W.P. McNeill" 
wrote:
> If it is, what is the proper way to make MyJar.jar available to both the
> Job
> Client and the Task Trackers?

Do you mean the task trackers, or the tasks themselves?  What process do
you want to be able to run the code in MyJar.jar?