-libjars with multiple jars broken when client and cluster reside on different
OSs
----------------------------------------------------------------------------------
Key: HADOOP-4864
URL: https://issues.apache.org/jira/browse/HADOOP-4864
Project: Hadoop Core
Issue Type: Bug
Components: filecache
Affects Versions: 0.19.0
Environment: When your hadoop job spans OSs.
Reporter: Stuart White
Priority: Minor
When submitting a hadoop job from Windows (Cygwin) to a Linux hadoop cluster
(or vice versa), and when you specify multiple additional jar files via the
-libjars flag, hadoop throws a ClassNotFoundException for any classes located
in the additional jars specified via the -libjars flag.
This is caused by the fact that hadoop uses
system.getProperty("path.separator") as the delimiter in the list of jar files
passed via -libjars.
If your job spans platforms, system.getProperty("path.separator") returns a
different delimiter on the different platforms.
My suggested solution is to use a comma as the delimiter, rather than the
path.separator.
I realize comma is, perhaps, a poor choice for a delimiter because it is valid
in filenames on both Windows and Linux, but the -libjars flag uses it as the
delimiter when listing the additional required jars. So, I figured if it's
already being used as a delimiter, then it's reasonable to use it internally as
well.
I have a patch that applied my suggested change, but I don't see anywhere so
upload it. So, I'll go ahead and create this JIRA and hope that I will have
the opportunity to add a patch later.
Now, with this change, I can submit hadoop jobs (requiring multiple
supporting jars) from my Windows laptop (via cygwin) to my 10-node
Linux hadoop cluster.
Any chance this change could be applied to the hadoop codebase?
To recreate the problem I'm seeing, do the following:
- Setup a hadoop cluster on linux
- Perform the remaining steps on cygwin, with a hadoop installation
configured to point to the linux cluster. (set fs.default.name and
mapred.job.tracker)
- Extract the tarball. Change into created directory.
tar xvfz Example.tar.gz
cd Example
- Edit build.properties, set your hadoop.home appropriately, then
build the example.
ant
- Load the file Example.in into your dfs
hadoop dfs -copyFromLocal Example.in Example.in
- Execute the provided shell script, passing it testID 1.
./Example.sh 1
This test does not use -libjars, and it completes successfully.
- Next, execute testID 2.
./Example.sh 2
This test uses -libjars with 1 jarfile (Foo.jar), and it completes
successfully.
- Next, execute testID 3.
./Example.sh 3
This test uses -libjars with 1 jarfile (Bar.jar), and it completes
successfully.
- Next, execute testID 4.
./Example.sh 4
This test uses -libjars with 2 jarfiles (Foo.jar and Bar.jar), and
it fails with a ClassNotFoundException.
This behavior only occurs when calling from cygwin to linux or vice
versa. If both the cluster and the client reside on either linux or
cygwin, the problem does not occur.
I'm continuing to dig to see what I can figure out, but since I'm very
new to hadoop (started using it this week), I thought I'd go ahead and
throw this out there to see if anyone can help.
Thanks!
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.