[
https://issues.apache.org/jira/browse/HADOOP-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stuart White updated HADOOP-4864:
---------------------------------
Attachment: patch.txt
Patch that changes Hadoop's internal delimiter for list of jars specified via
-libjars from using System.getProperty("path.separator") to using a comma.
This is because path.separator is platform-specific and therefore does not
serve as an appropriate delimiter across platforms.
> -libjars with multiple jars broken when client and cluster reside on
> different OSs
> ----------------------------------------------------------------------------------
>
> Key: HADOOP-4864
> URL: https://issues.apache.org/jira/browse/HADOOP-4864
> Project: Hadoop Core
> Issue Type: Bug
> Components: filecache
> Affects Versions: 0.19.0
> Environment: When your hadoop job spans OSs.
> Reporter: Stuart White
> Priority: Minor
> Attachments: patch.txt
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> When submitting a hadoop job from Windows (Cygwin) to a Linux hadoop cluster
> (or vice versa), and when you specify multiple additional jar files via the
> -libjars flag, hadoop throws a ClassNotFoundException for any classes located
> in the additional jars specified via the -libjars flag.
> This is caused by the fact that hadoop uses
> system.getProperty("path.separator") as the delimiter in the list of jar
> files passed via -libjars.
> If your job spans platforms, system.getProperty("path.separator") returns a
> different delimiter on the different platforms.
> My suggested solution is to use a comma as the delimiter, rather than the
> path.separator.
> I realize comma is, perhaps, a poor choice for a delimiter because it is
> valid in filenames on both Windows and Linux, but the -libjars flag uses it
> as the delimiter when listing the additional required jars. So, I figured if
> it's already being used as a delimiter, then it's reasonable to use it
> internally as well.
> I have a patch that applied my suggested change, but I don't see anywhere so
> upload it. So, I'll go ahead and create this JIRA and hope that I will have
> the opportunity to add a patch later.
> Now, with this change, I can submit hadoop jobs (requiring multiple
> supporting jars) from my Windows laptop (via cygwin) to my 10-node
> Linux hadoop cluster.
> Any chance this change could be applied to the hadoop codebase?
> To recreate the problem I'm seeing, do the following:
> - Setup a hadoop cluster on linux
> - Perform the remaining steps on cygwin, with a hadoop installation
> configured to point to the linux cluster. (set fs.default.name and
> mapred.job.tracker)
> - Extract the tarball. Change into created directory.
> tar xvfz Example.tar.gz
> cd Example
> - Edit build.properties, set your hadoop.home appropriately, then
> build the example.
> ant
> - Load the file Example.in into your dfs
> hadoop dfs -copyFromLocal Example.in Example.in
> - Execute the provided shell script, passing it testID 1.
> ./Example.sh 1
> This test does not use -libjars, and it completes successfully.
> - Next, execute testID 2.
> ./Example.sh 2
> This test uses -libjars with 1 jarfile (Foo.jar), and it completes
> successfully.
> - Next, execute testID 3.
> ./Example.sh 3
> This test uses -libjars with 1 jarfile (Bar.jar), and it completes
> successfully.
> - Next, execute testID 4.
> ./Example.sh 4
> This test uses -libjars with 2 jarfiles (Foo.jar and Bar.jar), and
> it fails with a ClassNotFoundException.
> This behavior only occurs when calling from cygwin to linux or vice
> versa. If both the cluster and the client reside on either linux or
> cygwin, the problem does not occur.
> I'm continuing to dig to see what I can figure out, but since I'm very
> new to hadoop (started using it this week), I thought I'd go ahead and
> throw this out there to see if anyone can help.
> Thanks!
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.