[
https://issues.apache.org/jira/browse/OOZIE-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156886#comment-16156886
]
Sergey Zhemzhitsky edited comment on OOZIE-2547 at 9/7/17 12:35 PM:
--------------------------------------------------------------------
Hello [~rkanter], [~rohini], [~satishsaley]
I've noticed that the patch from this issue removes
*determineSparkJarsAndClasspath* method introduced in OOZIE-2277 by [~rkanter].
Currently we are migrating our jobs from [CDH
5.7|http://archive.cloudera.com/cdh5/cdh/5/oozie-4.1.0-cdh5.7.0.releasenotes.html]
without this patch to CDH 5.12 that has this patch applied starting from [CDH
5.10|http://archive.cloudera.com/cdh5/cdh/5/oozie-4.1.0-cdh5.10.0.releasenotes.html]
and it seems that there is a regression, because all of our jobs which use
hdfs api internally started to fail with the following error in the oozie
launcher logs
{code}
Log Type: stderr
Log Upload Time: Thu Sep 07 11:43:40 +0300 2017
Log Length: 938
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/hadoop/conf/Configuration
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at
sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.conf.Configuration
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
Log Type: stdout
Log Upload Time: Thu Sep 07 11:43:40 +0300 2017
Log Length: 0
{code}
So it seems that this patch prevents oozie from fullfilling spark classpath
correctly with hadoop libraries.
Could you please suggest how to provide spark job with
hadoop-configuration.jar. Should it and all the necessary dependencies be
placed within the lib directory of the workflow?
was (Author: szhemzhitsky):
Hello [~rkanter], [~rohini], [~satishsaley]
I've noticed that the patch from this issue removes
**determineSparkJarsAndClasspath** method introduced in OOZIE-2277 by
[~rkanter].
Currently we are migrating our jobs from [CDH
5.7|http://archive.cloudera.com/cdh5/cdh/5/oozie-4.1.0-cdh5.7.0.releasenotes.html]
without this patch to CDH 5.12 that has this patch applied starting from [CDH
5.10|http://archive.cloudera.com/cdh5/cdh/5/oozie-4.1.0-cdh5.10.0.releasenotes.html]
and it seems that there is a regression, because all of our jobs which use
hdfs api internally started to fail with the following error in the oozie
launcher logs
{code}
Log Type: stderr
Log Upload Time: Thu Sep 07 11:43:40 +0300 2017
Log Length: 938
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/hadoop/conf/Configuration
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at
sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.conf.Configuration
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
Log Type: stdout
Log Upload Time: Thu Sep 07 11:43:40 +0300 2017
Log Length: 0
{code}
So it seems that this patch prevents oozie from fullfilling spark classpath
correctly with hadoop libraries.
Could you please suggest how to provide spark job with
hadoop-configuration.jar. Should it and all the necessary dependencies be
placed within the lib directory of the workflow?
> Add mapreduce.job.cache.files to spark action
> ---------------------------------------------
>
> Key: OOZIE-2547
> URL: https://issues.apache.org/jira/browse/OOZIE-2547
> Project: Oozie
> Issue Type: Bug
> Reporter: Satish Subhashrao Saley
> Assignee: Satish Subhashrao Saley
> Priority: Minor
> Fix For: 4.3.0
>
> Attachments: OOZIE-2547-1.patch, OOZIE-2547-4.patch,
> OOZIE-2547-5.patch, yarn-cluster_launcher.txt
>
>
> Currently, we pass jars using --jars option while submitting spark job. Also,
> we add spark.yarn.dist.files option in case of yarn-client mode.
> Instead of that, we can have only --files option and pass on the files which
> are present in mapreduce.job.cache.files. While doing so, we make sure that
> spark won't make another copy of the files if files exist on the hdfs. We saw
> the issues when files are getting copied multiple times and causing
> exceptions such as :
> {code}
> Diagnostics: Resource
> hdfs://localhost/user/saley/.sparkStaging/application_1234_123/oozie-examples.jar
> changed on src filesystem
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)