[ https://issues.apache.org/jira/browse/SPARK-15343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15566019#comment-15566019 ]
Jo Desmet commented on SPARK-15343: ----------------------------------- By design we apparently have a very tight coupling of scheduling and execution environment(?). Unless this gets better decoupling with a remote-call API, it is as much YARN's as it is SPARK's classpath. Isn't that what Jersey is about? What I advocate is a more subdued approach where we keep supporting new versions of Spark (2.0 and beyond) that remain compatible with mainstream used versions of Hadoop Yarn. We should do so until we have more mainstream adoption of a YARN environment with the more recent libraries, or until other fixes or features are implemented on either Spark or Yarn side. The desire for a newer Jersey library just does not seem that much worth to me compared to this. I am no Yarn fan, but it just feels like we are breaking the bond with Yarn just because we feel it is not going fast enough on that side. > NoClassDefFoundError when initializing Spark with YARN > ------------------------------------------------------ > > Key: SPARK-15343 > URL: https://issues.apache.org/jira/browse/SPARK-15343 > Project: Spark > Issue Type: Bug > Components: YARN > Affects Versions: 2.0.0 > Reporter: Maciej BryĆski > Priority: Critical > > I'm trying to connect Spark 2.0 (compiled from branch-2.0) with Hadoop. > Spark compiled with: > {code} > ./dev/make-distribution.sh -Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver > -Dhadoop.version=2.6.0 -DskipTests > {code} > I'm getting following error > {code} > mbrynski@jupyter:~/spark$ bin/pyspark > Python 3.4.0 (default, Apr 11 2014, 13:05:11) > [GCC 4.8.2] on linux > Type "help", "copyright", "credits" or "license" for more information. > Warning: Master yarn-client is deprecated since 2.0. Please use master "yarn" > with specified deploy mode instead. > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). > 16/05/16 11:54:41 WARN SparkConf: The configuration key 'spark.yarn.jar' has > been deprecated as of Spark 2.0 and may be removed in the future. Please use > the new key 'spark.yarn.jars' instead. > 16/05/16 11:54:41 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 16/05/16 11:54:42 WARN AbstractHandler: No Server set for > org.spark_project.jetty.server.handler.ErrorHandler@f7989f6 > 16/05/16 11:54:43 WARN DomainSocketFactory: The short-circuit local reads > feature cannot be used because libhadoop cannot be loaded. > Traceback (most recent call last): > File "/home/mbrynski/spark/python/pyspark/shell.py", line 38, in <module> > sc = SparkContext() > File "/home/mbrynski/spark/python/pyspark/context.py", line 115, in __init__ > conf, jsc, profiler_cls) > File "/home/mbrynski/spark/python/pyspark/context.py", line 172, in _do_init > self._jsc = jsc or self._initialize_context(self._conf._jconf) > File "/home/mbrynski/spark/python/pyspark/context.py", line 235, in > _initialize_context > return self._jvm.JavaSparkContext(jconf) > File > "/home/mbrynski/spark/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", > line 1183, in __call__ > File > "/home/mbrynski/spark/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line > 312, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling > None.org.apache.spark.api.java.JavaSparkContext. > : java.lang.NoClassDefFoundError: > com/sun/jersey/api/client/config/ClientConfig > at > org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:45) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:163) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:150) > at > org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56) > at > org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:148) > at org.apache.spark.SparkContext.<init>(SparkContext.scala:502) > at > org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) > at py4j.Gateway.invoke(Gateway.java:236) > at > py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) > at > py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) > at py4j.GatewayConnection.run(GatewayConnection.java:211) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.ClassNotFoundException: > com.sun.jersey.api.client.config.ClientConfig > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 19 more > {code} > On 1.6 everything works fine. I'm using HDP2.2 (Hadoop 2.6.0) > I have HADOOP_CONF_DIR and SPARK_HOME env variables. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org