[ https://issues.apache.org/jira/browse/SPARK-15343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335878#comment-15335878 ]
Steve Loughran commented on SPARK-15343: ---------------------------------------- ooh, this is ia pain. FWIW, my current stance on upgrading bits of Hadoop http://steveloughran.blogspot.co.uk/2016/05/fear-of-dependencies.html Should Hadoop make the leap to Jersey 2? I think for trunk/Hadoop 3, yes. For Hadoop 2? we'd be hated by too many people downstream. What Hadoop Timeline client could do is be reworked so that it doesn't try to instantiate the jersey client if you are using the filesystem timeline writer (ATS 1.5+). It's not needed there, even though the current class hierarchy does store it in the base class. The {{TimelineClient}} class creates that {{com.sun.jersey.api.client.Client}} instance to pass in...it's where the config is used (and it's the *only* place in the hadoop codebase which uses it). Created YARN-5271 : essentially the client code could be reworked to create the Jersey client lower down, even without upgrading Hadoop Jersey. You'd still see a stack trace trying to talk to an ATS1.0 server, but for a 1.5 endpoint, all would be well. Too bad Restlet's LGPL license stops ASF code using it: it's a better API. Though it probably doesn't handle Kerberos anyway. > NoClassDefFoundError when initializing Spark with YARN > ------------------------------------------------------ > > Key: SPARK-15343 > URL: https://issues.apache.org/jira/browse/SPARK-15343 > Project: Spark > Issue Type: Bug > Components: YARN > Affects Versions: 2.0.0 > Reporter: Maciej BryĆski > Priority: Critical > > I'm trying to connect Spark 2.0 (compiled from branch-2.0) with Hadoop. > Spark compiled with: > {code} > ./dev/make-distribution.sh -Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver > -Dhadoop.version=2.6.0 -DskipTests > {code} > I'm getting following error > {code} > mbrynski@jupyter:~/spark$ bin/pyspark > Python 3.4.0 (default, Apr 11 2014, 13:05:11) > [GCC 4.8.2] on linux > Type "help", "copyright", "credits" or "license" for more information. > Warning: Master yarn-client is deprecated since 2.0. Please use master "yarn" > with specified deploy mode instead. > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). > 16/05/16 11:54:41 WARN SparkConf: The configuration key 'spark.yarn.jar' has > been deprecated as of Spark 2.0 and may be removed in the future. Please use > the new key 'spark.yarn.jars' instead. > 16/05/16 11:54:41 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 16/05/16 11:54:42 WARN AbstractHandler: No Server set for > org.spark_project.jetty.server.handler.ErrorHandler@f7989f6 > 16/05/16 11:54:43 WARN DomainSocketFactory: The short-circuit local reads > feature cannot be used because libhadoop cannot be loaded. > Traceback (most recent call last): > File "/home/mbrynski/spark/python/pyspark/shell.py", line 38, in <module> > sc = SparkContext() > File "/home/mbrynski/spark/python/pyspark/context.py", line 115, in __init__ > conf, jsc, profiler_cls) > File "/home/mbrynski/spark/python/pyspark/context.py", line 172, in _do_init > self._jsc = jsc or self._initialize_context(self._conf._jconf) > File "/home/mbrynski/spark/python/pyspark/context.py", line 235, in > _initialize_context > return self._jvm.JavaSparkContext(jconf) > File > "/home/mbrynski/spark/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", > line 1183, in __call__ > File > "/home/mbrynski/spark/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line > 312, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling > None.org.apache.spark.api.java.JavaSparkContext. > : java.lang.NoClassDefFoundError: > com/sun/jersey/api/client/config/ClientConfig > at > org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:45) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:163) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:150) > at > org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56) > at > org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:148) > at org.apache.spark.SparkContext.<init>(SparkContext.scala:502) > at > org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) > at py4j.Gateway.invoke(Gateway.java:236) > at > py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) > at > py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) > at py4j.GatewayConnection.run(GatewayConnection.java:211) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.ClassNotFoundException: > com.sun.jersey.api.client.config.ClientConfig > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 19 more > {code} > On 1.6 everything works fine. I'm using HDP2.2 (Hadoop 2.6.0) > I have HADOOP_CONF_DIR and SPARK_HOME env variables. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org