[ https://issues.apache.org/jira/browse/SPARK-15343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15284428#comment-15284428 ]
Sean Owen commented on SPARK-15343: ----------------------------------- @vanzin did you think this was a HDP-specific thing -- if so why? is it something in the YARN traceback? I forget. > NoClassDefFoundError when initializing Spark with YARN > ------------------------------------------------------ > > Key: SPARK-15343 > URL: https://issues.apache.org/jira/browse/SPARK-15343 > Project: Spark > Issue Type: Bug > Components: YARN > Affects Versions: 2.0.0 > Reporter: Maciej BryĆski > Priority: Critical > > I'm trying to connect Spark 2.0 (compiled from branch-2.0) with Hadoop. > Spark compiled with: > {code} > ./dev/make-distribution.sh -Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver > -Dhadoop.version=2.6.0 -DskipTests > {code} > I'm getting following error > {code} > mbrynski@jupyter:~/spark$ bin/pyspark > Python 3.4.0 (default, Apr 11 2014, 13:05:11) > [GCC 4.8.2] on linux > Type "help", "copyright", "credits" or "license" for more information. > Warning: Master yarn-client is deprecated since 2.0. Please use master "yarn" > with specified deploy mode instead. > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). > 16/05/16 11:54:41 WARN SparkConf: The configuration key 'spark.yarn.jar' has > been deprecated as of Spark 2.0 and may be removed in the future. Please use > the new key 'spark.yarn.jars' instead. > 16/05/16 11:54:41 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 16/05/16 11:54:42 WARN AbstractHandler: No Server set for > org.spark_project.jetty.server.handler.ErrorHandler@f7989f6 > 16/05/16 11:54:43 WARN DomainSocketFactory: The short-circuit local reads > feature cannot be used because libhadoop cannot be loaded. > Traceback (most recent call last): > File "/home/mbrynski/spark/python/pyspark/shell.py", line 38, in <module> > sc = SparkContext() > File "/home/mbrynski/spark/python/pyspark/context.py", line 115, in __init__ > conf, jsc, profiler_cls) > File "/home/mbrynski/spark/python/pyspark/context.py", line 172, in _do_init > self._jsc = jsc or self._initialize_context(self._conf._jconf) > File "/home/mbrynski/spark/python/pyspark/context.py", line 235, in > _initialize_context > return self._jvm.JavaSparkContext(jconf) > File > "/home/mbrynski/spark/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", > line 1183, in __call__ > File > "/home/mbrynski/spark/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line > 312, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling > None.org.apache.spark.api.java.JavaSparkContext. > : java.lang.NoClassDefFoundError: > com/sun/jersey/api/client/config/ClientConfig > at > org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:45) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:163) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:150) > at > org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56) > at > org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:148) > at org.apache.spark.SparkContext.<init>(SparkContext.scala:502) > at > org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) > at py4j.Gateway.invoke(Gateway.java:236) > at > py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) > at > py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) > at py4j.GatewayConnection.run(GatewayConnection.java:211) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.ClassNotFoundException: > com.sun.jersey.api.client.config.ClientConfig > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 19 more > {code} > On 1.6 everything works fine. I'm using HDP2.2 (Hadoop 2.6.0) > I have HADOOP_CONF_DIR and SPARK_HOME env variables. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org