Apache Phoenix (4.3.1 and 4.4.0-HBase-0.98) on Spark 1.3.1 ClassNotFoundException
Hi, I posted a question with regards to Phoenix and Spark Streaming on StackOverflow [1]. Please find a copy of the question to this email below the first stack trace. I also already contacted the Phoenix mailing list and tried the suggestion of setting spark.driver.userClassPathFirst. Unfortunately that only pushed me further into the dependency hell, which I tried to resolve until I hit a wall with an UnsatisfiedLinkError on Snappy. What I am trying to achieve: To save a stream from Kafka into Phoenix/Hbase via Spark Streaming. I'm using MapR as a platform and the original exception happens both on a 3-node cluster, as on the MapR Sandbox (a VM for experimentation), in YARN and stand-alone mode. Further experimentation (like the saveAsNewHadoopApiFile below), was done only on the sandbox in standalone mode. Phoenix only supports Spark from 4.4.0 onwards, but I thought I could use a naive implementation that creates a new connection for every RDD from the DStream in 4.3.1. This resulted in the ClassNotFoundException described in [1], so I switched to 4.4.0. Unfortunately the saveToPhoenix method is only available in Scala. So I did find the suggestion to try it via the saveAsNewHadoopApiFile method [2] and an example implementation [3], which I adapted to my own needs. However, 4.4.0 + saveAsNewHadoopApiFile raises the same ClassNotFoundExeption, just a slightly different stacktrace: java.lang.RuntimeException: java.sql.SQLException: ERROR 103 (08004): Unable to establish connection. at org.apache.phoenix.mapreduce.PhoenixOutputFormat.getRecordWriter(PhoenixOutputFormat.java:58) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:995) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:979) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.sql.SQLException: ERROR 103 (08004): Unable to establish connection. at org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:386) at org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145) at org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(ConnectionQueryServicesImpl.java:288) at org.apache.phoenix.query.ConnectionQueryServicesImpl.access$300(ConnectionQueryServicesImpl.java:171) at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1881) at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1860) at org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:77) at org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:1860) at org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:162) at org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.connect(PhoenixEmbeddedDriver.java:131) at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:133) at java.sql.DriverManager.getConnection(DriverManager.java:571) at java.sql.DriverManager.getConnection(DriverManager.java:187) at org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(ConnectionUtil.java:92) at org.apache.phoenix.mapreduce.util.ConnectionUtil.getOutputConnection(ConnectionUtil.java:80) at org.apache.phoenix.mapreduce.util.ConnectionUtil.getOutputConnection(ConnectionUtil.java:68) at org.apache.phoenix.mapreduce.PhoenixRecordWriter.init(PhoenixRecordWriter.java:49) at org.apache.phoenix.mapreduce.PhoenixOutputFormat.getRecordWriter(PhoenixOutputFormat.java:55) ... 8 more Caused by: java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:457) at org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:350) at org.apache.phoenix.query.HConnectionFactory$HConnectionFactoryImpl.createConnection(HConnectionFactory.java:47) at org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(ConnectionQueryServicesImpl.java:286) ... 23 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java
Re: Apache Phoenix (4.3.1 and 4.4.0-HBase-0.98) on Spark 1.3.1 ClassNotFoundException
This may or may not be helpful for your classpath issues, but I wanted to verify that basic functionality worked, so I made a sample app here: https://github.com/jmahonin/spark-streaming-phoenix This consumes events off a Kafka topic using spark streaming, and writes out event counts to Phoenix using the new phoenix-spark functionality: http://phoenix.apache.org/phoenix_spark.html It's definitely overkill, and would probably be more efficient to use the JDBC driver directly, but it serves as a proof-of-concept. I've only tested this in local mode. To convert it to a full jobs JAR, I suspect that keeping all of the spark and phoenix dependencies marked as 'provided', and including the Phoenix client JAR in the Spark classpath would work as well. Good luck, Josh On Tue, Jun 9, 2015 at 4:40 AM, Jeroen Vlek j.v...@anchormen.nl wrote: Hi, I posted a question with regards to Phoenix and Spark Streaming on StackOverflow [1]. Please find a copy of the question to this email below the first stack trace. I also already contacted the Phoenix mailing list and tried the suggestion of setting spark.driver.userClassPathFirst. Unfortunately that only pushed me further into the dependency hell, which I tried to resolve until I hit a wall with an UnsatisfiedLinkError on Snappy. What I am trying to achieve: To save a stream from Kafka into Phoenix/Hbase via Spark Streaming. I'm using MapR as a platform and the original exception happens both on a 3-node cluster, as on the MapR Sandbox (a VM for experimentation), in YARN and stand-alone mode. Further experimentation (like the saveAsNewHadoopApiFile below), was done only on the sandbox in standalone mode. Phoenix only supports Spark from 4.4.0 onwards, but I thought I could use a naive implementation that creates a new connection for every RDD from the DStream in 4.3.1. This resulted in the ClassNotFoundException described in [1], so I switched to 4.4.0. Unfortunately the saveToPhoenix method is only available in Scala. So I did find the suggestion to try it via the saveAsNewHadoopApiFile method [2] and an example implementation [3], which I adapted to my own needs. However, 4.4.0 + saveAsNewHadoopApiFile raises the same ClassNotFoundExeption, just a slightly different stacktrace: java.lang.RuntimeException: java.sql.SQLException: ERROR 103 (08004): Unable to establish connection. at org.apache.phoenix.mapreduce.PhoenixOutputFormat.getRecordWriter(PhoenixOutputFormat.java:58) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:995) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:979) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.sql.SQLException: ERROR 103 (08004): Unable to establish connection. at org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:386) at org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145) at org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(ConnectionQueryServicesImpl.java:288) at org.apache.phoenix.query.ConnectionQueryServicesImpl.access$300(ConnectionQueryServicesImpl.java:171) at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1881) at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1860) at org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:77) at org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:1860) at org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:162) at org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.connect(PhoenixEmbeddedDriver.java:131) at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:133) at java.sql.DriverManager.getConnection(DriverManager.java:571) at java.sql.DriverManager.getConnection(DriverManager.java:187) at org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(ConnectionUtil.java:92) at org.apache.phoenix.mapreduce.util.ConnectionUtil.getOutputConnection(ConnectionUtil.java:80) at org.apache.phoenix.mapreduce.util.ConnectionUtil.getOutputConnection(ConnectionUtil.java:68) at org.apache.phoenix.mapreduce.PhoenixRecordWriter.init(PhoenixRecordWriter.java:49
Re: ClassNotFoundException for Kryo serialization
Now I am running up against some other problem while trying to schedule tasks: 15/05/01 22:32:03 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2419) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1380) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:180) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) I verified that the same configuration works without using Kryo serialization. On Fri, May 1, 2015 at 9:44 AM, Akshat Aranya aara...@gmail.com wrote: I cherry-picked the fix for SPARK-5470 and the problem has gone away. On Fri, May 1, 2015 at 9:15 AM, Akshat Aranya aara...@gmail.com wrote: Yes, this class is present in the jar that was loaded in the classpath of the executor Java process -- it wasn't even lazily added as a part of the task execution. Schema$MyRow is a protobuf-generated class. After doing some digging around, I think I might be hitting up against SPARK-5470, the fix for which hasn't been merged into 1.2, as far as I can tell. On Fri, May 1, 2015 at 9:05 AM, Ted Yu yuzhih...@gmail.com wrote: bq. Caused by: java.lang.ClassNotFoundException: com.example.Schema$MyRow So the above class is in the jar which was in the classpath ? Can you tell us a bit more about Schema$MyRow ? On Fri, May 1, 2015 at 8:05 AM, Akshat Aranya aara...@gmail.com wrote: Hi, I'm getting a ClassNotFoundException at the executor when trying to register a class for Kryo serialization: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:243) at org.apache.spark.SparkEnv$.instantiateClassFromConf$1(SparkEnv.scala:254) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:257) at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:182) at org.apache.spark.executor.Executor.init(Executor.scala:87) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receiveWithLogging$1.applyOrElse(CoarseGrainedExecutorBackend.scala:61) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:53) at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.spark.executor.CoarseGrainedExecutorBackend.aroundReceive(CoarseGrainedExecutorBackend.scala:36) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: org.apache.spark.SparkException: Failed to load class to register with Kryo at org.apache.spark.serializer.KryoSerializer$$anonfun$2
ClassNotFoundException for Kryo serialization
Hi, I'm getting a ClassNotFoundException at the executor when trying to register a class for Kryo serialization: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:243) at org.apache.spark.SparkEnv$.instantiateClassFromConf$1(SparkEnv.scala:254) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:257) at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:182) at org.apache.spark.executor.Executor.init(Executor.scala:87) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receiveWithLogging$1.applyOrElse(CoarseGrainedExecutorBackend.scala:61) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:53) at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.spark.executor.CoarseGrainedExecutorBackend.aroundReceive(CoarseGrainedExecutorBackend.scala:36) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: org.apache.spark.SparkException: Failed to load class to register with Kryo at org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:66) at org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:61) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) at org.apache.spark.serializer.KryoSerializer.init(KryoSerializer.scala:61) ... 28 more Caused by: java.lang.ClassNotFoundException: com.example.Schema$MyRow at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:190) at org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:63) I have verified that when the executor process is launched, my jar is in the classpath of the command line of the executor. I expect the class to be found by the default classloader being used at KryoSerializer.scala:63 Any ideas?
Re: ClassNotFoundException for Kryo serialization
bq. Caused by: java.lang.ClassNotFoundException: com.example.Schema$MyRow So the above class is in the jar which was in the classpath ? Can you tell us a bit more about Schema$MyRow ? On Fri, May 1, 2015 at 8:05 AM, Akshat Aranya aara...@gmail.com wrote: Hi, I'm getting a ClassNotFoundException at the executor when trying to register a class for Kryo serialization: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:243) at org.apache.spark.SparkEnv$.instantiateClassFromConf$1(SparkEnv.scala:254) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:257) at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:182) at org.apache.spark.executor.Executor.init(Executor.scala:87) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receiveWithLogging$1.applyOrElse(CoarseGrainedExecutorBackend.scala:61) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:53) at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.spark.executor.CoarseGrainedExecutorBackend.aroundReceive(CoarseGrainedExecutorBackend.scala:36) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: org.apache.spark.SparkException: Failed to load class to register with Kryo at org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:66) at org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:61) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) at org.apache.spark.serializer.KryoSerializer.init(KryoSerializer.scala:61) ... 28 more Caused by: java.lang.ClassNotFoundException: com.example.Schema$MyRow at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:190) at org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:63) I have verified that when the executor process is launched, my jar is in the classpath of the command line of the executor. I expect the class to be found by the default classloader being used at KryoSerializer.scala:63 Any ideas?
Re: ClassNotFoundException for Kryo serialization
Yes, this class is present in the jar that was loaded in the classpath of the executor Java process -- it wasn't even lazily added as a part of the task execution. Schema$MyRow is a protobuf-generated class. After doing some digging around, I think I might be hitting up against SPARK-5470, the fix for which hasn't been merged into 1.2, as far as I can tell. On Fri, May 1, 2015 at 9:05 AM, Ted Yu yuzhih...@gmail.com wrote: bq. Caused by: java.lang.ClassNotFoundException: com.example.Schema$MyRow So the above class is in the jar which was in the classpath ? Can you tell us a bit more about Schema$MyRow ? On Fri, May 1, 2015 at 8:05 AM, Akshat Aranya aara...@gmail.com wrote: Hi, I'm getting a ClassNotFoundException at the executor when trying to register a class for Kryo serialization: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:243) at org.apache.spark.SparkEnv$.instantiateClassFromConf$1(SparkEnv.scala:254) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:257) at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:182) at org.apache.spark.executor.Executor.init(Executor.scala:87) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receiveWithLogging$1.applyOrElse(CoarseGrainedExecutorBackend.scala:61) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:53) at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.spark.executor.CoarseGrainedExecutorBackend.aroundReceive(CoarseGrainedExecutorBackend.scala:36) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: org.apache.spark.SparkException: Failed to load class to register with Kryo at org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:66) at org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:61) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) at org.apache.spark.serializer.KryoSerializer.init(KryoSerializer.scala:61) ... 28 more Caused by: java.lang.ClassNotFoundException: com.example.Schema$MyRow at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:190) at org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:63) I have verified that when the executor process is launched, my jar is in the classpath
Re: ClassNotFoundException for Kryo serialization
I cherry-picked the fix for SPARK-5470 and the problem has gone away. On Fri, May 1, 2015 at 9:15 AM, Akshat Aranya aara...@gmail.com wrote: Yes, this class is present in the jar that was loaded in the classpath of the executor Java process -- it wasn't even lazily added as a part of the task execution. Schema$MyRow is a protobuf-generated class. After doing some digging around, I think I might be hitting up against SPARK-5470, the fix for which hasn't been merged into 1.2, as far as I can tell. On Fri, May 1, 2015 at 9:05 AM, Ted Yu yuzhih...@gmail.com wrote: bq. Caused by: java.lang.ClassNotFoundException: com.example.Schema$MyRow So the above class is in the jar which was in the classpath ? Can you tell us a bit more about Schema$MyRow ? On Fri, May 1, 2015 at 8:05 AM, Akshat Aranya aara...@gmail.com wrote: Hi, I'm getting a ClassNotFoundException at the executor when trying to register a class for Kryo serialization: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:243) at org.apache.spark.SparkEnv$.instantiateClassFromConf$1(SparkEnv.scala:254) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:257) at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:182) at org.apache.spark.executor.Executor.init(Executor.scala:87) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receiveWithLogging$1.applyOrElse(CoarseGrainedExecutorBackend.scala:61) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:53) at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.spark.executor.CoarseGrainedExecutorBackend.aroundReceive(CoarseGrainedExecutorBackend.scala:36) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: org.apache.spark.SparkException: Failed to load class to register with Kryo at org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:66) at org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:61) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) at org.apache.spark.serializer.KryoSerializer.init(KryoSerializer.scala:61) ... 28 more Caused by: java.lang.ClassNotFoundException: com.example.Schema$MyRow at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:190
Re: Spark 1.3 UDF ClassNotFoundException
My apologizes. I was running this locally and the JAR I was building using Intellij had some issues. This was not related to UDFs. All works fine now. On Thu, Apr 2, 2015 at 2:58 PM, Ted Yu yuzhih...@gmail.com wrote: Can you show more code in CreateMasterData ? How do you run your code ? Thanks On Thu, Apr 2, 2015 at 11:06 AM, ganterm gant...@gmail.com wrote: Hello, I started to use the dataframe API in Spark 1.3 with Scala. I am trying to implement a UDF and am following the sample here: https://spark.apache.org/docs/1.3.0/api/scala/index.html#org.apache.spark.sql.UserDefinedFunction meaning val predict = udf((score: Double) = if (score 0.5) true else false) df.select( predict(df(score)) ) All compiles just fine but when I run it, I get a ClassNotFoundException (see more details below) I am sure that I load the data correctly and that I have a field called score with the correct data type. Do I need to do anything else like registering the function? Thanks! Markus Exception in thread main org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 6.0 failed 4 times, most recent failure: Lost task 0.3 in stage 6.0 (TID 11, BillSmithPC): java.lang.ClassNotFoundException: test.CreateMasterData$$anonfun$1 at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:65) ... -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-3-UDF-ClassNotFoundException-tp22361.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark 1.3 UDF ClassNotFoundException
Hello, I started to use the dataframe API in Spark 1.3 with Scala. I am trying to implement a UDF and am following the sample here: https://spark.apache.org/docs/1.3.0/api/scala/index.html#org.apache.spark.sql.UserDefinedFunction meaning val predict = udf((score: Double) = if (score 0.5) true else false) df.select( predict(df(score)) ) All compiles just fine but when I run it, I get a ClassNotFoundException (see more details below) I am sure that I load the data correctly and that I have a field called score with the correct data type. Do I need to do anything else like registering the function? Thanks! Markus Exception in thread main org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 6.0 failed 4 times, most recent failure: Lost task 0.3 in stage 6.0 (TID 11, BillSmithPC): java.lang.ClassNotFoundException: test.CreateMasterData$$anonfun$1 at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:65) ... -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-3-UDF-ClassNotFoundException-tp22361.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: ClassNotFoundException
Hi Kevin, yes I can test it means I have to build Spark from git repository? Ralph Am 17.03.15 um 02:59 schrieb Kevin (Sangwoo) Kim: Hi Ralph, It seems like https://issues.apache.org/jira/browse/SPARK-6299 issue, which is I'm working on. I submitted a PR for it, would you test it? Regards, Kevin -- Ralph Bergmann www http://www.dasralph.de | http://www.the4thFloor.eu mail ra...@dasralph.de skypedasralph facebook https://www.facebook.com/dasralph google+ https://plus.google.com/+RalphBergmann xing https://www.xing.com/profile/Ralph_Bergmann3 linkedin https://www.linkedin.com/in/ralphbergmann gulp https://www.gulp.de/Profil/RalphBergmann.html github https://github.com/the4thfloor pgp key id 0x421F9B78 pgp fingerprint CEE3 7AE9 07BE 98DF CD5A E69C F131 4A8E 421F 9B78 - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: ClassNotFoundException
Hi Ralph, It seems like https://issues.apache.org/jira/browse/SPARK-6299 issue, which is I'm working on. I submitted a PR for it, would you test it? Regards, Kevin On Tue, Mar 17, 2015 at 1:11 AM Ralph Bergmann ra...@dasralph.de wrote: Hi, I want to try the JavaSparkPi example[1] on a remote Spark server but I get a ClassNotFoundException. When I run it local it works but not remote. I added the spark-core lib as dependency. Do I need more? Any ideas? Thanks Ralph [1] ... https://github.com/apache/spark/blob/master/examples/ src/main/java/org/apache/spark/examples/JavaSparkPi.java - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
ClassNotFoundException
Hi, I want to try the JavaSparkPi example[1] on a remote Spark server but I get a ClassNotFoundException. When I run it local it works but not remote. I added the spark-core lib as dependency. Do I need more? Any ideas? Thanks Ralph [1] ... https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/JavaSparkPi.java Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 15/03/16 17:02:45 INFO CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT] 2015-03-16 17:02:45.624 java[5730:1133038] Unable to load realm info from SCDynamicStore 15/03/16 17:02:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/03/16 17:02:45 INFO SecurityManager: Changing view acls to: dasralph 15/03/16 17:02:45 INFO SecurityManager: Changing modify acls to: dasralph 15/03/16 17:02:45 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(dasralph); users with modify permissions: Set(dasralph) 15/03/16 17:02:46 INFO Slf4jLogger: Slf4jLogger started 15/03/16 17:02:46 INFO Remoting: Starting remoting 15/03/16 17:02:46 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://driverPropsFetcher@10.0.0.10:54973] 15/03/16 17:02:46 INFO Utils: Successfully started service 'driverPropsFetcher' on port 54973. 15/03/16 17:02:46 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 15/03/16 17:02:46 INFO SecurityManager: Changing view acls to: dasralph 15/03/16 17:02:46 INFO SecurityManager: Changing modify acls to: dasralph 15/03/16 17:02:46 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. 15/03/16 17:02:46 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(dasralph); users with modify permissions: Set(dasralph) 15/03/16 17:02:46 INFO Slf4jLogger: Slf4jLogger started 15/03/16 17:02:46 INFO Remoting: Starting remoting 15/03/16 17:02:46 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down. 15/03/16 17:02:46 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkExecutor@10.0.0.10:54977] 15/03/16 17:02:46 INFO Utils: Successfully started service 'sparkExecutor' on port 54977. 15/03/16 17:02:46 INFO AkkaUtils: Connecting to MapOutputTracker: akka.tcp://sparkDriver@10.0.0.10:54945/user/MapOutputTracker 15/03/16 17:02:46 INFO AkkaUtils: Connecting to BlockManagerMaster: akka.tcp://sparkDriver@10.0.0.10:54945/user/BlockManagerMaster 15/03/16 17:02:46 INFO DiskBlockManager: Created local directory at /var/folders/5p/s1k2jrqx38ncxkm4wlflgfvwgn/T/spark-185c8652-0244-42ff-90b4-fd9c7dbde7b3/spark-8a9ba955-de6f-4ab1-8995-1474ac1ba3a9/spark-2533211f-c399-467f-851d-6b9ec89defdc/blockmgr-82e98f66-d592-42a7-a4fa-1ab78780814b 15/03/16 17:02:46 INFO MemoryStore: MemoryStore started with capacity 265.4 MB 15/03/16 17:02:46 INFO AkkaUtils: Connecting to OutputCommitCoordinator: akka.tcp://sparkDriver@10.0.0.10:54945/user/OutputCommitCoordinator 15/03/16 17:02:46 INFO CoarseGrainedExecutorBackend: Connecting to driver: akka.tcp://sparkDriver@10.0.0.10:54945/user/CoarseGrainedScheduler 15/03/16 17:02:46 INFO WorkerWatcher: Connecting to worker akka.tcp://sparkWorker@10.0.0.10:58715/user/Worker 15/03/16 17:02:46 INFO WorkerWatcher: Successfully connected to akka.tcp://sparkWorker@10.0.0.10:58715/user/Worker 15/03/16 17:02:46 INFO CoarseGrainedExecutorBackend: Successfully registered with driver 15/03/16 17:02:46 INFO Executor: Starting executor ID 0 on host 10.0.0.10 15/03/16 17:02:47 INFO NettyBlockTransferService: Server created on 54983 15/03/16 17:02:47 INFO BlockManagerMaster: Trying to register BlockManager 15/03/16 17:02:47 INFO BlockManagerMaster: Registered BlockManager 15/03/16 17:02:47 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@10.0.0.10:54945/user/HeartbeatReceiver 15/03/16 17:02:47 INFO CoarseGrainedExecutorBackend: Got assigned task 0 15/03/16 17:02:47 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 15/03/16 17:02:47 INFO CoarseGrainedExecutorBackend: Got assigned task 1 15/03/16 17:02:47 INFO Executor: Running task 1.0 in stage 0.0 (TID 1) 15/03/16 17:02:47 INFO TorrentBroadcast: Started reading broadcast variable 0 15/03/16 17:02:47 INFO MemoryStore: ensureFreeSpace(1679) called with curMem=0, maxMem=278302556 15/03/16 17:02:47 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1679.0 B, free 265.4 MB) 15/03/16 17:02:47 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0 15/03/16 17:02:47 INFO TorrentBroadcast: Reading broadcast variable 0 took 188 ms 15/03/16 17:02:47 INFO MemoryStore: ensureFreeSpace(2312) called with curMem=1679, maxMem=278302556 15/03/16 17:02:47 INFO MemoryStore: Block broadcast_0 stored as values
Spark 1.2.1: ClassNotFoundException when running hello world example in scala 2.11
I'm having an issue with spark 1.2.1 and scala 2.11. I detailed the symptoms in this stackoverflow question. http://stackoverflow.com/questions/28612837/spark-classnotfoundexception-when-running-hello-world-example-in-scala-2-11 Has anyone experienced anything similar? Thank you!
Re: Spark 1.2.1: ClassNotFoundException when running hello world example in scala 2.11
Can you downgrade your scala dependency to 2.10 and give it a try? Thanks Best Regards On Fri, Feb 20, 2015 at 12:40 AM, Luis Solano l...@pixable.com wrote: I'm having an issue with spark 1.2.1 and scala 2.11. I detailed the symptoms in this stackoverflow question. http://stackoverflow.com/questions/28612837/spark-classnotfoundexception-when-running-hello-world-example-in-scala-2-11 Has anyone experienced anything similar? Thank you!
Re: Spark Master Build Failing to run on cluster in standalone ClassNotFoundException: javax.servlet.FilterRegistration
Already come up several times today: https://issues.apache.org/jira/browse/SPARK-5557 On Tue, Feb 3, 2015 at 8:04 AM, Night Wolf nightwolf...@gmail.com wrote: Hi, I just built Spark 1.3 master using maven via make-distribution.sh; ./make-distribution.sh --name mapr3 --skip-java-test --tgz -Pmapr3 -Phive -Phive-thriftserver -Phive-0.12.0 When trying to start the standalone spark master on a cluster I get the following stack trace; 15/02/04 08:53:56 INFO slf4j.Slf4jLogger: Slf4jLogger started 15/02/04 08:53:56 INFO Remoting: Starting remoting 15/02/04 08:53:56 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkMaster@hadoop-009:7077] 15/02/04 08:53:56 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkMaster@hadoop-009:7077] ...skipping... at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at akka.util.Reflect$.instantiate(Reflect.scala:66) at akka.actor.ArgsReflectConstructor.produce(Props.scala:352) at akka.actor.Props.newActor(Props.scala:252) at akka.actor.ActorCell.newActor(ActorCell.scala:552) at akka.actor.ActorCell.create(ActorCell.scala:578) ... 9 more Caused by: java.lang.NoClassDefFoundError: javax/servlet/FilterRegistration at org.spark-project.jetty.servlet.ServletContextHandler.init(ServletContextHandler.java:136) at org.spark-project.jetty.servlet.ServletContextHandler.init(ServletContextHandler.java:129) at org.spark-project.jetty.servlet.ServletContextHandler.init(ServletContextHandler.java:98) at org.apache.spark.ui.JettyUtils$.createServletHandler(JettyUtils.scala:96) at org.apache.spark.ui.JettyUtils$.createServletHandler(JettyUtils.scala:87) at org.apache.spark.ui.WebUI.attachPage(WebUI.scala:67) at org.apache.spark.deploy.master.ui.MasterWebUI.initialize(MasterWebUI.scala:40) at org.apache.spark.deploy.master.ui.MasterWebUI.init(MasterWebUI.scala:36) at org.apache.spark.deploy.master.Master.init(Master.scala:95) ... 18 more Caused by: java.lang.ClassNotFoundException: javax.servlet.FilterRegistration at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 27 more The distro seems about the right size (260MB, so I dont imagine any of the libraries are missing. The above command worked on 1.2... Any ideas whats going wrong? Cheers, N - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark Master Build Failing to run on cluster in standalone ClassNotFoundException: javax.servlet.FilterRegistration
Hi, I just built Spark 1.3 master using maven via make-distribution.sh; ./make-distribution.sh --name mapr3 --skip-java-test --tgz -Pmapr3 -Phive -Phive-thriftserver -Phive-0.12.0 When trying to start the standalone spark master on a cluster I get the following stack trace; 15/02/04 08:53:56 INFO slf4j.Slf4jLogger: Slf4jLogger started 15/02/04 08:53:56 INFO Remoting: Starting remoting 15/02/04 08:53:56 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkMaster@hadoop-009:7077] 15/02/04 08:53:56 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkMaster@hadoop-009:7077] ...skipping... at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at akka.util.Reflect$.instantiate(Reflect.scala:66) at akka.actor.ArgsReflectConstructor.produce(Props.scala:352) at akka.actor.Props.newActor(Props.scala:252) at akka.actor.ActorCell.newActor(ActorCell.scala:552) at akka.actor.ActorCell.create(ActorCell.scala:578) ... 9 more Caused by: java.lang.NoClassDefFoundError: javax/servlet/FilterRegistration at org.spark-project.jetty.servlet.ServletContextHandler.init(ServletContextHandler.java:136) at org.spark-project.jetty.servlet.ServletContextHandler.init(ServletContextHandler.java:129) at org.spark-project.jetty.servlet.ServletContextHandler.init(ServletContextHandler.java:98) at org.apache.spark.ui.JettyUtils$.createServletHandler(JettyUtils.scala:96) at org.apache.spark.ui.JettyUtils$.createServletHandler(JettyUtils.scala:87) at org.apache.spark.ui.WebUI.attachPage(WebUI.scala:67) at org.apache.spark.deploy.master.ui.MasterWebUI.initialize(MasterWebUI.scala:40) at org.apache.spark.deploy.master.ui.MasterWebUI.init(MasterWebUI.scala:36) at org.apache.spark.deploy.master.Master.init(Master.scala:95) ... 18 more Caused by: java.lang.ClassNotFoundException: javax.servlet.FilterRegistration at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 27 more The distro seems about the right size (260MB, so I dont imagine any of the libraries are missing. The above command worked on 1.2... Any ideas whats going wrong? Cheers, N
ClassNotFoundException when registering classes with Kryo
Here is the relevant snippet of code in my main program: === sparkConf.set(spark.serializer, org.apache.spark.serializer.KryoSerializer) sparkConf.set(spark.kryo.registrationRequired, true) val summaryDataClass = classOf[SummaryData] val summaryViewClass = classOf[SummaryView] sparkConf.registerKryoClasses(Array( summaryDataClass, summaryViewClass)) === I get the following error: Exception in thread main java.lang.reflect.InvocationTargetException ... Caused by: org.apache.spark.SparkException: Failed to load class to register with Kryo ... Caused by: java.lang.ClassNotFoundException: com.dtex.analysis.transform.SummaryData Note that the class in question SummaryData is in the same package as the main program and hence in the same jar. What do I need to do to make this work? Thanks, arun
Re: ClassNotFoundException when registering classes with Kryo
Thanks for the notification! For now, I'll use the Kryo serializer without registering classes until the bug fix has been merged into the next version of Spark (I guess that will be 1.3, right?). arun On Sun, Feb 1, 2015 at 10:58 PM, Shixiong Zhu zsxw...@gmail.com wrote: It's a bug that has been fixed in https://github.com/apache/spark/pull/4258 but not yet been merged. Best Regards, Shixiong Zhu 2015-02-02 10:08 GMT+08:00 Arun Lists lists.a...@gmail.com: Here is the relevant snippet of code in my main program: === sparkConf.set(spark.serializer, org.apache.spark.serializer.KryoSerializer) sparkConf.set(spark.kryo.registrationRequired, true) val summaryDataClass = classOf[SummaryData] val summaryViewClass = classOf[SummaryView] sparkConf.registerKryoClasses(Array( summaryDataClass, summaryViewClass)) === I get the following error: Exception in thread main java.lang.reflect.InvocationTargetException ... Caused by: org.apache.spark.SparkException: Failed to load class to register with Kryo ... Caused by: java.lang.ClassNotFoundException: com.dtex.analysis.transform.SummaryData Note that the class in question SummaryData is in the same package as the main program and hence in the same jar. What do I need to do to make this work? Thanks, arun
Re: ClassNotFoundException when registering classes with Kryo
It's a bug that has been fixed in https://github.com/apache/spark/pull/4258 but not yet been merged. Best Regards, Shixiong Zhu 2015-02-02 10:08 GMT+08:00 Arun Lists lists.a...@gmail.com: Here is the relevant snippet of code in my main program: === sparkConf.set(spark.serializer, org.apache.spark.serializer.KryoSerializer) sparkConf.set(spark.kryo.registrationRequired, true) val summaryDataClass = classOf[SummaryData] val summaryViewClass = classOf[SummaryView] sparkConf.registerKryoClasses(Array( summaryDataClass, summaryViewClass)) === I get the following error: Exception in thread main java.lang.reflect.InvocationTargetException ... Caused by: org.apache.spark.SparkException: Failed to load class to register with Kryo ... Caused by: java.lang.ClassNotFoundException: com.dtex.analysis.transform.SummaryData Note that the class in question SummaryData is in the same package as the main program and hence in the same jar. What do I need to do to make this work? Thanks, arun
Spark SQL - Unable to use Hive UDF because of ClassNotFoundException
) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at scala.collection.immutable.$colon$colon.readObject(List.scala:362) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.exec.UDF at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) First I investigated all the jar and classpath options with the spark-submit command: no luck. Then I tried to instantiate the UDF and UDFFromUnixTime classes using the environment that is set up for the executor. To do that, after letting the spark app fail, I went to one of the container's directory /mnt/sda/yarn/nm/usercache/altaia/appcache/application_1422464005963_0047/container_1422464005963_0047_01_04/ (for example) and from there I can see these files: container_tokens GeneralTest20140929-1.0.0-SNAPSHOT.jar === this is my uber-jar launch_container.sh __spark__.jar tmp Opening the launch_container.sh I checked all the classpath setup and to test if that classpath was correct when launching the JVM I replaced the org.apache.spark.executor.CoarseGrainedExecutorBackend class with a class of mine whose job is to print the classpath and instantiate, by reflection, the UDF and UDFFromUnixTime and all went well. I already tested having all the dependencies' jars in one directory on all hosts and adding that to the spark.executor.extraClassPath and spark.driver.extraClassPath: no luck either. At this stage I just think that the uber-jar and classpath are OK. I have no more clues of what can be happening. Maybe some classloader issue with Spark SQL? The ClassNotFoundException occurs when returning data back to the driver (because of the ResultTask seen in the stacktrace). Does anyone had such a similar issue? Regards
Re: Spark SQL - Unable to use Hive UDF because of ClassNotFoundException
the classpath setup and to test if that classpath was correct when launching the JVM I replaced the org.apache.spark.executor.CoarseGrainedExecutorBackend class with a class of mine whose job is to print the classpath and instantiate, by reflection, the UDF and UDFFromUnixTime and all went well. I already tested having all the dependencies' jars in one directory on all hosts and adding that to the spark.executor.extraClassPath and spark.driver.extraClassPath: no luck either. At this stage I just think that the uber-jar and classpath are OK. I have no more clues of what can be happening. Maybe some classloader issue with Spark SQL? The ClassNotFoundException occurs when returning data back to the driver (because of the ResultTask seen in the stacktrace). Does anyone had such a similar issue? Regards. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Unable-to-use-Hive-UDF-because-of-ClassNotFoundException-tp21443.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Marcelo - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: ClassNotFoundException in standalone mode
I finally managed to get the example working, here are the details that may help other users. I have 2 windows nodes for the test system, PN01 and PN02. Both have the same shared drive S: (it is mapped to C:\source on PN02). If I run the worker and master from S:\spark-1.1.0-bin-hadoop2.4, then running simple test fails on the ClassNotFoundException (even if there is only one node which hosts both the master and the worker). If I run the workers and masters from the local drive (c:\source\spark-1.1.0-bin-hadoop2.4), then the simple test runs ok (with one or two nodes) I haven’t found why the class fails to load with the shared drive (I checked the permissions and they look ok) but at least the cluster is working now. If anyone has experience getting Spark with windows shared drive, any advice welcome ! Thanks, Benoit. PS: Yes thanks Angel, I did check that s:\spark\simple%JAVA_HOME%\bin\jar tvf s:\spark\simple\target\scala-2.10\simple-project_2.10-1.0.jar 299 Thu Nov 20 17:29:40 GMT 2014 META-INF/MANIFEST.MF 1070 Thu Nov 20 17:29:40 GMT 2014 SimpleApp$$anonfun$2.class 1350 Thu Nov 20 17:29:40 GMT 2014 SimpleApp$$anonfun$main$1.class 2581 Thu Nov 20 17:29:40 GMT 2014 SimpleApp$.class 1070 Thu Nov 20 17:29:40 GMT 2014 SimpleApp$$anonfun$1.class 710 Thu Nov 20 17:29:40 GMT 2014 SimpleApp.class From: angel2014 [mailto:angel.alvarez.pas...@gmail.com] Sent: Friday, November 21, 2014 3:16 AM To: u...@spark.incubator.apache.org Subject: Re: ClassNotFoundException in standalone mode Can you make sure the class SimpleApp$$anonfun$1 is included in your app jar? 2014-11-20 18:19 GMT+01:00 Benoit Pasquereau [via Apache Spark User List] [hidden email]/user/SendEmail.jtp?type=nodenode=19443i=0: Hi Guys, I’m having an issue in standalone mode (Spark 1.1, Hadoop 2.4, Windows Server 2008). A very simple program runs fine in local mode but fails in standalone mode. Here is the error: 14/11/20 17:01:53 INFO DAGScheduler: Failed to run count at SimpleApp.scala:22 Exception in thread main org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, UK-RND-PN02.actixhost.euhttp://UK-RND-PN02.actixhost.eu): java.lang.ClassNotFoundException: SimpleApp$$anonfun$1 java.net.URLClassLoader$1.run(URLClassLoader.java:202) I have added the jar to the SparkConf() to be on the safe side and it appears in standard output (copied after the code): /* SimpleApp.scala */ import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf import java.net.URLClassLoader object SimpleApp { def main(args: Array[String]) { val logFile = S:\\spark-1.1.0-bin-hadoop2.4\\README.md val conf = new SparkConf()//.setJars(Seq(s:\\spark\\simple\\target\\scala-2.10\\simple-project_2.10-1.0.jar)) .setMaster(spark://UK-RND-PN02.actixhost.eu:7077http://UK-RND-PN02.actixhost.eu:7077) //.setMaster(local[4]) .setAppName(Simple Application) val sc = new SparkContext(conf) val cl = ClassLoader.getSystemClassLoader val urls = cl.asInstanceOf[URLClassLoader].getURLs urls.foreach(url = println(Executor classpath is: + url.getFile)) val logData = sc.textFile(logFile, 2).cache() val numAs = logData.filter(line = line.contains(a)).count() val numBs = logData.filter(line = line.contains(b)).count() println(Lines with a: %s, Lines with b: %s.format(numAs, numBs)) sc.stop() } } Simple-project is in the executor classpath list: 14/11/20 17:01:48 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 Executor classpath is:/S:/spark/simple/ Executor classpath is:/S:/spark/simple/target/scala-2.10/simple-project_2.10-1.0.jar Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/conf/ Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/lib/spark-assembly-1.1.0-hadoop2.4.0.jar Executor classpath is:/S:/spark/simple/ Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.1.jar Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-core-3.2.2.jar Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.1.jar Executor classpath is:/S:/spark/simple/ Would you have any idea how I could investigate further ? Thanks ! Benoit. PS: I could attach a debugger to the Worker where the ClassNotFoundException happens but it is a bit painful This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement, you may review at http://www.amdocs.com/email_disclaimer.asp If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-in-standalone-mode-tp19391.html To start a new topic under Apache Spark User List
ClassNotFoundException in standalone mode
Hi Guys, I'm having an issue in standalone mode (Spark 1.1, Hadoop 2.4, Windows Server 2008). A very simple program runs fine in local mode but fails in standalone mode. Here is the error: 14/11/20 17:01:53 INFO DAGScheduler: Failed to run count at SimpleApp.scala:22 Exception in thread main org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, UK-RND-PN02.actixhost.eu): java.lang.ClassNotFoundException: SimpleApp$$anonfun$1 java.net.URLClassLoader$1.run(URLClassLoader.java:202) I have added the jar to the SparkConf() to be on the safe side and it appears in standard output (copied after the code): /* SimpleApp.scala */ import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf import java.net.URLClassLoader object SimpleApp { def main(args: Array[String]) { val logFile = S:\\spark-1.1.0-bin-hadoop2.4\\README.md val conf = new SparkConf()//.setJars(Seq(s:\\spark\\simple\\target\\scala-2.10\\simple-project_2.10-1.0.jar)) .setMaster(spark://UK-RND-PN02.actixhost.eu:7077) //.setMaster(local[4]) .setAppName(Simple Application) val sc = new SparkContext(conf) val cl = ClassLoader.getSystemClassLoader val urls = cl.asInstanceOf[URLClassLoader].getURLs urls.foreach(url = println(Executor classpath is: + url.getFile)) val logData = sc.textFile(logFile, 2).cache() val numAs = logData.filter(line = line.contains(a)).count() val numBs = logData.filter(line = line.contains(b)).count() println(Lines with a: %s, Lines with b: %s.format(numAs, numBs)) sc.stop() } } Simple-project is in the executor classpath list: 14/11/20 17:01:48 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 Executor classpath is:/S:/spark/simple/ Executor classpath is:/S:/spark/simple/target/scala-2.10/simple-project_2.10-1.0.jar Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/conf/ Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/lib/spark-assembly-1.1.0-hadoop2.4.0.jar Executor classpath is:/S:/spark/simple/ Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.1.jar Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-core-3.2.2.jar Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.1.jar Executor classpath is:/S:/spark/simple/ Would you have any idea how I could investigate further ? Thanks ! Benoit. PS: I could attach a debugger to the Worker where the ClassNotFoundException happens but it is a bit painful This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement, you may review at http://www.amdocs.com/email_disclaimer.asp
Re: ClassNotFoundException in standalone mode
Can you make sure the class SimpleApp$$anonfun$1 is included in your app jar? 2014-11-20 18:19 GMT+01:00 Benoit Pasquereau [via Apache Spark User List] ml-node+s1001560n19391...@n3.nabble.com: Hi Guys, I’m having an issue in standalone mode (Spark 1.1, Hadoop 2.4, Windows Server 2008). A very simple program runs fine in local mode but fails in standalone mode. Here is the error: 14/11/20 17:01:53 INFO DAGScheduler: Failed to run count at SimpleApp.scala:22 Exception in thread main org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, UK-RND-PN02.actixhost.eu): java.lang.ClassNotFoundException: SimpleApp$$anonfun$1 java.net.URLClassLoader$1.run(URLClassLoader.java:202) I have added the jar to the SparkConf() to be on the safe side and it appears in standard output (copied after the code): /* SimpleApp.scala */ import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf import java.net.URLClassLoader object SimpleApp { def main(args: Array[String]) { val logFile = S:\\spark-1.1.0-bin-hadoop2.4\\README.md val conf = new SparkConf()//.setJars(Seq(s:\\spark\\simple\\target\\scala-2.10\\simple-project_2.10-1.0.jar)) .setMaster(spark://UK-RND-PN02.actixhost.eu:7077) //.setMaster(local[4]) .setAppName(Simple Application) val sc = new SparkContext(conf) val cl = ClassLoader.getSystemClassLoader val urls = cl.asInstanceOf[URLClassLoader].getURLs urls.foreach(url = println(Executor classpath is: + url.getFile)) val logData = sc.textFile(logFile, 2).cache() val numAs = logData.filter(line = line.contains(a)).count() val numBs = logData.filter(line = line.contains(b)).count() println(Lines with a: %s, Lines with b: %s.format(numAs, numBs)) sc.stop() } } Simple-project is in the executor classpath list: 14/11/20 17:01:48 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 Executor classpath is:/S:/spark/simple/ Executor classpath is: */S:/spark/simple/target/scala-2.10/simple-project_2.10-1.0.jar* Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/conf/ Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/lib/spark-assembly-1.1.0-hadoop2.4.0.jar Executor classpath is:/S:/spark/simple/ Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.1.jar Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-core-3.2.2.jar Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.1.jar Executor classpath is:/S:/spark/simple/ Would you have any idea how I could investigate further ? Thanks ! Benoit. PS: I could attach a debugger to the Worker where the ClassNotFoundException happens but it is a bit painful This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement, you may review at http://www.amdocs.com/email_disclaimer.asp -- If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-in-standalone-mode-tp19391.html To start a new topic under Apache Spark User List, email ml-node+s1001560n1...@n3.nabble.com To unsubscribe from Apache Spark User List, click here http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1code=YW5nZWwuYWx2YXJlei5wYXNjdWFAZ21haWwuY29tfDF8ODAzOTc5ODky . NAML http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-in-standalone-mode-tp19391p19443.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: ClassNotFoundException in standalone mode
Looks like it can not found class or jar in your Driver machine. Are you sure that the corresponding jar file exist in Driver machine rather than your develop machine? 2014-11-21 11:16 GMT+08:00 angel2014 angel.alvarez.pas...@gmail.com: Can you make sure the class SimpleApp$$anonfun$1 is included in your app jar? 2014-11-20 18:19 GMT+01:00 Benoit Pasquereau [via Apache Spark User List] [hidden email] http://user/SendEmail.jtp?type=nodenode=19443i=0: Hi Guys, I’m having an issue in standalone mode (Spark 1.1, Hadoop 2.4, Windows Server 2008). A very simple program runs fine in local mode but fails in standalone mode. Here is the error: 14/11/20 17:01:53 INFO DAGScheduler: Failed to run count at SimpleApp.scala:22 Exception in thread main org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, UK-RND-PN02.actixhost.eu): java.lang.ClassNotFoundException: SimpleApp$$anonfun$1 java.net.URLClassLoader$1.run(URLClassLoader.java:202) I have added the jar to the SparkConf() to be on the safe side and it appears in standard output (copied after the code): /* SimpleApp.scala */ import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf import java.net.URLClassLoader object SimpleApp { def main(args: Array[String]) { val logFile = S:\\spark-1.1.0-bin-hadoop2.4\\README.md val conf = new SparkConf()//.setJars(Seq(s:\\spark\\simple\\target\\scala-2.10\\simple-project_2.10-1.0.jar)) .setMaster(spark://UK-RND-PN02.actixhost.eu:7077) //.setMaster(local[4]) .setAppName(Simple Application) val sc = new SparkContext(conf) val cl = ClassLoader.getSystemClassLoader val urls = cl.asInstanceOf[URLClassLoader].getURLs urls.foreach(url = println(Executor classpath is: + url.getFile)) val logData = sc.textFile(logFile, 2).cache() val numAs = logData.filter(line = line.contains(a)).count() val numBs = logData.filter(line = line.contains(b)).count() println(Lines with a: %s, Lines with b: %s.format(numAs, numBs)) sc.stop() } } Simple-project is in the executor classpath list: 14/11/20 17:01:48 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 Executor classpath is:/S:/spark/simple/ Executor classpath is: */S:/spark/simple/target/scala-2.10/simple-project_2.10-1.0.jar* Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/conf/ Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/lib/spark-assembly-1.1.0-hadoop2.4.0.jar Executor classpath is:/S:/spark/simple/ Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.1.jar Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-core-3.2.2.jar Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.1.jar Executor classpath is:/S:/spark/simple/ Would you have any idea how I could investigate further ? Thanks ! Benoit. PS: I could attach a debugger to the Worker where the ClassNotFoundException happens but it is a bit painful This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement, you may review at http://www.amdocs.com/email_disclaimer.asp -- If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-in-standalone-mode-tp19391.html To start a new topic under Apache Spark User List, email [hidden email] http://user/SendEmail.jtp?type=nodenode=19443i=1 To unsubscribe from Apache Spark User List, click here. NAML http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: Re: ClassNotFoundException in standalone mode http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-in-standalone-mode-tp19391p19443.html Sent from the Apache Spark User List mailing list archive http://apache-spark-user-list.1001560.n3.nabble.com/ at Nabble.com.
Spark 1.1.0 ClassNotFoundException issue when submit with multi jars using CLUSTER MODE
HI I am using Spark 1.1.0 config with STANDALONE clusterManager and CLUSTER deployMode. The logic is I want to submit multi jars with spark-submit , using the �C-jars optional, I got an ClassNotFoundException ,  by the way in my code I also use thread context class loader to load custom class . Strange things is that when I use CLIENT deployMode. the exception is not throws. Can anyone explain the class loader logic of spark or the issue when using cluster mode ? /10/24 14:18:30 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.lang.RuntimeException: Cannot load class: cn.cekasp.al.demo.SimpleInputFormat3 at cn.cekasp.algorithm.util.ReflectUtil.findClass(ReflectUtil.java:12) at cn.cekasp.algorithm.util.ReflectUtil.newInstance(ReflectUtil.java:18) at cn.cekasp.algorithm.reader.JdbcSourceReader$1.call(JdbcSourceReader.java:95) at cn.cekasp.algorithm.reader.JdbcSourceReader$1.call(JdbcSourceReader.java:90) at org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaP airRDD.scala:923) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1167) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:904) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:904) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:112 1) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:112 1) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11 45) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6 15) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: cn.cekasp.al.demo.SimpleInputFormat3 at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at cn.cekasp.algorithm.util.ReflectUtil.findClass(ReflectUtil.java:10) ... 16 more
Spark-submit ClassNotFoundException with JAR!
Hi, I'm having problems with a ClassNotFoundException using this simple example: import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf import java.net.URLClassLoader import scala.util.Marshal class ClassToRoundTrip(val id: Int) extends scala.Serializable { } object RoundTripTester { def test(id : Int) : ClassToRoundTrip = { // Get the current classpath and output. Can we see simpleapp jar? val cl = ClassLoader.getSystemClassLoader val urls = cl.asInstanceOf[URLClassLoader].getURLs urls.foreach(url = println(Executor classpath is: + url.getFile)) // Simply instantiating an instance of object and using it works fine. val testObj = new ClassToRoundTrip(id) println(testObj.id: + testObj.id) val testObjBytes = Marshal.dump(testObj) val testObjRoundTrip = Marshal.load[ClassToRoundTrip](testObjBytes) // -- ClassNotFoundException here testObjRoundTrip } } object SimpleApp { def main(args: Array[String]) { val conf = new SparkConf().setAppName(Simple Application) val sc = new SparkContext(conf) val cl = ClassLoader.getSystemClassLoader val urls = cl.asInstanceOf[URLClassLoader].getURLs urls.foreach(url = println(Driver classpath is: + url.getFile)) val data = Array(1, 2, 3, 4, 5) val distData = sc.parallelize(data) distData.foreach(x= RoundTripTester.test(x)) } } In local mode, submitting as per the docs generates a ClassNotFound exception on line 31, where the ClassToRoundTrip object is deserialized. Strangely, the earlier use on line 28 is okay: spark-submit --class SimpleApp \ --master local[4] \ target/scala-2.10/simpleapp_2.10-1.0.jar However, if I add extra parameters for driver-class-path, and -jars, it works fine, on local. spark-submit --class SimpleApp \ --master local[4] \ --driver-class-path /home/xxx/workspace/SimpleApp/target/scala-2.10/simpleapp_2.10-1.0.jar \ --jars /home/xxx/workspace/SimpleApp/target/scala-2.10/SimpleApp.jar \ target/scala-2.10/simpleapp_2.10-1.0.jar However, submitting to a local dev master, still generates the same issue: spark-submit --class SimpleApp \ --master spark://localhost.localdomain:7077 \ --driver-class-path /home/xxx/workspace/SimpleApp/target/scala-2.10/simpleapp_2.10-1.0.jar \ --jars /home/xxx/workspace/SimpleApp/target/scala-2.10/simpleapp_2.10-1.0.jar \ target/scala-2.10/simpleapp_2.10-1.0.jar I can see from the output that the JAR file is being fetched by the executor. Logs for one of the executor's are here: stdout: http://pastebin.com/raw.php?i=DQvvGhKm stderr: http://pastebin.com/raw.php?i=MPZZVa0Q I'm using Spark 1.0.2. The ClassToRoundTrip is included in the JAR. I have a work around of copying the JAR to each of the machines and setting the spark.executor.extraClassPath parameter but I would rather not have to do that. This is such a simple case, I must be doing something obviously wrong. Can anyone help? Thanks Peter - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: ClassNotFoundException: $line11.$read$ when loading an HDFS text file with SparkQL in spark-shell
Hi, Yes, the error still occurs when we replace the lambdas with named functions: (same error traces as in previous posts) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-line11-read-when-loading-an-HDFS-text-file-with-SparkQL-in-spark-shell-tp9954p10154.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
ClassNotFoundException: $line11.$read$ when loading an HDFS text file with SparkQL in spark-shell
Hi all, I just installed a mesos 0.19 cluster. I am failing to execute basic SparkQL operations on text files with Spark 1.0.1 with the spark-shell. I have one Mesos master without zookeeper and 4 mesos slaves. All nodes are running JDK 1.7.51 and Scala 2.10.4. The spark package is uploaded to hdfs and the user running the mesos slave has permission to access to it. I am runnning HDFS from the latest CDH5. I tried both with the pre-built CDH5 spark package available from http://spark.apache.org/downloads.html and by packaging spark with sbt 0.13.2, JDK 1.7.51 and scala 2.10.4 as explained here http://mesosphere.io/learn/run-spark-on-mesos/ No matter what I try, when I execute the following code on the spark-shell : The job fails with the following error reported by the mesos slave nodes: Note that runnning a simple map+reduce job on the same hdfs files with the same installation works fine: The hdfs files contain just plain csv files: spark-env.sh look like this: Any help, comment or pointer would be greatly appreciated! Thanks in advance Svend -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-line11-read-when-loading-an-HDFS-text-file-with-SparkQL-in-spark-shell-tp9954.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: ClassNotFoundException: $line11.$read$ when loading an HDFS text file with SparkQL in spark-shell
Note that runnning a simple map+reduce job on the same hdfs files with the same installation works fine: Did you call collect() on the totalLength? Otherwise nothing has actually executed.
Re: ClassNotFoundException: $line11.$read$ when loading an HDFS text file with SparkQL in spark-shell
Oh, I'm sorry... reduce is also an operation On Wed, Jul 16, 2014 at 3:37 PM, Michael Armbrust mich...@databricks.com wrote: Note that runnning a simple map+reduce job on the same hdfs files with the same installation works fine: Did you call collect() on the totalLength? Otherwise nothing has actually executed.
Re: ClassNotFoundException: $line11.$read$ when loading an HDFS text file with SparkQL in spark-shell
Hi Michael, Thanks for your reply. Yes, the reduce triggered the actual execution, I got a total length (totalLength: 95068762, for the record). -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-line11-read-when-loading-an-HDFS-text-file-with-SparkQL-in-spark-shell-tp9954p9984.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: ClassNotFoundException: $line11.$read$ when loading an HDFS text file with SparkQL in spark-shell
H, it could be some weirdness with classloaders / Mesos / spark sql? I'm curious if you would hit an error if there were no lambda functions involved. Perhaps if you load the data using jsonFile or parquetFile. Either way, I'd file a JIRA. Thanks! On Jul 16, 2014 6:48 PM, Svend svend.vanderve...@gmail.com wrote: Hi Michael, Thanks for your reply. Yes, the reduce triggered the actual execution, I got a total length (totalLength: 95068762, for the record). -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-line11-read-when-loading-an-HDFS-text-file-with-SparkQL-in-spark-shell-tp9954p9984.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Failing to run standalone streaming app: IOException; classNotFoundException; and more
Hi,I'm attempting to run the following simple standalone app on mac os and spark 1.0 using sbt:val sparkConf = new SparkConf().setAppName(ProcessEvents).setMaster(local[*]).setSparkHome(/Users/me/Downloads/spark)val ssc = new StreamingContext(sparkConf, Seconds(10))val lines = ssc.textFileStream(/Users/me/Downloads/test/)lines.foreachRDD(rdd = rdd.foreach(println(_)))ssc.start()ssc.awaitTermination()However, when running it with sbt run, I get quite a few errors:23:27:42.182 [run-main] DEBUG org.apache.hadoop.conf.Configuration - java.io.IOException: config() at org.apache.hadoop.conf.Configuration.(Configuration.java:227)at org.apache.hadoop.conf.Configuration.(Configuration.java:214)org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:0 failed 1 times, most recent failure: Exception failure in TID 0 on host localhost: java.lang.ClassNotFoundException: scala.None$ java.net.URLClassLoader$1.run(URLClassLoader.java:366) java.net.URLClassLoader$1.run(URLClassLoader.java:355)Any ideas? Let me know what other info you need to figure this out.Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Failing-to-run-standalone-streaming-app-IOException-classNotFoundException-and-more-tp7632.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Spark Kafka streaming - ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaReceiver
) at org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:72) at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350) at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:145) at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1791) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1750) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:193) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) What might be the problem? Can someone help me solving this issue? Regards, Gaurav -- If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Kafka-streaming-ClassNotFoundException-org-apache-spark-streaming-kafka-KafkaReceiver-tp7045p7216.html To start a new topic under Apache Spark User List, email ml-node+s1001560n1...@n3.nabble.com To unsubscribe from Apache Spark User List, click here http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1code=Z2F1cmF2LmRnMTlAZ21haWwuY29tfDF8LTk5NzA0ODAy . NAML http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Kafka-streaming-ClassNotFoundException-org-apache-spark-streaming-kafka-KafkaReceiver-tp7045p7387.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Spark Kafka streaming - ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaReceiver
Gaurav, I am not sure that the * expands to what you expect it to do. Normally the bash expands * to a space-separated string, not colon-separated. Try specifying all the jars manually, maybe? Tobias On Thu, Jun 5, 2014 at 6:45 PM, Gaurav Dasgupta gaurav.d...@gmail.com wrote: Hi, I have written my own custom Spark streaming code which connects to Kafka server and fetch data. I have tested the code on local mode and it is working fine. But when I am executing the same code on YARN mode, I am getting KafkaReceiver class not found exception. I am providing the Spark Kafka jar in the classpath and ensured that the path is correct for all the nodes in my cluster. I am using Spark 0.9.1 hadoop pre-built and is deployed on all the nodes (10 node cluster) in the YARN cluster. I am using the following command to run my code on YARN mode: SPARK_YARN_MODE=true SPARK_JAR=assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop2.2.0.jar SPARK_YARN_APP_JAR=/usr/local/SparkStreamExample.jar java -cp /usr/local/SparkStreamExample.jar:assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop2.2.0.jar:external/kafka/target/spark-streaming-kafka_2.10-0.9.1.jar:/usr/local/kafka/kafka_2.10-0.8.1.1/libs/*:/usr/lib/hbase/lib/*:/etc/hadoop/conf/:/etc/hbase/conf/ SparkStreamExample yarn-client 10.10.5.32 myFirstGroup testTopic NewTestTable 1 Below is the error message I am getting: 14/06/05 04:29:12 INFO cluster.YarnClientClusterScheduler: Adding task set 2.0 with 1 tasks 14/06/05 04:29:12 INFO scheduler.TaskSetManager: Starting task 2.0:0 as TID 70 on executor 2: manny6.musigma.com (PROCESS_LOCAL) 14/06/05 04:29:12 INFO scheduler.TaskSetManager: Serialized task 2.0:0 as 2971 bytes in 2 ms 14/06/05 04:29:12 WARN scheduler.TaskSetManager: Lost TID 70 (task 2.0:0) 14/06/05 04:29:12 WARN scheduler.TaskSetManager: Loss was due to java.lang.ClassNotFoundException java.lang.ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaReceiver at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:37) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1574) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1495) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1731) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1666) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1322) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946) at java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:479) at org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:72) at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350) at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:145) at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1791) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1750) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:193) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42) at
Spark Kafka streaming - ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaReceiver
Hi, I have written my own custom Spark streaming code which connects to Kafka server and fetch data. I have tested the code on local mode and it is working fine. But when I am executing the same code on YARN mode, I am getting KafkaReceiver class not found exception. I am providing the Spark Kafka jar in the classpath and ensured that the path is correct for all the nodes in my cluster. I am using Spark 0.9.1 hadoop pre-built and is deployed on all the nodes (10 node cluster) in the YARN cluster. I am using the following command to run my code on YARN mode: *SPARK_YARN_MODE=true SPARK_JAR=assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop2.2.0.jar SPARK_YARN_APP_JAR=/usr/local/SparkStreamExample.jar java -cp /usr/local/SparkStreamExample.jar:assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop2.2.0.jar:external/kafka/target/spark-streaming-kafka_2.10-0.9.1.jar:/usr/local/kafka/kafka_2.10-0.8.1.1/libs/*:/usr/lib/hbase/lib/*:/etc/hadoop/conf/:/etc/hbase/conf/ SparkStreamExample yarn-client 10.10.5.32 myFirstGroup testTopic NewTestTable 1* Below is the error message I am getting: *14/06/05 04:29:12 INFO cluster.YarnClientClusterScheduler: Adding task set 2.0 with 1 tasks14/06/05 04:29:12 INFO scheduler.TaskSetManager: Starting task 2.0:0 as TID 70 on executor 2: manny6.musigma.com http://manny6.musigma.com (PROCESS_LOCAL)14/06/05 04:29:12 INFO scheduler.TaskSetManager: Serialized task 2.0:0 as 2971 bytes in 2 ms14/06/05 04:29:12 WARN scheduler.TaskSetManager: Lost TID 70 (task 2.0:0)14/06/05 04:29:12 WARN scheduler.TaskSetManager: Loss was due to java.lang.ClassNotFoundExceptionjava.lang.ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaReceiverat java.net.URLClassLoader$1.run(URLClassLoader.java:202)at java.security.AccessController.doPrivileged(Native Method)at java.net.URLClassLoader.findClass(URLClassLoader.java:190)at java.lang.ClassLoader.loadClass(ClassLoader.java:306)at java.lang.ClassLoader.loadClass(ClassLoader.java:247)at java.lang.Class.forName0(Native Method)at java.lang.Class.forName(Class.java:247)at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:37) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1574) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1495) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1731) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1666)at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1322)at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946) at java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:479) at org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:72) at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597)at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:145) at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1791) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1750) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:193) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41) at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:396)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41) at
ClassNotFoundException with Spark/Mesos (spark-shell works fine)
Hi, I have set up a cluster with Mesos (backed by Zookeeper) with three master and three slave instances. I set up Spark (git HEAD) for use with Mesos according to this manual: http://people.apache.org/~pwendell/catalyst-docs/running-on-mesos.html Using the spark-shell, I can connect to this cluster and do simple RDD operations, but the same code in a Scala class and executed via sbt run-main works only partially. (That is, count() works, count() after flatMap() does not.) Here is my code: https://gist.github.com/tgpfeiffer/7d20a4d59ee6e0088f91 The file SparkExamplesScript.scala, when pasted into spark-shell, outputs the correct count() for the parallelized list comprehension, as well as for the flatMapped RDD. The file SparkExamplesMinimal.scala contains exactly the same code, and also the MASTER configuration and the Spark Executor are the same. However, while the count() for the parallelized list is displayed correctly, I receive the following error when asking for the count() of the flatMapped RDD: - 14/05/21 09:47:49 INFO scheduler.DAGScheduler: Submitting Stage 1 (FlatMappedRDD[1] at flatMap at SparkExamplesMinimal.scala:34), which has no missing parents 14/05/21 09:47:49 INFO scheduler.DAGScheduler: Submitting 8 missing tasks from Stage 1 (FlatMappedRDD[1] at flatMap at SparkExamplesMinimal.scala:34) 14/05/21 09:47:49 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 8 tasks 14/05/21 09:47:49 INFO scheduler.TaskSetManager: Starting task 1.0:0 as TID 8 on executor 20140520-102159-2154735808-5050-1108-1: mesos9-1 (PROCESS_LOCAL) 14/05/21 09:47:49 INFO scheduler.TaskSetManager: Serialized task 1.0:0 as 1779147 bytes in 37 ms 14/05/21 09:47:49 WARN scheduler.TaskSetManager: Lost TID 8 (task 1.0:0) 14/05/21 09:47:49 WARN scheduler.TaskSetManager: Loss was due to java.lang.ClassNotFoundException java.lang.ClassNotFoundException: spark.SparkExamplesMinimal$$anonfun$2 at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63) at org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:61) at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:141) at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:85) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:169) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) - Can anyone explain to me where this comes from or how I might further track the problem down? Thanks, Tobias
Re: ClassNotFoundException with Spark/Mesos (spark-shell works fine)
Hi Tobias, Regarding my comment on closure serialization: I was discussing it with my fellow Sparkers here and I totally overlooked the fact that you need the class files to de-serialize the closures (or whatever) on the workers, so you always need the jar file delivered to the workers in order for it to work. The SparkREPL works differently. It uses some dark magic to send the working session to the workers. -kr, Gerard. On Wed, May 21, 2014 at 2:47 PM, Gerard Maas gerard.m...@gmail.com wrote: Hi Tobias, I was curious about this issue and tried to run your example on my local Mesos. I was able to reproduce your issue using your current config: [error] (run-main-0) org.apache.spark.SparkException: Job aborted: Task 1.0:4 failed 4 times (most recent failure: Exception failure: java.lang.ClassNotFoundException: spark.SparkExamplesMinimal$$anonfun$2) org.apache.spark.SparkException: Job aborted: Task 1.0:4 failed 4 times (most recent failure: Exception failure: java.lang.ClassNotFoundException: spark.SparkExamplesMinimal$$anonfun$2) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028) Creating a simple jar from the job and providing it through the configuration seems to solve it: val conf = new SparkConf() .setMaster(mesos://my_ip:5050/) * .setJars(Seq(/sparkexample/target/scala-2.10/sparkexample_2.10-0.1.jar))* .setAppName(SparkExamplesMinimal) Resulting in: 14/05/21 12:03:45 INFO scheduler.DAGScheduler: Completed ResultTask(1, 1) 14/05/21 12:03:45 INFO scheduler.DAGScheduler: Stage 1 (count at SparkExamplesMinimal.scala:50) finished in 1.120 s 14/05/21 12:03:45 INFO spark.SparkContext: Job finished: count at SparkExamplesMinimal.scala:50, took 1.177091435 s count: 100 Why the closure serialization does not work with Mesos is beyond my current knowledge. Would be great to hear from the experts (cross-posting to dev for that) -kr, Gerard. On Wed, May 21, 2014 at 11:51 AM, Tobias Pfeiffer t...@preferred.jpwrote: Hi, I have set up a cluster with Mesos (backed by Zookeeper) with three master and three slave instances. I set up Spark (git HEAD) for use with Mesos according to this manual: http://people.apache.org/~pwendell/catalyst-docs/running-on-mesos.html Using the spark-shell, I can connect to this cluster and do simple RDD operations, but the same code in a Scala class and executed via sbt run-main works only partially. (That is, count() works, count() after flatMap() does not.) Here is my code: https://gist.github.com/tgpfeiffer/7d20a4d59ee6e0088f91 The file SparkExamplesScript.scala, when pasted into spark-shell, outputs the correct count() for the parallelized list comprehension, as well as for the flatMapped RDD. The file SparkExamplesMinimal.scala contains exactly the same code, and also the MASTER configuration and the Spark Executor are the same. However, while the count() for the parallelized list is displayed correctly, I receive the following error when asking for the count() of the flatMapped RDD: - 14/05/21 09:47:49 INFO scheduler.DAGScheduler: Submitting Stage 1 (FlatMappedRDD[1] at flatMap at SparkExamplesMinimal.scala:34), which has no missing parents 14/05/21 09:47:49 INFO scheduler.DAGScheduler: Submitting 8 missing tasks from Stage 1 (FlatMappedRDD[1] at flatMap at SparkExamplesMinimal.scala:34) 14/05/21 09:47:49 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 8 tasks 14/05/21 09:47:49 INFO scheduler.TaskSetManager: Starting task 1.0:0 as TID 8 on executor 20140520-102159-2154735808-5050-1108-1: mesos9-1 (PROCESS_LOCAL) 14/05/21 09:47:49 INFO scheduler.TaskSetManager: Serialized task 1.0:0 as 1779147 bytes in 37 ms 14/05/21 09:47:49 WARN scheduler.TaskSetManager: Lost TID 8 (task 1.0:0) 14/05/21 09:47:49 WARN scheduler.TaskSetManager: Loss was due to java.lang.ClassNotFoundException java.lang.ClassNotFoundException: spark.SparkExamplesMinimal$$anonfun$2 at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at
Re: ClassNotFoundException with Spark/Mesos (spark-shell works fine)
Hi Tobias, On Wed, May 21, 2014 at 5:45 PM, Tobias Pfeiffer t...@preferred.jp wrote: first, thanks for your explanations regarding the jar files! No prob :-) On Thu, May 22, 2014 at 12:32 AM, Gerard Maas gerard.m...@gmail.com wrote: I was discussing it with my fellow Sparkers here and I totally overlooked the fact that you need the class files to de-serialize the closures (or whatever) on the workers, so you always need the jar file delivered to the workers in order for it to work. So the closure as a function is serialized, sent across the wire, deserialized there, and *still* you need the class files? (I am not sure I understand what is actually sent over the network then. Does that serialization only contain the values that I close over?) I also had that mental lapse. Serialization refers to converting object (not class) state (current values) into a byte stream and de-serialization restores the bytes from the wire into an seemingly identical object at the receiving side (except for transient variables), for that, it requires the class definition of that object to know what it needs to instantiate, so yes, the compiled classes need to be given to the Spark driver and it will take care of dispatching them to the workers (much better than in the old RMI days ;-) If I understand correctly what you are saying, then the documentation at https://people.apache.org/~pwendell/catalyst-docs/running-on-mesos.html (list item 8) needs to be extended quite a bit, right? The mesos docs have been recently updated here: https://github.com/apache/spark/pull/756/files Don't know where the latest version from master is built/available. -kr, Gerard.
Re: ClassNotFoundException with Spark/Mesos (spark-shell works fine)
Here's the 1.0.0rc9 version of the docs: https://people.apache.org/~pwendell/spark-1.0.0-rc9-docs/running-on-mesos.html I refreshed them with the goal of steering users more towards prebuilt packages than relying on compiling from source plus improving overall formatting and clarity, but not otherwise modifying the content. I don't expect any changes for rc10. It does seem like an issue though that classpath issues are preventing that from running. Just to check, have you given the exact some jar a shot when running against a standalone cluster? If it works in standalone, I think that's good evidence that there's an issue with the Mesos classloaders in master. I'm running into a similar issue with classpaths failing on Mesos but working in standalone, but I haven't coherently written up my observations yet so haven't gotten that to this list. I'd almost gotten to the point where I thought that my custom code needed to be included in the SPARK_EXECUTOR_URI but that can't possibly be correct. The Spark workers that are launched on Mesos slaves should start with the Spark core jars and then transparently get classes from custom code over the network, or at least that's who I thought it should work. For those who have been using Mesos in previous releases, you've never had to do that before have you? On Wed, May 21, 2014 at 3:30 PM, Gerard Maas gerard.m...@gmail.com wrote: Hi Tobias, On Wed, May 21, 2014 at 5:45 PM, Tobias Pfeiffer t...@preferred.jp wrote: first, thanks for your explanations regarding the jar files! No prob :-) On Thu, May 22, 2014 at 12:32 AM, Gerard Maas gerard.m...@gmail.com wrote: I was discussing it with my fellow Sparkers here and I totally overlooked the fact that you need the class files to de-serialize the closures (or whatever) on the workers, so you always need the jar file delivered to the workers in order for it to work. So the closure as a function is serialized, sent across the wire, deserialized there, and *still* you need the class files? (I am not sure I understand what is actually sent over the network then. Does that serialization only contain the values that I close over?) I also had that mental lapse. Serialization refers to converting object (not class) state (current values) into a byte stream and de-serialization restores the bytes from the wire into an seemingly identical object at the receiving side (except for transient variables), for that, it requires the class definition of that object to know what it needs to instantiate, so yes, the compiled classes need to be given to the Spark driver and it will take care of dispatching them to the workers (much better than in the old RMI days ;-) If I understand correctly what you are saying, then the documentation at https://people.apache.org/~pwendell/catalyst-docs/running-on-mesos.html (list item 8) needs to be extended quite a bit, right? The mesos docs have been recently updated here: https://github.com/apache/spark/pull/756/files Don't know where the latest version from master is built/available. -kr, Gerard.
Re: ClassNotFoundException with Spark/Mesos (spark-shell works fine)
Hi Andrew, Thanks for the current doc. I'd almost gotten to the point where I thought that my custom code needed to be included in the SPARK_EXECUTOR_URI but that can't possibly be correct. The Spark workers that are launched on Mesos slaves should start with the Spark core jars and then transparently get classes from custom code over the network, or at least that's who I thought it should work. For those who have been using Mesos in previous releases, you've never had to do that before have you? Regarding the delivery of the custom job code to Mesos, we have been using 'ADD_JARS' (in the command line) or 'SparkConfig.setJars(Seq[String]) with a fat jar packing all dependencies. That works as well on the Spark 'standalone' cluster, but we deploy mostly on Mesos, so I couldn't say about classloading difference between the two. -greetz, Gerard.
Re: ClassNotFoundException
I just ran into the same problem. I will respond if I find how to fix. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-tp5182p5342.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: spark 0.9.1: ClassNotFoundException
check if the jar file that includes your example code is under examples/target/scala-2.10/. On Sat, May 3, 2014 at 5:58 AM, SK skrishna...@gmail.com wrote: I am using Spark 0.9.1 in standalone mode. In the SPARK_HOME/examples/src/main/scala/org/apache/spark/ folder, I created my directory called mycode in which I have placed some standalone scala code. I was able to compile. I ran the code using: ./bin/run-example org.apache.spark.mycode.MyClass local However, I get a ClassNotFound exception, although I do see the compiled classes in examples/target/scala-2.10/classes/org/apache/spark/mycode When I place the same code in the same folder structure in the spark 0.9.0 version, I am able to run it. Where should I place my standalone code with respect to SPARK_HOME, in spark0.9.1 so that the classes can be found? thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-0-9-1-ClassNotFoundException-tp5256.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
spark 0.9.1: ClassNotFoundException
I am using Spark 0.9.1 in standalone mode. In the SPARK_HOME/examples/src/main/scala/org/apache/spark/ folder, I created my directory called mycode in which I have placed some standalone scala code. I was able to compile. I ran the code using: ./bin/run-example org.apache.spark.mycode.MyClass local However, I get a ClassNotFound exception, although I do see the compiled classes in examples/target/scala-2.10/classes/org/apache/spark/mycode When I place the same code in the same folder structure in the spark 0.9.0 version, I am able to run it. Where should I place my standalone code with respect to SPARK_HOME, in spark0.9.1 so that the classes can be found? thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-0-9-1-ClassNotFoundException-tp5256.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
ClassNotFoundException
Hi, I am getting the following error. How could I fix this problem? Joe 14/05/02 03:51:48 WARN TaskSetManager: Lost TID 12 (task 2.0:1) 14/05/02 03:51:48 INFO TaskSetManager: Loss was due to java.lang.ClassNotFoundException: org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$4 [duplicate 6] 14/05/02 03:51:48 ERROR TaskSetManager: Task 2.0:1 failed 4 times; aborting job 14/05/02 03:51:48 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 14/05/02 03:51:48 INFO TaskSetManager: Loss was due to java.lang.ClassNotFoundException: org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$4 [duplicate 7] 14/05/02 03:51:48 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 14/05/02 03:51:48 INFO DAGScheduler: Failed to run count at reasoner.scala:70 [error] (run-main-0) org.apache.spark.SparkException: Job aborted: Task 2.0:1 failed 4 times (most recent failure: Exception failure: java.lang.ClassNotFoundException: org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$4) org.apache.spark.SparkException: Job aborted: Task 2.0:1 failed 4 times (most recent failure: Exception failure: java.lang.ClassNotFoundException: org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$4) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018) at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604) at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604) at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-tp5182.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Shark: ClassNotFoundException org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
Just for curiosity , as you are using Cloudera-Manager hadoop and spark.. How you build shark .for it?? are you able to read any file from hdfs ...did you tried that out..??? Regards, Arpit Tak On Thu, Apr 17, 2014 at 7:07 PM, ge ko koenig@gmail.com wrote: Hi, the error java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat has been resolved by adding parquet-hive-bundle-1.4.1.jar to shark's lib folder. Now the Hive metastore can be read successfully (also the parquet based table). But if I want to select from that table I receive: org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 4 times (most recent failure: Exception failure: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018) This is really strange, since the class org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe is included in the parquet-hive-bundle-1.4.1.jar ?!?! ...getting more and more confused ;) any help ? regards, Gerd On 17 April 2014 11:55, ge ko koenig@gmail.com wrote: Hi, I want to select from a parquet based table in shark, but receive the error: shark select * from wl_parquet; 14/04/17 11:33:49 INFO shark.SharkCliDriver: Execution Mode: shark 14/04/17 11:33:49 INFO ql.Driver: PERFLOG method=Driver.run 14/04/17 11:33:49 INFO ql.Driver: PERFLOG method=TimeToSubmit 14/04/17 11:33:49 INFO ql.Driver: PERFLOG method=compile 14/04/17 11:33:49 INFO parse.ParseDriver: Parsing command: select * from wl_parquet 14/04/17 11:33:49 INFO parse.ParseDriver: Parse Completed 14/04/17 11:33:49 INFO parse.SharkSemanticAnalyzer: Get metadata for source tables FAILED: Hive Internal Error: java.lang.RuntimeException(java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat) 14/04/17 11:33:50 ERROR shark.SharkDriver: FAILED: Hive Internal Error: java.lang.RuntimeException(java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat) java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat at org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:306) at org.apache.hadoop.hive.ql.metadata.Table.init(Table.java:99) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:988) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:891) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1083) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1059) at shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:137) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:279) at shark.SharkDriver.compile(SharkDriver.scala:215) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909) at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:338) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at shark.SharkCliDriver$.main(SharkCliDriver.scala:235) at shark.SharkCliDriver.main(SharkCliDriver.scala) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:302) ... 14 more I can successfully select from that table with Hive and Impala, but shark doesn't work. I am using CDH5 incl. Spark parcel and Shark 0.9.1. In what jar is this class hidden, how can I get rid of this exception ?!?! The lib folder of shark contains: [root@hadoop-pg-9 shark-0.9.1]# ll lib total 180 lrwxrwxrwx 1 root root67 16. Apr 14:17 hive-serdes-1.0-SNAPSHOT.jar - /opt/cloudera/parcels/CDH/lib/hive/lib/hive-serdes-1.0-SNAPSHOT.jar -rwxrwxr-x 1 root root 23086 9. Apr 10:57 JavaEWAH-0.4.2.jar lrwxrwxrwx 1 root root53 14. Apr 21:46 parquet-avro.jar -