Hi all, I’m submitting a simple task using the spark shell against a cassandraRDD ( Datastax Environment ). I’m getting the following eception from one of the workers:
INFO 2014-10-27 14:08:03 akka.event.slf4j.Slf4jLogger: Slf4jLogger started INFO 2014-10-27 14:08:03 Remoting: Starting remoting INFO 2014-10-27 14:08:03 Remoting: Remoting started; listening on addresses :[akka.tcp://sparkExecutor@10.105.111.130:50234] INFO 2014-10-27 14:08:03 Remoting: Remoting now listens on addresses: [akka.tcp://sparkExecutor@10.105.111.130:50234] INFO 2014-10-27 14:08:03 org.apache.spark.executor.CoarseGrainedExecutorBackend: Connecting to driver: akka.tcp://sp...@srv02.pocbgsia.ats-online.it:39797/user/CoarseGrainedScheduler INFO 2014-10-27 14:08:03 org.apache.spark.deploy.worker.WorkerWatcher: Connecting to worker akka.tcp://sparkWorker@10.105.111.130:34467/user/Worker INFO 2014-10-27 14:08:04 org.apache.spark.deploy.worker.WorkerWatcher: Successfully connected to akka.tcp://sparkWorker@10.105.111.130:34467/user/Worker INFO 2014-10-27 14:08:04 org.apache.spark.executor.CoarseGrainedExecutorBackend: Successfully registered with driver INFO 2014-10-27 14:08:04 org.apache.spark.executor.Executor: Using REPL class URI: http://159.8.18.11:51705 INFO 2014-10-27 14:08:04 akka.event.slf4j.Slf4jLogger: Slf4jLogger started INFO 2014-10-27 14:08:04 Remoting: Starting remoting INFO 2014-10-27 14:08:04 Remoting: Remoting started; listening on addresses :[akka.tcp://spark@10.105.111.130:49243] INFO 2014-10-27 14:08:04 Remoting: Remoting now listens on addresses: [akka.tcp://spark@10.105.111.130:49243] INFO 2014-10-27 14:08:04 org.apache.spark.SparkEnv: Connecting to BlockManagerMaster: akka.tcp://sp...@srv02.pocbgsia.ats-online.it:39797/user/BlockManagerMaster INFO 2014-10-27 14:08:04 org.apache.spark.storage.DiskBlockManager: Created local directory at /usr/share/dse/spark/tmp/executor/spark-local-20141027140804-4d84 INFO 2014-10-27 14:08:04 org.apache.spark.storage.MemoryStore: MemoryStore started with capacity 23.0 GB. INFO 2014-10-27 14:08:04 org.apache.spark.network.ConnectionManager: Bound socket to port 50542 with id = ConnectionManagerId(10.105.111.130,50542) INFO 2014-10-27 14:08:04 org.apache.spark.storage.BlockManagerMaster: Trying to register BlockManager INFO 2014-10-27 14:08:04 org.apache.spark.storage.BlockManagerMaster: Registered BlockManager INFO 2014-10-27 14:08:04 org.apache.spark.SparkEnv: Connecting to MapOutputTracker: akka.tcp://sp...@srv02.pocbgsia.ats-online.it:39797/user/MapOutputTracker INFO 2014-10-27 14:08:04 org.apache.spark.HttpFileServer: HTTP File server directory is /usr/share/dse/spark/tmp/executor/spark-a23656dc-efce-494b-875a-a1cf092c3230 INFO 2014-10-27 14:08:04 org.apache.spark.HttpServer: Starting HTTP Server INFO 2014-10-27 14:08:27 org.apache.spark.executor.CoarseGrainedExecutorBackend: Got assigned task 0 INFO 2014-10-27 14:08:28 org.apache.spark.executor.Executor: Running task ID 0 ERROR 2014-10-27 14:08:28 org.apache.spark.executor.Executor: Exception in task ID 0 java.lang.ClassNotFoundException: com.datastax.bdp.spark.CassandraRDD at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:49) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Unknown Source) at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:37) at java.io.ObjectInputStream.readNonProxyDesc(Unknown Source) at java.io.ObjectInputStream.readClassDesc(Unknown Source) at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.readObject(Unknown Source) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) at org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:63) at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:139) at java.io.ObjectInputStream.readExternalData(Unknown Source) at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.readObject(Unknown Source) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:193) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:45) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.io.FileNotFoundException: http://159.8.18.11:51705/com/datastax/bdp/spark/CassandraRDD.class at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source) at java.net.URL.openStream(Unknown Source) at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:55) ... 25 more I don’t understand why a worker (private address: 10.105.111.130 srv02.pocbgsia.ats-online.it ) search a .class file on a public url of the master node (http://159.8.18.11:51705/com/datastax/bdp/spark/CassandraRDD.class) What I’m missing ? Thanks in advance Paolo