Hi, everybody! I'm trying to deploy a simple app in Spark standalone cluster with a single node (the localhost). Unfortunately, something goes wrong while processing the JAR file and an exception NullPointerException is thrown. I'm running everything in a single machine with Windows8. Check below the detail. Please help with suggestions what is missing to make it work - really looking forward to work with spark in a cluster. The problem shows up both with my own little programs and with the spark examples (e.g. WordCount). The problem also show both running with my custom driver or using the spark-submit or run-examples (which calls spark-submit).
(Hadoop I also compiled from source for windows - but not really being used.) Drive Code: SparkConf conf = new SparkConf().setAppName("SimpleTests") .setJars(new String[]{"file:///myworkspace/spark-tests.jar"}) .setMaster("spark://mymachine:7077") .setSparkHome("/mysparkhome/spark-1.1.0-bin-hadoop2.4"); JavaSparkContext sc = new JavaSparkContext(conf); Streaming coding is trivial and the usual: Get this output and error: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/C:/Users/JorgePaulo/tmp/hadoop/hadoop-2.4.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/C:/Users/JorgePaulo/tmp/spark/spark-1.1.0-bin-hadoop2.4/lib/spark-assembly-1.1.0-hadoop2.4.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 14/10/12 20:15:00 INFO SecurityManager: Changing view acls to: JorgePaulo, 14/10/12 20:15:00 INFO SecurityManager: Changing modify acls to: JorgePaulo, 14/10/12 20:15:00 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(JorgePaulo, ); users with modify permissions: Set(JorgePaulo, ) 14/10/12 20:15:01 INFO Slf4jLogger: Slf4jLogger started 14/10/12 20:15:02 INFO Remoting: Starting remoting 14/10/12 20:15:02 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@jsimao71-acer:4279] 14/10/12 20:15:02 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@jsimao71-acer:4279] 14/10/12 20:15:02 INFO Utils: Successfully started service 'sparkDriver' on port 4279. 14/10/12 20:15:02 INFO SparkEnv: Registering MapOutputTracker 14/10/12 20:15:02 INFO SparkEnv: Registering BlockManagerMaster 14/10/12 20:15:02 INFO DiskBlockManager: Created local directory at C:\Users\JORGEP~1\AppData\Local\Temp\spark-local-20141012201502-723f 14/10/12 20:15:02 INFO Utils: Successfully started service 'Connection manager for block manager' on port 4282. 14/10/12 20:15:02 INFO ConnectionManager: Bound socket to port 4282 with id = ConnectionManagerId(jsimao71-acer,4282) 14/10/12 20:15:02 INFO MemoryStore: MemoryStore started with capacity 669.3 MB 14/10/12 20:15:02 INFO BlockManagerMaster: Trying to register BlockManager 14/10/12 20:15:02 INFO BlockManagerMasterActor: Registering block manager jsimao71-acer:4282 with 669.3 MB RAM 14/10/12 20:15:02 INFO BlockManagerMaster: Registered BlockManager 14/10/12 20:15:02 INFO HttpFileServer: HTTP File server directory is C:\Users\JORGEP~1\AppData\Local\Temp\spark-4771bfb8-e4f4-43d2-a437-6d55ee7c88b4 14/10/12 20:15:02 INFO HttpServer: Starting HTTP Server 14/10/12 20:15:03 INFO Utils: Successfully started service 'HTTP file server' on port 4283. 14/10/12 20:15:03 INFO Utils: Successfully started service 'SparkUI' on port 4040. 14/10/12 20:15:03 INFO SparkUI: Started SparkUI at http://jsimao71-acer:4040 14/10/12 20:15:10 INFO SparkContext: Added JAR file:///Users/JorgePaulo/workspace/spark-tests.jar at http://192.168.179.1:4283/jars/spark-tests.jar with timestamp 1413141310617 14/10/12 20:15:10 INFO AppClient$ClientActor: Connecting to master spark://jsimao71-acer:7077... 14/10/12 20:15:10 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 14/10/12 20:15:11 INFO MemoryStore: ensureFreeSpace(159118) called with curMem=0, maxMem=701843374 14/10/12 20:15:11 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 155.4 KB, free 669.2 MB) 14/10/12 20:15:11 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20141012201511-0014 14/10/12 20:15:11 INFO AppClient$ClientActor: Executor added: app-20141012201511-0014/0 on worker-20141012171633-jsimao71-acer-1970 (jsimao71-acer:1970) with 4 cores 14/10/12 20:15:11 INFO SparkDeploySchedulerBackend: Granted executor ID app-20141012201511-0014/0 on hostPort jsimao71-acer:1970 with 4 cores, 512.0 MB RAM 14/10/12 20:15:11 INFO AppClient$ClientActor: Executor updated: app-20141012201511-0014/0 is now RUNNING 14/10/12 20:15:11 INFO MemoryStore: ensureFreeSpace(12633) called with curMem=159118, maxMem=701843374 14/10/12 20:15:11 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 12.3 KB, free 669.2 MB) 14/10/12 20:15:11 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on jsimao71-acer:4282 (size: 12.3 KB, free: 669.3 MB) 14/10/12 20:15:11 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0 14/10/12 20:15:12 INFO FileInputFormat: Total input paths to process : 1 14/10/12 20:15:12 INFO SparkContext: Starting job: count at SparkTests.java:48 14/10/12 20:15:12 INFO DAGScheduler: Got job 0 (count at SparkTests.java:48) with 2 output partitions (allowLocal=false) 14/10/12 20:15:12 INFO DAGScheduler: Final stage: Stage 0(count at SparkTests.java:48) 14/10/12 20:15:12 INFO DAGScheduler: Parents of final stage: List() 14/10/12 20:15:12 INFO DAGScheduler: Missing parents: List() 14/10/12 20:15:12 INFO DAGScheduler: Submitting Stage 0 (FilteredRDD[2] at filter at SparkTests.java:42), which has no missing parents 14/10/12 20:15:12 INFO MemoryStore: ensureFreeSpace(2944) called with curMem=171751, maxMem=701843374 14/10/12 20:15:12 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 2.9 KB, free 669.2 MB) 14/10/12 20:15:12 INFO MemoryStore: ensureFreeSpace(1877) called with curMem=174695, maxMem=701843374 14/10/12 20:15:12 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1877.0 B, free 669.2 MB) 14/10/12 20:15:12 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on jsimao71-acer:4282 (size: 1877.0 B, free: 669.3 MB) 14/10/12 20:15:12 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0 14/10/12 20:15:12 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (FilteredRDD[2] at filter at SparkTests.java:42) 14/10/12 20:15:12 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 14/10/12 20:15:25 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@jsimao71-acer:4316/user/Executor#-1003079982] with ID 0 14/10/12 20:15:25 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, jsimao71-acer, PROCESS_LOCAL, 1303 bytes) 14/10/12 20:15:25 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, jsimao71-acer, PROCESS_LOCAL, 1303 bytes) 14/10/12 20:15:26 INFO BlockManagerMasterActor: Registering block manager jsimao71-acer:4335 with 265.1 MB RAM 14/10/12 20:15:27 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, jsimao71-acer): java.lang.NullPointerException: java.lang.ProcessBuilder.start(ProcessBuilder.java:1012) org.apache.hadoop.util.Shell.runCommand(Shell.java:445) org.apache.hadoop.util.Shell.run(Shell.java:418) org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650) org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:873) org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:853) org.apache.spark.util.Utils$.fetchFile(Utils.scala:448) org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:325) org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:323) scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) scala.collection.mutable.HashMap.foreach(HashMap.scala:98) scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) org.apache.spark.executor.Executor.org $apache$spark$executor$Executor$$updateDependencies(Executor.scala:323) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:158) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:745) 14/10/12 20:15:27 INFO TaskSetManager: Starting task 1.1 in stage 0.0 (TID 2, jsimao71-acer, PROCESS_LOCAL, 1303 bytes) 14/10/12 20:15:27 INFO TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) on executor jsimao71-acer: java.lang.NullPointerException (null) [duplicate 1] 14/10/12 20:15:27 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID 3, jsimao71-acer, PROCESS_LOCAL, 1303 bytes) 14/10/12 20:15:27 INFO TaskSetManager: Lost task 1.1 in stage 0.0 (TID 2) on executor jsimao71-acer: java.lang.NullPointerException (null) [duplicate 2] 14/10/12 20:15:27 INFO TaskSetManager: Starting task 1.2 in stage 0.0 (TID 4, jsimao71-acer, PROCESS_LOCAL, 1303 bytes) 14/10/12 20:15:27 INFO TaskSetManager: Lost task 0.1 in stage 0.0 (TID 3) on executor jsimao71-acer: java.lang.NullPointerException (null) [duplicate 3] 14/10/12 20:15:27 INFO TaskSetManager: Starting task 0.2 in stage 0.0 (TID 5, jsimao71-acer, PROCESS_LOCAL, 1303 bytes) 14/10/12 20:15:27 INFO TaskSetManager: Lost task 1.2 in stage 0.0 (TID 4) on executor jsimao71-acer: java.lang.NullPointerException (null) [duplicate 4] 14/10/12 20:15:27 INFO TaskSetManager: Starting task 1.3 in stage 0.0 (TID 6, jsimao71-acer, PROCESS_LOCAL, 1303 bytes) 14/10/12 20:15:27 INFO TaskSetManager: Lost task 0.2 in stage 0.0 (TID 5) on executor jsimao71-acer: java.lang.NullPointerException (null) [duplicate 5] 14/10/12 20:15:27 INFO TaskSetManager: Starting task 0.3 in stage 0.0 (TID 7, jsimao71-acer, PROCESS_LOCAL, 1303 bytes) 14/10/12 20:15:27 INFO TaskSetManager: Lost task 1.3 in stage 0.0 (TID 6) on executor jsimao71-acer: java.lang.NullPointerException (null) [duplicate 6] 14/10/12 20:15:27 ERROR TaskSetManager: Task 1 in stage 0.0 failed 4 times; aborting job 14/10/12 20:15:27 INFO TaskSchedulerImpl: Cancelling stage 0 14/10/12 20:15:27 INFO TaskSchedulerImpl: Stage 0 was cancelled 14/10/12 20:15:27 INFO DAGScheduler: Failed to run count at SparkTests.java:48 Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, jsimao71-acer): java.lang.NullPointerException: java.lang.ProcessBuilder.start(ProcessBuilder.java:1012) org.apache.hadoop.util.Shell.runCommand(Shell.java:445) org.apache.hadoop.util.Shell.run(Shell.java:418) org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650) org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:873) org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:853) org.apache.spark.util.Utils$.fetchFile(Utils.scala:448) org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:325) org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:323) scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) scala.collection.mutable.HashMap.foreach(HashMap.scala:98) scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) org.apache.spark.executor.Executor.org $apache$spark$executor$Executor$$updateDependencies(Executor.scala:323) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:158) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Looking at source code (.scala and .java seams that It could be a targetDir that is set to null - but not sure.) Please help.... Thanks a lot, Jorge.