Hi All, I am trying to run a sample Spark program using Scala SBT,
Below is the program, def main(args: Array[String]) { val logFile = "E:/ApacheSpark/usb/usb/spark/bin/README.md" // Should be some file on your system val sc = new SparkContext("local", "Simple App", "E:/ApacheSpark/usb/usb/spark/bin",List("target/scala-2.10/sbt2_2.10-1.0.jar")) val logData = sc.textFile(logFile, 2).cache() val numAs = logData.filter(line => line.contains("a")).count() val numBs = logData.filter(line => line.contains("b")).count() println("Lines with a: %s, Lines with b: %s".format(numAs, numBs)) } Below is the error log, 14/12/30 23:20:21 INFO rdd.HadoopRDD: Input split: file:/E:/ApacheSpark/usb/usb/spark/bin/README.md:0+673 14/12/30 23:20:21 INFO storage.MemoryStore: ensureFreeSpace(2032) called with curMem=34047, maxMem=280248975 14/12/30 23:20:21 INFO storage.MemoryStore: Block rdd_1_0 stored as values in memory (estimated size 2032.0 B, free 267.2 MB) 14/12/30 23:20:21 INFO storage.BlockManagerInfo: Added rdd_1_0 in memory on zealot:61452 (size: 2032.0 B, free: 267.3 MB) 14/12/30 23:20:21 INFO storage.BlockManagerMaster: Updated info of block rdd_1_0 14/12/30 23:20:21 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 2300 bytes result sent to driver 14/12/30 23:20:21 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 1264 bytes) 14/12/30 23:20:21 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1) 14/12/30 23:20:21 INFO spark.CacheManager: Partition rdd_1_1 not found, computing it 14/12/30 23:20:21 INFO rdd.HadoopRDD: Input split: file:/E:/ApacheSpark/usb/usb/spark/bin/README.md:673+673 14/12/30 23:20:21 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 3507 ms on localhost (1/2) 14/12/30 23:20:21 INFO storage.MemoryStore: ensureFreeSpace(1912) called with curMem=36079, maxMem=280248975 14/12/30 23:20:21 INFO storage.MemoryStore: Block rdd_1_1 stored as values in memory (estimated size 1912.0 B, free 267.2 MB) 14/12/30 23:20:21 INFO storage.BlockManagerInfo: Added rdd_1_1 in memory on zealot:61452 (size: 1912.0 B, free: 267.3 MB) 14/12/30 23:20:21 INFO storage.BlockManagerMaster: Updated info of block rdd_1_1 14/12/30 23:20:21 INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID 1). 2300 bytes result sent to driver 14/12/30 23:20:21 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 261 ms on localhost (2/2) 14/12/30 23:20:21 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 14/12/30 23:20:21 INFO scheduler.DAGScheduler: Stage 0 (count at Test1.scala:19) finished in 3.811 s 14/12/30 23:20:21 INFO spark.SparkContext: Job finished: count at Test1.scala:19, took 3.997365232 s 14/12/30 23:20:21 INFO spark.SparkContext: Starting job: count at Test1.scala:20 14/12/30 23:20:21 INFO scheduler.DAGScheduler: Got job 1 (count at Test1.scala:20) with 2 output partitions (allowLocal=false) 14/12/30 23:20:21 INFO scheduler.DAGScheduler: Final stage: Stage 1(count at Test1.scala:20) 14/12/30 23:20:21 INFO scheduler.DAGScheduler: Parents of final stage: List() 14/12/30 23:20:21 INFO scheduler.DAGScheduler: Missing parents: List() 14/12/30 23:20:21 INFO scheduler.DAGScheduler: Submitting Stage 1 (FilteredRDD[3] at filter at Test1.scala:20), which has no missing parents 14/12/30 23:20:21 INFO storage.MemoryStore: ensureFreeSpace(2600) called with curMem=37991, maxMem=280248975 14/12/30 23:20:21 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.5 KB, free 267.2 MB) 14/12/30 23:20:21 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 1 (FilteredRDD[3] at filter at Test1.scala:20) 14/12/30 23:20:21 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 2 tasks 14/12/30 23:20:21 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, localhost, ANY, 1264 bytes) 14/12/30 23:20:21 INFO executor.Executor: Running task 0.0 in stage 1.0 (TID 2) 14/12/30 23:20:21 INFO storage.BlockManager: Found block rdd_1_0 locally 14/12/30 23:20:21 INFO executor.Executor: Finished task 0.0 in stage 1.0 (TID 2). 1731 bytes result sent to driver 14/12/30 23:20:21 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, localhost, ANY, 1264 bytes) 14/12/30 23:20:21 INFO executor.Executor: Running task 1.0 in stage 1.0 (TID 3) 14/12/30 23:20:21 INFO storage.BlockManager: Found block rdd_1_1 locally 14/12/30 23:20:21 INFO executor.Executor: Finished task 1.0 in stage 1.0 (TID 3). 1731 bytes result sent to driver 14/12/30 23:20:21 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 7 ms on localhost (1/2) 14/12/30 23:20:21 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 16 ms on localhost (2/2) 14/12/30 23:20:21 INFO scheduler.DAGScheduler: Stage 1 (count at Test1.scala:20) finished in 0.016 s 14/12/30 23:20:21 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 14/12/30 23:20:21 INFO spark.SparkContext: Job finished: count at Test1.scala:20, took 0.041709824 s Lines with a: 24, Lines with b: 15 14/12/30 23:20:21 ERROR util.Utils: Uncaught exception in thread SparkListenerBus java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1301) at java.util.concurrent.Semaphore.acquire(Semaphore.java:317) at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:48) at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47) at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) [success] Total time: 12 s, completed Dec 30, 2014 11:20:21 PM at org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:46) 14/12/30 23:20:21 ERROR spark.ContextCleaner: Error in cleaning thread java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135) at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:136) at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:134) at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:134) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) at org.apache.spark.ContextCleaner.org $apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:133) at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:65) 14/12/30 23:20:21 INFO network.ConnectionManager: Selector thread was interrupted! 14/12/30 23:20:21 INFO storage.BlockManager: Removing broadcast 2 14/12/30 23:20:21 INFO storage.BlockManager: Removing block broadcast_2 14/12/30 23:20:21 INFO storage.MemoryStore: Block broadcast_2 of size 2600 dropped from memory (free 280210984) 14/12/30 23:20:21 INFO spark.ContextCleaner: Cleaned broadcast 2 Please let me know any pointers to debug the error. Thanks a lot.