Sorry, I was getting those errors because my workload was not sustainable. However, I noticed that, by just running the spark-streaming-benchmark ( https://github.com/tdas/spark-streaming-benchmark/blob/master/Benchmark.scala ), I get no difference on the execution time, number of processed records, and delay whether I'm using 1 machine or 2 machines with the setup described before (using spark standalone). Is it normal?
On Fri, Mar 27, 2015 at 5:32 PM, Tathagata Das <t...@databricks.com> wrote: > If it is deterministically reproducible, could you generate full DEBUG > level logs, from the driver and the workers and give it to me? Basically I > want to trace through what is happening to the block that is not being > found. > And can you tell what Cluster manager are you using? Spark Standalone, > Mesos or YARN? > > > On Fri, Mar 27, 2015 at 10:09 AM, Saiph Kappa <saiph.ka...@gmail.com> > wrote: > >> Hi, >> >> I am just running this simple example with >> machineA: 1 master + 1 worker >> machineB: 1 worker >> « >> val ssc = new StreamingContext(sparkConf, Duration(1000)) >> >> val rawStreams = (1 to numStreams).map(_ >> =>ssc.rawSocketStream[String](host, port, >> StorageLevel.MEMORY_ONLY_SER)).toArray >> val union = ssc.union(rawStreams) >> >> union.filter(line => Random.nextInt(1) == 0).map(line => { >> var sum = BigInt(0) >> line.toCharArray.foreach(chr => sum += chr.toInt) >> fib2(sum) >> sum >> }).reduceByWindow(_+_, Seconds(1),Seconds(1)).map(s => s"### result: >> $s").print() >> » >> >> And I'm getting the following exceptions: >> >> Log from machineB >> « >> 15/03/27 16:21:35 INFO CoarseGrainedExecutorBackend: Got assigned task 132 >> 15/03/27 16:21:35 INFO Executor: Running task 0.0 in stage 27.0 (TID 132) >> 15/03/27 16:21:35 INFO CoarseGrainedExecutorBackend: Got assigned task 134 >> 15/03/27 16:21:35 INFO Executor: Running task 2.0 in stage 27.0 (TID 134) >> 15/03/27 16:21:35 INFO TorrentBroadcast: Started reading broadcast >> variable 24 >> 15/03/27 16:21:35 INFO CoarseGrainedExecutorBackend: Got assigned task 136 >> 15/03/27 16:21:35 INFO Executor: Running task 4.0 in stage 27.0 (TID 136) >> 15/03/27 16:21:35 INFO CoarseGrainedExecutorBackend: Got assigned task 138 >> 15/03/27 16:21:35 INFO Executor: Running task 6.0 in stage 27.0 (TID 138) >> 15/03/27 16:21:35 INFO CoarseGrainedExecutorBackend: Got assigned task 140 >> 15/03/27 16:21:35 INFO Executor: Running task 8.0 in stage 27.0 (TID 140) >> 15/03/27 16:21:35 INFO MemoryStore: ensureFreeSpace(1886) called with >> curMem=47117, maxMem=280248975 >> 15/03/27 16:21:35 INFO MemoryStore: Block broadcast_24_piece0 stored as >> bytes in memory (estimated size 1886.0 B, free 267.2 MB) >> 15/03/27 16:21:35 INFO BlockManagerMaster: Updated info of block >> broadcast_24_piece0 >> 15/03/27 16:21:35 INFO TorrentBroadcast: Reading broadcast variable 24 >> took 19 ms >> 15/03/27 16:21:35 INFO MemoryStore: ensureFreeSpace(3104) called with >> curMem=49003, maxMem=280248975 >> 15/03/27 16:21:35 INFO MemoryStore: Block broadcast_24 stored as values >> in memory (estimated size 3.0 KB, free 267.2 MB) >> 15/03/27 16:21:35 ERROR Executor: Exception in task 8.0 in stage 27.0 >> (TID 140) >> java.lang.Exception: Could not compute split, block input-0-1427473262420 >> not found >> at org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:51) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) >> at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) >> at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) >> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) >> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >> at org.apache.spark.scheduler.Task.run(Task.scala:56) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:701) >> 15/03/27 16:21:35 ERROR Executor: Exception in task 6.0 in stage 27.0 >> (TID 138) >> java.lang.Exception: Could not compute split, block input-0-1427473262418 >> not found >> at org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:51) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) >> at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) >> at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) >> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) >> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >> at org.apache.spark.scheduler.Task.run(Task.scala:56) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:701) >> » >> >> Log from machineA >> « >> 15/03/27 16:21:35 INFO TorrentBroadcast: Reading broadcast variable 24 >> took 15 ms >> 15/03/27 16:21:35 INFO MemoryStore: ensureFreeSpace(3104) called with >> curMem=269989249, maxMem=280248975 >> 15/03/27 16:21:35 INFO MemoryStore: Block broadcast_24 stored as values >> in memory (estimated size 3.0 KB, free 9.8 MB) >> 15/03/27 16:21:35 ERROR Executor: Exception in task 3.0 in stage 27.0 >> (TID 135) >> java.lang.Exception: Could not compute split, block input-0-1427473262415 >> not found >> at org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:51) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) >> at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) >> at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) >> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) >> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >> at org.apache.spark.scheduler.Task.run(Task.scala:56) >> at >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> » >> >> It seems that blocks are not being shared between machines. Am I doing >> something wrong? >> >> Thanks, >> Sergio >> > >