Sorry, I was getting those errors because my workload was not sustainable.

However, I noticed that, by just running the spark-streaming-benchmark (
https://github.com/tdas/spark-streaming-benchmark/blob/master/Benchmark.scala
), I get no difference on the execution time, number of processed records,
and delay whether I'm using 1 machine or 2 machines with the setup
described before (using spark standalone). Is it normal?



On Fri, Mar 27, 2015 at 5:32 PM, Tathagata Das <t...@databricks.com> wrote:

> If it is deterministically reproducible, could you generate full DEBUG
> level logs, from the driver and the workers and give it to me? Basically I
> want to trace through what is happening to the block that is not being
> found.
> And can you tell what Cluster manager are you using? Spark Standalone,
> Mesos or YARN?
>
>
> On Fri, Mar 27, 2015 at 10:09 AM, Saiph Kappa <saiph.ka...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I am just running this simple example with
>> machineA: 1 master + 1 worker
>> machineB: 1 worker
>> «
>>     val ssc = new StreamingContext(sparkConf, Duration(1000))
>>
>>     val rawStreams = (1 to numStreams).map(_
>> =>ssc.rawSocketStream[String](host, port,
>> StorageLevel.MEMORY_ONLY_SER)).toArray
>>     val union = ssc.union(rawStreams)
>>
>>     union.filter(line => Random.nextInt(1) == 0).map(line => {
>>       var sum = BigInt(0)
>>       line.toCharArray.foreach(chr => sum += chr.toInt)
>>       fib2(sum)
>>       sum
>>     }).reduceByWindow(_+_, Seconds(1),Seconds(1)).map(s => s"### result:
>> $s").print()
>> »
>>
>> And I'm getting the following exceptions:
>>
>> Log from machineB
>> «
>> 15/03/27 16:21:35 INFO CoarseGrainedExecutorBackend: Got assigned task 132
>> 15/03/27 16:21:35 INFO Executor: Running task 0.0 in stage 27.0 (TID 132)
>> 15/03/27 16:21:35 INFO CoarseGrainedExecutorBackend: Got assigned task 134
>> 15/03/27 16:21:35 INFO Executor: Running task 2.0 in stage 27.0 (TID 134)
>> 15/03/27 16:21:35 INFO TorrentBroadcast: Started reading broadcast
>> variable 24
>> 15/03/27 16:21:35 INFO CoarseGrainedExecutorBackend: Got assigned task 136
>> 15/03/27 16:21:35 INFO Executor: Running task 4.0 in stage 27.0 (TID 136)
>> 15/03/27 16:21:35 INFO CoarseGrainedExecutorBackend: Got assigned task 138
>> 15/03/27 16:21:35 INFO Executor: Running task 6.0 in stage 27.0 (TID 138)
>> 15/03/27 16:21:35 INFO CoarseGrainedExecutorBackend: Got assigned task 140
>> 15/03/27 16:21:35 INFO Executor: Running task 8.0 in stage 27.0 (TID 140)
>> 15/03/27 16:21:35 INFO MemoryStore: ensureFreeSpace(1886) called with
>> curMem=47117, maxMem=280248975
>> 15/03/27 16:21:35 INFO MemoryStore: Block broadcast_24_piece0 stored as
>> bytes in memory (estimated size 1886.0 B, free 267.2 MB)
>> 15/03/27 16:21:35 INFO BlockManagerMaster: Updated info of block
>> broadcast_24_piece0
>> 15/03/27 16:21:35 INFO TorrentBroadcast: Reading broadcast variable 24
>> took 19 ms
>> 15/03/27 16:21:35 INFO MemoryStore: ensureFreeSpace(3104) called with
>> curMem=49003, maxMem=280248975
>> 15/03/27 16:21:35 INFO MemoryStore: Block broadcast_24 stored as values
>> in memory (estimated size 3.0 KB, free 267.2 MB)
>> 15/03/27 16:21:35 ERROR Executor: Exception in task 8.0 in stage 27.0
>> (TID 140)
>> java.lang.Exception: Could not compute split, block input-0-1427473262420
>> not found
>> at org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:51)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>> at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>> at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>> at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>> at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>> at org.apache.spark.scheduler.Task.run(Task.scala:56)
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:701)
>> 15/03/27 16:21:35 ERROR Executor: Exception in task 6.0 in stage 27.0
>> (TID 138)
>> java.lang.Exception: Could not compute split, block input-0-1427473262418
>> not found
>> at org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:51)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>> at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>> at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>> at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>> at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>> at org.apache.spark.scheduler.Task.run(Task.scala:56)
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:701)
>> »
>>
>> Log from machineA
>> «
>> 15/03/27 16:21:35 INFO TorrentBroadcast: Reading broadcast variable 24
>> took 15 ms
>> 15/03/27 16:21:35 INFO MemoryStore: ensureFreeSpace(3104) called with
>> curMem=269989249, maxMem=280248975
>> 15/03/27 16:21:35 INFO MemoryStore: Block broadcast_24 stored as values
>> in memory (estimated size 3.0 KB, free 9.8 MB)
>> 15/03/27 16:21:35 ERROR Executor: Exception in task 3.0 in stage 27.0
>> (TID 135)
>> java.lang.Exception: Could not compute split, block input-0-1427473262415
>> not found
>>         at org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:51)
>>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>>         at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
>>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>>         at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
>>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>>         at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>>         at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>>         at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>         at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>         at org.apache.spark.scheduler.Task.run(Task.scala:56)
>>         at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>         at java.lang.Thread.run(Thread.java:745)
>> »
>>
>> It seems that blocks are not being shared between machines. Am I doing
>> something wrong?
>>
>> Thanks,
>> Sergio
>>
>
>

Reply via email to