Are you running # of receivers = # machines?

TD

On Thu, Apr 9, 2015 at 9:56 AM, Saiph Kappa <saiph.ka...@gmail.com> wrote:

> Sorry, I was getting those errors because my workload was not sustainable.
>
> However, I noticed that, by just running the spark-streaming-benchmark (
> https://github.com/tdas/spark-streaming-benchmark/blob/master/Benchmark.scala
> ), I get no difference on the execution time, number of processed records,
> and delay whether I'm using 1 machine or 2 machines with the setup
> described before (using spark standalone). Is it normal?
>
>
>
> On Fri, Mar 27, 2015 at 5:32 PM, Tathagata Das <t...@databricks.com>
> wrote:
>
>> If it is deterministically reproducible, could you generate full DEBUG
>> level logs, from the driver and the workers and give it to me? Basically I
>> want to trace through what is happening to the block that is not being
>> found.
>> And can you tell what Cluster manager are you using? Spark Standalone,
>> Mesos or YARN?
>>
>>
>> On Fri, Mar 27, 2015 at 10:09 AM, Saiph Kappa <saiph.ka...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am just running this simple example with
>>> machineA: 1 master + 1 worker
>>> machineB: 1 worker
>>> «
>>>     val ssc = new StreamingContext(sparkConf, Duration(1000))
>>>
>>>     val rawStreams = (1 to numStreams).map(_
>>> =>ssc.rawSocketStream[String](host, port,
>>> StorageLevel.MEMORY_ONLY_SER)).toArray
>>>     val union = ssc.union(rawStreams)
>>>
>>>     union.filter(line => Random.nextInt(1) == 0).map(line => {
>>>       var sum = BigInt(0)
>>>       line.toCharArray.foreach(chr => sum += chr.toInt)
>>>       fib2(sum)
>>>       sum
>>>     }).reduceByWindow(_+_, Seconds(1),Seconds(1)).map(s => s"### result:
>>> $s").print()
>>> »
>>>
>>> And I'm getting the following exceptions:
>>>
>>> Log from machineB
>>> «
>>> 15/03/27 16:21:35 INFO CoarseGrainedExecutorBackend: Got assigned task
>>> 132
>>> 15/03/27 16:21:35 INFO Executor: Running task 0.0 in stage 27.0 (TID 132)
>>> 15/03/27 16:21:35 INFO CoarseGrainedExecutorBackend: Got assigned task
>>> 134
>>> 15/03/27 16:21:35 INFO Executor: Running task 2.0 in stage 27.0 (TID 134)
>>> 15/03/27 16:21:35 INFO TorrentBroadcast: Started reading broadcast
>>> variable 24
>>> 15/03/27 16:21:35 INFO CoarseGrainedExecutorBackend: Got assigned task
>>> 136
>>> 15/03/27 16:21:35 INFO Executor: Running task 4.0 in stage 27.0 (TID 136)
>>> 15/03/27 16:21:35 INFO CoarseGrainedExecutorBackend: Got assigned task
>>> 138
>>> 15/03/27 16:21:35 INFO Executor: Running task 6.0 in stage 27.0 (TID 138)
>>> 15/03/27 16:21:35 INFO CoarseGrainedExecutorBackend: Got assigned task
>>> 140
>>> 15/03/27 16:21:35 INFO Executor: Running task 8.0 in stage 27.0 (TID 140)
>>> 15/03/27 16:21:35 INFO MemoryStore: ensureFreeSpace(1886) called with
>>> curMem=47117, maxMem=280248975
>>> 15/03/27 16:21:35 INFO MemoryStore: Block broadcast_24_piece0 stored as
>>> bytes in memory (estimated size 1886.0 B, free 267.2 MB)
>>> 15/03/27 16:21:35 INFO BlockManagerMaster: Updated info of block
>>> broadcast_24_piece0
>>> 15/03/27 16:21:35 INFO TorrentBroadcast: Reading broadcast variable 24
>>> took 19 ms
>>> 15/03/27 16:21:35 INFO MemoryStore: ensureFreeSpace(3104) called with
>>> curMem=49003, maxMem=280248975
>>> 15/03/27 16:21:35 INFO MemoryStore: Block broadcast_24 stored as values
>>> in memory (estimated size 3.0 KB, free 267.2 MB)
>>> 15/03/27 16:21:35 ERROR Executor: Exception in task 8.0 in stage 27.0
>>> (TID 140)
>>> java.lang.Exception: Could not compute split, block
>>> input-0-1427473262420 not found
>>> at org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:51)
>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>>> at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>>> at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>>> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>>> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>>> at
>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>> at
>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>> at org.apache.spark.scheduler.Task.run(Task.scala:56)
>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:701)
>>> 15/03/27 16:21:35 ERROR Executor: Exception in task 6.0 in stage 27.0
>>> (TID 138)
>>> java.lang.Exception: Could not compute split, block
>>> input-0-1427473262418 not found
>>> at org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:51)
>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>>> at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>>> at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>>> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>>> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>>> at
>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>> at
>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>> at org.apache.spark.scheduler.Task.run(Task.scala:56)
>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:701)
>>> »
>>>
>>> Log from machineA
>>> «
>>> 15/03/27 16:21:35 INFO TorrentBroadcast: Reading broadcast variable 24
>>> took 15 ms
>>> 15/03/27 16:21:35 INFO MemoryStore: ensureFreeSpace(3104) called with
>>> curMem=269989249, maxMem=280248975
>>> 15/03/27 16:21:35 INFO MemoryStore: Block broadcast_24 stored as values
>>> in memory (estimated size 3.0 KB, free 9.8 MB)
>>> 15/03/27 16:21:35 ERROR Executor: Exception in task 3.0 in stage 27.0
>>> (TID 135)
>>> java.lang.Exception: Could not compute split, block
>>> input-0-1427473262415 not found
>>>         at org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:51)
>>>         at
>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>>>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>>>         at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
>>>         at
>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>>>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>>>         at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
>>>         at
>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>>>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>>>         at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>>>         at
>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>>>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>>>         at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>>>         at
>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>>>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>>>         at
>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>>         at
>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>>         at org.apache.spark.scheduler.Task.run(Task.scala:56)
>>>         at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>         at java.lang.Thread.run(Thread.java:745)
>>> »
>>>
>>> It seems that blocks are not being shared between machines. Am I doing
>>> something wrong?
>>>
>>> Thanks,
>>> Sergio
>>>
>>
>>
>

Reply via email to