Re: failure to parallelize an RDD

Alex Gittens Tue, 12 Jan 2016 22:56:01 -0800

I'm using Spark 1.5.1

When I turned on DEBUG, I don't see anything that looks useful. Other than
the INFO outputs, there is a ton of RPC message related logs, and this bit:


16/01/13 05:53:43 DEBUG ClosureCleaner: +++ Cleaning closure <function1>
(org.apache.spark.rdd.RDD$$anonfun$count$1) +++
16/01/13 05:53:43 DEBUG ClosureCleaner:  + declared fields: 1
16/01/13 05:53:43 DEBUG ClosureCleaner:      public static final long
org.apache.spark.rdd.RDD$$anonfun$count$1.serialVersionUID
16/01/13 05:53:43 DEBUG ClosureCleaner:  + declared methods: 2
16/01/13 05:53:43 DEBUG ClosureCleaner:      public final java.lang.Object
org.apache.spark.rdd.RDD$$anonfun$count$1.apply(java.lang.Object)
16/01/13 05:53:43 DEBUG ClosureCleaner:      public final long
org.apache.spark.rdd.RDD$$anonfun$count$1.apply(scala.collection.Iterator)
16/01/13 05:53:43 DEBUG ClosureCleaner:  + inner classes: 0
16/01/13 05:53:43 DEBUG ClosureCleaner:  + outer classes: 0
16/01/13 05:53:43 DEBUG ClosureCleaner:  + outer objects: 0
16/01/13 05:53:43 DEBUG ClosureCleaner:  + populating accessed fields
because this is the starting closure
16/01/13 05:53:43 DEBUG ClosureCleaner:  + fields accessed by starting
closure: 0
16/01/13 05:53:43 DEBUG ClosureCleaner:  + there are no enclosing objects!
16/01/13 05:53:43 DEBUG ClosureCleaner:  +++ closure <function1>
(org.apache.spark.rdd.RDD$$anonfun$count$1) is now cleaned +++
16/01/13 05:53:43 DEBUG ClosureCleaner: +++ Cleaning closure <function2>
(org.apache.spark.SparkContext$$anonfun$runJob$5) +++
16/01/13 05:53:43 DEBUG ClosureCleaner:  + declared fields: 2
16/01/13 05:53:43 DEBUG ClosureCleaner:      public static final long
org.apache.spark.SparkContext$$anonfun$runJob$5.serialVersionUID
16/01/13 05:53:43 DEBUG ClosureCleaner:      private final scala.Function1
org.apache.spark.SparkContext$$anonfun$runJob$5.cleanedFunc$1
16/01/13 05:53:43 DEBUG ClosureCleaner:  + declared methods: 2
16/01/13 05:53:43 DEBUG ClosureCleaner:      public final java.lang.Object
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(java.lang.Object,java.lang.Object)
16/01/13 05:53:43 DEBUG ClosureCleaner:      public final java.lang.Object
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(org.apache.spark.TaskContext,scala.collection.Iterator)
16/01/13 05:53:43 DEBUG ClosureCleaner:  + inner classes: 0
16/01/13 05:53:43 DEBUG ClosureCleaner:  + outer classes: 0
16/01/13 05:53:43 DEBUG ClosureCleaner:  + outer objects: 0
16/01/13 05:53:43 DEBUG ClosureCleaner:  + populating accessed fields
because this is the starting closure
16/01/13 05:53:43 DEBUG ClosureCleaner:  + fields accessed by starting
closure: 0
16/01/13 05:53:43 DEBUG ClosureCleaner:  + there are no enclosing objects!
16/01/13 05:53:43 DEBUG ClosureCleaner:  +++ closure <function2>
(org.apache.spark.SparkContext$$anonfun$runJob$5) is now cleaned +++
16/01/13 05:53:43 INFO SparkContext: Starting job: count at
transposeAvroToAvroChunks.scala:129
16/01/13 05:53:43 INFO DAGScheduler: Got job 3 (count at
transposeAvroToAvroChunks.scala:129) with 928 output partitions
16/01/13 05:53:43 INFO DAGScheduler: Final stage: ResultStage 3(count at
transposeAvroToAvroChunks.scala:129)


On Tue, Jan 12, 2016 at 6:41 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> Which release of Spark are you using ?
>
> Can you turn on DEBUG logging to see if there is more clue ?
>
> Thanks
>
> On Tue, Jan 12, 2016 at 6:37 PM, AlexG <swift...@gmail.com> wrote:
>
>> I transpose a matrix (colChunkOfA) stored as a 200-by-54843210 as an
>> array of
>> rows in Array[Array[Float]] format into another matrix (rowChunk) also
>> stored row-wise as a 54843210-by-200 Array[Array[Float]] using the
>> following
>> code:
>>
>> val rowChunk = new Array[Tuple2[Int,Array[Float]]](numCols)
>> val colIndices = (0 until colChunkOfA.length).toArray
>>
>> (0 until numCols).foreach( rowIdx => {
>>   rowChunk(rowIdx) = Tuple2(rowIdx,
>> colIndices.map(colChunkOfA(_)(rowIdx)))
>> })
>>
>> This succeeds, but the following code which attempts to turn rowChunk into
>> an RDD fails silently: spark-submit just ends, and none of the executor
>> logs
>> indicate any errors occurring.
>>
>> val parallelRowChunkRDD = sc.parallelize(rowChunk).cache
>> parallelRowChunkRDD.count
>>
>> What is the culprit here?
>>
>> Here is the log output starting from the count instruction:
>>
>> 16/01/13 02:23:38 INFO SparkContext: Starting job: count at
>> transposeAvroToAvroChunks.scala:129
>> 16/01/13 02:23:38 INFO DAGScheduler: Got job 3 (count at
>> transposeAvroToAvroChunks.scala:129) with 928 output partitions
>> 16/01/13 02:23:38 INFO DAGScheduler: Final stage: ResultStage 3(count at
>> transposeAvroToAvroChunks.scala:129)
>> 16/01/13 02:23:38 INFO DAGScheduler: Parents of final stage: List()
>> 16/01/13 02:23:38 INFO DAGScheduler: Missing parents: List()
>> 16/01/13 02:23:38 INFO DAGScheduler: Submitting ResultStage 3
>> (ParallelCollectionRDD[2448] at parallelize at
>> transposeAvroToAvroChunks.scala:128), which has no missing parents
>> 16/01/13 02:23:38 INFO MemoryStore: ensureFreeSpace(1048) called with
>> curMem=50917367, maxMem=127452201615
>> 16/01/13 02:23:38 INFO MemoryStore: Block broadcast_615 stored as values
>> in
>> memory (estimated size 1048.0 B, free 118.7 GB)
>> 16/01/13 02:23:38 INFO MemoryStore: ensureFreeSpace(740) called with
>> curMem=50918415, maxMem=127452201615
>> 16/01/13 02:23:38 INFO MemoryStore: Block broadcast_615_piece0 stored as
>> bytes in memory (estimated size 740.0 B, free 118.7 GB)
>> 16/01/13 02:23:38 INFO BlockManagerInfo: Added broadcast_615_piece0 in
>> memory on 172.31.36.112:36581 (size: 740.0 B, free: 118.7 GB)
>> 16/01/13 02:23:38 INFO SparkContext: Created broadcast 615 from broadcast
>> at
>> DAGScheduler.scala:861
>> 16/01/13 02:23:38 INFO DAGScheduler: Submitting 928 missing tasks from
>> ResultStage 3 (ParallelCollectionRDD[2448] at parallelize at
>> transposeAvroToAvroChunks.scala:128)
>> 16/01/13 02:23:38 INFO TaskSchedulerImpl: Adding task set 3.0 with 928
>> tasks
>> 16/01/13 02:23:39 WARN TaskSetManager: Stage 3 contains a task of very
>> large
>> size (47027 KB). The maximum recommended task size is 100 KB.
>> 16/01/13 02:23:39 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID
>> 1219, 172.31.34.184, PROCESS_LOCAL, 48156290 bytes)
>> ...
>> 16/01/13 02:27:13 INFO TaskSetManager: Starting task 927.0 in stage 3.0
>> (TID
>> 2146, 172.31.42.67, PROCESS_LOCAL, 48224789 bytes)
>> 16/01/13 02:27:17 INFO BlockManagerInfo: Removed broadcast_419_piece0 on
>> 172.31.36.112:36581 in memory (size: 17.4 KB, free: 118.7 GB)
>> 16/01/13 02:27:21 INFO BlockManagerInfo: Removed broadcast_419_piece0 on
>> 172.31.35.157:51059 in memory (size: 17.4 KB, free: 10.4 GB)
>> 16/01/13 02:27:21 INFO BlockManagerInfo: Removed broadcast_419_piece0 on
>> 172.31.47.118:34888 in memory (size: 17.4 KB, free: 10.4 GB)
>> 16/01/13 02:27:22 INFO BlockManagerInfo: Removed broadcast_419_piece0 on
>> 172.31.38.42:48582 in memory (size: 17.4 KB, free: 10.4 GB)
>> 16/01/13 02:27:38 INFO BlockManagerInfo: Added broadcast_615_piece0 in
>> memory on 172.31.41.68:59281 (size: 740.0 B, free: 10.4 GB)
>> 16/01/13 02:27:55 INFO BlockManagerInfo: Added broadcast_615_piece0 in
>> memory on 172.31.47.118:59575 (size: 740.0 B, free: 10.4 GB)
>> 16/01/13 02:28:47 INFO BlockManagerInfo: Added broadcast_615_piece0 in
>> memory on 172.31.40.24:55643 (size: 740.0 B, free: 10.4 GB)
>> 16/01/13 02:28:49 INFO BlockManagerInfo: Added broadcast_615_piece0 in
>> memory on 172.31.47.118:53671 (size: 740.0 B, free: 10.4 GB)
>>
>> This is the end of the log, so it looks like all 928 tasks got started,
>> but
>> presumably somewhere in running, they ran into an error. Nothing shows up
>> in
>> the executor logs.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/failure-to-parallelize-an-RDD-tp25950.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Re: failure to parallelize an RDD

Reply via email to