Was trying out 1.5 rc2 and noticed some issues with the Tungsten shuffle
manager. One problem was when using the com.databricks.spark.avro reader
and the error(1) was received, see stack trace below. The problem does not
occur with the "sort" shuffle manager.

Another problem was in a large complex job with lots of transformations
occurring simultaneously, i.e. 50+ or more maps each shuffling data.
Received error(2) about inability to acquire memory which seems to also
have to do with Tungsten. Possibly some setting available to increase that
memory, because there's lots of heap memory available.

Am running on Yarn 2.2 with about 400 executors. Hoping this will give some
hints for improving the upcoming release, or for me to get some hints to
fix the problems.

Thanks,
Anders

*Error(1) *

15/08/31 18:30:57 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID
3387, lon4-hadoopslave-c245.lon4.spotify.net): java.io.EOFException

       at java.io.DataInputStream.readInt(DataInputStream.java:392)

       at
org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:121)

       at
org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:109)

       at scala.collection.Iterator$$anon$13.next(Iterator.scala:372)

       at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)

       at
org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:30)

       at
org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)

       at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)

       at
org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:366)

       at
org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.start(TungstenAggregationIterator.scala:622)

       at
org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$
1.org$apache$spark$sql$execution$aggregate$Tung

stenAggregate$$anonfun$$executePartition$1(TungstenAggregate.scala:110)

       at
org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)

       at
org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)

       at
org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:47)

       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)

       at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)

       at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)

       at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)

       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)

       at org.apache.spark.scheduler.Task.run(Task.scala:88)

       at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)

       at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

       at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

       at java.lang.Thread.run(Thread.java:745)

*Error(2) *

5/08/31 18:41:25 WARN TaskSetManager: Lost task 16.1 in stage 316.0 (TID
32686, lon4-hadoopslave-b925.lon4.spotify.net): java.io.IOException: Unable
to acquire 67108864 bytes of memory

       at
org.apache.spark.shuffle.unsafe.UnsafeShuffleExternalSorter.acquireNewPageIfNecessary(UnsafeShuffleExternalSorter.java:385)

       at
org.apache.spark.shuffle.unsafe.UnsafeShuffleExternalSorter.insertRecord(UnsafeShuffleExternalSorter.java:435)

       at
org.apache.spark.shuffle.unsafe.UnsafeShuffleWriter.insertRecordIntoSorter(UnsafeShuffleWriter.java:246)

       at
org.apache.spark.shuffle.unsafe.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:174)

       at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)

       at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)

       at org.apache.spark.scheduler.Task.run(Task.scala:88)

       at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)

       at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

       at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       at java.lang.Thread.run(Thread.java:745)

Reply via email to