I had sent out a PR [1] to fix 2), could you help to test that? [1] https://github.com/apache/spark/pull/8543
On Mon, Aug 31, 2015 at 12:34 PM, Anders Arpteg <arp...@spotify.com> wrote: > Was trying out 1.5 rc2 and noticed some issues with the Tungsten shuffle > manager. One problem was when using the com.databricks.spark.avro reader and > the error(1) was received, see stack trace below. The problem does not occur > with the "sort" shuffle manager. > > Another problem was in a large complex job with lots of transformations > occurring simultaneously, i.e. 50+ or more maps each shuffling data. > Received error(2) about inability to acquire memory which seems to also have > to do with Tungsten. Possibly some setting available to increase that > memory, because there's lots of heap memory available. > > Am running on Yarn 2.2 with about 400 executors. Hoping this will give some > hints for improving the upcoming release, or for me to get some hints to fix > the problems. > > Thanks, > Anders > > Error(1) > > 15/08/31 18:30:57 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 3387, > lon4-hadoopslave-c245.lon4.spotify.net): java.io.EOFException > > at java.io.DataInputStream.readInt(DataInputStream.java:392) > > at > org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:121) > > at > org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:109) > > at scala.collection.Iterator$$anon$13.next(Iterator.scala:372) > > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > > at > org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:30) > > at > org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43) > > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > > at > org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:366) > > at > org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.start(TungstenAggregationIterator.scala:622) > > at > org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.org$apache$spark$sql$execution$aggregate$Tung > > stenAggregate$$anonfun$$executePartition$1(TungstenAggregate.scala:110) > > at > org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119) > > at > org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119) > > at > org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:47) > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > > at org.apache.spark.scheduler.Task.run(Task.scala:88) > > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:745) > > > Error(2) > > 5/08/31 18:41:25 WARN TaskSetManager: Lost task 16.1 in stage 316.0 (TID > 32686, lon4-hadoopslave-b925.lon4.spotify.net): java.io.IOException: Unable > to acquire 67108864 bytes of memory > > at > org.apache.spark.shuffle.unsafe.UnsafeShuffleExternalSorter.acquireNewPageIfNecessary(UnsafeShuffleExternalSorter.java:385) > > at > org.apache.spark.shuffle.unsafe.UnsafeShuffleExternalSorter.insertRecord(UnsafeShuffleExternalSorter.java:435) > > at > org.apache.spark.shuffle.unsafe.UnsafeShuffleWriter.insertRecordIntoSorter(UnsafeShuffleWriter.java:246) > > at > org.apache.spark.shuffle.unsafe.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:174) > > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > > at org.apache.spark.scheduler.Task.run(Task.scala:88) > > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:745) --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org