I had sent out a PR [1] to fix 2), could you help to test that?

[1]  https://github.com/apache/spark/pull/8543

On Mon, Aug 31, 2015 at 12:34 PM, Anders Arpteg <arp...@spotify.com> wrote:
> Was trying out 1.5 rc2 and noticed some issues with the Tungsten shuffle
> manager. One problem was when using the com.databricks.spark.avro reader and
> the error(1) was received, see stack trace below. The problem does not occur
> with the "sort" shuffle manager.
>
> Another problem was in a large complex job with lots of transformations
> occurring simultaneously, i.e. 50+ or more maps each shuffling data.
> Received error(2) about inability to acquire memory which seems to also have
> to do with Tungsten. Possibly some setting available to increase that
> memory, because there's lots of heap memory available.
>
> Am running on Yarn 2.2 with about 400 executors. Hoping this will give some
> hints for improving the upcoming release, or for me to get some hints to fix
> the problems.
>
> Thanks,
> Anders
>
> Error(1)
>
> 15/08/31 18:30:57 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 3387,
> lon4-hadoopslave-c245.lon4.spotify.net): java.io.EOFException
>
>        at java.io.DataInputStream.readInt(DataInputStream.java:392)
>
>        at
> org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:121)
>
>        at
> org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:109)
>
>        at scala.collection.Iterator$$anon$13.next(Iterator.scala:372)
>
>        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>
>        at
> org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:30)
>
>        at
> org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
>
>        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>
>        at
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:366)
>
>        at
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.start(TungstenAggregationIterator.scala:622)
>
>        at
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.org$apache$spark$sql$execution$aggregate$Tung
>
> stenAggregate$$anonfun$$executePartition$1(TungstenAggregate.scala:110)
>
>        at
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
>
>        at
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
>
>        at
> org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:47)
>
>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>
>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>
>        at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>
>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>
>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>
>        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>
>        at org.apache.spark.scheduler.Task.run(Task.scala:88)
>
>        at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>
>        at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
>        at java.lang.Thread.run(Thread.java:745)
>
>
> Error(2)
>
> 5/08/31 18:41:25 WARN TaskSetManager: Lost task 16.1 in stage 316.0 (TID
> 32686, lon4-hadoopslave-b925.lon4.spotify.net): java.io.IOException: Unable
> to acquire 67108864 bytes of memory
>
>        at
> org.apache.spark.shuffle.unsafe.UnsafeShuffleExternalSorter.acquireNewPageIfNecessary(UnsafeShuffleExternalSorter.java:385)
>
>        at
> org.apache.spark.shuffle.unsafe.UnsafeShuffleExternalSorter.insertRecord(UnsafeShuffleExternalSorter.java:435)
>
>        at
> org.apache.spark.shuffle.unsafe.UnsafeShuffleWriter.insertRecordIntoSorter(UnsafeShuffleWriter.java:246)
>
>        at
> org.apache.spark.shuffle.unsafe.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:174)
>
>        at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>
>        at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>
>        at org.apache.spark.scheduler.Task.run(Task.scala:88)
>
>        at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>
>        at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
>        at java.lang.Thread.run(Thread.java:745)

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to