Executor memory requirement for reduceByKey

Sung Hwan Chung Fri, 13 May 2016 12:16:02 -0700

Hello,

I'm using Spark version 1.6.0 and have trouble with memory when trying to
do reducebykey on a dataset with as many as 75 million keys. I.e. I get the
following exception when I run the task.


There are 20 workers in the cluster. It is running under the standalone
mode with 12 GB assigned per executor and 4 cores per worker. The
spark.memory.fraction is set to 0.5 and I'm not using any caching.

What might be the problem here? Since I'm using the version 1.6.0, this
doesn't seem to be related to  SPARK-12155. This problem always happens
during the shuffle read phase.

Is there a minimum  amount of memory required for executor
(spark.memory.fraction) for shuffle read?

java.lang.OutOfMemoryError: Unable to acquire 262144 bytes of memory, got 0
        at 
org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:91)
        at 
org.apache.spark.unsafe.map.BytesToBytesMap.allocate(BytesToBytesMap.java:735)
        at 
org.apache.spark.unsafe.map.BytesToBytesMap.<init>(BytesToBytesMap.java:197)
        at 
org.apache.spark.unsafe.map.BytesToBytesMap.<init>(BytesToBytesMap.java:212)
        at 
org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.<init>(UnsafeFixedWidthAggregationMap.java:103)
        at 
org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.<init>(TungstenAggregationIterator.scala:483)
        at 
org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:95)
        at 
org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)
        at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
        at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Executor memory requirement for reduceByKey

Reply via email to