[ 
https://issues.apache.org/jira/browse/SPARK-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated SPARK-1632:
------------------------------

    Summary: Avoid boxing in ExternalAppendOnlyMap compares  (was: Avoid boxing 
in ExternalAppendOnlyMap.KCComparator)

> Avoid boxing in ExternalAppendOnlyMap compares
> ----------------------------------------------
>
>                 Key: SPARK-1632
>                 URL: https://issues.apache.org/jira/browse/SPARK-1632
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 0.9.0
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>
> Hitting an OOME in ExternalAppendOnlyMap.KCComparator while boxing an int.  I 
> don't know if this is the root cause, but the boxing is also avoidable.
> Code:
> {code}
>     def compare(kc1: (K, C), kc2: (K, C)): Int = {
>       kc1._1.hashCode().compareTo(kc2._1.hashCode())
>     }
> {code}
> Error:
> {code}
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>      at java.lang.Integer.valueOf(Integer.java:642)
>      at scala.Predef$.int2Integer(Predef.scala:370)
>      at 
> org.apache.spark.util.collection.ExternalAppendOnlyMap$KCComparator.compare(ExternalAppendOnlyMap.scala:432)
>      at 
> org.apache.spark.util.collection.ExternalAppendOnlyMap$KCComparator.compare(ExternalAppendOnlyMap.scala:430)
>      at 
> org.apache.spark.util.collection.AppendOnlyMap$$anon$3.compare(AppendOnlyMap.scala:271)
>      at java.util.TimSort.mergeLo(TimSort.java:687)
>      at java.util.TimSort.mergeAt(TimSort.java:483)
>      at java.util.TimSort.mergeCollapse(TimSort.java:410)
>      at java.util.TimSort.sort(TimSort.java:214)
>      at java.util.Arrays.sort(Arrays.java:727)
>      at 
> org.apache.spark.util.collection.AppendOnlyMap.destructiveSortedIterator(AppendOnlyMap.scala:274)
>      at 
> org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:188)
>      at 
> org.apache.spark.util.collection.ExternalAppendOnlyMap.insert(ExternalAppendOnlyMap.scala:141)
>      at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:59)
>      at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:96)
>      at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:95)
>      at org.apache.spark.rdd.RDD$$anonfun$3.apply(RDD.scala:471)
>      at org.apache.spark.rdd.RDD$$anonfun$3.apply(RDD.scala:471)
>      at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34)
>      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
>      at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
>      at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161)
>      at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)
>      at org.apache.spark.scheduler.Task.run(Task.scala:53)
>      at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
>      at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
>      at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41)
>      at java.security.AccessController.doPrivileged(Native Method)
>      at javax.security.auth.Subject.doAs(Subject.java:415)
>      at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>      at 
> org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41)
>      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to