[ https://issues.apache.org/jira/browse/SPARK-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sandy Ryza updated SPARK-1632: ------------------------------ Summary: Avoid boxing in ExternalAppendOnlyMap compares (was: Avoid boxing in ExternalAppendOnlyMap.KCComparator) > Avoid boxing in ExternalAppendOnlyMap compares > ---------------------------------------------- > > Key: SPARK-1632 > URL: https://issues.apache.org/jira/browse/SPARK-1632 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 0.9.0 > Reporter: Sandy Ryza > Assignee: Sandy Ryza > > Hitting an OOME in ExternalAppendOnlyMap.KCComparator while boxing an int. I > don't know if this is the root cause, but the boxing is also avoidable. > Code: > {code} > def compare(kc1: (K, C), kc2: (K, C)): Int = { > kc1._1.hashCode().compareTo(kc2._1.hashCode()) > } > {code} > Error: > {code} > java.lang.OutOfMemoryError: GC overhead limit exceeded > at java.lang.Integer.valueOf(Integer.java:642) > at scala.Predef$.int2Integer(Predef.scala:370) > at > org.apache.spark.util.collection.ExternalAppendOnlyMap$KCComparator.compare(ExternalAppendOnlyMap.scala:432) > at > org.apache.spark.util.collection.ExternalAppendOnlyMap$KCComparator.compare(ExternalAppendOnlyMap.scala:430) > at > org.apache.spark.util.collection.AppendOnlyMap$$anon$3.compare(AppendOnlyMap.scala:271) > at java.util.TimSort.mergeLo(TimSort.java:687) > at java.util.TimSort.mergeAt(TimSort.java:483) > at java.util.TimSort.mergeCollapse(TimSort.java:410) > at java.util.TimSort.sort(TimSort.java:214) > at java.util.Arrays.sort(Arrays.java:727) > at > org.apache.spark.util.collection.AppendOnlyMap.destructiveSortedIterator(AppendOnlyMap.scala:274) > at > org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:188) > at > org.apache.spark.util.collection.ExternalAppendOnlyMap.insert(ExternalAppendOnlyMap.scala:141) > at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:59) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:96) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:95) > at org.apache.spark.rdd.RDD$$anonfun$3.apply(RDD.scala:471) > at org.apache.spark.rdd.RDD$$anonfun$3.apply(RDD.scala:471) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102) > at org.apache.spark.scheduler.Task.run(Task.scala:53) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at > org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)