[ https://issues.apache.org/jira/browse/SPARK-30379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-30379. ---------------------------------- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 27038 [https://github.com/apache/spark/pull/27038] > Avoid OOM when using collection accumulator > ------------------------------------------- > > Key: SPARK-30379 > URL: https://issues.apache.org/jira/browse/SPARK-30379 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 3.0.0 > Reporter: L. C. Hsieh > Assignee: L. C. Hsieh > Priority: Major > Fix For: 3.0.0 > > > One Spark job on our cluster uses collection accumulator to collect something > and has encountered an exception like: > ``` > java.lang.OutOfMemoryError: Java heap space > at java.util.Arrays.copyOf(Arrays.java:3332) > at > java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124) > at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448) > at java.lang.StringBuilder.append(StringBuilder.java:136) > at java.lang.StringBuilder.append(StringBuilder.java:131) > at java.util.AbstractCollection.toString(AbstractCollection.java:462) > at > java.util.Collections$UnmodifiableCollection.toString(Collections.java:1035) > at > org.apache.spark.status.LiveEntityHelpers$$anonfun$newAccumulatorInfos$2$$anonfun$apply$3.apply(LiveEntity.scala:596) > at > org.apache.spark.status.LiveEntityHelpers$$anonfun$newAccumulatorInfos$2$$anonfun$apply$3.apply(LiveEntity.scala:596) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.status.LiveEntityHelpers$$anonfun$newAccumulatorInfos$2.apply(LiveEntity.scala:596) > at > org.apache.spark.status.LiveEntityHelpers$$anonfun$newAccumulatorInfos$2.apply(LiveEntity.scala:591) > ``` > `LiveEntityHelpers.newAccumulatorInfos` converts `AccumulableInfo`s to > `v1.AccumulableInfo` by calling `toString` on accumulator's value. For > collection accumulator, it might take much more memory when in string > representation, for example, collection accumulator of long values, and cause > OOM (in this job, the driver memory is 6g). -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org