L. C. Hsieh created SPARK-30379: ----------------------------------- Summary: Avoid OOM when using collection accumulator Key: SPARK-30379 URL: https://issues.apache.org/jira/browse/SPARK-30379 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.0.0 Reporter: L. C. Hsieh Assignee: L. C. Hsieh
One Spark job on our cluster uses collection accumulator to collect something and has encountered an exception like: ``` java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448) at java.lang.StringBuilder.append(StringBuilder.java:136) at java.lang.StringBuilder.append(StringBuilder.java:131) at java.util.AbstractCollection.toString(AbstractCollection.java:462) at java.util.Collections$UnmodifiableCollection.toString(Collections.java:1035) at org.apache.spark.status.LiveEntityHelpers$$anonfun$newAccumulatorInfos$2$$anonfun$apply$3.apply(LiveEntity.scala:596) at org.apache.spark.status.LiveEntityHelpers$$anonfun$newAccumulatorInfos$2$$anonfun$apply$3.apply(LiveEntity.scala:596) at scala.Option.map(Option.scala:146) at org.apache.spark.status.LiveEntityHelpers$$anonfun$newAccumulatorInfos$2.apply(LiveEntity.scala:596) at org.apache.spark.status.LiveEntityHelpers$$anonfun$newAccumulatorInfos$2.apply(LiveEntity.scala:591) ``` `LiveEntityHelpers.newAccumulatorInfos` converts `AccumulableInfo`s to `v1.AccumulableInfo` by calling `toString` on accumulator's value. For collection accumulator, it might take much more memory when in string representation, for example, collection accumulator of long values, and cause OOM (in this job, the driver memory is 6g). -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org