[ 
https://issues.apache.org/jira/browse/SPARK-30379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-30379.
----------------------------------
    Fix Version/s: 3.0.0
       Resolution: Fixed

Issue resolved by pull request 27038
[https://github.com/apache/spark/pull/27038]

> Avoid OOM when using collection accumulator
> -------------------------------------------
>
>                 Key: SPARK-30379
>                 URL: https://issues.apache.org/jira/browse/SPARK-30379
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.0.0
>            Reporter: L. C. Hsieh
>            Assignee: L. C. Hsieh
>            Priority: Major
>             Fix For: 3.0.0
>
>
> One Spark job on our cluster uses collection accumulator to collect something 
> and has encountered an exception like:
> ```
> java.lang.OutOfMemoryError: Java heap space
>     at java.util.Arrays.copyOf(Arrays.java:3332)
>     at 
> java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
>     at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
>     at java.lang.StringBuilder.append(StringBuilder.java:136)
>     at java.lang.StringBuilder.append(StringBuilder.java:131)
>     at java.util.AbstractCollection.toString(AbstractCollection.java:462)
>     at 
> java.util.Collections$UnmodifiableCollection.toString(Collections.java:1035)
>     at 
> org.apache.spark.status.LiveEntityHelpers$$anonfun$newAccumulatorInfos$2$$anonfun$apply$3.apply(LiveEntity.scala:596)
>     at 
> org.apache.spark.status.LiveEntityHelpers$$anonfun$newAccumulatorInfos$2$$anonfun$apply$3.apply(LiveEntity.scala:596)
>     at scala.Option.map(Option.scala:146)
>     at 
> org.apache.spark.status.LiveEntityHelpers$$anonfun$newAccumulatorInfos$2.apply(LiveEntity.scala:596)
>     at 
> org.apache.spark.status.LiveEntityHelpers$$anonfun$newAccumulatorInfos$2.apply(LiveEntity.scala:591)
> ```
> `LiveEntityHelpers.newAccumulatorInfos` converts `AccumulableInfo`s to 
> `v1.AccumulableInfo` by calling `toString` on accumulator's value. For 
> collection accumulator, it might take much more memory when in string 
> representation, for example, collection accumulator of long values, and cause 
> OOM (in this job, the driver memory is 6g).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to