[ 
https://issues.apache.org/jira/browse/SPARK-21425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16090017#comment-16090017
 ] 

Ryan Williams commented on SPARK-21425:
---------------------------------------

Yea, static accumulators; I was able to reproduce it in cluster mode where 
multiple tasks on the same executor are writing to that executor's single 
instance of the same static accumulator.

I'm now thinking of this mostly as a surprise (to me!) about interactions 
between closure-serialization and accumulators… I assumed (incorrectly) that 
references to accumulators in task closures were treated specially in some way, 
but I see now that they're not.

How unexpected/undesirable this behavior is debatable; I inadvertently fell 
into a situation where I was making static accumulators while working around a 
scalatest bug, as mentioned above, but in retrospect it feels pretty contrived 
to have ended up in that situation.

Docs update, precautionary threadsafe-ing of \{Long,Double\}Accumulator, and 
taking no action all seem reasonable to me.

> LongAccumulator, DoubleAccumulator not threadsafe
> -------------------------------------------------
>
>                 Key: SPARK-21425
>                 URL: https://issues.apache.org/jira/browse/SPARK-21425
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.2.0
>            Reporter: Ryan Williams
>            Priority: Minor
>
> [AccumulatorV2 
> docs|https://github.com/apache/spark/blob/v2.2.0/core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala#L42-L43]
>  acknowledge that accumulators must be concurrent-read-safe, but afaict they 
> must also be concurrent-write-safe.
> The same docs imply that {{Int}} and {{Long}} meet either/both of these 
> criteria, when afaict they do not.
> Relatedly, the provided 
> [LongAccumulator|https://github.com/apache/spark/blob/v2.2.0/core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala#L291]
>  and 
> [DoubleAccumulator|https://github.com/apache/spark/blob/v2.2.0/core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala#L370]
>  are not thread-safe, and should be expected to behave undefinedly when 
> multiple concurrent tasks on the same executor write to them.
> [Here is a repro repo|https://github.com/ryan-williams/spark-bugs/tree/accum] 
> with some simple applications that demonstrate incorrect results from 
> {{LongAccumulator}}'s.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to