Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10835#discussion_r50778840
  
    --- Diff: core/src/main/scala/org/apache/spark/Accumulable.scala ---
    @@ -35,40 +35,67 @@ import org.apache.spark.util.Utils
      * [[org.apache.spark.Accumulator]]. They won't always be the same, though 
-- e.g., imagine you are
      * accumulating a set. You will add items to the set, and you will union 
two sets together.
      *
    + * All accumulators created on the driver to be used on the executors must 
be registered with
    + * [[Accumulators]]. This is already done automatically for accumulators 
created by the user.
    + * Internal accumulators must be explicitly registered by the caller.
    + *
    + * Operations are not thread-safe.
    + *
    + * @param id ID of this accumulator; for internal use only.
      * @param initialValue initial value of accumulator
      * @param param helper object defining how to add elements of type `R` and 
`T`
      * @param name human-readable name for use in Spark's web UI
      * @param internal if this [[Accumulable]] is internal. Internal 
[[Accumulable]]s will be reported
      *                 to the driver via heartbeats. For internal 
[[Accumulable]]s, `R` must be
      *                 thread safe so that they can be reported correctly.
    + * @param countFailedValues whether to accumulate values from failed 
tasks. This is set to true
    --- End diff --
    
    This reminds me: I believe that there's a paragraph in the Spark 
Programming Guide which describes the current accumulator semantics (sorta); it 
would be good to add new documentation for theses metric-style accumulators, 
since I imagine that a number of users would be interested in taking advantage 
of this new opt-in semantic in their own code.
    
    We shouldn't block merging this patch on those doc updates (since I'm 
worried about merge-conflict potential and want to get this in sooner than 
later), but let's file a followup ticket so we don't forget.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to