Repository: spark Updated Branches: refs/heads/master 07a98ca4c -> 2402b9146
[SPARK-15702][DOCUMENTATION] Update document programming-guide accumulator section ## What changes were proposed in this pull request? Update document programming-guide accumulator section (scala language) java and python version, because the API haven't done, so I do not modify them. ## How was this patch tested? N/A Author: WeichenXu <weichenxu...@outlook.com> Closes #13441 from WeichenXu123/update_doc_accumulatorV2_clean. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2402b914 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2402b914 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2402b914 Branch: refs/heads/master Commit: 2402b91461146289a78d6cbc53ed8338543e6de7 Parents: 07a98ca Author: WeichenXu <weichenxu...@outlook.com> Authored: Wed Jun 1 12:57:02 2016 -0700 Committer: Reynold Xin <r...@databricks.com> Committed: Wed Jun 1 12:57:02 2016 -0700 ---------------------------------------------------------------------- docs/programming-guide.md | 37 +++++++++++++++++++------------------ 1 file changed, 19 insertions(+), 18 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/2402b914/docs/programming-guide.md ---------------------------------------------------------------------- diff --git a/docs/programming-guide.md b/docs/programming-guide.md index d375926..70fd627 100644 --- a/docs/programming-guide.md +++ b/docs/programming-guide.md @@ -1352,41 +1352,42 @@ The code below shows an accumulator being used to add up the elements of an arra <div data-lang="scala" markdown="1"> {% highlight scala %} -scala> val accum = sc.accumulator(0, "My Accumulator") -accum: org.apache.spark.Accumulator[Int] = 0 +scala> val accum = sc.longAccumulator("My Accumulator") +accum: org.apache.spark.util.LongAccumulator = LongAccumulator(id: 0, name: Some(My Accumulator), value: 0) -scala> sc.parallelize(Array(1, 2, 3, 4)).foreach(x => accum += x) +scala> sc.parallelize(Array(1, 2, 3, 4)).foreach(x => accum.add(x)) ... 10/09/29 18:41:08 INFO SparkContext: Tasks finished in 0.317106 s scala> accum.value -res2: Int = 10 +res2: Long = 10 {% endhighlight %} -While this code used the built-in support for accumulators of type Int, programmers can also -create their own types by subclassing [AccumulatorParam](api/scala/index.html#org.apache.spark.AccumulatorParam). -The AccumulatorParam interface has two methods: `zero` for providing a "zero value" for your data -type, and `addInPlace` for adding two values together. For example, supposing we had a `Vector` class +While this code used the built-in support for accumulators of type Long, programmers can also +create their own types by subclassing [AccumulatorV2](api/scala/index.html#org.apache.spark.AccumulatorV2). +The AccumulatorV2 abstract class has several methods which need to override: +`reset` for resetting the accumulator to zero, and `add` for add anothor value into the accumulator, `merge` for merging another same-type accumulator into this one. Other methods need to override can refer to scala API document. For example, supposing we had a `MyVector` class representing mathematical vectors, we could write: {% highlight scala %} -object VectorAccumulatorParam extends AccumulatorParam[Vector] { - def zero(initialValue: Vector): Vector = { - Vector.zeros(initialValue.size) +object VectorAccumulatorV2 extends AccumulatorV2[MyVector, MyVector] { + val vec_ : MyVector = MyVector.createZeroVector + def reset(): MyVector = { + vec_.reset() } - def addInPlace(v1: Vector, v2: Vector): Vector = { - v1 += v2 + def add(v1: MyVector, v2: MyVector): MyVector = { + vec_.add(v2) } + ... } // Then, create an Accumulator of this type: -val vecAccum = sc.accumulator(new Vector(...))(VectorAccumulatorParam) +val myVectorAcc = new VectorAccumulatorV2 +// Then, register it into spark context: +sc.register(myVectorAcc, "MyVectorAcc1") {% endhighlight %} -In Scala, Spark also supports the more general [Accumulable](api/scala/index.html#org.apache.spark.Accumulable) -interface to accumulate data where the resulting type is not the same as the elements added (e.g. build -a list by collecting together elements), and the `SparkContext.accumulableCollection` method for accumulating -common Scala collection types. +Note that, when programmers define their own type of AccumulatorV2, the resulting type can be same or not same with the elements added. </div> --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org