Hello guys, I've started to migrate my Spark jobs which use Accumulators V1 to AccumulatorV2 and faced with the following issues:
1. LegacyAccumulatorWrapper now requires the resulting type of AccumulableParam to implement equals. In other case the AccumulableParam, automatically wrapped into LegacyAccumulatorWrapper, will fail with AssertionError (SPARK-23697 [1]). 2. Existing AccumulatorV2 classes are hardly difficult to extend easily and correctly (SPARK-24154 [2]) due to its "copy" method which is called during serialization and usually loses type information of descendant classes which don't override "copy" (and it's easier to implement an accumulator from scratch than override it correctly) 3. The same instance of AccumulatorV2 cannot be used with the same SparkContext multiple times (unlike AccumulableParam) failing with "IllegalStateException: Cannot register an Accumulator twice" even after "reset" method called. So it's impossible to unregister already registered accumulator from user code. 4. AccumulableParam (V1) implementations are usually more or less stateless, while AccumulatorV2 implementations are almost always stateful, leading to (unnecessary?) type checks (unlike AccumulableParam). For example typical "merge" method of AccumulatorV2 requires to check whether current accumulator is of an appropriate type, like here [3] 5. AccumulatorV2 is more difficult to implement correctly unlike AccumulableParam. For example, in case of AccumulableParam I have to implement just 3 methods (addAccumulator, addInPlace, zero), in case of AccumulableParam - just 2 methods (addInPlace, zero) and in case of AccumulatorV2 - 6 methods (isZero, copy, reset, add, merge, value) 6. AccumulatorV2 classes are hardly possible to be anonymous classes, because of their "copy" and "merge" methods which typically require a concrete class to make a type check. I understand the motivation for AccumulatorV2 (SPARK-14654 [4]), but just wondering whether there is a way to simplify the API of AccumulatorV2 to meet the points described above and to be less error prone? [1] https://issues.apache.org/jira/browse/SPARK-23697 [2] https://issues.apache.org/jira/browse/SPARK-24154 [3] https://github.com/apache/spark/blob/4f5bad615b47d743b8932aea1071652293981604/core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala#L348 [4] https://issues.apache.org/jira/browse/SPARK-14654 --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org