Github user attilapiros commented on a diff in the pull request: https://github.com/apache/spark/pull/21971#discussion_r207898987 --- Diff: core/src/main/scala/org/apache/spark/rdd/AsyncRDDActions.scala --- @@ -61,6 +62,36 @@ class AsyncRDDActions[T: ClassTag](self: RDD[T]) extends Serializable with Loggi (index, data) => results(index) = data, results.flatten.toSeq) } + + /** + * Returns a future of an aggregation across the RDD. + * + * @see [[RDD.aggregate]] which is the synchronous version of this method. + */ + def aggregateAsync[U](zeroValue: U)(seqOp: (U, T) => U, combOp: (U, U) => U): FutureAction[U] = + self.withScope { --- End diff -- IMHO one reason could be the rule of fail fast and early. As the cloning uses serialisation/deserialisation which anyway needed for sending the neutral element to the executors this way serialisation of the neutral element is tested when the operator is specified not during the lazy execution in the middle of a long chain of operations.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org