Hi, I recently extended the Spark SQL programming guide to cover user-defined aggregations, where I modified existing variables and returned them back in reduce and merge. This approach worked and it was approved by people who know the context.
Hope that helps. 2017-01-29 17:17 GMT+01:00 Koert Kuipers <ko...@tresata.com>: > anyone? > it not i will follow the trail and try to deduce it myself > > On Mon, Jan 23, 2017 at 2:31 PM, Koert Kuipers <ko...@tresata.com> wrote: > >> looking at the docs for org.apache.spark.sql.expressions.Aggregator it >> says for reduce method: "For performance, the function may modify `b` and >> return it instead of constructing new object for b.". >> >> it makes no such comment for the merge method. >> >> this is surprising to me because i know that for >> PairRDDFunctions.aggregateByKey mutation is allowed in both seqOp and >> combOp (which are the equivalents of reduce and merge in Aggregator). >> >> is it safe to mutate b1 and return it in Aggregator.merge? >> >> >