On the PR review, there were questions about adding a new aggregating class, and whether or not Aggregator[IN,BUF,OUT] could be used. I added a proof of concept solution based on enhancing Aggregator to the pull-req: https://github.com/apache/spark/pull/25024/
I wrote up my findings on the PR but the gist is that Aggregator is a feasible option, however it does not provide *total* feature parity with UDAF. Note that this PR now includes two candidate solutions, for comparison purposes, as well as an extra test file (tdigest.scala). Eventually one of these solutions will be removed, depending on what option is selected. I'm pushing this forward now with the goal of getting a solution into the upcoming 3.0 branch cut On Wed, Mar 27, 2019 at 4:19 PM Erik Erlandson <eerla...@redhat.com> wrote: > I describe some of the details here: > https://issues.apache.org/jira/browse/SPARK-27296 > > The short version of the story is that aggregating data structures (UDTs) > used by UDAFs are serialized to a Row object, and de-serialized, for every > row in a data frame. > Cheers, > Erik > >