Yes this is known and an issue for performance. Do you have any thoughts on how to fix this?
On Wed, Mar 27, 2019 at 4:19 PM Erik Erlandson <eerla...@redhat.com> wrote: > I describe some of the details here: > https://issues.apache.org/jira/browse/SPARK-27296 > > The short version of the story is that aggregating data structures (UDTs) > used by UDAFs are serialized to a Row object, and de-serialized, for every > row in a data frame. > Cheers, > Erik > >