Hi, take a look at this pull request that is not merged yet: https://github.com/apache/spark/pull/16329 . It contains examples in Java and Scala that can be helpful.
Best regards, Anton Okolnychyi On Jan 4, 2017 23:23, "Anil Langote" <anillangote0...@gmail.com> wrote: > Hi All, > > I have been working on a use case where I have a DF which has 25 columns, > 24 columns are of type string and last column is array of doubles. For a > given set of columns I have to apply group by and add the array of doubles, > I have implemented UDAF which works fine but it's expensive in order to > tune the solution I came across Aggregators which can be implemented and > used with agg function, my question is how can we implement a aggregator > which takes array of doubles as input and returns the array of double. > > I learned that it's not possible to implement the aggregator in Java can > be done in scala only how can define the aggregator which takes array of > doubles as input, note that I have parquet file as my input. > > Any pointers are highly appreciated, I read that spark UDAF is slow and > aggregators are the way to go. > > Best Regards, > > Anil Langote > > +1-425-633-9747 >