Eyal Allweil created DATAFU-173: ----------------------------------- Summary: Change UDAFS to use Aggregator instead of UserDefinedAggregateFunction Key: DATAFU-173 URL: https://issues.apache.org/jira/browse/DATAFU-173 Project: DataFu Issue Type: Improvement Reporter: Eyal Allweil Fix For: 2.0.0
Currently our UDAFs use the [UserDefinedAggregateFunction|https://spark.apache.org/docs/2.4.5/api/java/org/apache/spark/sql/expressions/UserDefinedAggregateFunction.html] class. There are two drawbacks with this: 1) It is less efficient than Aggregator 2) UserDefinedAggregateFunction is deprecated and removed from Spark 3.2.0. This story is for changing them to use [Aggregator|https://spark.apache.org/docs/3.2.0/api/java/org/apache/spark/sql/expressions/Aggregator.html]. The UDAFs are located here: [https://github.com/apache/datafu/blob/main/datafu-spark/src/main/scala/datafu/spark/SparkUDAFs.scala] Here are some links explaining how to do this: [https://stackoverflow.com/questions/48180598/spark-what-is-the-difference-between-aggregator-and-udaf] [https://stackoverflow.com/questions/66808917/apache-spark-how-to-define-a-userdefinedaggregatefunction-after-3] This change should be backwards compatible if possible; the tests in [TestSparkUDAFs|https://github.com/apache/datafu/blob/main/datafu-spark/src/test/scala/datafu/spark/TestSparkUDAFs.scala] should all still pass. -- This message was sent by Atlassian Jira (v8.20.10#820010)