[ https://issues.apache.org/jira/browse/DATAFU-173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eyal Allweil resolved DATAFU-173. --------------------------------- Resolution: Fixed This fix was reviewed and merged in the following pull request: https://github.com/apache/datafu/pull/41 > Change UDAFS to use Aggregator instead of UserDefinedAggregateFunction > ---------------------------------------------------------------------- > > Key: DATAFU-173 > URL: https://issues.apache.org/jira/browse/DATAFU-173 > Project: DataFu > Issue Type: Improvement > Reporter: Eyal Allweil > Assignee: Eyal Allweil > Priority: Major > Fix For: 2.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Currently our UDAFs use the > [UserDefinedAggregateFunction|https://spark.apache.org/docs/2.4.5/api/java/org/apache/spark/sql/expressions/UserDefinedAggregateFunction.html] > class. There are two drawbacks with this: > 1) It is less efficient than Aggregator > 2) UserDefinedAggregateFunction is deprecated and removed from Spark 3.2.0. > > This story is for changing them to use > [Aggregator|https://spark.apache.org/docs/3.2.0/api/java/org/apache/spark/sql/expressions/Aggregator.html]. > > The UDAFs are located here: > [https://github.com/apache/datafu/blob/main/datafu-spark/src/main/scala/datafu/spark/SparkUDAFs.scala] > > Here are some links explaining how to do this: > [https://stackoverflow.com/questions/48180598/spark-what-is-the-difference-between-aggregator-and-udaf] > [https://stackoverflow.com/questions/66808917/apache-spark-how-to-define-a-userdefinedaggregatefunction-after-3] > > This change should be backwards compatible if possible; the tests in > [TestSparkUDAFs|https://github.com/apache/datafu/blob/main/datafu-spark/src/test/scala/datafu/spark/TestSparkUDAFs.scala] > should all still pass. -- This message was sent by Atlassian Jira (v8.20.10#820010)