Eyal Allweil created DATAFU-173:
-----------------------------------

             Summary: Change UDAFS to use Aggregator instead of 
UserDefinedAggregateFunction
                 Key: DATAFU-173
                 URL: https://issues.apache.org/jira/browse/DATAFU-173
             Project: DataFu
          Issue Type: Improvement
            Reporter: Eyal Allweil
             Fix For: 2.0.0


Currently our UDAFs use the 
[UserDefinedAggregateFunction|https://spark.apache.org/docs/2.4.5/api/java/org/apache/spark/sql/expressions/UserDefinedAggregateFunction.html]
 class. There are two drawbacks with this:

1) It is less efficient than Aggregator

2) UserDefinedAggregateFunction is deprecated and removed from Spark 3.2.0.

 

This story is for changing them to use 
[Aggregator|https://spark.apache.org/docs/3.2.0/api/java/org/apache/spark/sql/expressions/Aggregator.html].

 

The UDAFs are located here:

[https://github.com/apache/datafu/blob/main/datafu-spark/src/main/scala/datafu/spark/SparkUDAFs.scala]

 

Here are some links explaining how to do this:

[https://stackoverflow.com/questions/48180598/spark-what-is-the-difference-between-aggregator-and-udaf]

[https://stackoverflow.com/questions/66808917/apache-spark-how-to-define-a-userdefinedaggregatefunction-after-3]

 

This change should be backwards compatible if possible; the tests in 
[TestSparkUDAFs|https://github.com/apache/datafu/blob/main/datafu-spark/src/test/scala/datafu/spark/TestSparkUDAFs.scala]
 should all still pass.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to