[ https://issues.apache.org/jira/browse/SPARK-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583931#comment-14583931 ]
Herman van Hovell tot Westerflier commented on SPARK-4233: ---------------------------------------------------------- Ok, so not exactly simplification, but more a performance improvement. I can live with that. If I understand you and the code correctly, you want remove all state from the aggregates expressions/functions into the engine. The engine will also be responsible for distinct functionality. Aggregates in essence move from blackbox to (more) whitebox implementations. If you look at the GenerateAggregate, a similar approach is actually taken. I do think this is the right way of doing this, and it might - as a bonus - make writing aggregates a bit easier. I still have a few questions: - How does the new design allow for CodeGeneration? It seems that this will one of the major improvement areas for 1.5. I think this is also an opportunity to also add genCode a-like methods to the aggregates. We would need four though: initialize, add, merge & terminate... - How much of this this still needed, given the fact that CodeGen will be enabled by default for 1.5 (conjecture...)? Shouldn't we just focus on supplying codeGen functionality for each aggregate? This approach will not cover Hive UDAFs and custom aggregates... We could process Hive Aggregates in a different engine... > Simplify the Aggregation Function implementation > ------------------------------------------------ > > Key: SPARK-4233 > URL: https://issues.apache.org/jira/browse/SPARK-4233 > Project: Spark > Issue Type: Sub-task > Components: SQL > Reporter: Cheng Hao > > Currently, the UDAF implementation is quite complicated, and we have to > provide distinct & non-distinct version. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org