[ 
https://issues.apache.org/jira/browse/SPARK-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583931#comment-14583931
 ] 

Herman van Hovell tot Westerflier commented on SPARK-4233:
----------------------------------------------------------

Ok, so not exactly simplification, but more a performance improvement. I can 
live with that.

If I understand you and the code correctly, you want remove all state from the 
aggregates expressions/functions into the engine. The engine will also be 
responsible for distinct functionality. Aggregates in essence move from 
blackbox to (more) whitebox implementations. If you look at the 
GenerateAggregate, a similar approach is actually taken. I do think this is the 
right way of doing this, and it might - as a bonus - make writing aggregates a 
bit easier.

I still have a few questions:
- How does the new design allow for CodeGeneration? It seems that this will one 
of the major improvement areas for 1.5. I think this is also an opportunity to 
also add genCode a-like methods to the aggregates. We would need four though: 
initialize, add, merge & terminate...
- How much of this this still needed, given the fact that CodeGen will be 
enabled by default for 1.5 (conjecture...)?  Shouldn't we just focus on 
supplying codeGen functionality for each aggregate? This approach will not 
cover Hive UDAFs and custom aggregates... We could process Hive Aggregates in a 
different engine...



> Simplify the Aggregation Function implementation
> ------------------------------------------------
>
>                 Key: SPARK-4233
>                 URL: https://issues.apache.org/jira/browse/SPARK-4233
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Cheng Hao
>
> Currently, the UDAF implementation is quite complicated, and we have to 
> provide distinct & non-distinct version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to