Matt Cheah created SPARK-13335:
----------------------------------

             Summary: Optimize collect_list and collect_set with declarative 
aggregates
                 Key: SPARK-13335
                 URL: https://issues.apache.org/jira/browse/SPARK-13335
             Project: Spark
          Issue Type: Improvement
            Reporter: Matt Cheah


Based on discussion from SPARK-9301, we can optimize collect_set and 
collect_list with declarative aggregate expressions, as opposed to using Hive 
UDAFs. The problem with Hive UDAFs is that they require converting the data 
items from catalyst types back to external types repeatedly. We can get around 
this by implementing declarative aggregate expressions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to