Matt Cheah created SPARK-13335: ---------------------------------- Summary: Optimize collect_list and collect_set with declarative aggregates Key: SPARK-13335 URL: https://issues.apache.org/jira/browse/SPARK-13335 Project: Spark Issue Type: Improvement Reporter: Matt Cheah
Based on discussion from SPARK-9301, we can optimize collect_set and collect_list with declarative aggregate expressions, as opposed to using Hive UDAFs. The problem with Hive UDAFs is that they require converting the data items from catalyst types back to external types repeatedly. We can get around this by implementing declarative aggregate expressions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org