[GitHub] spark pull request: [SPARK-9301] [SQL] Add collect_set aggregate f...

nburoojy Fri, 06 Nov 2015 11:38:46 -0800

Github user nburoojy commented on the pull request:

    https://github.com/apache/spark/pull/8592#issuecomment-154511621
  
    Thanks for the review @marmbrus !
    
    I've sent https://github.com/apache/spark/pull/9526 with your suggestion to 
alias the Hive UDAFs. And I'd like to include it in the 1.6 release.
    
    Longer-term (beyond 1.6) I'd like to solve the core issue.
    For my particular use case I would like the ability to aggregate compound 
types (struct and array), and it appears Hive 0.13.0 does not support this.
    
    What kind of major changes would we have to make to support O(1) array 
insertion? I was thinking that a strategy like 
[CollectHashSet](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala)
 uses would also work here; that is, I would implement a `CompactBufferUDT` 
(backed by 
[CompactBuffer](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/collection/CompactBuffer.scala)),
 and `updateExpressions` would append to the buffer in amortized `O(1)`.
    
    Would this strategy break assumptions in the new aggregation framework? Do 
you think this change is larger than I expect?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9301] [SQL] Add collect_set aggregate f...

Reply via email to