[ 
https://issues.apache.org/jira/browse/SPARK-9301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15121275#comment-15121275
 ] 

Justin Uang commented on SPARK-9301:
------------------------------------

Do we have a plan on how to implement these in native spark sql? I imagine that 
this code will have terrible performance implications, since every time we do 
update(), we're probably doing a full copy of the array/seq.

{code}
class MyUDAF extends UserDefinedAggregateFunction {
  override def inputSchema: StructType = StructType(List(StructField("input", 
StringType)))

  override def update(buffer: MutableAggregationBuffer, input: Row): Unit = {
    buffer.update(0, input.get(0) +: buffer.getSeq(0))
  }

  override def bufferSchema: StructType = StructType(List(StructField("list", 
ArrayType(StringType))))

  override def merge(buffer1: MutableAggregationBuffer, buffer2: Row): Unit = {
    buffer1.update(0, buffer1.getSeq(0) ++ buffer2.getSeq(0))
  }

  override def initialize(buffer: MutableAggregationBuffer): Unit = {
    buffer.update(0, Array())
  }

  override def deterministic: Boolean = true

  override def evaluate(buffer: Row): Any = {
    buffer.get(0)
  }

  override def dataType: DataType = ArrayType(StringType)
}
{code}

> collect_set and collect_list aggregate functions
> ------------------------------------------------
>
>                 Key: SPARK-9301
>                 URL: https://issues.apache.org/jira/browse/SPARK-9301
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Yin Huai
>            Assignee: Nick Buroojy
>            Priority: Critical
>             Fix For: 1.6.0
>
>
> A short introduction on how to build aggregate functions based on our new 
> interface can be found at 
> https://issues.apache.org/jira/browse/SPARK-4366?focusedCommentId=14639921&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14639921.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to