[ https://issues.apache.org/jira/browse/SPARK-9301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15121275#comment-15121275 ]
Justin Uang commented on SPARK-9301: ------------------------------------ Do we have a plan on how to implement these in native spark sql? I imagine that this code will have terrible performance implications, since every time we do update(), we're probably doing a full copy of the array/seq. {code} class MyUDAF extends UserDefinedAggregateFunction { override def inputSchema: StructType = StructType(List(StructField("input", StringType))) override def update(buffer: MutableAggregationBuffer, input: Row): Unit = { buffer.update(0, input.get(0) +: buffer.getSeq(0)) } override def bufferSchema: StructType = StructType(List(StructField("list", ArrayType(StringType)))) override def merge(buffer1: MutableAggregationBuffer, buffer2: Row): Unit = { buffer1.update(0, buffer1.getSeq(0) ++ buffer2.getSeq(0)) } override def initialize(buffer: MutableAggregationBuffer): Unit = { buffer.update(0, Array()) } override def deterministic: Boolean = true override def evaluate(buffer: Row): Any = { buffer.get(0) } override def dataType: DataType = ArrayType(StringType) } {code} > collect_set and collect_list aggregate functions > ------------------------------------------------ > > Key: SPARK-9301 > URL: https://issues.apache.org/jira/browse/SPARK-9301 > Project: Spark > Issue Type: Sub-task > Components: SQL > Reporter: Yin Huai > Assignee: Nick Buroojy > Priority: Critical > Fix For: 1.6.0 > > > A short introduction on how to build aggregate functions based on our new > interface can be found at > https://issues.apache.org/jira/browse/SPARK-4366?focusedCommentId=14639921&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14639921. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org