[jira] [Commented] (SPARK-9301) collect_set and collect_list aggregate functions

2016-01-28 Thread Herman van Hovell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15122291#comment-15122291
 ] 

Herman van Hovell commented on SPARK-9301:
--

You could implement this as an {{ImperativeAggregate}} make sure it does not 
support partial aggregation (override {{supportsPartial}}) and maintain state 
in the class itself. Look at {{org.apache.spark.sql.hive.HiveUDAFFunction}} for 
an example. It won't be quick but it should work (as long as the size of the 
size of the collection doesn't cause OOMEs).

> collect_set and collect_list aggregate functions
> 
>
> Key: SPARK-9301
> URL: https://issues.apache.org/jira/browse/SPARK-9301
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Nick Buroojy
>Priority: Critical
> Fix For: 1.6.0
>
>
> A short introduction on how to build aggregate functions based on our new 
> interface can be found at 
> https://issues.apache.org/jira/browse/SPARK-4366?focusedCommentId=14639921&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14639921.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9301) collect_set and collect_list aggregate functions

2016-01-28 Thread Cristian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15121968#comment-15121968
 ] 

Cristian commented on SPARK-9301:
-

Seconded, looks like MutableAggregationBuffer is not so mutable after all, 
everything gets converted to catalyst types and back everytime, which makes it 
impossible to implement anything that collects a larger amount of data to 
evaluate later.

> collect_set and collect_list aggregate functions
> 
>
> Key: SPARK-9301
> URL: https://issues.apache.org/jira/browse/SPARK-9301
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Nick Buroojy
>Priority: Critical
> Fix For: 1.6.0
>
>
> A short introduction on how to build aggregate functions based on our new 
> interface can be found at 
> https://issues.apache.org/jira/browse/SPARK-4366?focusedCommentId=14639921&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14639921.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9301) collect_set and collect_list aggregate functions

2016-01-28 Thread Justin Uang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15121476#comment-15121476
 ] 

Justin Uang commented on SPARK-9301:


Yea, my workaround has been json'ifying the struct into a string first, then 
doing the aggregate, then unpacking it, which is obviously very unideal. Also, 
using Hive makes my unit tests take 25 seconds to start up, instead of 3 
seconds.

> collect_set and collect_list aggregate functions
> 
>
> Key: SPARK-9301
> URL: https://issues.apache.org/jira/browse/SPARK-9301
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Nick Buroojy
>Priority: Critical
> Fix For: 1.6.0
>
>
> A short introduction on how to build aggregate functions based on our new 
> interface can be found at 
> https://issues.apache.org/jira/browse/SPARK-4366?focusedCommentId=14639921&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14639921.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9301) collect_set and collect_list aggregate functions

2016-01-28 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SPARK-9301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15121473#comment-15121473
 ] 

Maciej BryƄski commented on SPARK-9301:
---

Moreover version from Hive doesn't work with struct types.
https://issues.apache.org/jira/browse/SPARK-10605

> collect_set and collect_list aggregate functions
> 
>
> Key: SPARK-9301
> URL: https://issues.apache.org/jira/browse/SPARK-9301
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Nick Buroojy
>Priority: Critical
> Fix For: 1.6.0
>
>
> A short introduction on how to build aggregate functions based on our new 
> interface can be found at 
> https://issues.apache.org/jira/browse/SPARK-4366?focusedCommentId=14639921&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14639921.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9301) collect_set and collect_list aggregate functions

2016-01-28 Thread Justin Uang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15121275#comment-15121275
 ] 

Justin Uang commented on SPARK-9301:


Do we have a plan on how to implement these in native spark sql? I imagine that 
this code will have terrible performance implications, since every time we do 
update(), we're probably doing a full copy of the array/seq.

{code}
class MyUDAF extends UserDefinedAggregateFunction {
  override def inputSchema: StructType = StructType(List(StructField("input", 
StringType)))

  override def update(buffer: MutableAggregationBuffer, input: Row): Unit = {
buffer.update(0, input.get(0) +: buffer.getSeq(0))
  }

  override def bufferSchema: StructType = StructType(List(StructField("list", 
ArrayType(StringType

  override def merge(buffer1: MutableAggregationBuffer, buffer2: Row): Unit = {
buffer1.update(0, buffer1.getSeq(0) ++ buffer2.getSeq(0))
  }

  override def initialize(buffer: MutableAggregationBuffer): Unit = {
buffer.update(0, Array())
  }

  override def deterministic: Boolean = true

  override def evaluate(buffer: Row): Any = {
buffer.get(0)
  }

  override def dataType: DataType = ArrayType(StringType)
}
{code}

> collect_set and collect_list aggregate functions
> 
>
> Key: SPARK-9301
> URL: https://issues.apache.org/jira/browse/SPARK-9301
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Nick Buroojy
>Priority: Critical
> Fix For: 1.6.0
>
>
> A short introduction on how to build aggregate functions based on our new 
> interface can be found at 
> https://issues.apache.org/jira/browse/SPARK-4366?focusedCommentId=14639921&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14639921.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9301) collect_set and collect_list aggregate functions

2015-11-06 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994262#comment-14994262
 ] 

Apache Spark commented on SPARK-9301:
-

User 'nburoojy' has created a pull request for this issue:
https://github.com/apache/spark/pull/9526

> collect_set and collect_list aggregate functions
> 
>
> Key: SPARK-9301
> URL: https://issues.apache.org/jira/browse/SPARK-9301
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>
> A short introduction on how to build aggregate functions based on our new 
> interface can be found at 
> https://issues.apache.org/jira/browse/SPARK-4366?focusedCommentId=14639921&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14639921.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9301) collect_set and collect_list aggregate functions

2015-09-21 Thread Nick Buroojy (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14900823#comment-14900823
 ] 

Nick Buroojy commented on SPARK-9301:
-

I sent a pull request to add these aggregates on the new api; however, I now 
see that this may be blocked by SPARK-9830 
(https://issues.apache.org/jira/browse/SPARK-4366?focusedCommentId=14728451&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14728451).

Let me know if the next step on this is to wait for the blocking change.


> collect_set and collect_list aggregate functions
> 
>
> Key: SPARK-9301
> URL: https://issues.apache.org/jira/browse/SPARK-9301
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>
> A short introduction on how to build aggregate functions based on our new 
> interface can be found at 
> https://issues.apache.org/jira/browse/SPARK-4366?focusedCommentId=14639921&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14639921.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9301) collect_set and collect_list aggregate functions

2015-09-03 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729852#comment-14729852
 ] 

Apache Spark commented on SPARK-9301:
-

User 'nburoojy' has created a pull request for this issue:
https://github.com/apache/spark/pull/8592

> collect_set and collect_list aggregate functions
> 
>
> Key: SPARK-9301
> URL: https://issues.apache.org/jira/browse/SPARK-9301
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>
> A short introduction on how to build aggregate functions based on our new 
> interface can be found at 
> https://issues.apache.org/jira/browse/SPARK-4366?focusedCommentId=14639921&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14639921.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org