[ https://issues.apache.org/jira/browse/SPARK-17691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15647445#comment-15647445 ]
Jayadevan M commented on SPARK-17691: ------------------------------------- I am interested to take look into this. > Add aggregate function to collect list with maximum number of elements > ---------------------------------------------------------------------- > > Key: SPARK-17691 > URL: https://issues.apache.org/jira/browse/SPARK-17691 > Project: Spark > Issue Type: New Feature > Reporter: Assaf Mendelson > Priority: Minor > > One of the aggregate functions we have today is the collect_list function. > This is a useful tool to do a "catch all" aggregation which doesn't really > fit anywhere else. > The problem with collect_list is that it is unbounded. I would like to see a > means to do a collect_list where we limit the maximum number of elements. > I would see that the input for this would be the maximum number of elements > to use and the method of choosing (pick whatever, pick the top N, pick the > bottom B) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org