[ https://issues.apache.org/jira/browse/FLINK-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704498#comment-14704498 ]
Stephan Ewen commented on FLINK-2549: ------------------------------------- You can implement topK on top if sort()/first(). It will be much less efficient then it could be, though. In that strategy, you need to sort the whole input, which is computationally more intensive and may need to spill to disk for large data. Using a heap, you can simply always keep the lowest k elements. That way, you avoid the sort operations for most elements (that can be immediately discarded) and require little memory (only for k elements), most likely never spilling. > Add topK operator for DataSet > ----------------------------- > > Key: FLINK-2549 > URL: https://issues.apache.org/jira/browse/FLINK-2549 > Project: Flink > Issue Type: New Feature > Components: Core, Java API, Scala API > Reporter: Chengxiang Li > Assignee: Chengxiang Li > Priority: Minor > > topK is a common operation for user, it would be great to have it in Flink. -- This message was sent by Atlassian JIRA (v6.3.4#6332)