[ 
https://issues.apache.org/jira/browse/SPARK-18275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-18275.
-------------------------------
    Resolution: Not A Problem

Questions should go to user@

Priority queues are not necessarily sorted and in fact are generally not sorted 
because they use a heap representation internally. It's not true that inserts 
are O(log k) if you are maintaining a sorted data structure.

> Why does not use an ordered queue in takeOrdered?
> -------------------------------------------------
>
>                 Key: SPARK-18275
>                 URL: https://issues.apache.org/jira/browse/SPARK-18275
>             Project: Spark
>          Issue Type: Question
>          Components: Spark Core
>    Affects Versions: 2.0.1
>            Reporter: xubo245
>            Priority: Minor
>
> Every partition in mapRDDs is defined as BoundedPriorityQueue object :
> val queue = new BoundedPriorityQueue[T](num)(ord.reverse)
> in org.apache.spark.rdd.RDD#takeOrdered
> After mapRDDs.reduce,only return one queue and the queue is 
> BoundedPriorityQueue ,so  after toArray , Is it necessary to use sorted in 
> takeOrdered? If we can keep the queue is ordered,we can only use reverse
> The same as in org.apache.spark.util.collection.Utils#takeOrdered, the 
> leastOf method also use a unordered buffer ,why does not use a ordered queue? 
> we can insert a num in O(log k) ,but the traditional quickselect algorithm 
> take  O(k) time. Also we do not need a sort after selecting and  save  O(k * 
> log k)
> the leastOf is call 
> com.google.common.collect.Ordering#leastOf(java.util.Iterator<E>, int)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to