[ https://issues.apache.org/jira/browse/SPARK-18275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-18275. ------------------------------- Resolution: Not A Problem Questions should go to user@ Priority queues are not necessarily sorted and in fact are generally not sorted because they use a heap representation internally. It's not true that inserts are O(log k) if you are maintaining a sorted data structure. > Why does not use an ordered queue in takeOrdered? > ------------------------------------------------- > > Key: SPARK-18275 > URL: https://issues.apache.org/jira/browse/SPARK-18275 > Project: Spark > Issue Type: Question > Components: Spark Core > Affects Versions: 2.0.1 > Reporter: xubo245 > Priority: Minor > > Every partition in mapRDDs is defined as BoundedPriorityQueue object : > val queue = new BoundedPriorityQueue[T](num)(ord.reverse) > in org.apache.spark.rdd.RDD#takeOrdered > After mapRDDs.reduce,only return one queue and the queue is > BoundedPriorityQueue ,so after toArray , Is it necessary to use sorted in > takeOrdered? If we can keep the queue is ordered,we can only use reverse > The same as in org.apache.spark.util.collection.Utils#takeOrdered, the > leastOf method also use a unordered buffer ,why does not use a ordered queue? > we can insert a num in O(log k) ,but the traditional quickselect algorithm > take O(k) time. Also we do not need a sort after selecting and save O(k * > log k) > the leastOf is call > com.google.common.collect.Ordering#leastOf(java.util.Iterator<E>, int) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org