GitHub user mpjlu opened a pull request:
https://github.com/apache/spark/pull/18620
[MINOR][ML][MLLIB] add poll function for BoundedPriorityQueue
## What changes were proposed in this pull request?
The most of BoundedPriorityQueue usages in ML/MLLIB are:
Get the value of BoundedPriorityQueue, then sort it.
For example, in Word2Vec: pq.toSeq.sortBy(-_._2)
in ALS, pq.toArray.sorted()
The test results show using pq.poll is much faster than sort the value.
It is good to add the poll function for BoundedPriorityQueue.
## How was this patch tested?
The existing UT
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mpjlu/spark add-poll
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/18620.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #18620
----
commit 5c80798a99bb330e508469b64f45d974bb1184bb
Author: Peng Meng <[email protected]>
Date: 2017-07-13T07:33:45Z
add poll for PriorityQueue
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]