[
https://issues.apache.org/jira/browse/LUCENE-6828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946187#comment-14946187
]
Shai Erera commented on LUCENE-6828:
------------------------------------
I read the post [~toke], very interesting! I've got a couple of comments.
First, if you want to avoid the micro benchmark, you could implement your own
Collector, copying most of TopScoreDocsCollector's logic and use the packed
HitQueue version. That will compare end-to-end query performance, which is
better than the micro benchmark in that I believe the majority of the time
spent during search is at traversing posting lists, reading up DocValues values
and computing the scores, and *not* sorting the heap. So I think it'd be nice
to see how all 3 compare in an end-to-end query. I don't know how easy it is to
implement it in Solr (that is, how a custom Collector can easily be
plugged-in), but in Lucene it's straightforward. In Solr though you will be
able to compare other facets such as deep paging and grouping, that others
mentioned on this issue.
About Sentinel values: those were added in LUCENE-1593 with the purpose of
avoiding the "is the queue full" checks in the collector's code. At the time it
showed improvements, but the code has changed a lot since. Also, once any
ScoreDoc object is added to the queue, it stays there and its values are
modified in case a better ScoreDoc should replace it. Therefore GC-wise, there
are only X ScoreDoc objects allocated (where X is the same as top-X). In your
post I wasn't sure if you thought that the sentinel values are discarded and
new ones allocated instead, so just wanted to clarify that.
I also think that we may not need to choose a one-queue-to-rule-them-all
solution here. What about adding a VeryLargeTopScoreDocsCollector which Solr,
and maybe even Lucene's {{searcher.search(q, numHits)}} API can do so
automatically, uses when X is too large (100K taking an example from your
post). It will use a packed HitQueue, it can even just throw the results in
unsorted and heap-sort them if needed (or merge-sort in the end). It only needs
to expose a TopDocs-like API. If we need to, let's make it so it can extend
TopDocsCollector directly (such that you won't have to use a PQ at all).
That is all still pending end-to-end query benchmark results. If the Sentinel
approach is better for small X, and the packed for large X, let's make the
choice dynamically in the code, so users get the best performance per their
search request.
> Speed up requests for many rows
> -------------------------------
>
> Key: LUCENE-6828
> URL: https://issues.apache.org/jira/browse/LUCENE-6828
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/search
> Affects Versions: 4.10.4, 5.3
> Reporter: Toke Eskildsen
> Priority: Minor
> Labels: memory, performance
>
> Standard relevance ranked searches for top-X results uses the HitQueue class
> to keep track of the highest scoring documents. The HitQueue is a binary heap
> of ScoreDocs and is pre-filled with sentinel objects upon creation.
> Binary heaps of Objects in Java does not scale well: The HitQueue uses 28
> bytes/element and memory access is scattered due to the binary heap algorithm
> and the use of Objects. To make matters worse, the use of sentinel objects
> means that even if only a tiny number of documents matches, the full amount
> of Objects is still allocated.
> As long as the HitQueue is small (< 1000), it performs very well. If top-1M
> results are requested, it performs poorly and leaves 1M ScoreDocs to be
> garbage collected.
> An alternative is to replace the ScoreDocs with a single array of packed
> longs, each long holding the score and the document ID. This strategy
> requires only 8 bytes/element and is a lot lighter on the GC.
> Some preliminary tests has been done and published at
> https://sbdevel.wordpress.com/2015/10/05/speeding-up-core-search/
> These indicate that a long[]-backed implementation is at least 3x faster than
> vanilla HitDocs for top-1M requests.
> For smaller requests, such as top-10, the packed version also seems
> competitive, when the amount of matched documents exceeds 1M. This needs to
> be investigated further.
> Going forward with this idea requires some refactoring as Lucene is currently
> hardwired to the abstract PriorityQueue. Before attempting this, it seems
> prudent to discuss whether speeding up large top-X requests has any value?
> Paging seems an obvious contender for requesting large result sets, but I
> guess the two could work in tandem, opening up for efficient large pages.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]