Tim Underwood created LUCENE-8834:
-------------------------------------

             Summary: Cache the SortedNumericDocValues.docValueCount() value 
whenever it is used in a loop
                 Key: LUCENE-8834
                 URL: https://issues.apache.org/jira/browse/LUCENE-8834
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Tim Underwood


While troubleshooting some multi-valued facet performance problems in Solr I 
noticed that caching the SortedNumericDocValues.docValueCount() value when used 
as a loop condition provides a performance improvement.

Specifically, going from something like this:
{code:java}
for (int i = 1; i < longs.docValueCount(); i++) {
  ...
}
{code}
to this:
{code:java}
final int docValueCount = longs.docValueCount();
for (int i = 1; i < docValueCount; i++) {
  ...
}
{code}
provides a faceting performance improvement when trying to facet using doc 
values on a multi-valued field with more than a few values per document.

This patch modifies most of the places in Lucene/Solr that were not already 
using this pattern.
h2. Unscientific Manual Benchmarks

I focused on the change to NumericFacets.java and 
FacetFieldProcessorByHashDV.java since that is what I was specifically trying 
to improve.

Details about my setup:

* Index was created using Lucene/Solr 7.6.0 (I'm in the process of upgrading to 
8.1.1)
* Total Docs: 5,736,951
*  I'm faceting on a single multi-valued field that has 63,070,176 total values 
indexed (10.99 values on average per document.)
* OpenJDK 11

h3. Lucene/Solr 7.6.0:
||Facet Type||QTime Before Patch||QTime After Patch||
|Legacy Facets|1042ms|854ms|
|JSON Facets|823ms|783ms|
h3. Lucene/Solr 8.1.1 (using the 7.6.0 index):
||Facet Type||QTime Before Patch||QTime After Patch||
|Legacy Facets|1043ms|777ms|
|JSON Facets|827ms|792ms|

The reported QTime is simply the lowest QTime I was able to get after repeating 
the query a few dozen times. So not very scientific but it was repeatable 
(removing the patch increased the times, reapplying the patch decreased the 
times).

 The patch touches both Lucene and Solr code which is why I have filed this as 
a LUCENE issue.  I can re-organized and break it apart if needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to