[jira] [Commented] (LUCENE-8796) Use exponential search in IntArrayDocIdSet advance method

2019-06-18 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16866437#comment-16866437
 ] 

ASF subversion and git services commented on LUCENE-8796:
-

Commit 327a6dfeb45e2443d5c2f325441a6b4eb18e096b in lucene-solr's branch 
refs/heads/branch_8x from Luca Cavanna
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=327a6df ]

LUCENE-8796: Use exponential search in IntArrayDocIdSetIterator#advance (#667)




> Use exponential search in IntArrayDocIdSet advance method
> -
>
> Key: LUCENE-8796
> URL: https://issues.apache.org/jira/browse/LUCENE-8796
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Luca Cavanna
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Chatting with [~jpountz] , he suggested to improve IntArrayDocIdSet by making 
> its advance method use exponential search instead of binary search. This 
> should help performance of queries including conjunctions: given that 
> ConjunctionDISI uses leap frog, it advances through doc ids in small steps, 
> hence exponential search should be faster when advancing on average compared 
> to binary search.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8796) Use exponential search in IntArrayDocIdSet advance method

2019-06-18 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16866345#comment-16866345
 ] 

ASF subversion and git services commented on LUCENE-8796:
-

Commit 4fd09eb3e386d000ac9e8871c4a5178e66476540 in lucene-solr's branch 
refs/heads/master from Luca Cavanna
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4fd09eb ]

LUCENE-8796: Use exponential search in IntArrayDocIdSetIterator#advance (#667)




> Use exponential search in IntArrayDocIdSet advance method
> -
>
> Key: LUCENE-8796
> URL: https://issues.apache.org/jira/browse/LUCENE-8796
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Luca Cavanna
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Chatting with [~jpountz] , he suggested to improve IntArrayDocIdSet by making 
> its advance method use exponential search instead of binary search. This 
> should help performance of queries including conjunctions: given that 
> ConjunctionDISI uses leap frog, it advances through doc ids in small steps, 
> hence exponential search should be faster when advancing on average compared 
> to binary search.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8796) Use exponential search in IntArrayDocIdSet advance method

2019-06-04 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855534#comment-16855534
 ] 

Adrien Grand commented on LUCENE-8796:
--

Hmm it's disappointing it's not making things noticeably faster, but in the 
other hand your patch provides a better worst-case cost, in the case that 
advance is always called by very small intervals. If I'm not mistaken, this 
worst-case scenario runs in O(cost * log(cost)) today without your patch and 
O(cost) with your patch.

> Use exponential search in IntArrayDocIdSet advance method
> -
>
> Key: LUCENE-8796
> URL: https://issues.apache.org/jira/browse/LUCENE-8796
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Luca Cavanna
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Chatting with [~jpountz] , he suggested to improve IntArrayDocIdSet by making 
> its advance method use exponential search instead of binary search. This 
> should help performance of queries including conjunctions: given that 
> ConjunctionDISI uses leap frog, it advances through doc ids in small steps, 
> hence exponential search should be faster when advancing on average compared 
> to binary search.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8796) Use exponential search in IntArrayDocIdSet advance method

2019-05-31 Thread Luca Cavanna (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852876#comment-16852876
 ] 

Luca Cavanna commented on LUCENE-8796:
--

I updated the PR and addressed all the comments, here are the latest benchmark 
results:

{noformat}
Report after iter 19:
TaskQPS baseline  StdDevQPS my_modified_version  
StdDevPct diff
 MedTerm 1510.74  (6.8%) 1457.20  (8.4%)   
-3.5% ( -17% -   12%)
  Fuzzy1   70.49  (8.5%)   68.11  (9.8%)   
-3.4% ( -19% -   16%)
OrHighNotMed  650.57  (5.8%)  629.81  (6.0%)   
-3.2% ( -14% -9%)
   OrHighLow  447.13  (4.2%)  433.05  (4.5%)   
-3.2% ( -11% -5%)
OrNotHighMed  623.22  (6.3%)  605.19  (6.1%)   
-2.9% ( -14% -   10%)
OrHighNotLow  720.89  (7.0%)  701.26  (7.9%)   
-2.7% ( -16% -   13%)
   OrNotHighHigh  558.43  (6.3%)  544.82  (4.9%)   
-2.4% ( -12% -9%)
 LowTerm 1279.34  (4.9%) 1248.60  (5.2%)   
-2.4% ( -11% -8%)
  AndHighLow  690.75  (4.0%)  675.22  (5.3%)   
-2.2% ( -11% -7%)
   LowPhrase  358.90  (2.3%)  351.28  (4.0%)   
-2.1% (  -8% -4%)
PKLookup  139.97  (3.0%)  137.32  (3.5%)   
-1.9% (  -8% -4%)
OrNotHighLow  728.48  (6.8%)  714.79  (6.5%)   
-1.9% ( -14% -   12%)
HighTerm 1222.38  (6.3%) 1199.77  (7.1%)   
-1.8% ( -14% -   12%)
 AndHighHigh   58.93  (6.2%)   58.01  (5.8%)   
-1.6% ( -12% -   11%)
 Prefix3  152.21  (4.5%)  150.00  (5.0%)   
-1.5% ( -10% -8%)
   IntNRQConjMedTerm   79.15 (10.7%)   78.06 (10.5%)   
-1.4% ( -20% -   22%)
   HighTermDayOfYearSort   95.28  (5.1%)   94.10  (7.8%)   
-1.2% ( -13% -   12%)
Wildcard   64.23  (2.3%)   63.45  (2.3%)   
-1.2% (  -5% -3%)
 MedSpanNear   81.15  (2.2%)   80.19  (2.8%)   
-1.2% (  -6% -3%)
HighSpanNear   10.20  (3.9%)   10.08  (4.2%)   
-1.2% (  -8% -7%)
HighIntervalsOrdered4.07  (1.8%)4.03  (2.2%)   
-1.1% (  -4% -2%)
 LowSpanNear   41.62  (3.1%)   41.20  (3.6%)   
-1.0% (  -7% -5%)
   IntNRQConjLowTerm   20.36  (4.1%)   20.15  (4.5%)   
-1.0% (  -9% -7%)
  IntNRQConjHighTerm   64.84  (9.6%)   64.21  (9.4%)   
-1.0% ( -18% -   19%)
  AndHighMed  229.08  (2.8%)  227.00  (2.5%)   
-0.9% (  -6% -4%)
   MedPhrase   18.73  (1.5%)   18.57  (2.3%)   
-0.8% (  -4% -2%)
 LowSloppyPhrase  124.52  (2.3%)  123.48  (2.6%)   
-0.8% (  -5% -4%)
 Respell   69.26  (3.0%)   68.68  (2.9%)   
-0.8% (  -6% -5%)
  HighPhrase   12.98  (1.6%)   12.88  (2.2%)   
-0.7% (  -4% -3%)
   PrefixConjLowTerm   42.11  (2.6%)   41.81  (3.0%)   
-0.7% (  -6% -5%)
   OrHighNotHigh  680.34  (6.1%)  676.16  (7.6%)   
-0.6% ( -13% -   13%)
 MedSloppyPhrase   34.06  (4.9%)   33.89  (4.5%)   
-0.5% (  -9% -9%)
  IntNRQ   89.97 (12.4%)   89.62 (12.0%)   
-0.4% ( -22% -   27%)
HighSloppyPhrase8.28  (4.0%)8.25  (3.9%)   
-0.3% (  -7% -7%)
 WildcardConjLowTerm   36.35  (2.7%)   36.26  (2.7%)   
-0.3% (  -5% -5%)
  OrHighHigh   27.89  (2.6%)   27.85  (3.1%)   
-0.1% (  -5% -5%)
  Fuzzy2   44.19  (3.8%)   44.17  (3.1%)   
-0.1% (  -6% -7%)
   OrHighMed   90.42  (2.8%)   90.57  (2.8%)
0.2% (  -5% -6%)
   PrefixConjMedTerm   45.56  (2.8%)   45.79  (2.9%)
0.5% (  -5% -6%)
WildcardConjHighTerm   33.08  (2.6%)   33.47  (3.0%)
1.2% (  -4% -6%)
  PrefixConjHighTerm   83.65  (2.6%)   86.23  (3.7%)
3.1% (  -3% -9%)
   HighTermMonthSort  130.35 (15.8%)  135.08 (12.1%)
3.6% ( -20% -   37%)
 WildcardConjMedTerm   99.19  (3.6%)  103.37  (4.1%)
4.2% (  -3% -   12%)
{noformat}

> Use exponential search in IntArrayDocIdSet advance method
> -
>
> Key: LUCENE-8796
> URL: https://issues.apache.org/jira/browse/LUCENE-8796
> Project: Lucene - Core
>  Issue Type: Improvement

[jira] [Commented] (LUCENE-8796) Use exponential search in IntArrayDocIdSet advance method

2019-05-11 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16837979#comment-16837979
 ] 

David Smiley commented on LUCENE-8796:
--

Luca, you should reference this JIRA issue in your PR title in order to 
properly link them.  It'll trigger a bot to immediately notice and add the link 
here in JIRA to point to GitHub.

> Use exponential search in IntArrayDocIdSet advance method
> -
>
> Key: LUCENE-8796
> URL: https://issues.apache.org/jira/browse/LUCENE-8796
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Luca Cavanna
>Priority: Minor
>
> Chatting with [~jpountz] , he suggested to improve IntArrayDocIdSet by making 
> its advance method use exponential search instead of binary search. This 
> should help performance of queries including conjunctions: given that 
> ConjunctionDISI uses leap frog, it advances through doc ids in small steps, 
> hence exponential search should be faster when advancing on average compared 
> to binary search.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8796) Use exponential search in IntArrayDocIdSet advance method

2019-05-09 Thread Luca Cavanna (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836467#comment-16836467
 ] 

Luca Cavanna commented on LUCENE-8796:
--

I have updated the PR after applying Yonik's suggestion and re-run benchmarks a 
few times. The run with the least noise had these results (note that I disabled 
the bitset optimization on both sides):

{{
Report after iter 19:
TaskQPS baseline  StdDevQPS my_modified_version  
StdDevPct diff
HighTerm 1575.07  (5.9%) 1541.27  (6.9%)   
-2.1% ( -14% -   11%)
 MedTerm 1363.22  (6.5%) 1337.03  (7.0%)   
-1.9% ( -14% -   12%)
 LowTerm 1441.86  (4.2%) 1420.77  (5.2%)   
-1.5% ( -10% -8%)
   IntNRQConjMedTerm  280.55  (4.0%)  277.64  (4.1%)   
-1.0% (  -8% -7%)
   MedPhrase  153.84  (3.5%)  152.44  (3.3%)   
-0.9% (  -7% -6%)
 Prefix3  224.92  (4.0%)  223.13  (3.7%)   
-0.8% (  -8% -7%)
HighSloppyPhrase   19.70  (3.7%)   19.56  (4.5%)   
-0.7% (  -8% -7%)
 MedSloppyPhrase   18.23  (4.3%)   18.11  (4.7%)   
-0.7% (  -9% -8%)
OrNotHighMed  586.33  (3.4%)  582.47  (4.9%)   
-0.7% (  -8% -7%)
 LowSloppyPhrase   18.56  (3.6%)   18.46  (3.9%)   
-0.5% (  -7% -7%)
  HighPhrase   22.64  (2.7%)   22.54  (3.0%)   
-0.4% (  -6% -5%)
   LowPhrase  144.10  (3.8%)  143.55  (3.3%)   
-0.4% (  -7% -6%)
  AndHighLow  539.26  (3.7%)  537.25  (3.2%)   
-0.4% (  -7% -6%)
PKLookup  132.96  (3.0%)  132.48  (4.6%)   
-0.4% (  -7% -7%)
   OrHighMed  115.79  (2.7%)  115.49  (3.5%)   
-0.3% (  -6% -6%)
  PrefixConjHighTerm   36.98  (2.8%)   36.93  (3.4%)   
-0.1% (  -6% -6%)
WildcardConjHighTerm   45.79  (3.0%)   45.73  (3.1%)   
-0.1% (  -6% -6%)
   OrHighLow  448.91  (3.7%)  448.70  (6.3%)   
-0.0% (  -9% -   10%)
Wildcard   78.89  (3.2%)   78.95  (3.6%)
0.1% (  -6% -7%)
  IntNRQConjHighTerm   78.35  (2.3%)   78.48  (2.4%)
0.2% (  -4% -4%)
  IntNRQ  100.56  (2.7%)  100.84  (2.8%)
0.3% (  -5% -5%)
OrHighNotLow  732.45  (2.8%)  734.56  (5.3%)
0.3% (  -7% -8%)
   OrHighNotHigh  544.87  (2.8%)  546.47  (4.6%)
0.3% (  -6% -7%)
   IntNRQConjLowTerm  249.20  (4.2%)  249.99  (3.8%)
0.3% (  -7% -8%)
 Respell   73.05  (3.1%)   73.28  (3.4%)
0.3% (  -6% -7%)
  OrHighHigh   35.56  (3.0%)   35.68  (4.2%)
0.3% (  -6% -7%)
OrNotHighLow  695.41  (4.8%)  697.88  (6.5%)
0.4% ( -10% -   12%)
 MedSpanNear   59.99  (3.8%)   60.30  (4.0%)
0.5% (  -7% -8%)
  AndHighMed  190.02  (3.1%)  191.04  (3.6%)
0.5% (  -5% -7%)
 LowSpanNear   12.73  (3.9%)   12.81  (4.2%)
0.6% (  -7% -8%)
   HighTermDayOfYearSort   88.42  (7.0%)   89.09  (7.1%)
0.8% ( -12% -   15%)
   PrefixConjLowTerm   54.95  (3.7%)   55.43  (3.8%)
0.9% (  -6% -8%)
OrHighNotMed  628.44  (3.4%)  634.02  (6.1%)
0.9% (  -8% -   10%)
HighSpanNear   28.86  (3.2%)   29.11  (3.5%)
0.9% (  -5% -7%)
 WildcardConjMedTerm   72.48  (3.4%)   73.19  (4.8%)
1.0% (  -7% -9%)
  Fuzzy2   49.17  (9.9%)   49.68 (11.7%)
1.0% ( -18% -   25%)
 AndHighHigh   63.44  (3.8%)   64.11  (3.8%)
1.1% (  -6% -9%)
  Fuzzy1   79.43  (9.9%)   80.55  (9.7%)
1.4% ( -16% -   23%)
   OrNotHighHigh  574.89  (3.6%)  584.43  (5.5%)
1.7% (  -7% -   11%)
   PrefixConjMedTerm   79.00  (3.2%)   80.50  (3.6%)
1.9% (  -4% -8%)
 WildcardConjLowTerm   90.67  (2.9%)   92.49  (3.7%)
2.0% (  -4% -8%)
   HighTermMonthSort   86.13 (11.8%)   88.79 (12.4%)
3.1% ( -18% -   30%)
}}

I also ran benchmarks with the bitset optimization in place on both ends:

{{
Report after iter 19:
TaskQPS baseline  StdDevQPS my_modified_version  
StdDevPct diff
  IntNRQ   63.46 (24.6%)   62.28 (24.2%)   
-1.9% ( -40% -   

[jira] [Commented] (LUCENE-8796) Use exponential search in IntArrayDocIdSet advance method

2019-05-08 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835629#comment-16835629
 ] 

Atri Sharma commented on LUCENE-8796:
-

bq. We could potentially reduce the number of comparisons (at average) using 
the property that binary search uses same number of comparisons to search for a 
value in 2^n and 2^n+1 -1. Could we adjust the lower bound of search space 
based on that?

Something like:


{code:java}
int lowerBound = 0;
while(bound < length && docs[bound] < target) {
lowerBound += bound;
bound = std::min((bound + 1) * 2 - 1, length);
}
  i = Arrays.binarySearch(docs, lowerBound, (lowerBound + 
Math.min(bound, length)), target);
{code}

This might be wrong (I have not run a bunch of tests), but gives the general 
idea.

> Use exponential search in IntArrayDocIdSet advance method
> -
>
> Key: LUCENE-8796
> URL: https://issues.apache.org/jira/browse/LUCENE-8796
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Luca Cavanna
>Priority: Minor
>
> Chatting with [~jpountz] , he suggested to improve IntArrayDocIdSet by making 
> its advance method use exponential search instead of binary search. This 
> should help performance of queries including conjunctions: given that 
> ConjunctionDISI uses leap frog, it advances through doc ids in small steps, 
> hence exponential search should be faster when advancing on average compared 
> to binary search.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8796) Use exponential search in IntArrayDocIdSet advance method

2019-05-08 Thread Luca Cavanna (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835628#comment-16835628
 ] 

Luca Cavanna commented on LUCENE-8796:
--

You are right [~ysee...@gmail.com] I will make that change and re-run 
benchmarks.

> Use exponential search in IntArrayDocIdSet advance method
> -
>
> Key: LUCENE-8796
> URL: https://issues.apache.org/jira/browse/LUCENE-8796
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Luca Cavanna
>Priority: Minor
>
> Chatting with [~jpountz] , he suggested to improve IntArrayDocIdSet by making 
> its advance method use exponential search instead of binary search. This 
> should help performance of queries including conjunctions: given that 
> ConjunctionDISI uses leap frog, it advances through doc ids in small steps, 
> hence exponential search should be faster when advancing on average compared 
> to binary search.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8796) Use exponential search in IntArrayDocIdSet advance method

2019-05-08 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835613#comment-16835613
 ] 

Atri Sharma commented on LUCENE-8796:
-

+1, nice change!

A few thoughts:

1) We could potentially reduce the number of comparisons (at average) using the 
property that binary search uses same number of comparisons to search for a 
value in 2^n and 2^n+1 -1. Could we adjust the lower bound of search space 
based on that?

2) Could we improve things here for equal values?

> Use exponential search in IntArrayDocIdSet advance method
> -
>
> Key: LUCENE-8796
> URL: https://issues.apache.org/jira/browse/LUCENE-8796
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Luca Cavanna
>Priority: Minor
>
> Chatting with [~jpountz] , he suggested to improve IntArrayDocIdSet by making 
> its advance method use exponential search instead of binary search. This 
> should help performance of queries including conjunctions: given that 
> ConjunctionDISI uses leap frog, it advances through doc ids in small steps, 
> hence exponential search should be faster when advancing on average compared 
> to binary search.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8796) Use exponential search in IntArrayDocIdSet advance method

2019-05-08 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835597#comment-16835597
 ] 

Yonik Seeley commented on LUCENE-8796:
--

Hmmm, that looks like it's searching the whole space each time instead of 
starting that the current point?

Presumably this:
{code}
  while(bound < length && docs[bound] < target) {
{code}
Should be something like this:
{code}
  while(i+bound < length && docs[i+bound] < target) {
{code}
And also adjust the bounds of the following binary search to match as well.


> Use exponential search in IntArrayDocIdSet advance method
> -
>
> Key: LUCENE-8796
> URL: https://issues.apache.org/jira/browse/LUCENE-8796
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Luca Cavanna
>Priority: Minor
>
> Chatting with [~jpountz] , he suggested to improve IntArrayDocIdSet by making 
> its advance method use exponential search instead of binary search. This 
> should help performance of queries including conjunctions: given that 
> ConjunctionDISI uses leap frog, it advances through doc ids in small steps, 
> hence exponential search should be faster when advancing on average compared 
> to binary search.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8796) Use exponential search in IntArrayDocIdSet advance method

2019-05-08 Thread Luca Cavanna (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835542#comment-16835542
 ] 

Luca Cavanna commented on LUCENE-8796:
--

I have made the change and played with luceneutil to run some benchmark. I 
opened a PR here: https://github.com/apache/lucene-solr/pull/667 .

Luceneutil does not currently benchmark the queries that should be affected by 
this change, hence I added benchmarks for numeric range queries, prefix queries 
and wildcard queries in conjunction with term queries (low, medium and high 
frequency). See the changes I made to my luceneutil fork: 
[https://github.com/mikemccand/luceneutil/compare/master...javanna:conjunctions]
 .  Also, for the benchmarks I temporarily modified DocIdSetBuilder#grow to 
never call upgradeToBitSet (on both baseline and modified version), so that the 
updated code is exercised as much as possible during the benchmarks run, 
otherwise in many cases we would use bitsets instead and the changed code would 
not be exercised at all.

I ran the wikimedium10m benchmarks a few times, here is probably the run with 
the least noise, results show a little improvement for some queries, and no 
regressions in general:
 
Report after iter 19:
 TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff
 WildcardConjMedTerm 75.49 (2.2%) 72.79 (2.0%) -3.6% ( -7% - 0%)
 OrHighNotMed 607.01 (5.7%) 593.10 (4.4%) -2.3% ( -11% - 8%)
 WildcardConjHighTerm 64.00 (1.7%) 62.55 (1.4%) -2.3% ( -5% - 0%)
 Fuzzy2 20.14 (3.4%) 19.72 (4.6%) -2.1% ( -9% - 6%)
 HighTerm 1174.41 (4.7%) 1150.11 (4.2%) -2.1% ( -10% - 7%)
 OrHighLow 483.40 (5.1%) 473.69 (6.9%) -2.0% ( -13% - 10%)
 OrNotHighLow 526.75 (3.6%) 516.47 (3.6%) -2.0% ( -8% - 5%)
 OrNotHighHigh 600.38 (4.9%) 590.21 (3.7%) -1.7% ( -9% - 7%)
 HighTermMonthSort 110.05 (11.7%) 108.58 (11.5%) -1.3% ( -21% - 24%)
 OrHighMed 107.83 (2.6%) 106.48 (4.7%) -1.3% ( -8% - 6%)
 PrefixConjMedTerm 56.98 (2.5%) 56.33 (1.7%) -1.1% ( -5% - 3%)
 AndHighLow 432.27 (3.6%) 427.46 (3.2%) -1.1% ( -7% - 5%)
 PrefixConjLowTerm 44.43 (2.8%) 43.98 (1.8%) -1.0% ( -5% - 3%)
 MedTerm 1409.97 (5.5%) 1396.33 (4.9%) -1.0% ( -10% - 9%)
 HighSloppyPhrase 11.98 (4.3%) 11.87 (5.1%) -0.9% ( -9% - 8%)
 OrNotHighMed 614.19 (4.6%) 608.74 (3.8%) -0.9% ( -8% - 7%)
 Respell 58.11 (2.4%) 57.61 (2.4%) -0.9% ( -5% - 3%)
 LowTerm 1342.33 (4.8%) 1330.86 (4.0%) -0.9% ( -9% - 8%)
 PrefixConjHighTerm 68.50 (2.9%) 67.93 (1.8%) -0.8% ( -5% - 3%)
 OrHighNotHigh 566.30 (5.2%) 561.88 (4.5%) -0.8% ( -9% - 9%)
 WildcardConjLowTerm 32.75 (2.5%) 32.56 (2.1%) -0.6% ( -5% - 4%)
 PKLookup 131.80 (2.4%) 131.28 (2.3%) -0.4% ( -5% - 4%)
 OrHighHigh 29.90 (3.4%) 29.79 (5.3%) -0.4% ( -8% - 8%)
 OrHighNotLow 497.65 (6.6%) 495.84 (5.2%) -0.4% ( -11% - 12%)
 AndHighMed 175.08 (3.5%) 174.58 (3.0%) -0.3% ( -6% - 6%)
 LowSpanNear 15.17 (1.8%) 15.13 (2.5%) -0.2% ( -4% - 4%)
 Fuzzy1 71.14 (5.9%) 70.97 (6.3%) -0.2% ( -11% - 12%)
 LowSloppyPhrase 35.23 (2.0%) 35.16 (2.6%) -0.2% ( -4% - 4%)
 LowPhrase 74.10 (1.7%) 73.98 (1.8%) -0.2% ( -3% - 3%)
 HighPhrase 34.18 (2.1%) 34.13 (2.0%) -0.1% ( -4% - 3%)
 Prefix3 45.33 (2.3%) 45.28 (2.1%) -0.1% ( -4% - 4%)
 MedPhrase 28.30 (2.1%) 28.27 (1.7%) -0.1% ( -3% - 3%)
 MedSloppyPhrase 6.80 (3.6%) 6.80 (3.2%) -0.0% ( -6% - 6%)
 AndHighHigh 53.79 (3.9%) 53.79 (4.0%) -0.0% ( -7% - 8%)
 MedSpanNear 61.78 (2.2%) 61.83 (1.7%) 0.1% ( -3% - 4%)
 Wildcard 37.83 (2.5%) 37.91 (1.7%) 0.2% ( -3% - 4%)
 IntNRQConjHighTerm 20.17 (3.8%) 20.24 (4.9%) 0.3% ( -8% - 9%)
 HighTermDayOfYearSort 53.55 (7.8%) 53.76 (7.3%) 0.4% ( -13% - 16%)
 HighSpanNear 5.39 (2.6%) 5.42 (2.6%) 0.5% ( -4% - 5%)
 IntNRQConjLowTerm 19.69 (4.3%) 19.86 (4.3%) 0.9% ( -7% - 9%)
 IntNRQConjMedTerm 15.93 (4.5%) 16.12 (5.4%) 1.2% ( -8% - 11%)
 IntNRQ 114.28 (10.3%) 116.41 (14.0%) 1.9% ( -20% - 29%)

 

 

> Use exponential search in IntArrayDocIdSet advance method
> -
>
> Key: LUCENE-8796
> URL: https://issues.apache.org/jira/browse/LUCENE-8796
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Luca Cavanna
>Priority: Minor
>
> Chatting with [~jpountz] , he suggested to improve IntArrayDocIdSet by making 
> its advance method use exponential search instead of binary search. This 
> should help performance of queries including conjunctions: given that 
> ConjunctionDISI uses leap frog, it advances through doc ids in small steps, 
> hence exponential search should be faster when advancing on average compared 
> to binary search.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org