subject:"\[jira\] \[Commented\] \(LUCENE\-8311\) Leverage impacts for phrase queries"

[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries

2019-07-10 Thread Adrien Grand (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881967#comment-16881967
 ] 

Adrien Grand commented on LUCENE-8311:
--

This made exact phrase queries 3x faster in the nightly benchmarks 
http://people.apache.org/~mikemccand/lucenebench/Phrase.html and term queries 
about 10% slower http://people.apache.org/~mikemccand/lucenebench/Term.html.

> Leverage impacts for phrase queries
> ---
>
> Key: LUCENE-8311
> URL: https://issues.apache.org/jira/browse/LUCENE-8311
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8311.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now that we expose raw impacts, we could leverage them for phrase queries.
> For instance for exact phrases, we could take the minimum term frequency for 
> each unique norm value in order to get upper bounds of the score for the 
> phrase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries

2019-07-09 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881292#comment-16881292
 ] 

ASF subversion and git services commented on LUCENE-8311:
-

Commit a80b5164d1695d58115b78e832df0b722860b22c in lucene-solr's branch 
refs/heads/branch_8x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a80b516 ]

LUCENE-8311: Add CHANGES entry.


> Leverage impacts for phrase queries
> ---
>
> Key: LUCENE-8311
> URL: https://issues.apache.org/jira/browse/LUCENE-8311
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8311.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now that we expose raw impacts, we could leverage them for phrase queries.
> For instance for exact phrases, we could take the minimum term frequency for 
> each unique norm value in order to get upper bounds of the score for the 
> phrase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries

2019-07-09 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881294#comment-16881294
 ] 

ASF subversion and git services commented on LUCENE-8311:
-

Commit 437090c3028d9cf85dee45fb65df29248126d2ea in lucene-solr's branch 
refs/heads/master from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=437090c ]

LUCENE-8311: Add CHANGES entry.


> Leverage impacts for phrase queries
> ---
>
> Key: LUCENE-8311
> URL: https://issues.apache.org/jira/browse/LUCENE-8311
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8311.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now that we expose raw impacts, we could leverage them for phrase queries.
> For instance for exact phrases, we could take the minimum term frequency for 
> each unique norm value in order to get upper bounds of the score for the 
> phrase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries

2019-07-09 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881288#comment-16881288
 ] 

ASF subversion and git services commented on LUCENE-8311:
-

Commit d271770ed133995186f6a1667b36ee623e6cefc0 in lucene-solr's branch 
refs/heads/branch_8x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d271770 ]

LUCENE-8311: Phrase impacts (#760)




> Leverage impacts for phrase queries
> ---
>
> Key: LUCENE-8311
> URL: https://issues.apache.org/jira/browse/LUCENE-8311
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8311.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now that we expose raw impacts, we could leverage them for phrase queries.
> For instance for exact phrases, we could take the minimum term frequency for 
> each unique norm value in order to get upper bounds of the score for the 
> phrase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries

2019-07-09 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881245#comment-16881245
 ] 

ASF subversion and git services commented on LUCENE-8311:
-

Commit cfac486afd7bce64c10497a3b9e541d64ee4f1fd in lucene-solr's branch 
refs/heads/master from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=cfac486 ]

LUCENE-8311: Phrase impacts (#760)




> Leverage impacts for phrase queries
> ---
>
> Key: LUCENE-8311
> URL: https://issues.apache.org/jira/browse/LUCENE-8311
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8311.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now that we expose raw impacts, we could leverage them for phrase queries.
> For instance for exact phrases, we could take the minimum term frequency for 
> each unique norm value in order to get upper bounds of the score for the 
> phrase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries

2019-07-09 Thread Michael McCandless (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881203#comment-16881203
 ] 

Michael McCandless commented on LUCENE-8311:


+1 to merge ... that is a good tradeoff!  Astronomical speedups for 
{{PhraseQuery}} and some small slowdowns in others.  It's important that all of 
our common queries properly handle impacts.

> Leverage impacts for phrase queries
> ---
>
> Key: LUCENE-8311
> URL: https://issues.apache.org/jira/browse/LUCENE-8311
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8311.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now that we expose raw impacts, we could leverage them for phrase queries.
> For instance for exact phrases, we could take the minimum term frequency for 
> each unique norm value in order to get upper bounds of the score for the 
> phrase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries

2019-07-03 Thread Adrien Grand (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877919#comment-16877919
 ] 

Adrien Grand commented on LUCENE-8311:
--

I opened https://github.com/apache/lucene-solr/pull/760. Performance is a bit 
better than what we had before:

{noformat}
TaskQPS baseline  StdDev   QPS patch  StdDev
Pct diff
HighTerm 1395.12  (5.1%) 1230.78  (4.3%)  
-11.8% ( -20% -   -2%)
 MedTerm 2352.56  (4.7%) 2170.42  (3.9%)   
-7.7% ( -15% -0%)
 LowSpanNear   13.70  (7.0%)   12.67  (4.9%)   
-7.5% ( -18% -4%)
HighSpanNear5.69  (5.3%)5.31  (3.2%)   
-6.5% ( -14% -2%)
 MedSpanNear   23.33  (4.2%)   21.97  (2.4%)   
-5.8% ( -11% -0%)
  AndHighMed  114.70  (2.9%)  109.40  (4.1%)   
-4.6% ( -11% -2%)
 AndHighHigh   35.08  (3.2%)   33.51  (4.1%)   
-4.5% ( -11% -2%)
 LowTerm 3014.11  (4.7%) 2893.44  (4.7%)   
-4.0% ( -12% -5%)
   OrHighMed   60.26  (2.5%)   57.96  (2.1%)   
-3.8% (  -8% -0%)
  OrHighHigh   15.45  (2.5%)   14.87  (2.3%)   
-3.8% (  -8% -1%)
   LowPhrase   25.81  (3.4%)   24.89  (2.8%)   
-3.6% (  -9% -2%)
HighSloppyPhrase7.44  (6.3%)7.20  (5.7%)   
-3.3% ( -14% -9%)
 MedSloppyPhrase   12.76  (5.1%)   12.51  (4.6%)   
-1.9% ( -10% -8%)
 LowSloppyPhrase   34.24  (4.1%)   33.59  (3.8%)   
-1.9% (  -9% -6%)
   HighTermMonthSort   70.86 (10.9%)   69.98 (10.7%)   
-1.2% ( -20% -   22%)
  Fuzzy1  211.28  (3.5%)  208.86  (2.2%)   
-1.1% (  -6% -4%)
  Fuzzy2  180.97  (4.4%)  179.47  (2.6%)   
-0.8% (  -7% -6%)
   OrHighLow  467.25  (2.9%)  467.94  (2.0%)
0.1% (  -4% -5%)
 Prefix3   91.35  (8.1%)   91.52  (7.2%)
0.2% ( -14% -   16%)
   HighTermDayOfYearSort   62.77  (6.9%)   62.96  (7.5%)
0.3% ( -13% -   15%)
Wildcard  129.49  (4.3%)  129.99  (2.8%)
0.4% (  -6% -7%)
 Respell  210.68  (1.9%)  211.58  (2.4%)
0.4% (  -3% -4%)
  AndHighLow  541.64  (3.1%)  544.44  (3.2%)
0.5% (  -5% -7%)
  IntNRQ  148.56  (8.3%)  149.44 (10.4%)
0.6% ( -16% -   21%)
  HighPhrase   10.86  (9.0%)   13.92 (15.2%)   
28.2% (   3% -   57%)
   MedPhrase   62.22  (2.1%)   97.61  (4.6%)   
56.9% (  49% -   64%)
{noformat}

But there is a lot of variance across runs because it depends a lot on which 
query gets picked up. For instance on another run I got

{noformat}
   LowPhrase   39.39  (1.9%)   51.21  (2.2%)   
30.0% (  25% -   34%)
  HighPhrase   13.09  (3.2%)  192.76 (26.8%) 
1372.5% (1301% - 1448%)
{noformat}

In spite of some queries that get slightly slower, I think we should merge this 
since we need phrases to expose good impacts if we want to give boolean queries 
a chance to speed up queries that include phrases. Term queries appear to be a 
bit slower, I'm assuming that this is due to the fact that the JVM cannot do as 
much inlining as before since we are starting to use classes for phrases that 
were only used for term queries before.

> Leverage impacts for phrase queries
> ---
>
> Key: LUCENE-8311
> URL: https://issues.apache.org/jira/browse/LUCENE-8311
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8311.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now that we expose raw impacts, we could leverage them for phrase queries.
> For instance for exact phrases, we could take the minimum term frequency for 
> each unique norm value in order to get upper bounds of the score for the 
> phrase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries

2019-07-03 Thread Adrien Grand (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877849#comment-16877849
 ] 

Adrien Grand commented on LUCENE-8311:
--

It turns out that part of the reason why the patch is making things slower is 
that it is moving phrase queries from BlockPostingsEnum, which is specialized 
to read freqs and positions only, to BlockImpactsEverythingEnum, which can read 
any of docs+freqs, docs+freqs+positios or docs+freqs+positions+offsets. Maybe 
we should remove BlockPostingsEnum and have a specialized impacts enum for 
positions instead.

The merged impacts look like they have some room for improvement as well. I'm 
looking into those issues so that we can then do better testing of LUCENE-8806.

> Leverage impacts for phrase queries
> ---
>
> Key: LUCENE-8311
> URL: https://issues.apache.org/jira/browse/LUCENE-8311
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8311.patch
>
>
> Now that we expose raw impacts, we could leverage them for phrase queries.
> For instance for exact phrases, we could take the minimum term frequency for 
> each unique norm value in order to get upper bounds of the score for the 
> phrase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries

2018-05-22 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484120#comment-16484120
 ] 

Adrien Grand commented on LUCENE-8311:
--

Here is a run with DFR I(ne)L1:

{noformat}
   LowPhrase   19.89  (1.2%)   16.59  (1.0%)  
-16.6% ( -18% -  -14%)
   MedPhrase   15.94  (1.2%)   13.36  (1.1%)  
-16.1% ( -18% -  -14%)
   HighTermMonthSort   90.26 (10.9%)   81.72 (11.6%)   
-9.5% ( -28% -   14%)
HighSloppyPhrase1.84  (1.9%)1.69  (2.2%)   
-7.9% ( -11% -   -3%)
 LowSloppyPhrase7.87  (2.0%)7.28  (2.5%)   
-7.4% ( -11% -   -3%)
 MedSloppyPhrase   10.17  (1.6%)9.43  (2.0%)   
-7.3% ( -10% -   -3%)
   HighTermDayOfYearSort   64.33 (11.6%)   60.25 (10.4%)   
-6.3% ( -25% -   17%)
HighTerm  476.13  (2.5%)  452.30  (1.8%)   
-5.0% (  -9% -0%)
  Fuzzy1  211.47  (4.1%)  203.28  (3.3%)   
-3.9% ( -10% -3%)
  IntNRQ   31.99  (2.5%)   30.96  (7.6%)   
-3.2% ( -12% -6%)
 MedTerm  653.93  (2.4%)  634.02  (1.8%)   
-3.0% (  -7% -1%)
  Fuzzy2  218.64  (5.9%)  212.25  (5.4%)   
-2.9% ( -13% -8%)
  OrHighHigh   17.28  (1.6%)   16.93  (1.7%)   
-2.0% (  -5% -1%)
 LowTerm 1405.19  (2.9%) 1380.15  (2.3%)   
-1.8% (  -6% -3%)
 AndHighHigh   21.96  (2.1%)   21.62  (2.5%)   
-1.5% (  -5% -3%)
   OrHighMed   59.73  (1.5%)   58.89  (1.7%)   
-1.4% (  -4% -1%)
 Prefix3   73.07  (4.8%)   72.07  (5.8%)   
-1.4% ( -11% -9%)
Wildcard   64.42  (3.6%)   63.72  (4.5%)   
-1.1% (  -8% -7%)
 Respell  181.31  (2.4%)  180.69  (2.3%)   
-0.3% (  -4% -4%)
  AndHighLow  982.32  (2.5%)  981.63  (3.1%)   
-0.1% (  -5% -5%)
  AndHighMed   47.62  (2.0%)   47.60  (2.5%)   
-0.0% (  -4% -4%)
 LowSpanNear   49.59  (3.4%)   49.65  (3.0%)
0.1% (  -6% -6%)
   OrHighLow  314.16  (2.2%)  314.60  (1.7%)
0.1% (  -3% -4%)
HighSpanNear5.92  (4.6%)5.98  (4.1%)
1.0% (  -7% -   10%)
 MedSpanNear5.53  (6.7%)5.66  (5.5%)
2.2% (  -9% -   15%)
  HighPhrase3.87  (1.5%)4.36  (1.6%)   
12.6% (   9% -   15%)
{noformat}

> Leverage impacts for phrase queries
> ---
>
> Key: LUCENE-8311
> URL: https://issues.apache.org/jira/browse/LUCENE-8311
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8311.patch
>
>
> Now that we expose raw impacts, we could leverage them for phrase queries.
> For instance for exact phrases, we could take the minimum term frequency for 
> each unique norm value in order to get upper bounds of the score for the 
> phrase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries

2018-05-22 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483907#comment-16483907
 ] 

Robert Muir commented on LUCENE-8311:
-

Yeah, I was thinking more along the lines of LowPhrase (still exact scoring). 
Sloppy is a whole nother beast :)

> Leverage impacts for phrase queries
> ---
>
> Key: LUCENE-8311
> URL: https://issues.apache.org/jira/browse/LUCENE-8311
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8311.patch
>
>
> Now that we expose raw impacts, we could leverage them for phrase queries.
> For instance for exact phrases, we could take the minimum term frequency for 
> each unique norm value in order to get upper bounds of the score for the 
> phrase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries

2018-05-22 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483899#comment-16483899
 ] 

Adrien Grand commented on LUCENE-8311:
--

Unfortunately I don't think this is due to this scoring issue, but rather to 
the fact that a single position of a given term is allowed to be part of 
several matches in sloppy phrases. For instance if the query is {{"the 
fox"~4}}, and {{the}} and {{fox}} have respective term frequencies of 5 and 1. 
Then we can assume that the maximum frequency is 1 for an exact phrase (the min 
of both freqs). But if the query is a sloppy phrase query, we could have a 
frequency of 4 if a document has 5 occurrences of {{the}} at position N (as 
synonyms of each other) and 1 occurrence of {{fox}} at position {{N+1}}. Yet 
such documents that trigger the maximum frequency do not exist in practice, 
which causes the score upper bounds that we compute to be significantly higher 
than the scores that are computed in practice, so no blocks of documents are 
ever skipped because their score is not competitive.

> Leverage impacts for phrase queries
> ---
>
> Key: LUCENE-8311
> URL: https://issues.apache.org/jira/browse/LUCENE-8311
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8311.patch
>
>
> Now that we expose raw impacts, we could leverage them for phrase queries.
> For instance for exact phrases, we could take the minimum term frequency for 
> each unique norm value in order to get upper bounds of the score for the 
> phrase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries

2018-05-22 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483764#comment-16483764
 ] 

Robert Muir commented on LUCENE-8311:
-

I wonder if its difficult to test with another similarity such as a DFR model? 
I'm only asking because I'm a little concerned that the bogus way we compute 
"phrase IDF" for BM25Similarity & ClassicSimilarity is getting in your way. 

All the other models use a more sane approach (scores like a disjunction 
internally). BM25 carried along the brain damage of ClassicSimilarity just 
because it was trying to minimize differences, but not for any particular good 
reason.

> Leverage impacts for phrase queries
> ---
>
> Key: LUCENE-8311
> URL: https://issues.apache.org/jira/browse/LUCENE-8311
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8311.patch
>
>
> Now that we expose raw impacts, we could leverage them for phrase queries.
> For instance for exact phrases, we could take the minimum term frequency for 
> each unique norm value in order to get upper bounds of the score for the 
> phrase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries

2018-05-22 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483737#comment-16483737
 ] 

Adrien Grand commented on LUCENE-8311:
--

Here is a patch that builds on LUCENE-8312 and the output of a luceneutil run:

{noformat}
   LowPhrase   23.35  (2.1%)   16.05  (1.1%)  
-31.3% ( -33% -  -28%)
HighSloppyPhrase   26.90  (5.1%)   23.84  (3.8%)  
-11.4% ( -19% -   -2%)
   HighTermMonthSort  155.27 (13.1%)  138.14 (11.0%)  
-11.0% ( -31% -   15%)
 MedSloppyPhrase   18.12  (4.6%)   16.20  (3.2%)  
-10.6% ( -17% -   -2%)
 LowSloppyPhrase  236.36  (5.4%)  218.12  (4.5%)   
-7.7% ( -16% -2%)
   HighTermDayOfYearSort   89.47 (11.5%)   84.16 (10.1%)   
-5.9% ( -24% -   17%)
HighTerm 1463.31  (3.9%) 1402.12  (3.4%)   
-4.2% ( -11% -3%)
  IntNRQ   29.88  (6.8%)   28.65  (6.8%)   
-4.1% ( -16% -   10%)
 MedTerm 1721.26  (3.8%) 1672.73  (3.2%)   
-2.8% (  -9% -4%)
  Fuzzy2  112.51  (5.1%)  109.41  (4.9%)   
-2.8% ( -12% -7%)
 LowTerm 2469.28  (3.8%) 2414.68  (3.5%)   
-2.2% (  -9% -5%)
 MedSpanNear   85.48  (4.1%)   84.02  (3.9%)   
-1.7% (  -9% -6%)
HighSpanNear   10.03  (4.4%)9.86  (4.1%)   
-1.7% (  -9% -7%)
  Fuzzy1  153.76  (4.9%)  151.56  (4.0%)   
-1.4% (  -9% -7%)
  OrHighHigh   20.38  (3.2%)   20.18  (3.0%)   
-1.0% (  -6% -5%)
   OrHighMed   72.71  (2.5%)   72.05  (2.4%)   
-0.9% (  -5% -4%)
 Respell  163.99  (2.1%)  162.75  (2.3%)   
-0.8% (  -5% -3%)
Wildcard   39.17  (5.7%)   38.90  (5.0%)   
-0.7% ( -10% -   10%)
 Prefix3   45.93  (7.2%)   45.72  (6.6%)   
-0.5% ( -13% -   14%)
  AndHighMed  147.08  (2.0%)  146.55  (3.1%)   
-0.4% (  -5% -4%)
 AndHighHigh   52.33  (2.0%)   52.25  (3.6%)   
-0.2% (  -5% -5%)
   OrHighLow  331.39  (3.4%)  334.43  (2.5%)
0.9% (  -4% -7%)
  AndHighLow  603.54  (3.6%)  611.77  (3.8%)
1.4% (  -5% -9%)
 LowSpanNear7.87 (11.1%)8.04  (6.9%)
2.2% ( -14% -   22%)
   MedPhrase   94.59  (1.6%)  108.41  (1.9%)   
14.6% (  10% -   18%)
  HighPhrase   11.74  (2.8%)  109.04 (24.6%)  
828.7% ( 779% -  880%)
{noformat}

It helps HighPhrase a lot, but hurts LowPhrase a bit. More generally, this 
change helps most when at least one of the searched terms mostly occurs within 
the phrase. For instance "york" mostly appears in the "new york" phrase in the 
wikipedia corpus that we use, so the "new york" phrase gets a huge speedup. 
This is not the case for LowPhrase entries like "median age" or "his family", 
which get worse latencies because they need to read impacts from the index and 
compute score upper bounds.

I tried to implement impacts on sloppy phrases by summing up frequencies but it 
didn't help since the score upper bounds were way higher than the scores that 
were actually computed. The reason why they are slower according to luceneutil 
is that the refactoring made them use the impacts enums rather than simple 
postings enums to iterate doc ids.

> Leverage impacts for phrase queries
> ---
>
> Key: LUCENE-8311
> URL: https://issues.apache.org/jira/browse/LUCENE-8311
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> Now that we expose raw impacts, we could leverage them for phrase queries.
> For instance for exact phrases, we could take the minimum term frequency for 
> each unique norm value in order to get upper bounds of the score for the 
> phrase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries

[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries

[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries

[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries

[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries

[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries

[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries

[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries

[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries

[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries

[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries

[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries

[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries

13 matches

Site Navigation

Mail list logo

Footer information