[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries
[ https://issues.apache.org/jira/browse/LUCENE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881967#comment-16881967 ] Adrien Grand commented on LUCENE-8311: -- This made exact phrase queries 3x faster in the nightly benchmarks http://people.apache.org/~mikemccand/lucenebench/Phrase.html and term queries about 10% slower http://people.apache.org/~mikemccand/lucenebench/Term.html. > Leverage impacts for phrase queries > --- > > Key: LUCENE-8311 > URL: https://issues.apache.org/jira/browse/LUCENE-8311 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-8311.patch > > Time Spent: 20m > Remaining Estimate: 0h > > Now that we expose raw impacts, we could leverage them for phrase queries. > For instance for exact phrases, we could take the minimum term frequency for > each unique norm value in order to get upper bounds of the score for the > phrase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries
[ https://issues.apache.org/jira/browse/LUCENE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881292#comment-16881292 ] ASF subversion and git services commented on LUCENE-8311: - Commit a80b5164d1695d58115b78e832df0b722860b22c in lucene-solr's branch refs/heads/branch_8x from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a80b516 ] LUCENE-8311: Add CHANGES entry. > Leverage impacts for phrase queries > --- > > Key: LUCENE-8311 > URL: https://issues.apache.org/jira/browse/LUCENE-8311 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-8311.patch > > Time Spent: 20m > Remaining Estimate: 0h > > Now that we expose raw impacts, we could leverage them for phrase queries. > For instance for exact phrases, we could take the minimum term frequency for > each unique norm value in order to get upper bounds of the score for the > phrase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries
[ https://issues.apache.org/jira/browse/LUCENE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881294#comment-16881294 ] ASF subversion and git services commented on LUCENE-8311: - Commit 437090c3028d9cf85dee45fb65df29248126d2ea in lucene-solr's branch refs/heads/master from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=437090c ] LUCENE-8311: Add CHANGES entry. > Leverage impacts for phrase queries > --- > > Key: LUCENE-8311 > URL: https://issues.apache.org/jira/browse/LUCENE-8311 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-8311.patch > > Time Spent: 20m > Remaining Estimate: 0h > > Now that we expose raw impacts, we could leverage them for phrase queries. > For instance for exact phrases, we could take the minimum term frequency for > each unique norm value in order to get upper bounds of the score for the > phrase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries
[ https://issues.apache.org/jira/browse/LUCENE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881288#comment-16881288 ] ASF subversion and git services commented on LUCENE-8311: - Commit d271770ed133995186f6a1667b36ee623e6cefc0 in lucene-solr's branch refs/heads/branch_8x from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d271770 ] LUCENE-8311: Phrase impacts (#760) > Leverage impacts for phrase queries > --- > > Key: LUCENE-8311 > URL: https://issues.apache.org/jira/browse/LUCENE-8311 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-8311.patch > > Time Spent: 20m > Remaining Estimate: 0h > > Now that we expose raw impacts, we could leverage them for phrase queries. > For instance for exact phrases, we could take the minimum term frequency for > each unique norm value in order to get upper bounds of the score for the > phrase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries
[ https://issues.apache.org/jira/browse/LUCENE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881245#comment-16881245 ] ASF subversion and git services commented on LUCENE-8311: - Commit cfac486afd7bce64c10497a3b9e541d64ee4f1fd in lucene-solr's branch refs/heads/master from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=cfac486 ] LUCENE-8311: Phrase impacts (#760) > Leverage impacts for phrase queries > --- > > Key: LUCENE-8311 > URL: https://issues.apache.org/jira/browse/LUCENE-8311 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-8311.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Now that we expose raw impacts, we could leverage them for phrase queries. > For instance for exact phrases, we could take the minimum term frequency for > each unique norm value in order to get upper bounds of the score for the > phrase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries
[ https://issues.apache.org/jira/browse/LUCENE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881203#comment-16881203 ] Michael McCandless commented on LUCENE-8311: +1 to merge ... that is a good tradeoff! Astronomical speedups for {{PhraseQuery}} and some small slowdowns in others. It's important that all of our common queries properly handle impacts. > Leverage impacts for phrase queries > --- > > Key: LUCENE-8311 > URL: https://issues.apache.org/jira/browse/LUCENE-8311 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-8311.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Now that we expose raw impacts, we could leverage them for phrase queries. > For instance for exact phrases, we could take the minimum term frequency for > each unique norm value in order to get upper bounds of the score for the > phrase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries
[ https://issues.apache.org/jira/browse/LUCENE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877919#comment-16877919 ] Adrien Grand commented on LUCENE-8311: -- I opened https://github.com/apache/lucene-solr/pull/760. Performance is a bit better than what we had before: {noformat} TaskQPS baseline StdDev QPS patch StdDev Pct diff HighTerm 1395.12 (5.1%) 1230.78 (4.3%) -11.8% ( -20% - -2%) MedTerm 2352.56 (4.7%) 2170.42 (3.9%) -7.7% ( -15% -0%) LowSpanNear 13.70 (7.0%) 12.67 (4.9%) -7.5% ( -18% -4%) HighSpanNear5.69 (5.3%)5.31 (3.2%) -6.5% ( -14% -2%) MedSpanNear 23.33 (4.2%) 21.97 (2.4%) -5.8% ( -11% -0%) AndHighMed 114.70 (2.9%) 109.40 (4.1%) -4.6% ( -11% -2%) AndHighHigh 35.08 (3.2%) 33.51 (4.1%) -4.5% ( -11% -2%) LowTerm 3014.11 (4.7%) 2893.44 (4.7%) -4.0% ( -12% -5%) OrHighMed 60.26 (2.5%) 57.96 (2.1%) -3.8% ( -8% -0%) OrHighHigh 15.45 (2.5%) 14.87 (2.3%) -3.8% ( -8% -1%) LowPhrase 25.81 (3.4%) 24.89 (2.8%) -3.6% ( -9% -2%) HighSloppyPhrase7.44 (6.3%)7.20 (5.7%) -3.3% ( -14% -9%) MedSloppyPhrase 12.76 (5.1%) 12.51 (4.6%) -1.9% ( -10% -8%) LowSloppyPhrase 34.24 (4.1%) 33.59 (3.8%) -1.9% ( -9% -6%) HighTermMonthSort 70.86 (10.9%) 69.98 (10.7%) -1.2% ( -20% - 22%) Fuzzy1 211.28 (3.5%) 208.86 (2.2%) -1.1% ( -6% -4%) Fuzzy2 180.97 (4.4%) 179.47 (2.6%) -0.8% ( -7% -6%) OrHighLow 467.25 (2.9%) 467.94 (2.0%) 0.1% ( -4% -5%) Prefix3 91.35 (8.1%) 91.52 (7.2%) 0.2% ( -14% - 16%) HighTermDayOfYearSort 62.77 (6.9%) 62.96 (7.5%) 0.3% ( -13% - 15%) Wildcard 129.49 (4.3%) 129.99 (2.8%) 0.4% ( -6% -7%) Respell 210.68 (1.9%) 211.58 (2.4%) 0.4% ( -3% -4%) AndHighLow 541.64 (3.1%) 544.44 (3.2%) 0.5% ( -5% -7%) IntNRQ 148.56 (8.3%) 149.44 (10.4%) 0.6% ( -16% - 21%) HighPhrase 10.86 (9.0%) 13.92 (15.2%) 28.2% ( 3% - 57%) MedPhrase 62.22 (2.1%) 97.61 (4.6%) 56.9% ( 49% - 64%) {noformat} But there is a lot of variance across runs because it depends a lot on which query gets picked up. For instance on another run I got {noformat} LowPhrase 39.39 (1.9%) 51.21 (2.2%) 30.0% ( 25% - 34%) HighPhrase 13.09 (3.2%) 192.76 (26.8%) 1372.5% (1301% - 1448%) {noformat} In spite of some queries that get slightly slower, I think we should merge this since we need phrases to expose good impacts if we want to give boolean queries a chance to speed up queries that include phrases. Term queries appear to be a bit slower, I'm assuming that this is due to the fact that the JVM cannot do as much inlining as before since we are starting to use classes for phrases that were only used for term queries before. > Leverage impacts for phrase queries > --- > > Key: LUCENE-8311 > URL: https://issues.apache.org/jira/browse/LUCENE-8311 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-8311.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Now that we expose raw impacts, we could leverage them for phrase queries. > For instance for exact phrases, we could take the minimum term frequency for > each unique norm value in order to get upper bounds of the score for the > phrase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries
[ https://issues.apache.org/jira/browse/LUCENE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877849#comment-16877849 ] Adrien Grand commented on LUCENE-8311: -- It turns out that part of the reason why the patch is making things slower is that it is moving phrase queries from BlockPostingsEnum, which is specialized to read freqs and positions only, to BlockImpactsEverythingEnum, which can read any of docs+freqs, docs+freqs+positios or docs+freqs+positions+offsets. Maybe we should remove BlockPostingsEnum and have a specialized impacts enum for positions instead. The merged impacts look like they have some room for improvement as well. I'm looking into those issues so that we can then do better testing of LUCENE-8806. > Leverage impacts for phrase queries > --- > > Key: LUCENE-8311 > URL: https://issues.apache.org/jira/browse/LUCENE-8311 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-8311.patch > > > Now that we expose raw impacts, we could leverage them for phrase queries. > For instance for exact phrases, we could take the minimum term frequency for > each unique norm value in order to get upper bounds of the score for the > phrase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries
[ https://issues.apache.org/jira/browse/LUCENE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484120#comment-16484120 ] Adrien Grand commented on LUCENE-8311: -- Here is a run with DFR I(ne)L1: {noformat} LowPhrase 19.89 (1.2%) 16.59 (1.0%) -16.6% ( -18% - -14%) MedPhrase 15.94 (1.2%) 13.36 (1.1%) -16.1% ( -18% - -14%) HighTermMonthSort 90.26 (10.9%) 81.72 (11.6%) -9.5% ( -28% - 14%) HighSloppyPhrase1.84 (1.9%)1.69 (2.2%) -7.9% ( -11% - -3%) LowSloppyPhrase7.87 (2.0%)7.28 (2.5%) -7.4% ( -11% - -3%) MedSloppyPhrase 10.17 (1.6%)9.43 (2.0%) -7.3% ( -10% - -3%) HighTermDayOfYearSort 64.33 (11.6%) 60.25 (10.4%) -6.3% ( -25% - 17%) HighTerm 476.13 (2.5%) 452.30 (1.8%) -5.0% ( -9% -0%) Fuzzy1 211.47 (4.1%) 203.28 (3.3%) -3.9% ( -10% -3%) IntNRQ 31.99 (2.5%) 30.96 (7.6%) -3.2% ( -12% -6%) MedTerm 653.93 (2.4%) 634.02 (1.8%) -3.0% ( -7% -1%) Fuzzy2 218.64 (5.9%) 212.25 (5.4%) -2.9% ( -13% -8%) OrHighHigh 17.28 (1.6%) 16.93 (1.7%) -2.0% ( -5% -1%) LowTerm 1405.19 (2.9%) 1380.15 (2.3%) -1.8% ( -6% -3%) AndHighHigh 21.96 (2.1%) 21.62 (2.5%) -1.5% ( -5% -3%) OrHighMed 59.73 (1.5%) 58.89 (1.7%) -1.4% ( -4% -1%) Prefix3 73.07 (4.8%) 72.07 (5.8%) -1.4% ( -11% -9%) Wildcard 64.42 (3.6%) 63.72 (4.5%) -1.1% ( -8% -7%) Respell 181.31 (2.4%) 180.69 (2.3%) -0.3% ( -4% -4%) AndHighLow 982.32 (2.5%) 981.63 (3.1%) -0.1% ( -5% -5%) AndHighMed 47.62 (2.0%) 47.60 (2.5%) -0.0% ( -4% -4%) LowSpanNear 49.59 (3.4%) 49.65 (3.0%) 0.1% ( -6% -6%) OrHighLow 314.16 (2.2%) 314.60 (1.7%) 0.1% ( -3% -4%) HighSpanNear5.92 (4.6%)5.98 (4.1%) 1.0% ( -7% - 10%) MedSpanNear5.53 (6.7%)5.66 (5.5%) 2.2% ( -9% - 15%) HighPhrase3.87 (1.5%)4.36 (1.6%) 12.6% ( 9% - 15%) {noformat} > Leverage impacts for phrase queries > --- > > Key: LUCENE-8311 > URL: https://issues.apache.org/jira/browse/LUCENE-8311 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-8311.patch > > > Now that we expose raw impacts, we could leverage them for phrase queries. > For instance for exact phrases, we could take the minimum term frequency for > each unique norm value in order to get upper bounds of the score for the > phrase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries
[ https://issues.apache.org/jira/browse/LUCENE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483907#comment-16483907 ] Robert Muir commented on LUCENE-8311: - Yeah, I was thinking more along the lines of LowPhrase (still exact scoring). Sloppy is a whole nother beast :) > Leverage impacts for phrase queries > --- > > Key: LUCENE-8311 > URL: https://issues.apache.org/jira/browse/LUCENE-8311 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-8311.patch > > > Now that we expose raw impacts, we could leverage them for phrase queries. > For instance for exact phrases, we could take the minimum term frequency for > each unique norm value in order to get upper bounds of the score for the > phrase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries
[ https://issues.apache.org/jira/browse/LUCENE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483899#comment-16483899 ] Adrien Grand commented on LUCENE-8311: -- Unfortunately I don't think this is due to this scoring issue, but rather to the fact that a single position of a given term is allowed to be part of several matches in sloppy phrases. For instance if the query is {{"the fox"~4}}, and {{the}} and {{fox}} have respective term frequencies of 5 and 1. Then we can assume that the maximum frequency is 1 for an exact phrase (the min of both freqs). But if the query is a sloppy phrase query, we could have a frequency of 4 if a document has 5 occurrences of {{the}} at position N (as synonyms of each other) and 1 occurrence of {{fox}} at position {{N+1}}. Yet such documents that trigger the maximum frequency do not exist in practice, which causes the score upper bounds that we compute to be significantly higher than the scores that are computed in practice, so no blocks of documents are ever skipped because their score is not competitive. > Leverage impacts for phrase queries > --- > > Key: LUCENE-8311 > URL: https://issues.apache.org/jira/browse/LUCENE-8311 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-8311.patch > > > Now that we expose raw impacts, we could leverage them for phrase queries. > For instance for exact phrases, we could take the minimum term frequency for > each unique norm value in order to get upper bounds of the score for the > phrase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries
[ https://issues.apache.org/jira/browse/LUCENE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483764#comment-16483764 ] Robert Muir commented on LUCENE-8311: - I wonder if its difficult to test with another similarity such as a DFR model? I'm only asking because I'm a little concerned that the bogus way we compute "phrase IDF" for BM25Similarity & ClassicSimilarity is getting in your way. All the other models use a more sane approach (scores like a disjunction internally). BM25 carried along the brain damage of ClassicSimilarity just because it was trying to minimize differences, but not for any particular good reason. > Leverage impacts for phrase queries > --- > > Key: LUCENE-8311 > URL: https://issues.apache.org/jira/browse/LUCENE-8311 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-8311.patch > > > Now that we expose raw impacts, we could leverage them for phrase queries. > For instance for exact phrases, we could take the minimum term frequency for > each unique norm value in order to get upper bounds of the score for the > phrase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries
[ https://issues.apache.org/jira/browse/LUCENE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483737#comment-16483737 ] Adrien Grand commented on LUCENE-8311: -- Here is a patch that builds on LUCENE-8312 and the output of a luceneutil run: {noformat} LowPhrase 23.35 (2.1%) 16.05 (1.1%) -31.3% ( -33% - -28%) HighSloppyPhrase 26.90 (5.1%) 23.84 (3.8%) -11.4% ( -19% - -2%) HighTermMonthSort 155.27 (13.1%) 138.14 (11.0%) -11.0% ( -31% - 15%) MedSloppyPhrase 18.12 (4.6%) 16.20 (3.2%) -10.6% ( -17% - -2%) LowSloppyPhrase 236.36 (5.4%) 218.12 (4.5%) -7.7% ( -16% -2%) HighTermDayOfYearSort 89.47 (11.5%) 84.16 (10.1%) -5.9% ( -24% - 17%) HighTerm 1463.31 (3.9%) 1402.12 (3.4%) -4.2% ( -11% -3%) IntNRQ 29.88 (6.8%) 28.65 (6.8%) -4.1% ( -16% - 10%) MedTerm 1721.26 (3.8%) 1672.73 (3.2%) -2.8% ( -9% -4%) Fuzzy2 112.51 (5.1%) 109.41 (4.9%) -2.8% ( -12% -7%) LowTerm 2469.28 (3.8%) 2414.68 (3.5%) -2.2% ( -9% -5%) MedSpanNear 85.48 (4.1%) 84.02 (3.9%) -1.7% ( -9% -6%) HighSpanNear 10.03 (4.4%)9.86 (4.1%) -1.7% ( -9% -7%) Fuzzy1 153.76 (4.9%) 151.56 (4.0%) -1.4% ( -9% -7%) OrHighHigh 20.38 (3.2%) 20.18 (3.0%) -1.0% ( -6% -5%) OrHighMed 72.71 (2.5%) 72.05 (2.4%) -0.9% ( -5% -4%) Respell 163.99 (2.1%) 162.75 (2.3%) -0.8% ( -5% -3%) Wildcard 39.17 (5.7%) 38.90 (5.0%) -0.7% ( -10% - 10%) Prefix3 45.93 (7.2%) 45.72 (6.6%) -0.5% ( -13% - 14%) AndHighMed 147.08 (2.0%) 146.55 (3.1%) -0.4% ( -5% -4%) AndHighHigh 52.33 (2.0%) 52.25 (3.6%) -0.2% ( -5% -5%) OrHighLow 331.39 (3.4%) 334.43 (2.5%) 0.9% ( -4% -7%) AndHighLow 603.54 (3.6%) 611.77 (3.8%) 1.4% ( -5% -9%) LowSpanNear7.87 (11.1%)8.04 (6.9%) 2.2% ( -14% - 22%) MedPhrase 94.59 (1.6%) 108.41 (1.9%) 14.6% ( 10% - 18%) HighPhrase 11.74 (2.8%) 109.04 (24.6%) 828.7% ( 779% - 880%) {noformat} It helps HighPhrase a lot, but hurts LowPhrase a bit. More generally, this change helps most when at least one of the searched terms mostly occurs within the phrase. For instance "york" mostly appears in the "new york" phrase in the wikipedia corpus that we use, so the "new york" phrase gets a huge speedup. This is not the case for LowPhrase entries like "median age" or "his family", which get worse latencies because they need to read impacts from the index and compute score upper bounds. I tried to implement impacts on sloppy phrases by summing up frequencies but it didn't help since the score upper bounds were way higher than the scores that were actually computed. The reason why they are slower according to luceneutil is that the refactoring made them use the impacts enums rather than simple postings enums to iterate doc ids. > Leverage impacts for phrase queries > --- > > Key: LUCENE-8311 > URL: https://issues.apache.org/jira/browse/LUCENE-8311 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > Now that we expose raw impacts, we could leverage them for phrase queries. > For instance for exact phrases, we could take the minimum term frequency for > each unique norm value in order to get upper bounds of the score for the > phrase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org