[ https://issues.apache.org/jira/browse/LUCENE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877919#comment-16877919 ]
Adrien Grand commented on LUCENE-8311: -------------------------------------- I opened https://github.com/apache/lucene-solr/pull/760. Performance is a bit better than what we had before: {noformat} TaskQPS baseline StdDev QPS patch StdDev Pct diff HighTerm 1395.12 (5.1%) 1230.78 (4.3%) -11.8% ( -20% - -2%) MedTerm 2352.56 (4.7%) 2170.42 (3.9%) -7.7% ( -15% - 0%) LowSpanNear 13.70 (7.0%) 12.67 (4.9%) -7.5% ( -18% - 4%) HighSpanNear 5.69 (5.3%) 5.31 (3.2%) -6.5% ( -14% - 2%) MedSpanNear 23.33 (4.2%) 21.97 (2.4%) -5.8% ( -11% - 0%) AndHighMed 114.70 (2.9%) 109.40 (4.1%) -4.6% ( -11% - 2%) AndHighHigh 35.08 (3.2%) 33.51 (4.1%) -4.5% ( -11% - 2%) LowTerm 3014.11 (4.7%) 2893.44 (4.7%) -4.0% ( -12% - 5%) OrHighMed 60.26 (2.5%) 57.96 (2.1%) -3.8% ( -8% - 0%) OrHighHigh 15.45 (2.5%) 14.87 (2.3%) -3.8% ( -8% - 1%) LowPhrase 25.81 (3.4%) 24.89 (2.8%) -3.6% ( -9% - 2%) HighSloppyPhrase 7.44 (6.3%) 7.20 (5.7%) -3.3% ( -14% - 9%) MedSloppyPhrase 12.76 (5.1%) 12.51 (4.6%) -1.9% ( -10% - 8%) LowSloppyPhrase 34.24 (4.1%) 33.59 (3.8%) -1.9% ( -9% - 6%) HighTermMonthSort 70.86 (10.9%) 69.98 (10.7%) -1.2% ( -20% - 22%) Fuzzy1 211.28 (3.5%) 208.86 (2.2%) -1.1% ( -6% - 4%) Fuzzy2 180.97 (4.4%) 179.47 (2.6%) -0.8% ( -7% - 6%) OrHighLow 467.25 (2.9%) 467.94 (2.0%) 0.1% ( -4% - 5%) Prefix3 91.35 (8.1%) 91.52 (7.2%) 0.2% ( -14% - 16%) HighTermDayOfYearSort 62.77 (6.9%) 62.96 (7.5%) 0.3% ( -13% - 15%) Wildcard 129.49 (4.3%) 129.99 (2.8%) 0.4% ( -6% - 7%) Respell 210.68 (1.9%) 211.58 (2.4%) 0.4% ( -3% - 4%) AndHighLow 541.64 (3.1%) 544.44 (3.2%) 0.5% ( -5% - 7%) IntNRQ 148.56 (8.3%) 149.44 (10.4%) 0.6% ( -16% - 21%) HighPhrase 10.86 (9.0%) 13.92 (15.2%) 28.2% ( 3% - 57%) MedPhrase 62.22 (2.1%) 97.61 (4.6%) 56.9% ( 49% - 64%) {noformat} But there is a lot of variance across runs because it depends a lot on which query gets picked up. For instance on another run I got {noformat} LowPhrase 39.39 (1.9%) 51.21 (2.2%) 30.0% ( 25% - 34%) HighPhrase 13.09 (3.2%) 192.76 (26.8%) 1372.5% (1301% - 1448%) {noformat} In spite of some queries that get slightly slower, I think we should merge this since we need phrases to expose good impacts if we want to give boolean queries a chance to speed up queries that include phrases. Term queries appear to be a bit slower, I'm assuming that this is due to the fact that the JVM cannot do as much inlining as before since we are starting to use classes for phrases that were only used for term queries before. > Leverage impacts for phrase queries > ----------------------------------- > > Key: LUCENE-8311 > URL: https://issues.apache.org/jira/browse/LUCENE-8311 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Priority: Minor > Attachments: LUCENE-8311.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Now that we expose raw impacts, we could leverage them for phrase queries. > For instance for exact phrases, we could take the minimum term frequency for > each unique norm value in order to get upper bounds of the score for the > phrase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org