[ https://issues.apache.org/jira/browse/LUCENE-9204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365106#comment-17365106 ]
Michael Gibney commented on LUCENE-9204: ---------------------------------------- I hope it's ok to post this here; I've [added benchmarks|https://github.com/mikemccand/luceneutil/pull/133] with the goal of quantifying performance for these different approaches. 500k docs from wikimedium; baseline and candidate code are the same, since I'm initially seeking to compare different queries, not different code. First, a realistic use-case, somewhat contrived to exercise {{pullUpDisjunctions()}}: {code} # (body:us|united-states health|health-care policy|public-policy law|legal-aspects)~10 Task QPS baseline StdDev QPS candidate StdDev Pct diff p-value IntervalDis 20.34 (11.4%) 19.83 (9.2%) -2.5% ( -20% - 20%) 0.446 IntervalMinDis 34.03 (9.9%) 35.22 (9.5%) 3.5% ( -14% - 25%) 0.251 SpanDis 63.63 (10.4%) 68.56 (11.0%) 7.8% ( -12% - 32%) 0.022 {code} Next, an intensive use-case, contrived to push/illustrate the performance profile of increasing the numbers of internal disjunctions: {code} # (body:smith a|in-the)~10 # (body:smith a|in-the the|in-the)~10 # (body:smith a|in-the the|in-the a|in-the)~10 # (body:smith a|in-the the|in-the a|in-the the|in-the)~10 # (body:smith a|in-the the|in-the a|in-the the|in-the a|in-the)~10 # (body:smith a|in-the the|in-the a|in-the the|in-the a|in-the the|in-the)~10 # NOTE: "smith" is arbitrary; just to push QPS numbers into a more human-friendly range Task QPS baseline StdDev QPS candidate StdDev Pct diff p-value IntervalDis1 82.47 (2.3%) 81.27 (1.9%) -1.5% ( -5% - 2%) 0.276 IntervalDis2 25.96 (1.3%) 25.91 (1.7%) -0.2% ( -3% - 2%) 0.851 IntervalDis3 9.46 (2.3%) 9.46 (3.4%) -0.0% ( -5% - 5%) 0.986 IntervalDis4 3.69 (2.1%) 3.69 (2.3%) 0.1% ( -4% - 4%) 0.962 IntervalDis5 1.57 (1.1%) 1.56 (0.9%) -0.7% ( -2% - 1%) 0.282 IntervalDis6 0.66 (0.6%) 0.66 (1.5%) -0.6% ( -2% - 1%) 0.414 IntervalMinDis1 130.06 (5.6%) 129.07 (4.8%) -0.8% ( -10% - 10%) 0.817 IntervalMinDis2 115.44 (6.3%) 116.59 (4.2%) 1.0% ( -8% - 12%) 0.769 IntervalMinDis3 97.24 (5.0%) 99.19 (7.6%) 2.0% ( -10% - 15%) 0.625 IntervalMinDis4 100.28 (8.0%) 101.31 (3.1%) 1.0% ( -9% - 13%) 0.791 IntervalMinDis5 102.01 (8.0%) 101.34 (6.2%) -0.6% ( -13% - 14%) 0.886 IntervalMinDis6 99.96 (2.2%) 97.27 (7.0%) -2.7% ( -11% - 6%) 0.410 SpanDis1 81.13 (4.0%) 80.34 (2.1%) -1.0% ( -6% - 5%) 0.630 SpanDis2 45.01 (1.6%) 44.21 (1.5%) -1.8% ( -4% - 1%) 0.068 SpanDis3 31.01 (2.0%) 31.21 (1.9%) 0.6% ( -3% - 4%) 0.608 SpanDis4 24.36 (2.2%) 23.01 (5.7%) -5.6% ( -13% - 2%) 0.042 SpanDis5 19.76 (4.0%) 20.22 (3.5%) 2.3% ( -4% - 10%) 0.324 SpanDis6 17.29 (4.5%) 16.74 (5.9%) -3.2% ( -12% - 7%) 0.340 {code} For good measure, I added two tasks that compare non-positional disjunctions across different implementations: SpanOrQuery and DisjunctionIntervalsSource. (fwiw, I'd guess the performance gap between straight disjunctions could probably be closed without too much work?) {code} # (body:trash|waste|garbage|recycling|refuse) Task QPS baseline StdDev QPS candidate StdDev Pct diff p-value PlainSpanDis 80.92 (11.3%) 82.80 (17.5%) 2.3% ( -23% - 35%) 0.619 PlainIntervalDis 142.66 (10.8%) 154.38 (13.6%) 8.2% ( -14% - 36%) 0.035 {code} > Move span queries to the queries module > --------------------------------------- > > Key: LUCENE-9204 > URL: https://issues.apache.org/jira/browse/LUCENE-9204 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Alan Woodward > Assignee: Alan Woodward > Priority: Major > Fix For: main (9.0) > > Time Spent: 1h > Remaining Estimate: 0h > > We have a slightly odd situation currently, with two parallel query > structures for building complex positional queries: the long-standing span > queries, in core; and interval queries, in the queries module. Given that > interval queries solve at least some of the problems we've had with Spans, I > think we should be pushing users more towards these implementations. It's > counter-intuitive to do that when Spans are in core though. I've opened this > issue to discuss moving the spans package as a whole to the queries module. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org