[jira] [Commented] (LUCENE-9335) Add a bulk scorer for disjunctions that does dynamic pruning

Zach Chen (Jira) Fri, 14 May 2021 23:26:10 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17344973#comment-17344973
 ]


Zach Chen commented on LUCENE-9335:
-----------------------------------

Just want to provide a quick summary of the latest progress of this issue. 
Currently there are 3 different BMM implementations from 2 PRs:
 # Scorer based implementation
 ## PR: [https://github.com/apache/lucene/pull/101] 
 ## wikibigall benchmark results: 
[https://github.com/apache/lucene/pull/101#issuecomment-840255508]
 ### On average it improves _OrHighHigh_ by 40%+, and _OrHighMed_ around 20%
 ### 1 out of 3 runs it hurt _AndMedOrHighHigh_ and _OrHighMed_ performance by 
around 16%
 # BulkScorer based implementation with fixed window size
 ## PR: [https://github.com/apache/lucene/pull/113] 
 ## wikibigall benchmark with window size 1024 results: 
[https://github.com/apache/lucene/pull/113#issuecomment-840293637]
 ### On average it improves _OrHighHigh_ by 3-8%, and _OrHighMed_ by 23%+
 ### For some reasons it hurt Fuzzy1 & Fuzzy2 performance by around 8%, even 
though it wasn't used for those queries 
 # BulkScorer based implementation without window, and using the scorer 
implementation from #1
 ## Commit: 
[https://github.com/zacharymorn/lucene/commit/3bcdbb31a7d55b00cb53e4be40a4adc93b9f30db]
 
 ## wikibigall benchmark results: 
[https://github.com/apache/lucene/pull/113#discussion_r631568912]
 ### On average it improves _OrHighHigh by 52%, and_ _OrHighMed 10% - 18%_
 ### For some reasons it hurt Fuzzy1 & Fuzzy2 performance consistently by 
around 8%-13%, even though it wasn't used for those queries 

[~jpountz] what do you think about the above results as well as the latest 
changes, and any other idea we would like to try on? From the current results 
it appears option 1 might be the one to go with? I can start to work on 
productizing the changes and adding tests if we have settled down on the 
implementation approach here.

 

> Add a bulk scorer for disjunctions that does dynamic pruning
> ------------------------------------------------------------
>
>                 Key: LUCENE-9335
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9335
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: wikimedium.10M.nostopwords.tasks, 
> wikimedium.10M.nostopwords.tasks.5OrMeds
>
>          Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> Lucene often gets benchmarked against other engines, e.g. against Tantivy and 
> PISA at [https://tantivy-search.github.io/bench/] or against research 
> prototypes in Table 1 of 
> [https://cs.uwaterloo.ca/~jimmylin/publications/Grand_etal_ECIR2020_preprint.pdf].
>  Given that top-level disjunctions of term queries are commonly used for 
> benchmarking, it would be nice to optimize this case a bit more, I suspect 
> that we could make fewer per-document decisions by implementing a BulkScorer 
> instead of a Scorer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9335) Add a bulk scorer for disjunctions that does dynamic pruning

Reply via email to