[ 
https://issues.apache.org/jira/browse/LUCENE-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17319972#comment-17319972
 ] 

Adrien Grand commented on LUCENE-9335:
--------------------------------------

bq. I made the following changes, and actually still saw varying benchmark 
result across runs (randomized queries?)

Indeed the benchmark randomly picks queries in the tasks file.

bq. Changes in benchUtil.py to not verify counts

Actually you should be able to do it without modifying the benchmarking code, 
by configuring your Competition object to not verify counts like that in your 
localrun file: {{comp =  competition.Competition(verifyCounts=False)}}

bq. When I run luceneutil, I see further errors from verifyScores section of 
code, which may indicate bugs in my changes:

Indeed this indicates that the query returns different top hits with your 
change. If the change was in the order of one ulp, then this could be due to 
the fact that the sum might depend on the order in which clauses' scores are 
summed up, but given the significant score difference, there must be a bigger 
problem. Have you run tests with this change? This could help figure out where 
the bug is.

> Add a bulk scorer for disjunctions that does dynamic pruning
> ------------------------------------------------------------
>
>                 Key: LUCENE-9335
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9335
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Lucene often gets benchmarked against other engines, e.g. against Tantivy and 
> PISA at [https://tantivy-search.github.io/bench/] or against research 
> prototypes in Table 1 of 
> [https://cs.uwaterloo.ca/~jimmylin/publications/Grand_etal_ECIR2020_preprint.pdf].
>  Given that top-level disjunctions of term queries are commonly used for 
> benchmarking, it would be nice to optimize this case a bit more, I suspect 
> that we could make fewer per-document decisions by implementing a BulkScorer 
> instead of a Scorer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to