mayya-sharipova edited a comment on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-605327672 Update: these are wrong results. Please disregard them @msokolov Thank for suggesting additional benchmarks that we can use. Below are the results on the dataset `wikimedium10m`. First I will repeat the results from the previous round of benchmarking: topN=10, taskRepeatCount = 20, concurrentSearchers = False | TaskQPS | baseline QPS | StdDevQPS | my_modified_version QPS | StdDevQPS | | --------------------- | -----------: | --------: | ----------------------: | --------: | | **TermDTSort** | 147.64 | (11.5%) | 547.80 | (6.6%) | | HighTermMonthSort | 147.85 | (12.2%) | 239.28 | (7.3%) | | HighTermDayOfYearSort | 74.44 | (7.7%) | 42.56 | (12.1%) | --- topN=10, **taskRepeatCount = 500**, concurrentSearchers = False | TaskQPS | baseline QPS | StdDevQPS | my_modified_version QPS | StdDevQPS | | --------------------- | -----------: | --------: | ----------------------: | --------: | | **TermDTSort** | 184.60 | (8.2%) | 3046.19 | (4.4%) | | HighTermMonthSort | 209.43 | (6.5%) | 253.90 | (10.5%) | | HighTermDayOfYearSort | 130.97 | (5.8%) | 73.25 | (11.8%) | This seemed to speed up all operations, and here the speedups for `TermDTSort` even bigger: 16.5x times. There is also seems to be more regression for `HighTermDayOfYearSort`. --- **topN=500**, taskRepeatCount = 20, concurrentSearchers = False | TaskQPS | baseline QPS | StdDevQPS | my_modified_version QPS | StdDevQPS | | --------------------- | -----------: | --------: | ----------------------: | --------: | | **TermDTSort** | 210.24 | (9.7%) | 537.65 | (6.7%) | | HighTermMonthSort | 116.02 | (8.9%) | 189.96 | (13.5%) | | HighTermDayOfYearSort | 42.33 | (7.6%) | 67.93 | (9.3%) | With increased `topN` the sort optimization has less speedups up to 2x, as it is expected as it will be possible to run it only after collecting `topN` docs. --- topN=10, taskRepeatCount = 20, **concurrentSearchers = True** | TaskQPS | baseline QPS | StdDevQPS | my_modified_version QPS | StdDevQPS | | --------------------- | -----------: | --------: | ----------------------: | --------: | | **TermDTSort** | 132.09 | (14.3%) | 287.93 | (11.8%) | | HighTermMonthSort | 211.01 | (12.2%) | 116.46 | (7.1%) | | HighTermDayOfYearSort | 72.28 | (6.1%) | 68.21 | (11.4%) | With the concurrent searchers the speedups are also smaller up to 2x. This is expected as now segments are spread between several TopFieldCollects/Comparators and they don't exchange bottom values. As a follow-up on this PR, we can think how we can have a global bottom value similar how `MaxScoreAccumulator` is used to set up a global competitive min score. --- with **indexSort='lastModNDV:long'** topN=10, taskRepeatCount = 20, concurrentSearchers = False | TaskQPS | baseline QPS | StdDevQPS | my_modified_version QPS | StdDevQPS | | --------------------- | -----------: | --------: | ----------------------: | --------: | | **TermDTSort** | 321.75 | (11.5%) | 364.83 | (7.8%) | | HighTermMonthSort | 205.20 | (5.7%) | 178.16 | (7.8%) | | HighTermDayOfYearSort | 66.07 | (12.0%) | 58.84 | (9.3%) |
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org