gf2121 opened a new pull request, #12324:
URL: https://github.com/apache/lucene/pull/12324

   Today `Sparse#AdvanceExactWithinBlock` always need to read next doc and seek 
back if a doc not exists. This could do harm to performance in dense hit 
queries. 
   
   For example, a field exists in doc 1, 5. When `advanceExact` 2,3,4 it always 
need to read next doc (5) and seek back.
   
   I think caching the next existing doc in block can help dense hit queries 
without too much harm to other cases.
   
   I ran a benchmark with `MatchAllDocsQuery` on some fields with different 
sparsity:
   
   > sparsity=n means field only exists when `doc % n == 0`
   
   <byte-sheet-html-origin data-id="1684739567508" data-version="4" 
data-is-embed="false" data-grid-line-hidden="false" 
data-importRangeRawData-spreadSource="https://bytedance.feishu.cn/sheets/YyZcs5ZLNh9tl2t2MvKcsU4jn6b";
 data-importRangeRawData-range="&#39;Sheet1&#39;!I1:L12">
   
   sparsity | baseline(ms) | candidate(ms) | diff
   -- | -- | -- | --
   32 | 255 | 112 | -56.08%
   64 | 260 | 95 | -63.46%
   128 | 264 | 94 | -64.39%
   256 | 262 | 93 | -64.50%
   512 | 260 | 91 | -65.00%
   1024 | 259 | 90 | -65.25%
   2048 | 258 | 90 | -65.12%
   4096 | 253 | 90 | -64.43%
   8192 | 243 | 90 | -62.96%
   16384 | 224 | 90 | -59.82%
   32768 | 184 | 90 | -51.09%
   
   </byte-sheet-html-origin>
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to