[GitHub] [lucene] gf2121 opened a new pull request, #12324: Speed up IndexedDISI Sparse #AdvanceExactWithinBlock for tiny step advance

via GitHub Mon, 22 May 2023 00:17:42 -0700


gf2121 opened a new pull request, #12324:
URL: https://github.com/apache/lucene/pull/12324


   Today `Sparse#AdvanceExactWithinBlock` always need to read next doc and seek 
back if a doc not exists. This could do harm to performance in dense hit 
queries. 
   
   For example, a field exists in doc 1, 5. When `advanceExact` 2,3,4 it always 
need to read next doc (5) and seek back.
   
   I think caching the next existing doc in block can help dense hit queries 
without too much harm to other cases.
   
   I ran a benchmark with `MatchAllDocsQuery` on some fields with different 
sparsity:
   
   > sparsity=n means field only exists when `doc % n == 0`
   
   <byte-sheet-html-origin data-id="1684739567508" data-version="4" 
data-is-embed="false" data-grid-line-hidden="false" 
data-importRangeRawData-spreadSource="https://bytedance.feishu.cn/sheets/YyZcs5ZLNh9tl2t2MvKcsU4jn6b";
 data-importRangeRawData-range="&#39;Sheet1&#39;!I1:L12">
   
   sparsity | baseline(ms) | candidate(ms) | diff
   -- | -- | -- | --
   32 | 255 | 112 | -56.08%
   64 | 260 | 95 | -63.46%
   128 | 264 | 94 | -64.39%
   256 | 262 | 93 | -64.50%
   512 | 260 | 91 | -65.00%
   1024 | 259 | 90 | -65.25%
   2048 | 258 | 90 | -65.12%
   4096 | 253 | 90 | -64.43%
   8192 | 243 | 90 | -62.96%
   16384 | 224 | 90 | -59.82%
   32768 | 184 | 90 | -51.09%
   
   </byte-sheet-html-origin>
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [lucene] gf2121 opened a new pull request, #12324: Speed up IndexedDISI Sparse #AdvanceExactWithinBlock for tiny step advance

Reply via email to