msokolov commented on PR #14226:
URL: https://github.com/apache/lucene/pull/14226#issuecomment-2678496890
I pushed a version that re-uses scores *and* limits per-leaf topK to global
topK. The former didn't make very much difference, but the latter change did
improve things quite a bit. Here are some numbers from cohere/768d:
### mainline
recall latency (ms) nDoc topK fanout maxConn beamWidth quantized
index s index docs/s num segments index size (MB) vec disk (MB) vec RAM
(MB)
0.954 12.919 500000 50 0 64 250 no
13786 0.00 Infinity 8 1501.70 1464.844
1464.844
0.981 18.488 500000 50 50 64 250 no
20371 0.00 Infinity 8 1501.70 1464.844
1464.844
0.989 22.948 500000 50 100 64 250 no
24963 0.00 Infinity 8 1501.70 1464.844
1464.844
### wih reused scores *and* limiting perLeafK <= K
Results:
recall latency (ms) nDoc topK fanout maxConn beamWidth quantized
visited index s index docs/s num segments index size (MB) vec disk (MB)
vec RAM (MB)
0.959 11.375 500000 50 0 64 250 no
12086 308.23 1622.15 8 1501.70 1464.844
1464.844
0.979 14.926 500000 50 50 64 250 no
16724 0.00 Infinity 8 1501.70 1464.844
1464.844
0.987 17.858 500000 50 100 64 250 no
20277 0.00 Infinity 8 1501.70 1464.844
1464.844
it would be awesome if you could produce similar comparisons for this
version, @dungba88 !
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]