dungba88 commented on PR #14226:
URL: https://github.com/apache/lucene/pull/14226#issuecomment-2676175583
I ran some benchmark with Cohere 768 dataset for 3 different algorithms: (1)
the baseline "greedy", (2) this PR "optimistic", and (3) with only "pro-rata".
(2) and (3) will converge with fanout >= 20, it kinda makes sense because as we
increase the `ef` parameters, it would already look deeper into the graph. (2)
increase ~6% recall compared to (3) at fanout=0, and the effect got diminished
the higher fanout, such as ~3% at fanout=5 and 1% at fanout=10.
(2) still missed some matches (about 8%) when compared to (1), I'm wondering
why is that, as we tried to exhaustively search in segments.
Other params: ndoc=500K, topK=100
```
recall latency_(ms) fanout quantized greediness index_s index_docs/s
force_merge_s num_segments index_size_(MB) vec_disk_(MB) vec_RAM_(MB)
algorithm
0.936 13.285 0 no 0.00 117.24 4264.94
0.00 7 1490.05 1464.844 1464.844
greediness
0.933 12.766 0 no 0.10 116.45 4293.84
0.00 7 1490.14 1464.844 1464.844
greediness
0.927 12.045 0 no 0.30 116.69 4285.04
0.00 7 1490.12 1464.844 1464.844
greediness
0.923 11.709 0 no 0.50 117.74 4246.79
0.00 7 1490.06 1464.844 1464.844
greediness
0.917 10.950 0 no 0.70 117.68 4248.67
0.00 7 1490.06 1464.844 1464.844
greediness
0.907 10.996 0 no 0.90 117.40 4258.80
0.00 7 1489.98 1464.844 1464.844
greediness
0.836 10.349 0 no 1.00 117.07 4271.13
0.00 7 1490.03 1464.844 1464.844
greediness
0.846 6.682 0 no -1.00 116.50 4291.77
0.00 7 1490.09 1464.844 1464.844
optimistic
0.862 6.699 5 no -1.00 118.33 4225.44
0.00 7 1489.97 1464.844 1464.844
optimistic
0.871 7.172 10 no -1.00 116.10 4306.63
0.00 7 1490.05 1464.844 1464.844
optimistic
0.897 7.962 20 no -1.00 116.11 4306.15
0.00 7 1490.06 1464.844 1464.844
optimistic
0.925 10.975 50 no -1.00 116.71 4284.23
0.00 7 1490.10 1464.844 1464.844
optimistic
0.941 15.635 100 no -1.00 116.58 4289.08
0.00 7 1490.13 1464.844 1464.844
optimistic
0.786 5.727 0 no -1.00 117.86 4242.25
0.00 7 1490.10 1464.844 1464.844
prorata
0.831 6.176 5 no -1.00 117.27 4263.70
0.00 7 1490.07 1464.844 1464.844
prorata
0.857 6.746 10 no -1.00 116.79 4281.04
0.00 7 1490.08 1464.844 1464.844
prorata
0.891 7.763 20 no -1.00 116.21 4302.59
0.00 7 1490.09 1464.844 1464.844
prorata
0.921 11.083 50 no -1.00 118.02 4236.46
0.00 7 1490.06 1464.844 1464.844
prorata
0.940 15.694 100 no -1.00 116.63 4286.91
0.00 7 1490.00 1464.844 1464.844
prorata
```
<img width="713" alt="Screenshot 2025-02-22 at 21 17 40"
src="https://github.com/user-attachments/assets/51daedd7-53f8-4222-9c3a-88c7e5e4a733"
/>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]