zhaih opened a new pull request, #15208:
URL: https://github.com/apache/lucene/pull/15208
### Description
In #14331 @mayya-sharipova made a great change to introduce this smart
algorithm of computing join set and then use it to accelerate the graph merge.
This PR tries to make the above algorithm work with the concurrent graph merge.
### Progress
I made the first version which just naively let each thread deal with one
segment and then handle rest of the nodes like before and get a quite mixed and
ineresting benchmark result. Basically the changed code only performs better in
10M doc situation, which kind of making sense since I made it handle one
segment per thread, so if there's not enough segments it won't be able to make
use of all the threads.
### Benchmark
#### cand
```
recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn
beamWidth quantized index(s) index_docs/s force_merge(s) num_segments
index_size(MB) vec_disk(MB) vec_RAM(MB) indexType
0.784 0.254 0.249 0.980 1000000 100 50 64
250 no 28.48 35108.66 98.94 1
435.66 381.470 381.470 HNSW
0.761 0.266 0.261 0.981 2000000 100 50 64
250 no 61.42 32562.15 231.87 1
875.69 762.939 762.939 HNSW
0.738 0.281 0.276 0.982 5000000 100 50 64
250 no 182.75 27359.33 624.20 1
2213.94 1907.349 1907.349 HNSW
0.716 0.310 0.295 0.952 10000000 100 50 64
250 no 394.47 25350.47 782.60 1
4456.40 3814.697 3814.697 HNSW
```
#### baseline
```
recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn
beamWidth quantized index(s) index_docs/s force_merge(s) num_segments
index_size(MB) vec_disk(MB) vec_RAM(MB) indexType
0.817 0.284 0.270 0.951 1000000 100 50 64
250 no 28.31 35318.22 66.91 1
444.55 381.470 381.470 HNSW
0.794 0.303 0.288 0.950 2000000 100 50 64
250 no 63.60 31447.53 158.45 1
894.32 762.939 762.939 HNSW
0.774 0.306 0.301 0.984 5000000 100 50 64
250 no 180.95 27632.40 253.59 1
2274.45 1907.349 1907.349 HNSW
0.755 0.334 0.329 0.985 10000000 100 50 64
250 no 433.43 23071.94 1022.34 1
4591.37 3814.697 3814.697 HNSW
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]