shubhamvishu commented on PR #14963:
URL: https://github.com/apache/lucene/pull/14963#issuecomment-3242383393
To verify that this isn't a red herring, I deliberately increased the
`HNSW_GRAPH_THRESHOLD` to `1000000`, effectively preventing the creation of any
HNSW graphs (as confirmed by the very low indexing time and the absence of
graph layers in the structure). As expected, latency increased significantly
due to a fallback to exact search. This validates that the earlier results with
`HNSW_GRAPH_THRESHOLD` set to `10` or `100` represent a win-win situation i.e.
we achieve ~4x faster indexing without compromising on latency (in fact,
latency improves due to fewer segments).
#### Candidate (with `HNSW_GRAPH_THRESHOLD` = 1000000)
```
:
.
.
:
Leaf 0 has 0 layers
Leaf 0 has 309324 documents
Leaf 1 has 0 layers
Leaf 1 has 154373 documents
Leaf 2 has 0 layers
Leaf 2 has 36303 documents
Results:
recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn
beamWidth quantized index(s) index_docs/s num_segments index_size(MB)
vec_disk(MB) vec_RAM(MB) indexType
0.514 89.174 89.153 1.000 500000 100 50 64
250 4 bits 26.93 18565.96 3 1651.81
1649.857 185.013 HNSW
0.890 223.908 223.798 1.000 500000 100 50 64
250 7 bits 26.93 18569.41 3 1834.91
1832.962 368.118 HNSW
1.000 140.186 140.176 1.000 500000 100 50 64
250 no 24.19 20667.99 3 1466.79
1464.844 1464.844 HNSW
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]