abernardi597 commented on PR #15472:
URL: https://github.com/apache/lucene/pull/15472#issuecomment-3656620956
> Do you have the 8 bit quantization results for the last table?
I didn't run that benchmark on 8-bit quantization, since empirically that
seems to substantially increase the indexing time and query latency without
much benefit compared to 4-bit.
The way I am drawing comparisons from scalar to product quantization is by
the compression level. For example, 8-bit quantization represents 4x
compression, which for PQ means using sub-vectors of dimension 1 as each
subvector has 1 byte worth of centroids (so each 4-byte float compresses to a
one-byte centroid index), Similarly, 1-bit quantization represents 32x
compression, where PQ uses sub-vectors of dimension 8 so 8 4-byte floats
compress into a one-byte centroid index. Even higher compression rates are
theoretically possible with PQ than with scalar quantization, but I have not
touched it at all really.
The table before the last does include 8-bit results for JVector, which
shows nearly 5x slower query latency and 16x slower indexing speed than raw
vectors. It's also nearly 3x slower to query than 4-bit and takes more than 2x
longer to index.
> How did you make that table, with the alternating HNSW/jVector rows?
`knnPerfTest.py` seems to support this already by specifying a tuple for
`indexType`! Once I made the two codecs comparable with the same merge policy
and concurrency parameters, it spit out the table for me.
> Does this mean the HNSW graph is still bushier in JVector?
I did wire this up when investigating some of the recall disparity and
trying to make the graphs look similar (e.g. `alpha=1`, `useHierarchy=true`) to
validate the graphs aren't totally dissimilar.
For example, compare the results for HNSW (top) and JVector (bottom) on 500K
docs without quantization:
```
Leaf 0 has 3 layers
Leaf 0 has 500000 documents
Graph level=2 size=62, Fanout min=1, mean=7.35, max=19, meandelta=29631.33
% 0 10 20 30 40 50 60 70 80 90 100
0 2 4 4 5 6 7 9 11 12 19
Graph level=1 size=7113, Fanout min=1, mean=19.75, max=64,
meandelta=20463.54
% 0 10 20 30 40 50 60 70 80 90 100
0 6 8 11 13 16 20 24 30 40 64
Graph level=0 size=500000, Fanout min=1, mean=23.12, max=128,
meandelta=18093.16
% 0 10 20 30 40 50 60 70 80 90 100
0 7 9 12 14 17 21 26 33 47 128
Graph level=2 size=62, connectedness=1.00
Graph level=1 size=7113, connectedness=1.00
Graph level=0 size=500000, connectedness=1.00
```
```
Leaf 0 has 3 layers
Leaf 0 has 500000 documents
Graph level=2 size=21, Fanout min=1, mean=5.43, max=12, meandelta=-22546.25
% 0 10 20 30 40 50 60 70 80 90 100
0 1 2 3 4 5 7 7 7 8 12
Graph level=1 size=4001, Fanout min=1, mean=18.79, max=64,
meandelta=-3287.41
% 0 10 20 30 40 50 60 70 80 90 100
0 5 8 10 13 16 19 23 29 37 64
Graph level=0 size=500000, Fanout min=1, mean=23.11, max=128,
meandelta=-1101.85
% 0 10 20 30 40 50 60 70 80 90 100
0 7 9 12 14 17 21 26 33 47 128
Graph level=2 size=21, connectedness=1.00
Graph level=1 size=4001, connectedness=1.00
Graph level=0 size=500000, connectedness=1.00
```
Interestingly, the base layers show the same distribution of degrees (and
nearly identical mean fanout), while the upper layers start to diverge, most
notably in size.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]