abernardi597 commented on PR #15472:
URL: https://github.com/apache/lucene/pull/15472#issuecomment-3656620956

   > Do you have the 8 bit quantization results for the last table?
   
   I didn't run that benchmark on 8-bit quantization, since empirically that 
seems to substantially increase the indexing time and query latency without 
much benefit compared to 4-bit.
   The way I am drawing comparisons from scalar to product quantization is by 
the compression level. For example, 8-bit quantization represents 4x 
compression, which for PQ means using sub-vectors of dimension 1 as each 
subvector has 1 byte worth of centroids (so each 4-byte float compresses to a 
one-byte centroid index), Similarly, 1-bit quantization represents 32x 
compression, where PQ uses sub-vectors of dimension 8 so 8 4-byte floats 
compress into a one-byte centroid index. Even higher compression rates are 
theoretically possible with PQ than with scalar quantization, but I have not 
touched it at all really.
   
   The table before the last does include 8-bit results for JVector, which 
shows nearly 5x slower query latency and 16x slower indexing speed than raw 
vectors. It's also nearly 3x slower to query than 4-bit and takes more than 2x 
longer to index.
   
   > How did you make that table, with the alternating HNSW/jVector rows?
   `knnPerfTest.py` seems to support this already by specifying a tuple for 
`indexType`! Once I made the two codecs comparable with the same merge policy 
and concurrency parameters, it spit out the table for me.
   
   > Does this mean the HNSW graph is still bushier in JVector?
   I did wire this up when investigating some of the recall disparity and 
trying to make the graphs look similar (e.g. `alpha=1`, `useHierarchy=true`) to 
validate the graphs aren't totally dissimilar.
   
   For example, compare the results for HNSW (top) and JVector (bottom) on 500K 
docs without quantization:
    ```
    Leaf 0 has 3 layers
    Leaf 0 has 500000 documents
    Graph level=2 size=62, Fanout min=1, mean=7.35, max=19, meandelta=29631.33
    %   0  10  20  30  40  50  60  70  80  90 100
        0   2   4   4   5   6   7   9  11  12  19
    Graph level=1 size=7113, Fanout min=1, mean=19.75, max=64, 
meandelta=20463.54
    %   0  10  20  30  40  50  60  70  80  90 100
        0   6   8  11  13  16  20  24  30  40  64
    Graph level=0 size=500000, Fanout min=1, mean=23.12, max=128, 
meandelta=18093.16
    %   0  10  20  30  40  50  60  70  80  90 100
        0   7   9  12  14  17  21  26  33  47 128
    Graph level=2 size=62, connectedness=1.00
    Graph level=1 size=7113, connectedness=1.00
    Graph level=0 size=500000, connectedness=1.00
    ```
   
    ```
    Leaf 0 has 3 layers
    Leaf 0 has 500000 documents
    Graph level=2 size=21, Fanout min=1, mean=5.43, max=12, meandelta=-22546.25
    %   0  10  20  30  40  50  60  70  80  90 100
        0   1   2   3   4   5   7   7   7   8  12
    Graph level=1 size=4001, Fanout min=1, mean=18.79, max=64, 
meandelta=-3287.41
    %   0  10  20  30  40  50  60  70  80  90 100
        0   5   8  10  13  16  19  23  29  37  64
    Graph level=0 size=500000, Fanout min=1, mean=23.11, max=128, 
meandelta=-1101.85
    %   0  10  20  30  40  50  60  70  80  90 100
        0   7   9  12  14  17  21  26  33  47 128
    Graph level=2 size=21, connectedness=1.00
    Graph level=1 size=4001, connectedness=1.00
    Graph level=0 size=500000, connectedness=1.00
    ```
   
   Interestingly, the base layers show the same distribution of degrees (and 
nearly identical mean fanout), while the upper layers start to diverge, most 
notably in size.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to