Re: [I] Explore more granular vector quantization? [lucene]

via GitHub Thu, 19 Feb 2026 12:04:53 -0800


kaivalnp commented on issue #15734:
URL: https://github.com/apache/lucene/issues/15734#issuecomment-3929639459


   However, I understand that packing vectors with arbitrary quantization 
levels + supporting various comparisons in Lucene is a challenge -- so I tried 
a slightly different approach:
   
   What if vectors were first expanded to a higher dimension, and then 
quantized to one of the supported options? (e.g. represent 1024 dimension 
vectors in 1536 dimensions, then quantize to 4 bits: we get the 
memory-equivalent of 6-bit quantization)
   
   A dimension "expansion" from X -> Y could be as simple as padding the 
original vector with (Y - X) zeros, and rotating the new Y dimension vector by 
an arbitrary rotation. We need a rotation because quantization occurs 
per-dimension, so any dimension with all zeros is not very useful -- we 
basically need to spread the information in X dimensions across Y, which is 
where a random rotation helps.
   
   I took some 1024 dimension Cohere v3 docs + queries and "expanded" them to 
1536 dimensions using the above strategy, which was a simple Python script like:
   
   ```python
   import numpy as np
   
   def random_rotation_matrix(n):
       """Generate a random rotation matrix in n dimensions using QR 
decomposition."""
       H = np.random.randn(n, n)
       Q, R = np.linalg.qr(H)
       Q = Q @ np.diag(np.sign(np.diag(R)))
       return Q.astype("<f4")
   
   def expand(size):
       rotation_matrix = random_rotation_matrix(size)
       def _func(arr):
           return np.matmul(np.pad(arr, (0, size - arr.size)), rotation_matrix)
       return _func
   
   dim = 1024
   num_docs = 100_000
   input_vectors = "../cohere-v3-wikipedia-en-scattered-1024d.docs.first1M.vec"
   docs = np.fromfile(doc_path, dtype=f"<{dim}f4", count=num_docs)
   
   expanded_dim = 1536
   expand_func = expand(expanded_dim)
   
   docs_expanded = np.apply_along_axis(expand_func, 1, d)
   docs_expanded.astype("<f4").tofile(f"cohere-v3-docs-{expanded_dim}d.vec")
   
   # use the same expand_func for queries -- because it needs the same rotation 
as docs
   ```
   
   Then I performed a recall test on the Cohere v3 vectors, numDocs = 100K, 
numQueries = 10K, forceMerge to 1 segment
   
   Baseline (supported quantization options):
   
   ```
   recall  latency(ms)  netCPU  avgCpuCount  quantized  visited  index(s)  
index_docs/s  force_merge(s)  index_size(MB)  vec_disk(MB)  vec_RAM(MB)
    0.993        0.141   4.448       31.504         no     6858     19.29       
5185.11           20.32          403.00       390.625      390.625
    0.983        0.088   2.760       31.288     8 bits     6868     20.35       
4913.28           28.67          502.19       489.807       99.182
    0.927        0.059   1.822       31.037     4 bits     6954     20.59       
4857.67           21.85          453.41       440.979       50.354
    0.835        0.037   1.152       30.970     2 bits     7299     20.23       
4943.15           20.04          429.30       416.374       25.749
    0.717        0.032   0.976       30.599     1 bits     8197     19.86       
5034.23           17.10          418.30       404.167       13.542
   ```
   
   Candidate (non-standard quantization equivalent):
   
   ```
   recall  latency(ms)  netCPU  avgCpuCount  quantized  visited  index(s)  
index_docs/s  force_merge(s)  index_size(MB)  vec_disk(MB)  vec_RAM(MB)
    0.990        0.102   3.218       31.429    10 bits     6857     22.57       
4429.88           40.30          624.26       611.877      123.596
    0.953        0.073   2.260       31.051     6 bits     6868     21.30       
4694.62           32.77          673.11       660.706       74.768
    0.950        0.066   2.027       30.901     5 bits     6878     20.84       
4798.69           29.12          563.26       550.842       62.561
    0.882        0.047   1.453       31.178     3 bits     7044     22.29       
4486.12           28.81          636.63       623.894       37.956
   ```
   
   The candidate uses "expanded" vectors of different dimensions + quantization 
options:
   1. 10 bits == expand 1024 -> 1280 dimensions + 8 bit quantization
   2. 6 bits == expand 1024 -> 1536 dimensions + 4 bit quantization
   3. 5 bits == expand 1024 -> 1280 dimensions + 4 bit quantization
   4. 3 bits == expand 1024 -> 1536 dimensions + 2 bit quantization
   
   The exact KNN baseline in _all_ runs is the one produced from the original 
1024 dimension vectors.
   
   It was encouraging to see the _equivalent_ quantization options have about 
the correct interpolated values for recall and latency!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Explore more granular vector quantization? [lucene]

Reply via email to