[PR] Implement data-blind scalar quantization [lucene]

via GitHub Sun, 03 May 2026 22:02:44 -0700


mccullocht opened a new pull request, #16030:
URL: https://github.com/apache/lucene/pull/16030


   Add an option to the quantization format to enable or disable centering 
(enabled by default). When centering is disabled we also stop writing the float 
vectors which can lead to significant storage savings. Special handling is 
included during merges -- we check that all of the input is in the same 
encoding, and handle transcoding if some of the input is float vectors.
   
   Large portions of this change were generated using claude code. I reviewed, 
tweaked, and tested the code before puttig it up for review.
   
   This change is being made as a new codec as the format changes to drop the 
center vector when centering is disabled. This is not strictly necessary as we 
could write a zero vector instead, but I have plans to make other format 
changes related to data blindness, see #16029.
   
   luceneutil results -- 1M cohere vectors, 8 bit quantization.
   before:
   ```
   recall  latency(ms)  netCPU  avgCpuCount     nDoc  searchType  topK  fanout  
resultSimilarity  decay  resultCount  maxConn  beamWidth  quantized  visited  
index(s)  index_docs/s  force_merge(s)  num_segments  index_size(MB)  
filterStrategy  filterSelectivity  overSample  vec_disk(MB)  vec_RAM(MB)  
bp-reorder  indexType
    0.974        2.304   2.297        0.997  1000000         KNN   100     100  
             N/A    N/A      100.000       64        250     8 bits     8619    
132.85       7527.40          235.00             1         5047.27            
null                N/A       1.000      4898.071      991.821       false      
 HNSW
   ```
   
   after
   ```
   recall  latency(ms)  netCPU  avgCpuCount     nDoc  searchType  topK  fanout  
resultSimilarity  decay  resultCount  maxConn  beamWidth  quantized  visited  
index(s)  index_docs/s  force_merge(s)  num_segments  index_size(MB)  
filterStrategy  filterSelectivity  overSample  vec_disk(MB)  vec_RAM(MB)  
bp-reorder  indexType
    0.972        2.281   2.274        0.997  1000000         KNN   100     100  
             N/A    N/A      100.000       64        250     8 bits     8612    
143.06       6990.07          160.33             1         1140.98            
null                N/A       1.000      4898.071      991.821       false      
 HNSW
   ```
   
   The harness extrapolates vector size from the input size so believe the 
on-disk number -- this is about 4x smaller. Force merge is faster since we 
don't have to re-quantize vectors on merge. Recall is very similar but YMMV.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Implement data-blind scalar quantization [lucene]

Reply via email to