Re: [PR] Add TurboQuant rotation-based vector quantization codec to sandbox [lucene]

via GitHub Tue, 07 Apr 2026 10:22:50 -0700


shbhar commented on PR #15903:
URL: https://github.com/apache/lucene/pull/15903#issuecomment-4200938919


   Here are the updated results after some fixes -  TQ8Bit is also comparable 
to SQ8bit now, but TQ4bit is still slower than SQ4bit (but at comparable recall 
and much smaller index size). Changes from last run:
   1. Merge int overflow fix
   2. Byte copy merge
   3. int8 SIMD scorer for TQ8Bit 
   
   ## Benchmark data: 5M Cohere Wikipedia, 1024d
   
   5M Cohere Wikipedia vectors at 1024 dimensions. HNSW (M=32, beamWidth=100, 
topK=10, fanout=50, forceMerge to 1 segment). 
   
   | Method | R@10 | Latency | Docs/s | FMerge (s) | Index MB |
   |--------|------|---------|--------|------------|----------|
   | Float32 | 0.928 | 1.60ms | 11,881 | 2,267 | 20,020 |
   | SQ-4bit | 0.855 | 0.86ms | 19,025 | 1,347 | 22,538 |
   | SQ-4bit+5×rsc | 0.986 | 3.17ms | 19,133 | 1,361 | 22,539 |
   | SQ-8bit | 0.918 | 1.23ms | 15,784 | 1,791 | 24,980 |
   | SQ-8bit+5×rsc | 0.987 | 4.13ms | 15,318 | 1,776 | 24,979 |
   | BBQ-1bit | 0.631 | 0.82ms | 22,840 | 1,208 | 20,743 |
   | BBQ-1bit+5×rsc | 0.944 | 2.59ms | 23,257 | 1,215 | 20,744 |
   | **TQ-1bit** | **0.608** | **0.77ms** | 30,381 | 1,897 | **1,064** |
   | **TQ-1bit+5×rsc** | **0.928** | **2.83ms** | 30,405 | 1,532 | **1,066** |
   | **TQ-4bit** | **0.852** | **1.19ms** | 17,915 | 2,459 | **2,851** |
   | **TQ-4bit+5×rsc** | **0.983** | **3.91ms** | 17,195 | 3,858 | **2,849** |
   | **TQ-8bit** | **0.902** | **0.94ms** | **19,369** | **2,659** | **5,293** |
   | **TQ-8bit+5×rsc** | **0.983** | **3.13ms** | **18,471** | **3,110** | 
**5,293** |
   
   TQ-1bit vs BBQ-1bit: TQ-1bit (0.608) nearly matches BBQ-1bit (0.631) raw 
recall, but at 19× less storage (1,064 MB vs 20,743 MB) and 1.3× faster 
indexing (30K vs 23K docs/s). With 5× rescore, TQ-1bit+rsc (0.928) nearly 
matches BBQ-1bit+rsc (0.944) — the gap narrows further at higher dimensions 
(see ASIN 4096d below).
   
   TQ-8bit vs SQ-8bit: TQ-8bit (0.902) nearly matches SQ-8bit (0.918) raw 
recall at 0.94ms vs 1.23ms latency (1.3× faster), with 4.7× less storage (5,293 
MB vs 24,980 MB). With 5× rescore, TQ-8bit+rsc (0.983) nearly matches 
SQ-8bit+rsc (0.987) at 24% less latency (3.13ms vs 4.13ms).
   
   ## Benchmark data: 1M ASIN Vectors, Qwen3-8B, 4096d
   
   1M Amazon product ASINs encoded with Qwen3-Embedding-8B at native 4096 
dimensions. 5K real product search queries. HNSW (M=32, beamWidth=200, topK=10, 
fanout=50, forceMerge to 1 segment).
   
   | Method | R@10 | Lat (ms) | Docs/s | FMerge (s) | Index MB |
   |--------|------|----------|--------|------------|----------|
   | Float32 | 0.925 | 0.85 | 5,430 | 389 | 15,674 |
   | SQ-4bit | 0.883 | 0.84 | 9,287 | 504 | 17,642 |
   | SQ-4bit+5×rsc | 0.978 | 2.47 | 9,049 | 512 | 17,642 |
   | SQ-8bit | 0.902 | 1.31 | 6,752 | 680 | 19,595 |
   | SQ-8bit+5×rsc | 0.980 | 3.91 | 6,947 | 672 | 19,595 |
   | BBQ-1bit | 0.774 | 0.56 | 13,200 | 417 | 16,178 |
   | BBQ-1bit+5×rsc | 0.976 | 1.53 | 13,235 | 419 | 16,178 |
   | BBQ-1bit+10×rsc | 0.987 | 2.26 | 13,120 | 422 | 16,178 |
   | **TQ-1bit** | **0.741** | **0.49** | **20,020** | **210** | **539** |
   | **TQ-1bit+5×rsc** | **0.970** | **1.58** | **19,376** | **353** | **539** |
   | **TQ-1bit+10×rsc** | **0.984** | **2.47** | **19,460** | **352** | **538** 
|
   | **TQ-4bit** | **0.866** | **1.33** | **8,226** | **1,397** | **2,000** |
   | **TQ-4bit+5×rsc** | **0.974** | **4.18** | **8,181** | **1,409** | 
**2,000** |
   | **TQ-8bit** | **0.908** | **0.94** | **10,537** | **667** | **3,954** |
   | **TQ-8bit+5×rsc** | **0.974** | **2.89** | **10,564** | **724** | 
**3,954** |
   
   TQ-8bit beats SQ-8bit on every axis at 4096d: higher recall (0.908 vs 
0.902), lower latency (0.94ms vs 1.31ms), faster indexing (10.5K vs 6.8K 
docs/s), comparable merge time (667s vs 680s), and 5× smaller index (3,954 MB 
vs 19,595 MB).
   
   TQ-1bit+10×rsc (0.984) matches BBQ-1bit+10×rsc (0.987) at 30× less storage 
(538 MB vs 16,178 MB), with 1.5× faster indexing and 2× faster merge.
   
   If anyone wants to replicate these results:
   lucene: https://github.com/shbhar/lucene/tree/turboquant-v1 (commit for 
these tests: 62cce045b61556484517542e43e6c0c7ddfec8ee)
   luceneutil: https://github.com/shbhar/luceneutil/tree/turboquant-v1 (hacky - 
make sure to run fp32 as the first one as tq ground truth depends on that 
index) - commit for this test: 911b947dab95a6164ba38c875eca5a1d72298b3c


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add TurboQuant rotation-based vector quantization codec to sandbox [lucene]

Reply via email to