kaivalnp commented on issue #15734:
URL: https://github.com/apache/lucene/issues/15734#issuecomment-3929639459
However, I understand that packing vectors with arbitrary quantization
levels + supporting various comparisons in Lucene is a challenge -- so I tried
a slightly different approach:
What if vectors were first expanded to a higher dimension, and then
quantized to one of the supported options? (e.g. represent 1024 dimension
vectors in 1536 dimensions, then quantize to 4 bits: we get the
memory-equivalent of 6-bit quantization)
A dimension "expansion" from X -> Y could be as simple as padding the
original vector with (Y - X) zeros, and rotating the new Y dimension vector by
an arbitrary rotation. We need a rotation because quantization occurs
per-dimension, so any dimension with all zeros is not very useful -- we
basically need to spread the information in X dimensions across Y, which is
where a random rotation helps.
I took some 1024 dimension Cohere v3 docs + queries and "expanded" them to
1536 dimensions using the above strategy, which was a simple Python script like:
```python
import numpy as np
def random_rotation_matrix(n):
"""Generate a random rotation matrix in n dimensions using QR
decomposition."""
H = np.random.randn(n, n)
Q, R = np.linalg.qr(H)
Q = Q @ np.diag(np.sign(np.diag(R)))
return Q.astype("<f4")
def expand(size):
rotation_matrix = random_rotation_matrix(size)
def _func(arr):
return np.matmul(np.pad(arr, (0, size - arr.size)), rotation_matrix)
return _func
dim = 1024
num_docs = 100_000
input_vectors = "../cohere-v3-wikipedia-en-scattered-1024d.docs.first1M.vec"
docs = np.fromfile(doc_path, dtype=f"<{dim}f4", count=num_docs)
expanded_dim = 1536
expand_func = expand(expanded_dim)
docs_expanded = np.apply_along_axis(expand_func, 1, d)
docs_expanded.astype("<f4").tofile(f"cohere-v3-docs-{expanded_dim}d.vec")
# use the same expand_func for queries -- because it needs the same rotation
as docs
```
Then I performed a recall test on the Cohere v3 vectors, numDocs = 100K,
numQueries = 10K, forceMerge to 1 segment
Baseline (supported quantization options):
```
recall latency(ms) netCPU avgCpuCount quantized visited index(s)
index_docs/s force_merge(s) index_size(MB) vec_disk(MB) vec_RAM(MB)
0.993 0.141 4.448 31.504 no 6858 19.29
5185.11 20.32 403.00 390.625 390.625
0.983 0.088 2.760 31.288 8 bits 6868 20.35
4913.28 28.67 502.19 489.807 99.182
0.927 0.059 1.822 31.037 4 bits 6954 20.59
4857.67 21.85 453.41 440.979 50.354
0.835 0.037 1.152 30.970 2 bits 7299 20.23
4943.15 20.04 429.30 416.374 25.749
0.717 0.032 0.976 30.599 1 bits 8197 19.86
5034.23 17.10 418.30 404.167 13.542
```
Candidate (non-standard quantization equivalent):
```
recall latency(ms) netCPU avgCpuCount quantized visited index(s)
index_docs/s force_merge(s) index_size(MB) vec_disk(MB) vec_RAM(MB)
0.990 0.102 3.218 31.429 10 bits 6857 22.57
4429.88 40.30 624.26 611.877 123.596
0.953 0.073 2.260 31.051 6 bits 6868 21.30
4694.62 32.77 673.11 660.706 74.768
0.950 0.066 2.027 30.901 5 bits 6878 20.84
4798.69 29.12 563.26 550.842 62.561
0.882 0.047 1.453 31.178 3 bits 7044 22.29
4486.12 28.81 636.63 623.894 37.956
```
The candidate uses "expanded" vectors of different dimensions + quantization
options:
1. 10 bits == expand 1024 -> 1280 dimensions + 8 bit quantization
2. 6 bits == expand 1024 -> 1536 dimensions + 4 bit quantization
3. 5 bits == expand 1024 -> 1280 dimensions + 4 bit quantization
4. 3 bits == expand 1024 -> 1536 dimensions + 2 bit quantization
The exact KNN baseline in _all_ runs is the one produced from the original
1024 dimension vectors.
It was encouraging to see the _equivalent_ quantization options have about
the correct interpolated values for recall and latency!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]