lpld commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2698569925
Hi @benwtrent
Thanks again for your previous comment. I was able to modify luceneutil and
run some benchmarks. I am quite new to lucene, so I would appreciate some help
in understanding the results that I’m getting.
First, I was trying to run a quantized and a non-quantized benchmark on the
Cohere 768 dataset on my local machine.
Here are the results for the quantized benchmark (with
Lucene102HnswBinaryQuantizedVectorsFormat):
```
recall latency (ms) nDoc topK fanout maxConn beamWidth quantized
index s index docs/s force merge s num segments index size (MB) vec disk
(MB) vec RAM (MB)
0.452 11.655 10000000 100 50 16 100 1 bits
1914.86 5222.32 10112.13 1 30934.82
30212.402 915.527
```
Unfortunately I didn’t save the non-quantized results, but the recall was
something around 0.73.
Then I ran the same tests on a dedicated server with more CPU and RAM, and
the results were weird. Yes, they were much much faster, but the recall was
super low now:
Non-quantized:
```
recall latency (ms) nDoc topK fanout maxConn beamWidth quantized
index s index docs/s force merge s num segments index size (MB) vec disk
(MB) vec RAM (MB)
0.203 7.143 10000000 100 50 16 100 no
1403.40 7125.53 769.29 1 29470.29
29296.875 29296.875
```
Quantized:
```
recall latency (ms) nDoc topK fanout maxConn beamWidth quantized
index s index docs/s force merge s num segments index size (MB) vec disk
(MB) vec RAM (MB)
0.191 7.721 10000000 100 50 16 100 1 bits
511.40 19554.09 1116.80 1 30597.43
30212.402 915.527
```
So, my questions are
1. What exactly do the numbers in the description of this pull request mean?
When you say that the recall for Cohere 768 is 0.938, is it the absolute recall
value that you got from the benchmark, or is it some sort of ratio between the
quantized and non-quantized recalls?
2. Do you have any ideas about what could be the reason for such a huge
recall difference in the benchmark results on difference environments?
3. I was also trying to do some benchmarking with other public datasets
(without luceneutil), and I got a little confused about how to correctly
calculate the recall. I understand that recall is a ratio between the number of
correct responses and the total number of responses. The total number of
responses is straightforward, but the number of correct ones is a bit confusing
to me. `luceneutil` is querying them as following (not exact code, but my
variation):
```java
var queryVector = new ConstKnnByteVectorValueSource(queryEmb);
var docVectors = new ByteKnnVectorFieldSource("vector");
var exactQuery = new BooleanQuery.Builder()
.add(new FunctionQuery(new ByteVectorSimilarityFunction(similarity,
queryVector, docVectors)), BooleanClause.Occur.SHOULD)
.add(new MatchAllDocsQuery(), BooleanClause.Occur.FILTER)
.build();
```
However, in `lucene` unit tests a different query is used to get the correct
neighbors from the index:
```java
var exactQuery = new KnnByteVectorQuery("vector", queryEmb, size, new
MatchAllDocsQuery());
```
I would appreciate if you could give some insights on what query is the
correct one, because they return different results.
Thanks for your time!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]