[
https://issues.apache.org/jira/browse/LUCENE-10577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17538467#comment-17538467
]
Michael Sokolov edited comment on LUCENE-10577 at 5/17/22 8:57 PM:
-------------------------------------------------------------------
Okay, thanks for the link [~rcmuir]I do see ByteVector.mul(ByteVector) and so
on. And, yes [~jpountz]I think that could work for an API. It would be nice to
let users worry about making their data in the right shape. I think it might
make more sense to expect signed values though?
There do seem to be 8-bit vectorized instructions for Intel chips at least
https://www.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/compiler-reference/intrinsics/intrinsics-for-intel-advanced-vector-extensions-2/intrinsics-for-arithmetic-operations-2/mm256-add-epi8-16-32-64.html
I agree we should measure, but also the JDK support here seems to be a moving
target. Perhaps it's time to give it another whirl and see where we are now
with JDK 18/19
was (Author: sokolov):
Okay, thanks for the link [~rcmuir]I do see ByteVector.mul(ByteVector) and so
on. And, yes [~jpountz]I think that could work for an API. It would be nice to
let users worry about making their data in the right shape. I think it might
make more sense to expect signed values though?
> Quantize vector values
> ----------------------
>
> Key: LUCENE-10577
> URL: https://issues.apache.org/jira/browse/LUCENE-10577
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/codecs
> Reporter: Michael Sokolov
> Priority: Major
>
> The {{KnnVectorField}} api handles vectors with 4-byte floating point values.
> These fields can be used (via {{KnnVectorsReader}}) in two main ways:
> 1. The {{VectorValues}} iterator enables retrieving values
> 2. Approximate nearest -neighbor search
> The main point of this addition was to provide the search capability, and to
> support that it is not really necessary to store vectors in full precision.
> Perhaps users may also be willing to retrieve values in lower precision for
> whatever purpose those serve, if they are able to store more samples. We know
> that 8 bits is enough to provide a very near approximation to the same
> recall/performance tradeoff that is achieved with the full-precision vectors.
> I'd like to explore how we could enable 4:1 compression of these fields by
> reducing their precision.
> A few ways I can imagine this would be done:
> 1. Provide a parallel byte-oriented API. This would allow users to provide
> their data in reduced-precision format and give control over the quantization
> to them. It would have a major impact on the Lucene API surface though,
> essentially requiring us to duplicate all of the vector APIs.
> 2. Automatically quantize the stored vector data when we can. This would
> require no or perhaps very limited change to the existing API to enable the
> feature.
> I've been exploring (2), and what I find is that we can achieve very good
> recall results using dot-product similarity scoring by simple linear scaling
> + quantization of the vector values, so long as we choose the scale that
> minimizes the quantization error. Dot-product is amenable to this treatment
> since vectors are required to be unit-length when used with that similarity
> function.
> Even still there is variability in the ideal scale over different data sets.
> A good choice seems to be max(abs(min-value), abs(max-value)), but of course
> this assumes that the data set doesn't have a few outlier data points. A
> theoretical range can be obtained by 1/sqrt(dimension), but this is only
> useful when the samples are normally distributed. We could in theory
> determine the ideal scale when flushing a segment and manage this
> quantization per-segment, but then numerical error could creep in when
> merging.
> I'll post a patch/PR with an experimental setup I've been using for
> evaluation purposes. It is pretty self-contained and simple, but has some
> drawbacks that need to be addressed:
> 1. No automated mechanism for determining quantization scale (it's a constant
> that I have been playing with)
> 2. Converts from byte/float when computing dot-product instead of directly
> computing on byte values
> I'd like to get people's feedback on the approach and whether in general we
> should think about doing this compression under the hood, or expose a
> byte-oriented API. Whatever we do I think a 4:1 compression ratio is pretty
> compelling and we should pursue something.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]