msokolov commented on code in PR #899:
URL: https://github.com/apache/lucene/pull/899#discussion_r877085508
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene92/ExpandingRandomAccessVectorValues.java:
##########
@@ -0,0 +1,57 @@
+package org.apache.lucene.codecs.lucene92;
+
+import org.apache.lucene.index.RandomAccessVectorValues;
+import org.apache.lucene.index.RandomAccessVectorValuesProducer;
+import org.apache.lucene.util.BytesRef;
+
+import java.io.IOException;
+
+public class ExpandingRandomAccessVectorValues implements
RandomAccessVectorValuesProducer {
+
+ private final RandomAccessVectorValuesProducer delegate;
+ private final float scale;
+
+ /**
+ * Wraps an existing vector values producer. Floating point vector values
will be produced by scaling
+ * byte-quantized values read from the values produced by the input.
+ */
+ protected ExpandingRandomAccessVectorValues(RandomAccessVectorValuesProducer
in, float scale) {
+ this.delegate = in;
+ assert scale != 0;
+ this.scale = scale;
+ }
+
+ @Override
+ public RandomAccessVectorValues randomAccess() throws IOException {
+ RandomAccessVectorValues delegateValues = delegate.randomAccess();
+ float[] value = new float[delegateValues.dimension()];;
+
+ return new RandomAccessVectorValues() {
+
+ @Override
+ public int size() {
+ return delegateValues.size();
+ }
+
+ @Override
+ public int dimension() {
+ return delegateValues.dimension();
+ }
+
+ @Override
+ public float[] vectorValue(int targetOrd) throws IOException {
+ BytesRef binaryValue = delegateValues.binaryValue(targetOrd);
+ byte[] bytes = binaryValue.bytes;
+ for (int i = 0, j = binaryValue.offset; i < value.length; i++, j++) {
+ value[i] = bytes[j] * scale;
Review Comment:
Well, there are definitely byte-oriented vectors. I don't think we should
try to use some kind of 8-bit floating point (what would that do, have an
exponent and a mantissa?) rather if we can scale to use the entire 8 bits as
significant bits (no exponent) then we maximize the precision and can use the
native instruction set, which ByteVector does seem to be accessing (based on
some preliminary JMH I ran it seems quite a bit faster than 32-bit FloatVector).
For now I see two directions: (1) figure out how to scale vectors and store
as signed bytes at full precision (this issue), and (2) work up performant
Vector API implementations of that scaling and the dot product in anticipation
of the vector API hatching along the lines of the patch you already did,
@rcmuir. Maybe we can push that to a branch that builds with `--add-modules
jdk.incubator.vector` for posterity and ease of testing. I don't think we need
to block (1) for (2) since it seems clear there will be a path to
vectorization. In the meantime we can implement a byte-at-a-time dot product to
replace the float-at-a-time dot product?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]