If it's an unqualified win, we should modify the VectorScorer to do it, and then we wouldn't need to expose the quantized values. I do think we would rather not expose the details of quantization since we want to be free to innovate without back-compat considerations, and generally don't want to expose API surface when we don't need to.
On Sun, Jul 27, 2025 at 10:35 PM Anh Dũng Bùi <dungba...@gmail.com> wrote: > > Hi all, > > I have a follow-up question on this. Would it make sense to expose the > quantized vector values as well? Currently even if we are quantizing the > vectors, calling vectorValue() will return the full precision vectors while > the quantized vectors are only used for scorer(). Do we consider the > quantized vectors as private information that should not be exposed? > > For the context, I'm thinking about a way to run 2-phase rescoring using > the 32-bit query vector and 7-bit or 4-bit document vectors (matching phase > will use a more aggressive quantization). During the rescoring phase, if we > use the quantized scorer(), the main cost is actually the quantization, not > the dot product score computation (since we only run it a small number of > docs). Doing asymmetric quantization (inspired by BBQ) at the rescoring > phase, not only would we improve the recall but also the latency. > > On Tue, Feb 11, 2025 at 11:50 PM Michael Sokolov <msoko...@gmail.com> wrote: > > > Stored fields is a separate format that stores data in a row-wise > > fashion: all the stored data for a single document is written > > together. Vectors aren't *also* copied into stored fields storage, so > > the stored fields API can't be used to retrieve them. If we did allow > > that it would result in massive duplication for no purpose aside from > > making things look simpler. But do you think that it would be more > > convenient to use the stored fields API to retrieve the vectors? Does > > it hide the details of the leaf structure? Maybe there's an > > opportunity to create some convenience API for vectors, not sure. > > > > On Tue, Feb 11, 2025 at 8:45 AM Viliam Ďurina <viliam.dur...@gmail.com> > > wrote: > > > > > > Thanks Adrien! > > > > > > The code has one issue: > > > if (iterator.advance(leafDocID) == docID) > > > should have been: > > > if (iterator.advance(leafDocID) == leafDocID) > > > > > > After fixing this, it works (for reference, I'm using Lucene 10.1). But I > > > still wonder why can't we retrieve vectors just as we retrieve any other > > > field. I was unable to figure the code out myself, this way it's pretty > > > complicated. Is there any reason the vectors are not available through > > > `storedFields()`? > > > > > > Viliam > > > > > > On Mon, Feb 10, 2025 at 9:21 PM Adrien Grand <jpou...@gmail.com> wrote: > > > > > > > Hi Viliam, > > > > > > > > Your logic is mostly correct, here is a version that should be a bit > > > > simpler and correct (but beware, untested): > > > > > > > > IndexReader reader; // your multi-reader > > > > int docID; // top-level doc ID > > > > int readerID = ReaderUtil.subIndex(docID, reader.leaves()); > > > > LeafReaderContext leafContext = reader.leaves().get(readerID); > > > > int leafDocID = docID - leafContext.docBase; > > > > FloatVectorValues values = > > > > leafContext.reader().getFloatVectorValues("my_vector_field"); > > > > DocIndexIterator iterator = values.iterator(); > > > > float[] vector; > > > > if (iterator.advance(leafDocID) == docID) { // this doc ID has a vector > > > > vector = values.vectorValue(iterator.index()); > > > > } else { > > > > vector = null; > > > > } > > > > > > > > On Mon, Feb 10, 2025 at 5:01 PM Viliam Ďurina <viliam.dur...@gmail.com > > > > > > > wrote: > > > > > > > > > Dear all, > > > > > > > > > > when indexing vector fields, Lucene doesn't allow specifying the > > vector > > > > > field as stored (it throws `IllegalStateException: Cannot store > > value of > > > > > type class [F`). When trying to retrieve the value using > > > > > `IndexReader.storedFields()`, the vector field isn't stored. > > > > > > > > > > However, Lucene 10 stores the vectors in `.vec` files. I was able to > > > > > retrieve them using this complicated code, for which I had to make > > the > > > > > `readerIndex` and `readerBase` methods in `BaseCompositeReader` > > public > > > > > (they are protected): > > > > > > > > > > int docId = ...; // the docId to retrieve, e.g. coming out of a > > > > search > > > > > IndexReader node = reader.getContext().reader(); > > > > > while (node instanceof BaseCompositeReader) { > > > > > int index = ((BaseCompositeReader) node).readerIndex(docId); > > > > > int base = ((BaseCompositeReader) node).readerBase(index); > > > > > docId -= base; > > > > > node = ((BaseCompositeReader) > > > > > node).getContext().children().get(index).reader(); > > > > > } > > > > > assert node instanceof LeafReader; > > > > > assert node.leaves().size() == 1; > > > > > FloatVectorValues vectorValues = > > > > > > > > > > > > node.leaves().getFirst().reader().getFloatVectorValues("myVectorField"); > > > > > float[] vector = vectorValues.vectorValue(docId); > > > > > > > > > > My reader is a `MultiReader`, composed of multiple > > `DirectoryReader`s. > > > > > > > > > > Is there any public API to retrieve the vector values? If not, is > > there > > > > any > > > > > particular reason to not make the vectors available, if Lucene stores > > > > them > > > > > anyway? Even if the vectors are quantized, original raw vectors are > > > > stored, > > > > > though they are never used. > > > > > > > > > > Thanks, > > > > > Viliam > > > > > > > > > > > > > > > > > -- > > > > Adrien > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org