If it's an unqualified win, we should modify the VectorScorer to do
it, and then we wouldn't need to expose the quantized values.  I do
think we would rather not expose the details of quantization since we
want to be free to innovate without back-compat considerations, and
generally don't want to expose API surface when we don't need to.

On Sun, Jul 27, 2025 at 10:35 PM Anh Dũng Bùi <dungba...@gmail.com> wrote:
>
> Hi all,
>
> I have a follow-up question on this. Would it make sense to expose the
> quantized vector values as well? Currently even if we are quantizing the
> vectors, calling vectorValue() will return the full precision vectors while
> the quantized vectors are only used for scorer(). Do we consider the
> quantized vectors as private information that should not be exposed?
>
> For the context, I'm thinking about a way to run 2-phase rescoring using
> the 32-bit query vector and 7-bit or 4-bit document vectors (matching phase
> will use a more aggressive quantization). During the rescoring phase, if we
> use the quantized scorer(), the main cost is actually the quantization, not
> the dot product score computation (since we only run it a small number of
> docs). Doing asymmetric quantization (inspired by BBQ) at the rescoring
> phase, not only would we improve the recall but also the latency.
>
> On Tue, Feb 11, 2025 at 11:50 PM Michael Sokolov <msoko...@gmail.com> wrote:
>
> > Stored fields is a separate format that stores data in a row-wise
> > fashion: all the stored data for a single document is written
> > together.  Vectors aren't *also* copied into stored fields storage, so
> > the stored fields API can't be used to retrieve them. If we did allow
> > that it would result in massive duplication for no purpose aside from
> > making things look simpler. But do you think that it would be more
> > convenient to use the stored fields API to retrieve the vectors?  Does
> > it hide the details of the leaf structure? Maybe there's an
> > opportunity to create some convenience API for vectors, not sure.
> >
> > On Tue, Feb 11, 2025 at 8:45 AM Viliam Ďurina <viliam.dur...@gmail.com>
> > wrote:
> > >
> > > Thanks Adrien!
> > >
> > > The code has one issue:
> > >     if (iterator.advance(leafDocID) == docID)
> > > should have been:
> > >     if (iterator.advance(leafDocID) == leafDocID)
> > >
> > > After fixing this, it works (for reference, I'm using Lucene 10.1). But I
> > > still wonder why can't we retrieve vectors just as we retrieve any other
> > > field. I was unable to figure the code out myself, this way it's pretty
> > > complicated. Is there any reason the vectors are not available through
> > > `storedFields()`?
> > >
> > > Viliam
> > >
> > > On Mon, Feb 10, 2025 at 9:21 PM Adrien Grand <jpou...@gmail.com> wrote:
> > >
> > > > Hi Viliam,
> > > >
> > > > Your logic is mostly correct, here is a version that should be a bit
> > > > simpler and correct (but beware, untested):
> > > >
> > > > IndexReader reader; // your multi-reader
> > > > int docID; // top-level doc ID
> > > > int readerID = ReaderUtil.subIndex(docID, reader.leaves());
> > > > LeafReaderContext leafContext = reader.leaves().get(readerID);
> > > > int leafDocID = docID - leafContext.docBase;
> > > > FloatVectorValues values =
> > > > leafContext.reader().getFloatVectorValues("my_vector_field");
> > > > DocIndexIterator iterator = values.iterator();
> > > > float[] vector;
> > > > if (iterator.advance(leafDocID) == docID) { // this doc ID has a vector
> > > >   vector = values.vectorValue(iterator.index());
> > > > } else {
> > > >   vector = null;
> > > > }
> > > >
> > > > On Mon, Feb 10, 2025 at 5:01 PM Viliam Ďurina <viliam.dur...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Dear all,
> > > > >
> > > > > when indexing vector fields, Lucene doesn't allow specifying the
> > vector
> > > > > field as stored (it throws `IllegalStateException: Cannot store
> > value of
> > > > > type class [F`). When trying to retrieve the value using
> > > > > `IndexReader.storedFields()`, the vector field isn't stored.
> > > > >
> > > > > However, Lucene 10 stores the vectors in `.vec` files. I was able to
> > > > > retrieve them using this complicated code, for which I had to make
> > the
> > > > > `readerIndex` and `readerBase` methods in `BaseCompositeReader`
> > public
> > > > > (they are protected):
> > > > >
> > > > >     int docId = ...; // the docId to retrieve, e.g. coming out of a
> > > > search
> > > > >     IndexReader node = reader.getContext().reader();
> > > > >     while (node instanceof BaseCompositeReader) {
> > > > >       int index = ((BaseCompositeReader) node).readerIndex(docId);
> > > > >       int base = ((BaseCompositeReader) node).readerBase(index);
> > > > >       docId -= base;
> > > > >       node = ((BaseCompositeReader)
> > > > > node).getContext().children().get(index).reader();
> > > > >     }
> > > > >     assert node instanceof LeafReader;
> > > > >     assert node.leaves().size() == 1;
> > > > >     FloatVectorValues vectorValues =
> > > > >
> > > > >
> > node.leaves().getFirst().reader().getFloatVectorValues("myVectorField");
> > > > >     float[] vector = vectorValues.vectorValue(docId);
> > > > >
> > > > > My reader is a `MultiReader`, composed of multiple
> > `DirectoryReader`s.
> > > > >
> > > > > Is there any public API to retrieve the vector values? If not, is
> > there
> > > > any
> > > > > particular reason to not make the vectors available, if Lucene stores
> > > > them
> > > > > anyway? Even if the vectors are quantized, original raw vectors are
> > > > stored,
> > > > > though they are never used.
> > > > >
> > > > > Thanks,
> > > > > Viliam
> > > > >
> > > >
> > > >
> > > > --
> > > > Adrien
> > > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to