There's also some good discussion on
https://issues.apache.org/jira/browse/LUCENE-9583 about random access
vs iterator pattern that never got fully resolved. We said we would
revisit after KNN (LUCENE-9004) landed, and now it has. The usage of
random access is pretty well-established there, maybe we should
abandon the iterator API since it is redundant (you can always iterate
over a random access API if you know the size)?

On Tue, Mar 16, 2021 at 12:10 PM Michael Sokolov <msoko...@gmail.com> wrote:
>
> Also, Tomoko re:LUCENE-9322, did it succeed? I guess we won't know for
> sure unless someone revives
> https://issues.apache.org/jira/browse/LUCENE-9136 or something like
> that
>
> On Tue, Mar 16, 2021 at 12:04 PM Michael Sokolov <msoko...@gmail.com> wrote:
> >
> > Consistent plural naming makes sense to me. I think it ended up
> > singular because I am biased to avoid plural names unless there is a
> > useful distinction to be made. But consistency should trump my
> > predilections.
> >
> > I think the reason we have search() on VectorValues is that we have
> > LeafReader.getVectorValues() (by analogy to the DocValues iterators),
> > but no way to access the VectorReader. Do you think we should also
> > have LeafReader.getVectorReader()? Today it's only on CodecReader.
> >
> > Re: SearchStrategy.NONE; the idea is we support efficient access to
> > floating point values. Using BinaryDocValues for this will always
> > require an additional decoding step. I can see that the naming is
> > confusing there. The intent is that you index the vector values, but
> > no additional indexing data structure. Also: the reason HNSW is
> > mentioned in these SearchStrategy enums is to make room for other
> > vector indexing approaches, like LSH. There was a lot of discussion
> > that we wanted an API that allowed for experimenting with other
> > techniques for indexing and searching vector values.
> >
> > Adrien, you made an analogy to PerFieldPostingsFormat (and DocValues),
> > but I think the situation is more akin to Points, where we have the
> > options on IndexableField. The metadata we store there (dimension and
> > score function) don't really result in different formats, ie code
> > paths for indexing and storage; they are more like parameters to the
> > format, in my mind. Perhaps the situation will look different when we
> > get our second vector indexing strategy (like LSH).
> >
> >
> > On Tue, Mar 16, 2021 at 10:19 AM Tomoko Uchida
> > <tomoko.uchida.1...@gmail.com> wrote:
> > >
> > > > Should we rename VectorFormat to VectorsFormat? This would be more 
> > > > consistent with other file formats that use the plural, like 
> > > > PostingsFormat, DocValuesFormat, TermVectorsFormat, etc.?
> > >
> > > +1 for using plural form for consistency - if we reconsider the names, 
> > > how about VectorValuesFormat so that it follows the naming convention for 
> > > XXXValues?
> > >
> > > DocValuesFormat / DocValues
> > > PointValuesFormat / PointValues
> > > VectorValuesFormat / VectorValues (currently, VectorFormat / VectorValues)
> > >
> > > > Should SearchStrategy constants avoid explicit references to HNSW?
> > >
> > > Also +1 for decoupling HNSW specific implementations from general 
> > > vectors, though I am not fully sure if we can strictly separate the 
> > > similarity metrics and search algorithms for vectors.
> > > LUCENE-9322 (unified vectors API) was resolved months ago, does it 
> > > achieve its goal? I haven't followed the issue in months because of my 
> > > laziness...
> > >
> > > Thanks,
> > > Tomoko
> > >
> > >
> > > 2021年3月16日(火) 19:32 Adrien Grand <jpou...@gmail.com>:
> > >>
> > >> Hello,
> > >>
> > >> I've tried to catch up on the vector API and I have the following 
> > >> questions. I've tried to read through discussions on JIRA first in case 
> > >> it had been covered, but it's possible I missed some relevant ones.
> > >>
> > >> Should VectorValues#search be on VectorReader instead? It felt a bit odd 
> > >> to me to have the search logic on the iterator.
> > >>
> > >> Do we need SearchStrategy.NONE? Documentation suggests that it allows 
> > >> storing vectors but that NN search won't be supported. This looks like a 
> > >> use-case for binary doc values to me? It also slightly caught me by 
> > >> surprise due to the inconsistency with IndexOptions.NONE, which means 
> > >> "do not index this field" (and likewise for DocValuesType.NONE), so I 
> > >> first assumed that SearchStrategy.NONE also meant "do not index this 
> > >> field as a vector".
> > >>
> > >> While postings and doc-value formats allow per-field configuration via 
> > >> PerFieldPostingsFormat/PerFieldDocValuesFormat, vectors use a different 
> > >> mechanism where VectorField#createHnswType sets attributes on the field 
> > >> type that the vectors writer then reads. Should we have a 
> > >> PerFieldVectorsFormat instead and configure these options via the 
> > >> vectors format?
> > >>
> > >> Should SearchStrategy constants avoid explicit references to HNSW? The 
> > >> rest of the API seems to try to be agnostic of the way that NN search is 
> > >> implemented. Could we make SearchStrategy only about the similarity 
> > >> metric that is used for vectors? This particular point seems discussed 
> > >> on LUCENE-9322 but I couldn't find the conclusion.
> > >>
> > >> Should we rename VectorFormat to VectorsFormat? This would be more 
> > >> consistent with other file formats that use the plural, like 
> > >> PostingsFormat, DocValuesFormat, TermVectorsFormat, etc.?
> > >>
> > >> --
> > >> Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to