Also, Tomoko re:LUCENE-9322, did it succeed? I guess we won't know for
sure unless someone revives
https://issues.apache.org/jira/browse/LUCENE-9136 or something like
that

On Tue, Mar 16, 2021 at 12:04 PM Michael Sokolov <msoko...@gmail.com> wrote:
>
> Consistent plural naming makes sense to me. I think it ended up
> singular because I am biased to avoid plural names unless there is a
> useful distinction to be made. But consistency should trump my
> predilections.
>
> I think the reason we have search() on VectorValues is that we have
> LeafReader.getVectorValues() (by analogy to the DocValues iterators),
> but no way to access the VectorReader. Do you think we should also
> have LeafReader.getVectorReader()? Today it's only on CodecReader.
>
> Re: SearchStrategy.NONE; the idea is we support efficient access to
> floating point values. Using BinaryDocValues for this will always
> require an additional decoding step. I can see that the naming is
> confusing there. The intent is that you index the vector values, but
> no additional indexing data structure. Also: the reason HNSW is
> mentioned in these SearchStrategy enums is to make room for other
> vector indexing approaches, like LSH. There was a lot of discussion
> that we wanted an API that allowed for experimenting with other
> techniques for indexing and searching vector values.
>
> Adrien, you made an analogy to PerFieldPostingsFormat (and DocValues),
> but I think the situation is more akin to Points, where we have the
> options on IndexableField. The metadata we store there (dimension and
> score function) don't really result in different formats, ie code
> paths for indexing and storage; they are more like parameters to the
> format, in my mind. Perhaps the situation will look different when we
> get our second vector indexing strategy (like LSH).
>
>
> On Tue, Mar 16, 2021 at 10:19 AM Tomoko Uchida
> <tomoko.uchida.1...@gmail.com> wrote:
> >
> > > Should we rename VectorFormat to VectorsFormat? This would be more 
> > > consistent with other file formats that use the plural, like 
> > > PostingsFormat, DocValuesFormat, TermVectorsFormat, etc.?
> >
> > +1 for using plural form for consistency - if we reconsider the names, how 
> > about VectorValuesFormat so that it follows the naming convention for 
> > XXXValues?
> >
> > DocValuesFormat / DocValues
> > PointValuesFormat / PointValues
> > VectorValuesFormat / VectorValues (currently, VectorFormat / VectorValues)
> >
> > > Should SearchStrategy constants avoid explicit references to HNSW?
> >
> > Also +1 for decoupling HNSW specific implementations from general vectors, 
> > though I am not fully sure if we can strictly separate the similarity 
> > metrics and search algorithms for vectors.
> > LUCENE-9322 (unified vectors API) was resolved months ago, does it achieve 
> > its goal? I haven't followed the issue in months because of my laziness...
> >
> > Thanks,
> > Tomoko
> >
> >
> > 2021年3月16日(火) 19:32 Adrien Grand <jpou...@gmail.com>:
> >>
> >> Hello,
> >>
> >> I've tried to catch up on the vector API and I have the following 
> >> questions. I've tried to read through discussions on JIRA first in case it 
> >> had been covered, but it's possible I missed some relevant ones.
> >>
> >> Should VectorValues#search be on VectorReader instead? It felt a bit odd 
> >> to me to have the search logic on the iterator.
> >>
> >> Do we need SearchStrategy.NONE? Documentation suggests that it allows 
> >> storing vectors but that NN search won't be supported. This looks like a 
> >> use-case for binary doc values to me? It also slightly caught me by 
> >> surprise due to the inconsistency with IndexOptions.NONE, which means "do 
> >> not index this field" (and likewise for DocValuesType.NONE), so I first 
> >> assumed that SearchStrategy.NONE also meant "do not index this field as a 
> >> vector".
> >>
> >> While postings and doc-value formats allow per-field configuration via 
> >> PerFieldPostingsFormat/PerFieldDocValuesFormat, vectors use a different 
> >> mechanism where VectorField#createHnswType sets attributes on the field 
> >> type that the vectors writer then reads. Should we have a 
> >> PerFieldVectorsFormat instead and configure these options via the vectors 
> >> format?
> >>
> >> Should SearchStrategy constants avoid explicit references to HNSW? The 
> >> rest of the API seems to try to be agnostic of the way that NN search is 
> >> implemented. Could we make SearchStrategy only about the similarity metric 
> >> that is used for vectors? This particular point seems discussed on 
> >> LUCENE-9322 but I couldn't find the conclusion.
> >>
> >> Should we rename VectorFormat to VectorsFormat? This would be more 
> >> consistent with other file formats that use the plural, like 
> >> PostingsFormat, DocValuesFormat, TermVectorsFormat, etc.?
> >>
> >> --
> >> Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to