Consistent plural naming makes sense to me. I think it ended up singular because I am biased to avoid plural names unless there is a useful distinction to be made. But consistency should trump my predilections.
I think the reason we have search() on VectorValues is that we have LeafReader.getVectorValues() (by analogy to the DocValues iterators), but no way to access the VectorReader. Do you think we should also have LeafReader.getVectorReader()? Today it's only on CodecReader. Re: SearchStrategy.NONE; the idea is we support efficient access to floating point values. Using BinaryDocValues for this will always require an additional decoding step. I can see that the naming is confusing there. The intent is that you index the vector values, but no additional indexing data structure. Also: the reason HNSW is mentioned in these SearchStrategy enums is to make room for other vector indexing approaches, like LSH. There was a lot of discussion that we wanted an API that allowed for experimenting with other techniques for indexing and searching vector values. Adrien, you made an analogy to PerFieldPostingsFormat (and DocValues), but I think the situation is more akin to Points, where we have the options on IndexableField. The metadata we store there (dimension and score function) don't really result in different formats, ie code paths for indexing and storage; they are more like parameters to the format, in my mind. Perhaps the situation will look different when we get our second vector indexing strategy (like LSH). On Tue, Mar 16, 2021 at 10:19 AM Tomoko Uchida <tomoko.uchida.1...@gmail.com> wrote: > > > Should we rename VectorFormat to VectorsFormat? This would be more > > consistent with other file formats that use the plural, like > > PostingsFormat, DocValuesFormat, TermVectorsFormat, etc.? > > +1 for using plural form for consistency - if we reconsider the names, how > about VectorValuesFormat so that it follows the naming convention for > XXXValues? > > DocValuesFormat / DocValues > PointValuesFormat / PointValues > VectorValuesFormat / VectorValues (currently, VectorFormat / VectorValues) > > > Should SearchStrategy constants avoid explicit references to HNSW? > > Also +1 for decoupling HNSW specific implementations from general vectors, > though I am not fully sure if we can strictly separate the similarity metrics > and search algorithms for vectors. > LUCENE-9322 (unified vectors API) was resolved months ago, does it achieve > its goal? I haven't followed the issue in months because of my laziness... > > Thanks, > Tomoko > > > 2021年3月16日(火) 19:32 Adrien Grand <jpou...@gmail.com>: >> >> Hello, >> >> I've tried to catch up on the vector API and I have the following questions. >> I've tried to read through discussions on JIRA first in case it had been >> covered, but it's possible I missed some relevant ones. >> >> Should VectorValues#search be on VectorReader instead? It felt a bit odd to >> me to have the search logic on the iterator. >> >> Do we need SearchStrategy.NONE? Documentation suggests that it allows >> storing vectors but that NN search won't be supported. This looks like a >> use-case for binary doc values to me? It also slightly caught me by surprise >> due to the inconsistency with IndexOptions.NONE, which means "do not index >> this field" (and likewise for DocValuesType.NONE), so I first assumed that >> SearchStrategy.NONE also meant "do not index this field as a vector". >> >> While postings and doc-value formats allow per-field configuration via >> PerFieldPostingsFormat/PerFieldDocValuesFormat, vectors use a different >> mechanism where VectorField#createHnswType sets attributes on the field type >> that the vectors writer then reads. Should we have a PerFieldVectorsFormat >> instead and configure these options via the vectors format? >> >> Should SearchStrategy constants avoid explicit references to HNSW? The rest >> of the API seems to try to be agnostic of the way that NN search is >> implemented. Could we make SearchStrategy only about the similarity metric >> that is used for vectors? This particular point seems discussed on >> LUCENE-9322 but I couldn't find the conclusion. >> >> Should we rename VectorFormat to VectorsFormat? This would be more >> consistent with other file formats that use the plural, like PostingsFormat, >> DocValuesFormat, TermVectorsFormat, etc.? >> >> -- >> Adrien --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org