> Should we rename VectorFormat to VectorsFormat? This would be more consistent with other file formats that use the plural, like PostingsFormat, DocValuesFormat, TermVectorsFormat, etc.?
+1 for using plural form for consistency - if we reconsider the names, how about VectorValuesFormat so that it follows the naming convention for XXXValues? DocValuesFormat / DocValues PointValuesFormat / PointValues VectorValuesFormat / VectorValues (currently, VectorFormat / VectorValues) > Should SearchStrategy constants avoid explicit references to HNSW? Also +1 for decoupling HNSW specific implementations from general vectors, though I am not fully sure if we can strictly separate the similarity metrics and search algorithms for vectors. LUCENE-9322 (unified vectors API) was resolved months ago, does it achieve its goal? I haven't followed the issue in months because of my laziness... Thanks, Tomoko 2021年3月16日(火) 19:32 Adrien Grand <jpou...@gmail.com>: > Hello, > > I've tried to catch up on the vector API and I have the following > questions. I've tried to read through discussions on JIRA first in case it > had been covered, but it's possible I missed some relevant ones. > > Should VectorValues#search be on VectorReader instead? It felt a bit odd > to me to have the search logic on the iterator. > > Do we need SearchStrategy.NONE? Documentation suggests that it allows > storing vectors but that NN search won't be supported. This looks like a > use-case for binary doc values to me? It also slightly caught me by > surprise due to the inconsistency with IndexOptions.NONE, which means "do > not index this field" (and likewise for DocValuesType.NONE), so I first > assumed that SearchStrategy.NONE also meant "do not index this field as a > vector". > > While postings and doc-value formats allow per-field configuration via > PerFieldPostingsFormat/PerFieldDocValuesFormat, vectors use a different > mechanism where VectorField#createHnswType sets attributes on the field > type that the vectors writer then reads. Should we have a > PerFieldVectorsFormat instead and configure these options via the vectors > format? > > Should SearchStrategy constants avoid explicit references to HNSW? The > rest of the API seems to try to be agnostic of the way that NN search is > implemented. Could we make SearchStrategy only about the similarity metric > that is used for vectors? This particular point seems discussed on > LUCENE-9322 <https://issues.apache.org/jira/browse/LUCENE-9322> but I > couldn't find the conclusion. > > Should we rename VectorFormat to VectorsFormat? This would be more > consistent with other file formats that use the plural, like > PostingsFormat, DocValuesFormat, TermVectorsFormat, etc.? > > -- > Adrien >