> Should we rename VectorFormat to VectorsFormat? This would be more
consistent with other file formats that use the plural, like
PostingsFormat, DocValuesFormat, TermVectorsFormat, etc.?

+1 for using plural form for consistency - if we reconsider the names, how
about VectorValuesFormat so that it follows the naming convention for
XXXValues?

DocValuesFormat / DocValues
PointValuesFormat / PointValues
VectorValuesFormat / VectorValues (currently, VectorFormat / VectorValues)

> Should SearchStrategy constants avoid explicit references to HNSW?

Also +1 for decoupling HNSW specific implementations from general vectors,
though I am not fully sure if we can strictly separate the similarity
metrics and search algorithms for vectors.
LUCENE-9322 (unified vectors API) was resolved months ago, does it achieve
its goal? I haven't followed the issue in months because of my laziness...

Thanks,
Tomoko


2021年3月16日(火) 19:32 Adrien Grand <jpou...@gmail.com>:

> Hello,
>
> I've tried to catch up on the vector API and I have the following
> questions. I've tried to read through discussions on JIRA first in case it
> had been covered, but it's possible I missed some relevant ones.
>
> Should VectorValues#search be on VectorReader instead? It felt a bit odd
> to me to have the search logic on the iterator.
>
> Do we need SearchStrategy.NONE? Documentation suggests that it allows
> storing vectors but that NN search won't be supported. This looks like a
> use-case for binary doc values to me? It also slightly caught me by
> surprise due to the inconsistency with IndexOptions.NONE, which means "do
> not index this field" (and likewise for DocValuesType.NONE), so I first
> assumed that SearchStrategy.NONE also meant "do not index this field as a
> vector".
>
> While postings and doc-value formats allow per-field configuration via
> PerFieldPostingsFormat/PerFieldDocValuesFormat, vectors use a different
> mechanism where VectorField#createHnswType sets attributes on the field
> type that the vectors writer then reads. Should we have a
> PerFieldVectorsFormat instead and configure these options via the vectors
> format?
>
> Should SearchStrategy constants avoid explicit references to HNSW? The
> rest of the API seems to try to be agnostic of the way that NN search is
> implemented. Could we make SearchStrategy only about the similarity metric
> that is used for vectors? This particular point seems discussed on
> LUCENE-9322 <https://issues.apache.org/jira/browse/LUCENE-9322> but I
> couldn't find the conclusion.
>
> Should we rename VectorFormat to VectorsFormat? This would be more
> consistent with other file formats that use the plural, like
> PostingsFormat, DocValuesFormat, TermVectorsFormat, etc.?
>
> --
> Adrien
>

Reply via email to