IIUC KnnVectorField is deprecated and one is supposed to use
KnnFloatVectorField when using float as vector values, right?
Am 17.05.23 um 16:41 schrieb Michael Sokolov:
see https://markmail.org/message/kf4nzoqyhwacb7ri
On Wed, May 17, 2023 at 10:09 AM David Smiley <dsmi...@apache.org> wrote:
> easily be circumvented by a user
This is a revelation to me and others, if true. Michael, please
then point to a test or code snippet that shows the Lucene user
community what they want to see so they are unblocked from their
explorations of vector search.
~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley
On Wed, May 17, 2023 at 7:51 AM Michael Sokolov
<msoko...@gmail.com> wrote:
I think I've said before on this list we don't actually
enforce the limit in any way that can't easily be circumvented
by a user. The codec already supports any size vector - it
doesn't impose any limit. The way the API is written you can
*already today* create an index with max-int sized vectors and
we are committed to supporting that going forward by our
backwards compatibility policy as Robert points out. This
wasn't intentional, I think, but it is the facts.
Given that, I think this whole discussion is not really necessary.
On Tue, May 16, 2023 at 4:50 AM Alessandro Benedetti
<a.benede...@sease.io> wrote:
Hi all,
we have finalized all the options proposed by the
community and we are ready to vote for the preferred one
and then proceed with the implementation.
*Option 1*
Keep it as it is (dimension limit hardcoded to 1024)
*Motivation*:
We are close to improving on many fronts. Given the
criticality of Lucene in computing infrastructure and the
concerns raised by one of the most active stewards of the
project, I think we should keep working toward improving
the feature as is and move to up the limit after we can
demonstrate improvement unambiguously.
*Option 2*
make the limit configurable, for example through a system
property
*Motivation*:
The system administrator can enforce a limit its users
need to respect that it's in line with whatever the admin
decided to be acceptable for them.
The default can stay the current one.
This should open the doors for Apache Solr, Elasticsearch,
OpenSearch, and any sort of plugin development
*Option 3*
Move the max dimension limit lower level to a HNSW
specific implementation. Once there, this limit would not
bind any other potential vector engine alternative/evolution.*
*
*Motivation:*There seem to be contradictory performance
interpretations about the current HNSW implementation.
Some consider its performance ok, some not, and it depends
on the target data set and use case. Increasing the max
dimension limit where it is currently (in top level
FloatVectorValues) would not allow potential alternatives
(e.g. for other use-cases) to be based on a lower limit.
*Option 4*
Make it configurable and move it to an appropriate place.
In particular, a
simple Integer.getInteger("lucene.hnsw.maxDimensions",
1024) should be enough.
*Motivation*:
Both are good and not mutually exclusive and could happen
in any order.
Someone suggested to perfect what the _default_ limit
should be, but I've not seen an argument _against_
configurability. Especially in this way -- a toggle that
doesn't bind Lucene's APIs in any way.
I'll keep this [VOTE] open for a week and then proceed to
the implementation.
--------------------------
*Alessandro Benedetti*
Director @ Sease Ltd.
/Apache Lucene/Solr Committer/
/Apache Solr PMC Member/
e-mail: a.benede...@sease.io/
/
*Sease* - Information Retrieval Applied
Consulting | Training | Open Source
Website: Sease.io <http://sease.io/>
LinkedIn <https://linkedin.com/company/sease-ltd> |
Twitter <https://twitter.com/seaseltd> | Youtube
<https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> |
Github <https://github.com/seaseltd>