Re: [VOTE] Dimension Limit for KNN Vectors

Alessandro Benedetti Tue, 16 May 2023 02:28:40 -0700

For simplicity's sake, let's consider Option 2 and 4 as equivalent as they
are not mutually exclusive and just differ on a minor implementation
detail.


On Tue, 16 May 2023, 10:24 Alessandro Benedetti, <a.benede...@sease.io>
wrote:

> Option 4 also aim to refactor the limit in an appropriate place for the
> code (short answer is Yes, implementation details)
>
> Cheers
>
> On Tue, 16 May 2023, 10:04 Michael Wechner, <michael.wech...@wyona.com>
> wrote:
>
>> Hi Alessandro
>>
>> Thank you very much for summarizing and starting the vote.
>>
>> I am not sure whether I really understand the difference between Option 2
>> and Option 4, or is it just about implementation details?
>>
>> Thanks
>>
>> Michael
>>
>>
>>
>> Am 16.05.23 um 10:50 schrieb Alessandro Benedetti:
>>
>> Hi all,
>> we have finalized all the options proposed by the community and we are
>> ready to vote for the preferred one and then proceed with the
>> implementation.
>>
>> *Option 1*
>> Keep it as it is (dimension limit hardcoded to 1024)
>> *Motivation*:
>> We are close to improving on many fronts. Given the criticality of Lucene
>> in computing infrastructure and the concerns raised by one of the most
>> active stewards of the project, I think we should keep working toward
>> improving the feature as is and move to up the limit after we can
>> demonstrate improvement unambiguously.
>>
>> *Option 2*
>> make the limit configurable, for example through a system property
>> *Motivation*:
>> The system administrator can enforce a limit its users need to respect
>> that it's in line with whatever the admin decided to be acceptable for
>> them.
>> The default can stay the current one.
>> This should open the doors for Apache Solr, Elasticsearch, OpenSearch,
>> and any sort of plugin development
>>
>> *Option 3*
>> Move the max dimension limit lower level to a HNSW specific
>> implementation. Once there, this limit would not bind any other potential
>> vector engine alternative/evolution.
>> *Motivation:* There seem to be contradictory performance interpretations
>> about the current HNSW implementation. Some consider its performance ok,
>> some not, and it depends on the target data set and use case. Increasing
>> the max dimension limit where it is currently (in top level
>> FloatVectorValues) would not allow potential alternatives (e.g. for other
>> use-cases) to be based on a lower limit.
>>
>> *Option 4*
>> Make it configurable and move it to an appropriate place.
>> In particular, a simple Integer.getInteger("lucene.hnsw.maxDimensions",
>> 1024) should be enough.
>> *Motivation*:
>> Both are good and not mutually exclusive and could happen in any order.
>> Someone suggested to perfect what the _default_ limit should be, but I've
>> not seen an argument _against_ configurability.  Especially in this way --
>> a toggle that doesn't bind Lucene's APIs in any way.
>>
>> I'll keep this [VOTE] open for a week and then proceed to the
>> implementation.
>> --------------------------
>> *Alessandro Benedetti*
>> Director @ Sease Ltd.
>> *Apache Lucene/Solr Committer*
>> *Apache Solr PMC Member*
>>
>> e-mail: a.benede...@sease.io
>>
>>
>> *Sease* - Information Retrieval Applied
>> Consulting | Training | Open Source
>>
>> Website: Sease.io <http://sease.io/>
>> LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
>> <https://twitter.com/seaseltd> | Youtube
>> <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
>> <https://github.com/seaseltd>
>>
>>
>>

Re: [VOTE] Dimension Limit for KNN Vectors

Reply via email to