Re: [VOTE] Dimension Limit for KNN Vectors

David Smiley Wed, 17 May 2023 07:09:51 -0700

> easily be circumvented by a user

This is a revelation to me and others, if true.  Michael, please then point
to a test or code snippet that shows the Lucene user community what they
want to see so they are unblocked from their explorations of vector search.


~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Wed, May 17, 2023 at 7:51 AM Michael Sokolov <[email protected]> wrote:

> I think I've said before on this list we don't actually enforce the limit
> in any way that can't easily be circumvented by a user. The codec already
> supports any size vector - it doesn't impose any limit. The way the API is
> written you can *already today* create an index with max-int sized vectors
> and we are committed to supporting that going forward by our backwards
> compatibility policy as Robert points out. This wasn't intentional, I
> think, but it is the facts.
>
> Given that, I think this whole discussion is not really necessary.
>
> On Tue, May 16, 2023 at 4:50 AM Alessandro Benedetti <[email protected]>
> wrote:
>
>> Hi all,
>> we have finalized all the options proposed by the community and we are
>> ready to vote for the preferred one and then proceed with the
>> implementation.
>>
>> *Option 1*
>> Keep it as it is (dimension limit hardcoded to 1024)
>> *Motivation*:
>> We are close to improving on many fronts. Given the criticality of Lucene
>> in computing infrastructure and the concerns raised by one of the most
>> active stewards of the project, I think we should keep working toward
>> improving the feature as is and move to up the limit after we can
>> demonstrate improvement unambiguously.
>>
>> *Option 2*
>> make the limit configurable, for example through a system property
>> *Motivation*:
>> The system administrator can enforce a limit its users need to respect
>> that it's in line with whatever the admin decided to be acceptable for
>> them.
>> The default can stay the current one.
>> This should open the doors for Apache Solr, Elasticsearch, OpenSearch,
>> and any sort of plugin development
>>
>> *Option 3*
>> Move the max dimension limit lower level to a HNSW specific
>> implementation. Once there, this limit would not bind any other potential
>> vector engine alternative/evolution.
>> *Motivation:* There seem to be contradictory performance interpretations
>> about the current HNSW implementation. Some consider its performance ok,
>> some not, and it depends on the target data set and use case. Increasing
>> the max dimension limit where it is currently (in top level
>> FloatVectorValues) would not allow potential alternatives (e.g. for other
>> use-cases) to be based on a lower limit.
>>
>> *Option 4*
>> Make it configurable and move it to an appropriate place.
>> In particular, a simple Integer.getInteger("lucene.hnsw.maxDimensions",
>> 1024) should be enough.
>> *Motivation*:
>> Both are good and not mutually exclusive and could happen in any order.
>> Someone suggested to perfect what the _default_ limit should be, but I've
>> not seen an argument _against_ configurability.  Especially in this way --
>> a toggle that doesn't bind Lucene's APIs in any way.
>>
>> I'll keep this [VOTE] open for a week and then proceed to the
>> implementation.
>> --------------------------
>> *Alessandro Benedetti*
>> Director @ Sease Ltd.
>> *Apache Lucene/Solr Committer*
>> *Apache Solr PMC Member*
>>
>> e-mail: [email protected]
>>
>>
>> *Sease* - Information Retrieval Applied
>> Consulting | Training | Open Source
>>
>> Website: Sease.io <http://sease.io/>
>> LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
>> <https://twitter.com/seaseltd> | Youtube
>> <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
>> <https://github.com/seaseltd>
>>
>

Re: [VOTE] Dimension Limit for KNN Vectors

Reply via email to