Re: [VOTE] Dimension Limit for KNN Vectors

Mayya Sharipova Wed, 17 May 2023 09:58:38 -0700

Alessandro,
Thanks for raising the code of conduct; it is very discouraging and
intimidating to participate in discussions where such language is used
especially by senior members.


Michael S.,
thanks for your suggestion and that's what we used in Elasticsearch to
raise dims limit, and Alessandro, perhaps, you can use it as well in Solr
for the time being.

On Wed, May 17, 2023 at 11:03 AM Alessandro Benedetti <[email protected]>
wrote:

> Thanks, Michael,
> that example backs even more strongly the need of cleaning it up and
> making the limit configurable without the need for custom field types I
> guess (I was taking a look at the code again, and it seems the limit is
> also checked twice:
> in org.apache.lucene.document.KnnByteVectorField#createType and then
> in org.apache.lucene.document.FieldType#setVectorAttributes (for both byte
> and float variants).
> This should help people vote, great!
>
> Cheers
> --------------------------
> *Alessandro Benedetti*
> Director @ Sease Ltd.
> *Apache Lucene/Solr Committer*
> *Apache Solr PMC Member*
>
> e-mail: [email protected]
>
>
> *Sease* - Information Retrieval Applied
> Consulting | Training | Open Source
>
> Website: Sease.io <http://sease.io/>
> LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
> <https://twitter.com/seaseltd> | Youtube
> <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
> <https://github.com/seaseltd>
>
>
> On Wed, 17 May 2023 at 15:42, Michael Sokolov <[email protected]> wrote:
>
>> see https://markmail.org/message/kf4nzoqyhwacb7ri
>>
>> On Wed, May 17, 2023 at 10:09 AM David Smiley <[email protected]> wrote:
>>
>>> > easily be circumvented by a user
>>>
>>> This is a revelation to me and others, if true.  Michael, please then
>>> point to a test or code snippet that shows the Lucene user community what
>>> they want to see so they are unblocked from their explorations of vector
>>> search.
>>>
>>> ~ David Smiley
>>> Apache Lucene/Solr Search Developer
>>> http://www.linkedin.com/in/davidwsmiley
>>>
>>>
>>> On Wed, May 17, 2023 at 7:51 AM Michael Sokolov <[email protected]>
>>> wrote:
>>>
>>>> I think I've said before on this list we don't actually enforce the
>>>> limit in any way that can't easily be circumvented by a user. The codec
>>>> already supports any size vector - it doesn't impose any limit. The way the
>>>> API is written you can *already today* create an index with max-int sized
>>>> vectors and we are committed to supporting that going forward by our
>>>> backwards compatibility policy as Robert points out. This wasn't
>>>> intentional, I think, but it is the facts.
>>>>
>>>> Given that, I think this whole discussion is not really necessary.
>>>>
>>>> On Tue, May 16, 2023 at 4:50 AM Alessandro Benedetti <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi all,
>>>>> we have finalized all the options proposed by the community and we are
>>>>> ready to vote for the preferred one and then proceed with the
>>>>> implementation.
>>>>>
>>>>> *Option 1*
>>>>> Keep it as it is (dimension limit hardcoded to 1024)
>>>>> *Motivation*:
>>>>> We are close to improving on many fronts. Given the criticality of
>>>>> Lucene in computing infrastructure and the concerns raised by one of the
>>>>> most active stewards of the project, I think we should keep working toward
>>>>> improving the feature as is and move to up the limit after we can
>>>>> demonstrate improvement unambiguously.
>>>>>
>>>>> *Option 2*
>>>>> make the limit configurable, for example through a system property
>>>>> *Motivation*:
>>>>> The system administrator can enforce a limit its users need to respect
>>>>> that it's in line with whatever the admin decided to be acceptable for
>>>>> them.
>>>>> The default can stay the current one.
>>>>> This should open the doors for Apache Solr, Elasticsearch, OpenSearch,
>>>>> and any sort of plugin development
>>>>>
>>>>> *Option 3*
>>>>> Move the max dimension limit lower level to a HNSW specific
>>>>> implementation. Once there, this limit would not bind any other potential
>>>>> vector engine alternative/evolution.
>>>>> *Motivation:* There seem to be contradictory performance
>>>>> interpretations about the current HNSW implementation. Some consider its
>>>>> performance ok, some not, and it depends on the target data set and use
>>>>> case. Increasing the max dimension limit where it is currently (in top
>>>>> level FloatVectorValues) would not allow potential alternatives (e.g. for
>>>>> other use-cases) to be based on a lower limit.
>>>>>
>>>>> *Option 4*
>>>>> Make it configurable and move it to an appropriate place.
>>>>> In particular, a
>>>>> simple Integer.getInteger("lucene.hnsw.maxDimensions", 1024) should be
>>>>> enough.
>>>>> *Motivation*:
>>>>> Both are good and not mutually exclusive and could happen in any order.
>>>>> Someone suggested to perfect what the _default_ limit should be, but
>>>>> I've not seen an argument _against_ configurability.  Especially in this
>>>>> way -- a toggle that doesn't bind Lucene's APIs in any way.
>>>>>
>>>>> I'll keep this [VOTE] open for a week and then proceed to the
>>>>> implementation.
>>>>> --------------------------
>>>>> *Alessandro Benedetti*
>>>>> Director @ Sease Ltd.
>>>>> *Apache Lucene/Solr Committer*
>>>>> *Apache Solr PMC Member*
>>>>>
>>>>> e-mail: [email protected]
>>>>>
>>>>>
>>>>> *Sease* - Information Retrieval Applied
>>>>> Consulting | Training | Open Source
>>>>>
>>>>> Website: Sease.io <http://sease.io/>
>>>>> LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
>>>>> <https://twitter.com/seaseltd> | Youtube
>>>>> <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
>>>>> <https://github.com/seaseltd>
>>>>>
>>>>

Re: [VOTE] Dimension Limit for KNN Vectors

Reply via email to