I'm for option 3 (limit at algorithm level), with the default there tunable
via property (option 4).

I understand Robert's concerns and I'd love to contribute a faster
implementation but the reality is - I can't do it at the moment. I feel
like experiments are good though and we shouldn't just ban people from
trying - if somebody changes the (sane) default and gets burned by
performance, perhaps it'll be an itch to work on speeding things up (much
like it's already happening with Jonathan's patch).

Dawid

On Tue, May 16, 2023 at 10:50 AM Alessandro Benedetti <a.benede...@sease.io>
wrote:

> Hi all,
> we have finalized all the options proposed by the community and we are
> ready to vote for the preferred one and then proceed with the
> implementation.
>
> *Option 1*
> Keep it as it is (dimension limit hardcoded to 1024)
> *Motivation*:
> We are close to improving on many fronts. Given the criticality of Lucene
> in computing infrastructure and the concerns raised by one of the most
> active stewards of the project, I think we should keep working toward
> improving the feature as is and move to up the limit after we can
> demonstrate improvement unambiguously.
>
> *Option 2*
> make the limit configurable, for example through a system property
> *Motivation*:
> The system administrator can enforce a limit its users need to respect
> that it's in line with whatever the admin decided to be acceptable for
> them.
> The default can stay the current one.
> This should open the doors for Apache Solr, Elasticsearch, OpenSearch, and
> any sort of plugin development
>
> *Option 3*
> Move the max dimension limit lower level to a HNSW specific
> implementation. Once there, this limit would not bind any other potential
> vector engine alternative/evolution.
> *Motivation:* There seem to be contradictory performance interpretations
> about the current HNSW implementation. Some consider its performance ok,
> some not, and it depends on the target data set and use case. Increasing
> the max dimension limit where it is currently (in top level
> FloatVectorValues) would not allow potential alternatives (e.g. for other
> use-cases) to be based on a lower limit.
>
> *Option 4*
> Make it configurable and move it to an appropriate place.
> In particular, a simple Integer.getInteger("lucene.hnsw.maxDimensions",
> 1024) should be enough.
> *Motivation*:
> Both are good and not mutually exclusive and could happen in any order.
> Someone suggested to perfect what the _default_ limit should be, but I've
> not seen an argument _against_ configurability.  Especially in this way --
> a toggle that doesn't bind Lucene's APIs in any way.
>
> I'll keep this [VOTE] open for a week and then proceed to the
> implementation.
> --------------------------
> *Alessandro Benedetti*
> Director @ Sease Ltd.
> *Apache Lucene/Solr Committer*
> *Apache Solr PMC Member*
>
> e-mail: a.benede...@sease.io
>
>
> *Sease* - Information Retrieval Applied
> Consulting | Training | Open Source
>
> Website: Sease.io <http://sease.io/>
> LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
> <https://twitter.com/seaseltd> | Youtube
> <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
> <https://github.com/seaseltd>
>

Reply via email to