My vote goes to *Option 4*. -------------------------- *Alessandro Benedetti* Director @ Sease Ltd. *Apache Lucene/Solr Committer* *Apache Solr PMC Member*
e-mail: a.benede...@sease.io *Sease* - Information Retrieval Applied Consulting | Training | Open Source Website: Sease.io <http://sease.io/> LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter <https://twitter.com/seaseltd> | Youtube <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github <https://github.com/seaseltd> On Tue, 16 May 2023 at 09:50, Alessandro Benedetti <a.benede...@sease.io> wrote: > Hi all, > we have finalized all the options proposed by the community and we are > ready to vote for the preferred one and then proceed with the > implementation. > > *Option 1* > Keep it as it is (dimension limit hardcoded to 1024) > *Motivation*: > We are close to improving on many fronts. Given the criticality of Lucene > in computing infrastructure and the concerns raised by one of the most > active stewards of the project, I think we should keep working toward > improving the feature as is and move to up the limit after we can > demonstrate improvement unambiguously. > > *Option 2* > make the limit configurable, for example through a system property > *Motivation*: > The system administrator can enforce a limit its users need to respect > that it's in line with whatever the admin decided to be acceptable for > them. > The default can stay the current one. > This should open the doors for Apache Solr, Elasticsearch, OpenSearch, and > any sort of plugin development > > *Option 3* > Move the max dimension limit lower level to a HNSW specific > implementation. Once there, this limit would not bind any other potential > vector engine alternative/evolution. > *Motivation:* There seem to be contradictory performance interpretations > about the current HNSW implementation. Some consider its performance ok, > some not, and it depends on the target data set and use case. Increasing > the max dimension limit where it is currently (in top level > FloatVectorValues) would not allow potential alternatives (e.g. for other > use-cases) to be based on a lower limit. > > *Option 4* > Make it configurable and move it to an appropriate place. > In particular, a simple Integer.getInteger("lucene.hnsw.maxDimensions", > 1024) should be enough. > *Motivation*: > Both are good and not mutually exclusive and could happen in any order. > Someone suggested to perfect what the _default_ limit should be, but I've > not seen an argument _against_ configurability. Especially in this way -- > a toggle that doesn't bind Lucene's APIs in any way. > > I'll keep this [VOTE] open for a week and then proceed to the > implementation. > -------------------------- > *Alessandro Benedetti* > Director @ Sease Ltd. > *Apache Lucene/Solr Committer* > *Apache Solr PMC Member* > > e-mail: a.benede...@sease.io > > > *Sease* - Information Retrieval Applied > Consulting | Training | Open Source > > Website: Sease.io <http://sease.io/> > LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter > <https://twitter.com/seaseltd> | Youtube > <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github > <https://github.com/seaseltd> >