Option 4 also aim to refactor the limit in an appropriate place for the code (short answer is Yes, implementation details)
Cheers On Tue, 16 May 2023, 10:04 Michael Wechner, <michael.wech...@wyona.com> wrote: > Hi Alessandro > > Thank you very much for summarizing and starting the vote. > > I am not sure whether I really understand the difference between Option 2 > and Option 4, or is it just about implementation details? > > Thanks > > Michael > > > > Am 16.05.23 um 10:50 schrieb Alessandro Benedetti: > > Hi all, > we have finalized all the options proposed by the community and we are > ready to vote for the preferred one and then proceed with the > implementation. > > *Option 1* > Keep it as it is (dimension limit hardcoded to 1024) > *Motivation*: > We are close to improving on many fronts. Given the criticality of Lucene > in computing infrastructure and the concerns raised by one of the most > active stewards of the project, I think we should keep working toward > improving the feature as is and move to up the limit after we can > demonstrate improvement unambiguously. > > *Option 2* > make the limit configurable, for example through a system property > *Motivation*: > The system administrator can enforce a limit its users need to respect > that it's in line with whatever the admin decided to be acceptable for > them. > The default can stay the current one. > This should open the doors for Apache Solr, Elasticsearch, OpenSearch, and > any sort of plugin development > > *Option 3* > Move the max dimension limit lower level to a HNSW specific > implementation. Once there, this limit would not bind any other potential > vector engine alternative/evolution. > *Motivation:* There seem to be contradictory performance interpretations > about the current HNSW implementation. Some consider its performance ok, > some not, and it depends on the target data set and use case. Increasing > the max dimension limit where it is currently (in top level > FloatVectorValues) would not allow potential alternatives (e.g. for other > use-cases) to be based on a lower limit. > > *Option 4* > Make it configurable and move it to an appropriate place. > In particular, a simple Integer.getInteger("lucene.hnsw.maxDimensions", > 1024) should be enough. > *Motivation*: > Both are good and not mutually exclusive and could happen in any order. > Someone suggested to perfect what the _default_ limit should be, but I've > not seen an argument _against_ configurability. Especially in this way -- > a toggle that doesn't bind Lucene's APIs in any way. > > I'll keep this [VOTE] open for a week and then proceed to the > implementation. > -------------------------- > *Alessandro Benedetti* > Director @ Sease Ltd. > *Apache Lucene/Solr Committer* > *Apache Solr PMC Member* > > e-mail: a.benede...@sease.io > > > *Sease* - Information Retrieval Applied > Consulting | Training | Open Source > > Website: Sease.io <http://sease.io/> > LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter > <https://twitter.com/seaseltd> | Youtube > <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github > <https://github.com/seaseltd> > > >