> easily be circumvented by a user This is a revelation to me and others, if true. Michael, please then point to a test or code snippet that shows the Lucene user community what they want to see so they are unblocked from their explorations of vector search.
~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Wed, May 17, 2023 at 7:51 AM Michael Sokolov <msoko...@gmail.com> wrote: > I think I've said before on this list we don't actually enforce the limit > in any way that can't easily be circumvented by a user. The codec already > supports any size vector - it doesn't impose any limit. The way the API is > written you can *already today* create an index with max-int sized vectors > and we are committed to supporting that going forward by our backwards > compatibility policy as Robert points out. This wasn't intentional, I > think, but it is the facts. > > Given that, I think this whole discussion is not really necessary. > > On Tue, May 16, 2023 at 4:50 AM Alessandro Benedetti <a.benede...@sease.io> > wrote: > >> Hi all, >> we have finalized all the options proposed by the community and we are >> ready to vote for the preferred one and then proceed with the >> implementation. >> >> *Option 1* >> Keep it as it is (dimension limit hardcoded to 1024) >> *Motivation*: >> We are close to improving on many fronts. Given the criticality of Lucene >> in computing infrastructure and the concerns raised by one of the most >> active stewards of the project, I think we should keep working toward >> improving the feature as is and move to up the limit after we can >> demonstrate improvement unambiguously. >> >> *Option 2* >> make the limit configurable, for example through a system property >> *Motivation*: >> The system administrator can enforce a limit its users need to respect >> that it's in line with whatever the admin decided to be acceptable for >> them. >> The default can stay the current one. >> This should open the doors for Apache Solr, Elasticsearch, OpenSearch, >> and any sort of plugin development >> >> *Option 3* >> Move the max dimension limit lower level to a HNSW specific >> implementation. Once there, this limit would not bind any other potential >> vector engine alternative/evolution. >> *Motivation:* There seem to be contradictory performance interpretations >> about the current HNSW implementation. Some consider its performance ok, >> some not, and it depends on the target data set and use case. Increasing >> the max dimension limit where it is currently (in top level >> FloatVectorValues) would not allow potential alternatives (e.g. for other >> use-cases) to be based on a lower limit. >> >> *Option 4* >> Make it configurable and move it to an appropriate place. >> In particular, a simple Integer.getInteger("lucene.hnsw.maxDimensions", >> 1024) should be enough. >> *Motivation*: >> Both are good and not mutually exclusive and could happen in any order. >> Someone suggested to perfect what the _default_ limit should be, but I've >> not seen an argument _against_ configurability. Especially in this way -- >> a toggle that doesn't bind Lucene's APIs in any way. >> >> I'll keep this [VOTE] open for a week and then proceed to the >> implementation. >> -------------------------- >> *Alessandro Benedetti* >> Director @ Sease Ltd. >> *Apache Lucene/Solr Committer* >> *Apache Solr PMC Member* >> >> e-mail: a.benede...@sease.io >> >> >> *Sease* - Information Retrieval Applied >> Consulting | Training | Open Source >> >> Website: Sease.io <http://sease.io/> >> LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter >> <https://twitter.com/seaseltd> | Youtube >> <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github >> <https://github.com/seaseltd> >> >