my non-binding vote goes to Option 2 resp. Option 4
Thanks
Michael Wechner
Am 16.05.23 um 10:51 schrieb Alessandro Benedetti:
My vote goes to *Option 4*.
--------------------------
*Alessandro Benedetti*
Director @ Sease Ltd.
/Apache Lucene/Solr Committer/
/Apache Solr PMC Member/
e-mail: a.benede...@sease.io/
/
*Sease* - Information Retrieval Applied
Consulting | Training | Open Source
Website: Sease.io <http://sease.io/>
LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
<https://twitter.com/seaseltd> | Youtube
<https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
<https://github.com/seaseltd>
On Tue, 16 May 2023 at 09:50, Alessandro Benedetti
<a.benede...@sease.io> wrote:
Hi all,
we have finalized all the options proposed by the community and we
are ready to vote for the preferred one and then proceed with the
implementation.
*Option 1*
Keep it as it is (dimension limit hardcoded to 1024)
*Motivation*:
We are close to improving on many fronts. Given the criticality of
Lucene in computing infrastructure and the concerns raised by one
of the most active stewards of the project, I think we should keep
working toward improving the feature as is and move to up the
limit after we can demonstrate improvement unambiguously.
*Option 2*
make the limit configurable, for example through a system property
*Motivation*:
The system administrator can enforce a limit its users need to
respect that it's in line with whatever the admin decided to be
acceptable for them.
The default can stay the current one.
This should open the doors for Apache Solr, Elasticsearch,
OpenSearch, and any sort of plugin development
*Option 3*
Move the max dimension limit lower level to a HNSW specific
implementation. Once there, this limit would not bind any other
potential vector engine alternative/evolution.*
*
*Motivation:*There seem to be contradictory performance
interpretations about the current HNSW implementation. Some
consider its performance ok, some not, and it depends on the
target data set and use case. Increasing the max dimension limit
where it is currently (in top level FloatVectorValues) would not
allow potential alternatives (e.g. for other use-cases) to be
based on a lower limit.
*Option 4*
Make it configurable and move it to an appropriate place.
In particular, a
simple Integer.getInteger("lucene.hnsw.maxDimensions", 1024)
should be enough.
*Motivation*:
Both are good and not mutually exclusive and could happen in any
order.
Someone suggested to perfect what the _default_ limit should be,
but I've not seen an argument _against_ configurability.
Especially in this way -- a toggle that doesn't bind Lucene's APIs
in any way.
I'll keep this [VOTE] open for a week and then proceed to the
implementation.
--------------------------
*Alessandro Benedetti*
Director @ Sease Ltd.
/Apache Lucene/Solr Committer/
/Apache Solr PMC Member/
e-mail: a.benede...@sease.io/
/
*Sease* - Information Retrieval Applied
Consulting | Training | Open Source
Website: Sease.io <http://sease.io/>
LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
<https://twitter.com/seaseltd> | Youtube
<https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> |
Github <https://github.com/seaseltd>