my non-binding vote goes to Option 2 resp. Option 4

Thanks

Michael Wechner


Am 16.05.23 um 10:51 schrieb Alessandro Benedetti:
My vote goes to *Option 4*.
--------------------------
*Alessandro Benedetti*
Director @ Sease Ltd.
/Apache Lucene/Solr Committer/
/Apache Solr PMC Member/

e-mail: a.benede...@sease.io/
/

*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io <http://sease.io/>
LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter <https://twitter.com/seaseltd> | Youtube <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github <https://github.com/seaseltd>


On Tue, 16 May 2023 at 09:50, Alessandro Benedetti <a.benede...@sease.io> wrote:

    Hi all,
    we have finalized all the options proposed by the community and we
    are ready to vote for the preferred one and then proceed with the
    implementation.

    *Option 1*
    Keep it as it is (dimension limit hardcoded to 1024)
    *Motivation*:
    We are close to improving on many fronts. Given the criticality of
    Lucene in computing infrastructure and the concerns raised by one
    of the most active stewards of the project, I think we should keep
    working toward improving the feature as is and move to up the
    limit after we can demonstrate improvement unambiguously.

    *Option 2*
    make the limit configurable, for example through a system property
    *Motivation*:
    The system administrator can enforce a limit its users need to
    respect that it's in line with whatever the admin decided to be
    acceptable for them.
    The default can stay the current one.
    This should open the doors for Apache Solr, Elasticsearch,
    OpenSearch, and any sort of plugin development

    *Option 3*
    Move the max dimension limit lower level to a HNSW specific
    implementation. Once there, this limit would not bind any other
    potential vector engine alternative/evolution.*
    *
    *Motivation:*There seem to be contradictory performance
    interpretations about the current HNSW implementation. Some
    consider its performance ok, some not, and it depends on the
    target data set and use case. Increasing the max dimension limit
    where it is currently (in top level FloatVectorValues) would not
    allow potential alternatives (e.g. for other use-cases) to be
    based on a lower limit.

    *Option 4*
    Make it configurable and move it to an appropriate place.
    In particular, a
    simple Integer.getInteger("lucene.hnsw.maxDimensions", 1024)
    should be enough.
    *Motivation*:
    Both are good and not mutually exclusive and could happen in any
    order.
    Someone suggested to perfect what the _default_ limit should be,
    but I've not seen an argument _against_ configurability. 
    Especially in this way -- a toggle that doesn't bind Lucene's APIs
    in any way.

    I'll keep this [VOTE] open for a week and then proceed to the
    implementation.
    --------------------------
    *Alessandro Benedetti*
    Director @ Sease Ltd.
    /Apache Lucene/Solr Committer/
    /Apache Solr PMC Member/

    e-mail: a.benede...@sease.io/
    /

    *Sease* - Information Retrieval Applied
    Consulting | Training | Open Source

    Website: Sease.io <http://sease.io/>
    LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
    <https://twitter.com/seaseltd> | Youtube
    <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> |
    Github <https://github.com/seaseltd>

Reply via email to