IIUC KnnVectorField is deprecated and one is supposed to use KnnFloatVectorField when using float as vector values, right?

Am 17.05.23 um 16:41 schrieb Michael Sokolov:
see https://markmail.org/message/kf4nzoqyhwacb7ri

On Wed, May 17, 2023 at 10:09 AM David Smiley <dsmi...@apache.org> wrote:

    > easily be circumvented by a user

    This is a revelation to me and others, if true. Michael, please
    then point to a test or code snippet that shows the Lucene user
    community what they want to see so they are unblocked from their
    explorations of vector search.

    ~ David Smiley
    Apache Lucene/Solr Search Developer
    http://www.linkedin.com/in/davidwsmiley


    On Wed, May 17, 2023 at 7:51 AM Michael Sokolov
    <msoko...@gmail.com> wrote:

        I think I've said before on this list we don't actually
        enforce the limit in any way that can't easily be circumvented
        by a user. The codec already supports any size vector - it
        doesn't impose any limit. The way the API is written you can
        *already today* create an index with max-int sized vectors and
        we are committed to supporting that going forward by our
        backwards compatibility policy as Robert points out. This
        wasn't intentional, I think, but it is the facts.

        Given that, I think this whole discussion is not really necessary.

        On Tue, May 16, 2023 at 4:50 AM Alessandro Benedetti
        <a.benede...@sease.io> wrote:

            Hi all,
            we have finalized all the options proposed by the
            community and we are ready to vote for the preferred one
            and then proceed with the implementation.

            *Option 1*
            Keep it as it is (dimension limit hardcoded to 1024)
            *Motivation*:
            We are close to improving on many fronts. Given the
            criticality of Lucene in computing infrastructure and the
            concerns raised by one of the most active stewards of the
            project, I think we should keep working toward improving
            the feature as is and move to up the limit after we can
            demonstrate improvement unambiguously.

            *Option 2*
            make the limit configurable, for example through a system
            property
            *Motivation*:
            The system administrator can enforce a limit its users
            need to respect that it's in line with whatever the admin
            decided to be acceptable for them.
            The default can stay the current one.
            This should open the doors for Apache Solr, Elasticsearch,
            OpenSearch, and any sort of plugin development

            *Option 3*
            Move the max dimension limit lower level to a HNSW
            specific implementation. Once there, this limit would not
            bind any other potential vector engine alternative/evolution.*
            *
            *Motivation:*There seem to be contradictory performance
            interpretations about the current HNSW implementation.
            Some consider its performance ok, some not, and it depends
            on the target data set and use case. Increasing the max
            dimension limit where it is currently (in top level
            FloatVectorValues) would not allow potential alternatives
            (e.g. for other use-cases) to be based on a lower limit.

            *Option 4*
            Make it configurable and move it to an appropriate place.
            In particular, a
            simple Integer.getInteger("lucene.hnsw.maxDimensions",
            1024) should be enough.
            *Motivation*:
            Both are good and not mutually exclusive and could happen
            in any order.
            Someone suggested to perfect what the _default_ limit
            should be, but I've not seen an argument _against_
            configurability.  Especially in this way -- a toggle that
            doesn't bind Lucene's APIs in any way.

            I'll keep this [VOTE] open for a week and then proceed to
            the implementation.
            --------------------------
            *Alessandro Benedetti*
            Director @ Sease Ltd.
            /Apache Lucene/Solr Committer/
            /Apache Solr PMC Member/

            e-mail: a.benede...@sease.io/
            /

            *Sease* - Information Retrieval Applied
            Consulting | Training | Open Source

            Website: Sease.io <http://sease.io/>
            LinkedIn <https://linkedin.com/company/sease-ltd> |
            Twitter <https://twitter.com/seaseltd> | Youtube
            <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> |
            Github <https://github.com/seaseltd>

Reply via email to