Not at the moment :-)

I am using Lucene's vector search for https://ukatie.com to detect duplicated questions, whereas I am currently refactoring it, such that you can connect Katie with your own similarity search implementation, whereas I have done a very first prototype of a connector for Weaviate

https://github.com/wyona/spring-boot-hello-world-rest/blob/master/src/main/java/org/wyona/webapp/controllers/v2/KatieMockupConnectorController.java

Weaviate itself is now supporting the OpenAI embeddings and I wanted to see how well this works together with Lucene, whereas I would like to make the embeddings configurable. So far the Katie Lucene implementation supports the various sbert transformer models https://www.sbert.net/docs/pretrained_models.html and OpenAI text-similarity-ada-001

I will need some more time for the refactoring, but will make the Lucene connecter available under the Apache license.

Thanks

Michael

Am 16.02.22 um 19:51 schrieb Michael Sokolov:
Fair enough - are you planning to offer such a service;) sounds exciting!

-Mike

On Tue, Feb 15, 2022 at 6:00 PM Michael Wechner <michael.wech...@wyona.com> wrote:

    true :-) when you are the one controlling the input of vectors,
    then a method to disable the maximum limit would be sufficient.

    But I could imagine when you offer Lucene as a service where
    people can for example configure their own "sentence embedding
    models" and you would like to offer a different maximum limit than
    the default of 1024, then I think a method to reset the maximum
    limit would make sense. Examples could be a service of OpenAI or
    vector search databases like for example Weaviate or Pinecone.

    Thanks

    Michael




    Am 15.02.22 um 23:34 schrieb Michael Sokolov:
    I don't think it makes sense to have a static variable maximum
    that you can change by calling a method. What purpose would it
    serve?

    On Tue, Feb 15, 2022, 2:39 PM Michael Wechner
    <michael.wech...@wyona.com> wrote:

        Hi Alessandro

        No, I have not created a Jira ticket, but I would be happy to
        create one, just let me know or please feel free to create one.

        I understand the concerns about the limits in general and I
        think it makes sense to have a default max dimensions limit,
        but I could imagine it needs to be increased eventually and
        being able to increase it programmatically and at your own
        risk will help people using Lucene.

        Thanks

        Michael

        Am 15.02.22 um 19:22 schrieb Alessandro Benedetti:
        Hi Michael,
        let's create a Jira ticket to use a higher value(if you
        haven't already).
        I would be happy to consider the patch/or do it myself but
        after 10/03.
        Once the pull request is ready (including the Javadoc
        documentation that clearly states that if you go above X
        it's at your own risk), we'll involve also Michael Sokolov
        and the other committers familiar with this area of the code.

        Cheers

        --------------------------
        Alessandro Benedetti
        Apache Lucene/Solr PMC member and Committer
        Director, R&D Software Engineer, Search Consultant

        www.sease.io <http://www.sease.io>


        On Sat, 12 Feb 2022 at 22:53, Michael Wechner
        <michael.wech...@wyona.com> wrote:

            Hi

            I just tried to test the OpenAI model
            "text-similarity-davinci-001" with 12288 dimensions and
            receive the following error

            java.lang.IllegalArgumentException: vector numDimensions
            must be <= VectorValues.MAX_DIMENSIONS (=1024); got 12288
                    at
            
org.apache.lucene.document.FieldType.setVectorDimensionsAndSimilarityFunction(FieldType.java:381)
            ~[lucene-core-9.0.0.jar:9.0.0
            0b18b3b965cedaf5eb129aa41243a44c83ca826d - jpountz -
            2021-12-01 14:23:49]
                    at
            
org.apache.lucene.document.KnnVectorField.createFieldType(KnnVectorField.java:69)
            ~[lucene-core-9.0.0.jar:9.0.0
            0b18b3b965cedaf5eb129aa41243a44c83ca826d - jpountz -
            2021-12-01 14:23:49]

            IIUC I can not increase programmatically the max vector
            size which is set inside
            lucene/core/src/java/org/apache/lucene/index/VectorValues.java


              public static int MAX_DIMENSIONS = 1024;

            right?

            I guess I could rebuild Lucene with a greater size or
            what are the possbilities to increase the max vector size?

            Thanks

            Michael




Reply via email to