I assumed that you would wrap Lucene into a mimimal REST service or use Solr or Elasticsearch

Am 09.05.23 um 19:07 schrieb jim ferenczi:
Lucene is a library. I don’t see how it would be exposed in this plugin which is about services.


On Tue, 9 May 2023 at 18:00, Jun Luo <luo.jun...@gmail.com> wrote:

    The pr mentioned a Elasticsearch pr
    <https://github.com/elastic/elasticsearch/pull/95257> that
    increased the dim to 2048 in ElasticSearch.

    Curious how you use Lucene's KNN search. Lucene's KNN supports one
    vector per document. Usually multiple/many vectors are needed for
    a document content. We will have to split the document content
    into chunks and create one Lucene document per document chunk.

    ChatGPT plugin directly stores the chunk text in the underline
    vector db. If there are lots of documents, will it be a concern to
    store the full document content in Lucene? In the traditional
    inverted index use case, is it common to store the full document
    content in Lucene?

    Another question: if you use Lucene as a vector db, do you still
    need the inverted index? Wondering what would be the use case to
    use inverted index together with vector index. If we don't need
    the inverted index, will it be better to use other vector dbs? For
    example, PostgreSQL also added vector support recently.

    Thanks,
    Jun

    On Sat, May 6, 2023 at 1:44 PM Michael Wechner
    <michael.wech...@wyona.com> wrote:

        there is already a pull request for Elasticsearch which is also
        mentioning the max size 1024

        https://github.com/openai/chatgpt-retrieval-plugin/pull/83



        Am 06.05.23 um 19:00 schrieb Michael Wechner:
        > Hi Together
        >
        > I recently setup ChatGPT retrieval plugin locally
        >
        > https://github.com/openai/chatgpt-retrieval-plugin
        >
        > I think it would be nice to consider to submit a Lucene
        implementation
        > for this plugin
        >
        >
        https://github.com/openai/chatgpt-retrieval-plugin#future-directions
        >
        > The plugin is using by default OpenAI's model
        "text-embedding-ada-002"
        > with 1536 dimensions
        >
        > https://openai.com/blog/new-and-improved-embedding-model
        >
        > but which means one won't be able to use it out-of-the-box
        with Lucene.
        >
        > Similar request here
        >
        >
        
https://learn.microsoft.com/en-us/answers/questions/1192796/open-ai-text-embedding-dimensions

        >
        >
        > I understand we just recently had a lenghty discussion about
        > increasing the max dimension and whatever one thinks of
        OpenAI, fact
        > is, that it has a huge impact and I think it would be nice
        that Lucene
        > could be part of this "revolution". All we have to do is
        increase the
        > limit from 1024 to 1536 or even 2048 for example.
        >
        > Since the performace seems to be linear with the vector
        dimension and
        > several members have done performance tests successfully and
        1024
        > seems to have been chosen as max dimension quite arbitrarily
        in the
        > first place, I think it should not be a problem to increase
        the max
        > dimension by a factor 1.5 or 2.
        >
        > WDYT?
        >
        > Thanks
        >
        > Michael
        >
        >
        >
        >
        ---------------------------------------------------------------------
        > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
        > For additional commands, e-mail: dev-h...@lucene.apache.org
        >


        ---------------------------------------------------------------------
        To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
        For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to