Re: Conneting Lucene with ChatGPT Retrieval Plugin

Jonathan Ellis Tue, 09 May 2023 10:21:18 -0700

It looks like the framework is designed to support self-hosted plugins.

On Tue, May 9, 2023 at 12:13 PM jim ferenczi <jim.feren...@gmail.com> wrote:


> Lucene is a library. I don’t see how it would be exposed in this plugin
> which is about services.
>
>
> On Tue, 9 May 2023 at 18:00, Jun Luo <luo.jun...@gmail.com> wrote:
>
>> The pr mentioned a Elasticsearch pr
>> <https://github.com/elastic/elasticsearch/pull/95257> that increased the
>> dim to 2048 in ElasticSearch.
>>
>> Curious how you use Lucene's KNN search. Lucene's KNN supports one vector
>> per document. Usually multiple/many vectors are needed for a document
>> content. We will have to split the document content into chunks and create
>> one Lucene document per document chunk.
>>
>> ChatGPT plugin directly stores the chunk text in the underline vector db.
>> If there are lots of documents, will it be a concern to store the full
>> document content in Lucene? In the traditional inverted index use case, is
>> it common to store the full document content in Lucene?
>>
>> Another question: if you use Lucene as a vector db, do you still need the
>> inverted index? Wondering what would be the use case to use inverted index
>> together with vector index. If we don't need the inverted index, will it be
>> better to use other vector dbs? For example, PostgreSQL also added vector
>> support recently.
>>
>> Thanks,
>> Jun
>>
>> On Sat, May 6, 2023 at 1:44 PM Michael Wechner <michael.wech...@wyona.com>
>> wrote:
>>
>>> there is already a pull request for Elasticsearch which is also
>>> mentioning the max size 1024
>>>
>>> https://github.com/openai/chatgpt-retrieval-plugin/pull/83
>>>
>>>
>>>
>>> Am 06.05.23 um 19:00 schrieb Michael Wechner:
>>> > Hi Together
>>> >
>>> > I recently setup ChatGPT retrieval plugin locally
>>> >
>>> > https://github.com/openai/chatgpt-retrieval-plugin
>>> >
>>> > I think it would be nice to consider to submit a Lucene implementation
>>> > for this plugin
>>> >
>>> > https://github.com/openai/chatgpt-retrieval-plugin#future-directions
>>> >
>>> > The plugin is using by default OpenAI's model "text-embedding-ada-002"
>>> > with 1536 dimensions
>>> >
>>> > https://openai.com/blog/new-and-improved-embedding-model
>>> >
>>> > but which means one won't be able to use it out-of-the-box with Lucene.
>>> >
>>> > Similar request here
>>> >
>>> >
>>> https://learn.microsoft.com/en-us/answers/questions/1192796/open-ai-text-embedding-dimensions
>>> >
>>> >
>>> > I understand we just recently had a lenghty discussion about
>>> > increasing the max dimension and whatever one thinks of OpenAI, fact
>>> > is, that it has a huge impact and I think it would be nice that Lucene
>>> > could be part of this "revolution". All we have to do is increase the
>>> > limit from 1024 to 1536 or even 2048 for example.
>>> >
>>> > Since the performace seems to be linear with the vector dimension and
>>> > several members have done performance tests successfully and 1024
>>> > seems to have been chosen as max dimension quite arbitrarily in the
>>> > first place, I think it should not be a problem to increase the max
>>> > dimension by a factor 1.5 or 2.
>>> >
>>> > WDYT?
>>> >
>>> > Thanks
>>> >
>>> > Michael
>>> >
>>> >
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> > For additional commands, e-mail: dev-h...@lucene.apache.org
>>> >
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced

Re: Conneting Lucene with ChatGPT Retrieval Plugin

Reply via email to