Re: Conneting Lucene with ChatGPT Retrieval Plugin

Jun Luo Tue, 09 May 2023 09:59:43 -0700

The pr mentioned a Elasticsearch pr
<https://github.com/elastic/elasticsearch/pull/95257> that increased the
dim to 2048 in ElasticSearch.

Curious how you use Lucene's KNN search. Lucene's KNN supports one vector
per document. Usually multiple/many vectors are needed for a document
content. We will have to split the document content into chunks and create
one Lucene document per document chunk.

ChatGPT plugin directly stores the chunk text in the underline vector db.
If there are lots of documents, will it be a concern to store the full
document content in Lucene? In the traditional inverted index use case, is
it common to store the full document content in Lucene?

Another question: if you use Lucene as a vector db, do you still need the
inverted index? Wondering what would be the use case to use inverted index
together with vector index. If we don't need the inverted index, will it be
better to use other vector dbs? For example, PostgreSQL also added vector
support recently.

Thanks,
Jun

On Sat, May 6, 2023 at 1:44 PM Michael Wechner <[email protected]>
wrote:

> there is already a pull request for Elasticsearch which is also
> mentioning the max size 1024
>
> https://github.com/openai/chatgpt-retrieval-plugin/pull/83
>
>
>
> Am 06.05.23 um 19:00 schrieb Michael Wechner:
> > Hi Together
> >
> > I recently setup ChatGPT retrieval plugin locally
> >
> > https://github.com/openai/chatgpt-retrieval-plugin
> >
> > I think it would be nice to consider to submit a Lucene implementation
> > for this plugin
> >
> > https://github.com/openai/chatgpt-retrieval-plugin#future-directions
> >
> > The plugin is using by default OpenAI's model "text-embedding-ada-002"
> > with 1536 dimensions
> >
> > https://openai.com/blog/new-and-improved-embedding-model
> >
> > but which means one won't be able to use it out-of-the-box with Lucene.
> >
> > Similar request here
> >
> >
> https://learn.microsoft.com/en-us/answers/questions/1192796/open-ai-text-embedding-dimensions
> >
> >
> > I understand we just recently had a lenghty discussion about
> > increasing the max dimension and whatever one thinks of OpenAI, fact
> > is, that it has a huge impact and I think it would be nice that Lucene
> > could be part of this "revolution". All we have to do is increase the
> > limit from 1024 to 1536 or even 2048 for example.
> >
> > Since the performace seems to be linear with the vector dimension and
> > several members have done performance tests successfully and 1024
> > seems to have been chosen as max dimension quite arbitrarily in the
> > first place, I think it should not be a problem to increase the max
> > dimension by a factor 1.5 or 2.
> >
> > WDYT?
> >
> > Thanks
> >
> > Michael
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Conneting Lucene with ChatGPT Retrieval Plugin

Reply via email to