Re: Conneting Lucene with ChatGPT Retrieval Plugin

Bruno Roustant Tue, 09 May 2023 09:49:18 -0700

I agree with Robert Muir that an increase of the 1024 limit as it is
currently in FloatVectorValues or ByteVectorValues would bind the API, we
could not decrease it after, even if we needed to change the vector engine.


Would it be possible to move the limit definition to a HNSW specific
implementation, where it would only bind HNSW?
I don't know this area of code well. It seems to me the FloatVectorValues
implementation is unfortunately not HNSW specific. Is this on purpose? We
should be able to replace the vector engine, no?

Le sam. 6 mai 2023 à 22:44, Michael Wechner <michael.wech...@wyona.com> a
écrit :

> there is already a pull request for Elasticsearch which is also
> mentioning the max size 1024
>
> https://github.com/openai/chatgpt-retrieval-plugin/pull/83
>
>
>
> Am 06.05.23 um 19:00 schrieb Michael Wechner:
> > Hi Together
> >
> > I recently setup ChatGPT retrieval plugin locally
> >
> > https://github.com/openai/chatgpt-retrieval-plugin
> >
> > I think it would be nice to consider to submit a Lucene implementation
> > for this plugin
> >
> > https://github.com/openai/chatgpt-retrieval-plugin#future-directions
> >
> > The plugin is using by default OpenAI's model "text-embedding-ada-002"
> > with 1536 dimensions
> >
> > https://openai.com/blog/new-and-improved-embedding-model
> >
> > but which means one won't be able to use it out-of-the-box with Lucene.
> >
> > Similar request here
> >
> >
> https://learn.microsoft.com/en-us/answers/questions/1192796/open-ai-text-embedding-dimensions
> >
> >
> > I understand we just recently had a lenghty discussion about
> > increasing the max dimension and whatever one thinks of OpenAI, fact
> > is, that it has a huge impact and I think it would be nice that Lucene
> > could be part of this "revolution". All we have to do is increase the
> > limit from 1024 to 1536 or even 2048 for example.
> >
> > Since the performace seems to be linear with the vector dimension and
> > several members have done performance tests successfully and 1024
> > seems to have been chosen as max dimension quite arbitrarily in the
> > first place, I think it should not be a problem to increase the max
> > dimension by a factor 1.5 or 2.
> >
> > WDYT?
> >
> > Thanks
> >
> > Michael
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Re: Conneting Lucene with ChatGPT Retrieval Plugin

Reply via email to