there is already a pull request for Elasticsearch which is also
mentioning the max size 1024
https://github.com/openai/chatgpt-retrieval-plugin/pull/83
Am 06.05.23 um 19:00 schrieb Michael Wechner:
Hi Together
I recently setup ChatGPT retrieval plugin locally
https://github.com/openai/chatgpt-retrieval-plugin
I think it would be nice to consider to submit a Lucene implementation
for this plugin
https://github.com/openai/chatgpt-retrieval-plugin#future-directions
The plugin is using by default OpenAI's model "text-embedding-ada-002"
with 1536 dimensions
https://openai.com/blog/new-and-improved-embedding-model
but which means one won't be able to use it out-of-the-box with Lucene.
Similar request here
https://learn.microsoft.com/en-us/answers/questions/1192796/open-ai-text-embedding-dimensions
I understand we just recently had a lenghty discussion about
increasing the max dimension and whatever one thinks of OpenAI, fact
is, that it has a huge impact and I think it would be nice that Lucene
could be part of this "revolution". All we have to do is increase the
limit from 1024 to 1536 or even 2048 for example.
Since the performace seems to be linear with the vector dimension and
several members have done performance tests successfully and 1024
seems to have been chosen as max dimension quite arbitrarily in the
first place, I think it should not be a problem to increase the max
dimension by a factor 1.5 or 2.
WDYT?
Thanks
Michael
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org