Re: Conneting Lucene with ChatGPT Retrieval Plugin
there is already a pull request for Elasticsearch which is also mentioning the max size 1024 https://github.com/openai/chatgpt-retrieval-plugin/pull/83 Am 06.05.23 um 19:00 schrieb Michael Wechner: Hi Together I recently setup ChatGPT retrieval plugin locally https://github.com/openai/chatgpt-retrieval-plugin I think it would be nice to consider to submit a Lucene implementation for this plugin https://github.com/openai/chatgpt-retrieval-plugin#future-directions The plugin is using by default OpenAI's model "text-embedding-ada-002" with 1536 dimensions https://openai.com/blog/new-and-improved-embedding-model but which means one won't be able to use it out-of-the-box with Lucene. Similar request here https://learn.microsoft.com/en-us/answers/questions/1192796/open-ai-text-embedding-dimensions I understand we just recently had a lenghty discussion about increasing the max dimension and whatever one thinks of OpenAI, fact is, that it has a huge impact and I think it would be nice that Lucene could be part of this "revolution". All we have to do is increase the limit from 1024 to 1536 or even 2048 for example. Since the performace seems to be linear with the vector dimension and several members have done performance tests successfully and 1024 seems to have been chosen as max dimension quite arbitrarily in the first place, I think it should not be a problem to increase the max dimension by a factor 1.5 or 2. WDYT? Thanks Michael - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Conneting Lucene with ChatGPT Retrieval Plugin
Hello Michael, I agree. I think it makes sense to support OpenAI embeddings. Best, Christian On Sat, May 6, 2023 at 7:03 PM Michael Wechner wrote: > Hi Together > > I recently setup ChatGPT retrieval plugin locally > > https://github.com/openai/chatgpt-retrieval-plugin > > I think it would be nice to consider to submit a Lucene implementation > for this plugin > > https://github.com/openai/chatgpt-retrieval-plugin#future-directions > > The plugin is using by default OpenAI's model "text-embedding-ada-002" > with 1536 dimensions > > https://openai.com/blog/new-and-improved-embedding-model > > but which means one won't be able to use it out-of-the-box with Lucene. > > Similar request here > > > https://learn.microsoft.com/en-us/answers/questions/1192796/open-ai-text-embedding-dimensions > > I understand we just recently had a lenghty discussion about increasing > the max dimension and whatever one thinks of OpenAI, fact is, that it > has a huge impact and I think it would be nice that Lucene could be part > of this "revolution". All we have to do is increase the limit from 1024 > to 1536 or even 2048 for example. > > Since the performace seems to be linear with the vector dimension and > several members have done performance tests successfully and 1024 seems > to have been chosen as max dimension quite arbitrarily in the first > place, I think it should not be a problem to increase the max dimension > by a factor 1.5 or 2. > > WDYT? > > Thanks > > Michael > > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
Conneting Lucene with ChatGPT Retrieval Plugin
Hi Together I recently setup ChatGPT retrieval plugin locally https://github.com/openai/chatgpt-retrieval-plugin I think it would be nice to consider to submit a Lucene implementation for this plugin https://github.com/openai/chatgpt-retrieval-plugin#future-directions The plugin is using by default OpenAI's model "text-embedding-ada-002" with 1536 dimensions https://openai.com/blog/new-and-improved-embedding-model but which means one won't be able to use it out-of-the-box with Lucene. Similar request here https://learn.microsoft.com/en-us/answers/questions/1192796/open-ai-text-embedding-dimensions I understand we just recently had a lenghty discussion about increasing the max dimension and whatever one thinks of OpenAI, fact is, that it has a huge impact and I think it would be nice that Lucene could be part of this "revolution". All we have to do is increase the limit from 1024 to 1536 or even 2048 for example. Since the performace seems to be linear with the vector dimension and several members have done performance tests successfully and 1024 seems to have been chosen as max dimension quite arbitrarily in the first place, I think it should not be a problem to increase the max dimension by a factor 1.5 or 2. WDYT? Thanks Michael - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Seeking Tools and Methods to Measure Lucene's Indexing Performance
thanks for the pointer! I have added it to the Lucene FAQ https://cwiki.apache.org/confluence/display/LUCENE/LuceneFAQ#LuceneFAQ-HowisLucene'sindexingandsearchperformancemeasured? Thanks Michael Am 06.05.23 um 06:18 schrieb Ishan Chattopadhyaya: Check Lucene bench: https://home.apache.org/~mikemccand/lucenebench/ On Sat, 6 May, 2023, 9:30 am donghai tang, wrote: Hello Lucene Community, I am in the process of learning about Lucene's indexing capabilities, and I'm keen on conducting experiments to evaluate its performance. However, I haven't come across any official tools specifically designed for measuring Lucene's indexing performance. I would be extremely grateful if any of you could share your experiences with tools you've used in the past or suggest alternative methods for evaluating Lucene's indexing performance.