Re: Conneting Lucene with ChatGPT Retrieval Plugin

2023-05-06 Thread Michael Wechner
there is already a pull request for Elasticsearch which is also 
mentioning the max size 1024


https://github.com/openai/chatgpt-retrieval-plugin/pull/83



Am 06.05.23 um 19:00 schrieb Michael Wechner:

Hi Together

I recently setup ChatGPT retrieval plugin locally

https://github.com/openai/chatgpt-retrieval-plugin

I think it would be nice to consider to submit a Lucene implementation 
for this plugin


https://github.com/openai/chatgpt-retrieval-plugin#future-directions

The plugin is using by default OpenAI's model "text-embedding-ada-002" 
with 1536 dimensions


https://openai.com/blog/new-and-improved-embedding-model

but which means one won't be able to use it out-of-the-box with Lucene.

Similar request here

https://learn.microsoft.com/en-us/answers/questions/1192796/open-ai-text-embedding-dimensions 



I understand we just recently had a lenghty discussion about 
increasing the max dimension and whatever one thinks of OpenAI, fact 
is, that it has a huge impact and I think it would be nice that Lucene 
could be part of this "revolution". All we have to do is increase the 
limit from 1024 to 1536 or even 2048 for example.


Since the performace seems to be linear with the vector dimension and 
several members have done performance tests successfully and 1024 
seems to have been chosen as max dimension quite arbitrarily in the 
first place, I think it should not be a problem to increase the max 
dimension by a factor 1.5 or 2.


WDYT?

Thanks

Michael



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Conneting Lucene with ChatGPT Retrieval Plugin

2023-05-06 Thread Christian Moen
Hello Michael,

I agree.  I think it makes sense to support OpenAI embeddings.

Best,
Christian


On Sat, May 6, 2023 at 7:03 PM Michael Wechner 
wrote:

> Hi Together
>
> I recently setup ChatGPT retrieval plugin locally
>
> https://github.com/openai/chatgpt-retrieval-plugin
>
> I think it would be nice to consider to submit a Lucene implementation
> for this plugin
>
> https://github.com/openai/chatgpt-retrieval-plugin#future-directions
>
> The plugin is using by default OpenAI's model "text-embedding-ada-002"
> with 1536 dimensions
>
> https://openai.com/blog/new-and-improved-embedding-model
>
> but which means one won't be able to use it out-of-the-box with Lucene.
>
> Similar request here
>
>
> https://learn.microsoft.com/en-us/answers/questions/1192796/open-ai-text-embedding-dimensions
>
> I understand we just recently had a lenghty discussion about increasing
> the max dimension and whatever one thinks of OpenAI, fact is, that it
> has a huge impact and I think it would be nice that Lucene could be part
> of this "revolution". All we have to do is increase the limit from 1024
> to 1536 or even 2048 for example.
>
> Since the performace seems to be linear with the vector dimension and
> several members have done performance tests successfully and 1024 seems
> to have been chosen as max dimension quite arbitrarily in the first
> place, I think it should not be a problem to increase the max dimension
> by a factor 1.5 or 2.
>
> WDYT?
>
> Thanks
>
> Michael
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Conneting Lucene with ChatGPT Retrieval Plugin

2023-05-06 Thread Michael Wechner

Hi Together

I recently setup ChatGPT retrieval plugin locally

https://github.com/openai/chatgpt-retrieval-plugin

I think it would be nice to consider to submit a Lucene implementation 
for this plugin


https://github.com/openai/chatgpt-retrieval-plugin#future-directions

The plugin is using by default OpenAI's model "text-embedding-ada-002" 
with 1536 dimensions


https://openai.com/blog/new-and-improved-embedding-model

but which means one won't be able to use it out-of-the-box with Lucene.

Similar request here

https://learn.microsoft.com/en-us/answers/questions/1192796/open-ai-text-embedding-dimensions

I understand we just recently had a lenghty discussion about increasing 
the max dimension and whatever one thinks of OpenAI, fact is, that it 
has a huge impact and I think it would be nice that Lucene could be part 
of this "revolution". All we have to do is increase the limit from 1024 
to 1536 or even 2048 for example.


Since the performace seems to be linear with the vector dimension and 
several members have done performance tests successfully and 1024 seems 
to have been chosen as max dimension quite arbitrarily in the first 
place, I think it should not be a problem to increase the max dimension 
by a factor 1.5 or 2.


WDYT?

Thanks

Michael



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Seeking Tools and Methods to Measure Lucene's Indexing Performance

2023-05-06 Thread Michael Wechner

thanks for the pointer!

I have added it to the Lucene FAQ

https://cwiki.apache.org/confluence/display/LUCENE/LuceneFAQ#LuceneFAQ-HowisLucene'sindexingandsearchperformancemeasured?

Thanks

Michael



Am 06.05.23 um 06:18 schrieb Ishan Chattopadhyaya:

Check Lucene bench: https://home.apache.org/~mikemccand/lucenebench/

On Sat, 6 May, 2023, 9:30 am donghai tang,  wrote:

Hello Lucene Community,

I am in the process of learning about Lucene's indexing
capabilities, and I'm keen on conducting experiments to evaluate
its performance. However, I haven't come across any official tools
specifically designed for measuring Lucene's indexing performance.

I would be extremely grateful if any of you could share your
experiences with tools you've used in the past or suggest
alternative methods for evaluating Lucene's indexing performance.