Re: Experience re OpenAI embeddings in combination with Lucene vector search

Julie Tibshirani Mon, 14 Feb 2022 12:53:47 -0800

Hello Michael, the max number of dimensions is currently hardcoded and
can't be changed. I could see an argument for increasing the default a bit
and would be happy to discuss if you'd like to file a JIRA issue.
However 12288 dimensions still seems high to me, this is much larger than
most well-established embedding models and could require a lot of memory.


Julie

On Mon, Feb 14, 2022 at 12:08 PM Michael Wechner <michael.wech...@wyona.com>
wrote:

> Hi Julie
>
> Thanks very much for this link, which is very interesting!
>
> Btw, do you have an idea how to increase the default max size of 1024?
>
> https://lists.apache.org/thread/hyb6w5c4x5rjt34k3w7zqn3yp5wvf33o
>
> Thanks
>
> Michael
>
>
>
> Am 14.02.22 um 17:45 schrieb Julie Tibshirani:
>
> Hello Michael, I don't have personal experience with these models, but I
> found this article insightful:
> https://medium.com/@nils_reimers/openai-gpt-3-text-embeddings-really-a-new-state-of-the-art-in-dense-text-embeddings-6571fe3ec9d9.
> It evaluates the OpenAI models against a variety of existing models on
> tasks like sentence similarity and text retrieval. Although the other
> models are cheaper and have fewer dimensions, the OpenAI ones perform
> similarly or worse. This got me thinking that they might not be a good
> cost/ effectiveness trade-off, especially the larger ones with 4096
> or 12288 dimensions.
>
> Julie
>
> On Sun, Feb 13, 2022 at 1:55 AM Michael Wechner <michael.wech...@wyona.com>
> wrote:
>
>> Re the OpenAI embedding the following recent paper might be of interest
>>
>> https://arxiv.org/pdf/2201.10005.pdf
>>
>> (Text and Code Embeddings by Contrastive Pre-Training, Jan 24, 2022)
>>
>> Thanks
>>
>> Michael
>>
>> Am 13.02.22 um 00:14 schrieb Michael Wechner:
>>
>> Here a concrete example where I combine OpenAI model
>> "text-similarity-ada-001" with Lucene vector search
>>
>> INPUT sentence: "What is your age this year?"
>>
>> Result sentences
>>
>> 1) How old are you this year?
>>    score '0.98860765'
>>
>> 2) What was your age last year?
>>    score '0.97811764'
>>
>> 3) What is your age?
>>    score '0.97094905'
>>
>> 4) How old are you?
>>    score '0.9600177'
>>
>>
>> Result 1 is great and result 2 looks similar, but is not correct from an
>> "understanding" point of view and results 3 and 4 are good again.
>>
>> I understand "similarity" is not the same as "understanding", but I hope
>> it makes it clearer what I am looking for :-)
>>
>> Thanks
>>
>> Michael
>>
>>
>>
>> Am 12.02.22 um 22:38 schrieb Michael Wechner:
>>
>> Hi Alessandro
>>
>> I am mainly interested in detecting similarity, for example whether the
>> following two sentences are similar resp. likely to mean the same thing
>>
>> "How old are you?"
>> "What is your age?"
>>
>> and that the following two sentences are not similar, resp. do not mean
>> the same thing
>>
>> "How old are you this year?"
>> "How old have you been last year?"
>>
>> But also performance or how OpenAI embeddings compare for example with
>> SBERT (https://sbert.net/docs/usage/semantic_textual_similarity.html)
>>
>> Thanks
>>
>> Michael
>>
>>
>>
>> Am 12.02.22 um 20:41 schrieb Alessandro Benedetti:
>>
>> Hi Michael, experience to what extent?
>> We have been exploring the area for a while given we contributed the
>> first neural search milestone to Apache Solr.
>> What is your curiosity? Performance? Relevance impact? How to integrate
>> it?
>> Regards
>>
>> On Fri, 11 Feb 2022, 22:38 Michael Wechner, <michael.wech...@wyona.com>
>> wrote:
>>
>>> Hi
>>>
>>> Does anyone have experience using OpenAI embeddings in combination with
>>> Lucene vector search?
>>>
>>> https://beta.openai.com/docs/guides/embeddings
>>>
>>> for example comparing performance re vector size
>>>
>>> https://api.openai.com/v1/engines/text-similarity-ada-001/embeddings
>>>
>>> and
>>>
>>> https://api.openai.com/v1/engines/text-similarity-davinci-001/embeddings
>>>
>>> ?
>>>
>>>
>>> Thanks
>>>
>>> Michael
>>>
>>
>>
>>
>>
>

Re: Experience re OpenAI embeddings in combination with Lucene vector search

Reply via email to