Hello Michael, the max number of dimensions is currently hardcoded and can't be changed. I could see an argument for increasing the default a bit and would be happy to discuss if you'd like to file a JIRA issue. However 12288 dimensions still seems high to me, this is much larger than most well-established embedding models and could require a lot of memory.
Julie On Mon, Feb 14, 2022 at 12:08 PM Michael Wechner <michael.wech...@wyona.com> wrote: > Hi Julie > > Thanks very much for this link, which is very interesting! > > Btw, do you have an idea how to increase the default max size of 1024? > > https://lists.apache.org/thread/hyb6w5c4x5rjt34k3w7zqn3yp5wvf33o > > Thanks > > Michael > > > > Am 14.02.22 um 17:45 schrieb Julie Tibshirani: > > Hello Michael, I don't have personal experience with these models, but I > found this article insightful: > https://medium.com/@nils_reimers/openai-gpt-3-text-embeddings-really-a-new-state-of-the-art-in-dense-text-embeddings-6571fe3ec9d9. > It evaluates the OpenAI models against a variety of existing models on > tasks like sentence similarity and text retrieval. Although the other > models are cheaper and have fewer dimensions, the OpenAI ones perform > similarly or worse. This got me thinking that they might not be a good > cost/ effectiveness trade-off, especially the larger ones with 4096 > or 12288 dimensions. > > Julie > > On Sun, Feb 13, 2022 at 1:55 AM Michael Wechner <michael.wech...@wyona.com> > wrote: > >> Re the OpenAI embedding the following recent paper might be of interest >> >> https://arxiv.org/pdf/2201.10005.pdf >> >> (Text and Code Embeddings by Contrastive Pre-Training, Jan 24, 2022) >> >> Thanks >> >> Michael >> >> Am 13.02.22 um 00:14 schrieb Michael Wechner: >> >> Here a concrete example where I combine OpenAI model >> "text-similarity-ada-001" with Lucene vector search >> >> INPUT sentence: "What is your age this year?" >> >> Result sentences >> >> 1) How old are you this year? >> score '0.98860765' >> >> 2) What was your age last year? >> score '0.97811764' >> >> 3) What is your age? >> score '0.97094905' >> >> 4) How old are you? >> score '0.9600177' >> >> >> Result 1 is great and result 2 looks similar, but is not correct from an >> "understanding" point of view and results 3 and 4 are good again. >> >> I understand "similarity" is not the same as "understanding", but I hope >> it makes it clearer what I am looking for :-) >> >> Thanks >> >> Michael >> >> >> >> Am 12.02.22 um 22:38 schrieb Michael Wechner: >> >> Hi Alessandro >> >> I am mainly interested in detecting similarity, for example whether the >> following two sentences are similar resp. likely to mean the same thing >> >> "How old are you?" >> "What is your age?" >> >> and that the following two sentences are not similar, resp. do not mean >> the same thing >> >> "How old are you this year?" >> "How old have you been last year?" >> >> But also performance or how OpenAI embeddings compare for example with >> SBERT (https://sbert.net/docs/usage/semantic_textual_similarity.html) >> >> Thanks >> >> Michael >> >> >> >> Am 12.02.22 um 20:41 schrieb Alessandro Benedetti: >> >> Hi Michael, experience to what extent? >> We have been exploring the area for a while given we contributed the >> first neural search milestone to Apache Solr. >> What is your curiosity? Performance? Relevance impact? How to integrate >> it? >> Regards >> >> On Fri, 11 Feb 2022, 22:38 Michael Wechner, <michael.wech...@wyona.com> >> wrote: >> >>> Hi >>> >>> Does anyone have experience using OpenAI embeddings in combination with >>> Lucene vector search? >>> >>> https://beta.openai.com/docs/guides/embeddings >>> >>> for example comparing performance re vector size >>> >>> https://api.openai.com/v1/engines/text-similarity-ada-001/embeddings >>> >>> and >>> >>> https://api.openai.com/v1/engines/text-similarity-davinci-001/embeddings >>> >>> ? >>> >>> >>> Thanks >>> >>> Michael >>> >> >> >> >> >