It looks like the framework is designed to support self-hosted plugins. On Tue, May 9, 2023 at 12:13 PM jim ferenczi <jim.feren...@gmail.com> wrote:
> Lucene is a library. I don’t see how it would be exposed in this plugin > which is about services. > > > On Tue, 9 May 2023 at 18:00, Jun Luo <luo.jun...@gmail.com> wrote: > >> The pr mentioned a Elasticsearch pr >> <https://github.com/elastic/elasticsearch/pull/95257> that increased the >> dim to 2048 in ElasticSearch. >> >> Curious how you use Lucene's KNN search. Lucene's KNN supports one vector >> per document. Usually multiple/many vectors are needed for a document >> content. We will have to split the document content into chunks and create >> one Lucene document per document chunk. >> >> ChatGPT plugin directly stores the chunk text in the underline vector db. >> If there are lots of documents, will it be a concern to store the full >> document content in Lucene? In the traditional inverted index use case, is >> it common to store the full document content in Lucene? >> >> Another question: if you use Lucene as a vector db, do you still need the >> inverted index? Wondering what would be the use case to use inverted index >> together with vector index. If we don't need the inverted index, will it be >> better to use other vector dbs? For example, PostgreSQL also added vector >> support recently. >> >> Thanks, >> Jun >> >> On Sat, May 6, 2023 at 1:44 PM Michael Wechner <michael.wech...@wyona.com> >> wrote: >> >>> there is already a pull request for Elasticsearch which is also >>> mentioning the max size 1024 >>> >>> https://github.com/openai/chatgpt-retrieval-plugin/pull/83 >>> >>> >>> >>> Am 06.05.23 um 19:00 schrieb Michael Wechner: >>> > Hi Together >>> > >>> > I recently setup ChatGPT retrieval plugin locally >>> > >>> > https://github.com/openai/chatgpt-retrieval-plugin >>> > >>> > I think it would be nice to consider to submit a Lucene implementation >>> > for this plugin >>> > >>> > https://github.com/openai/chatgpt-retrieval-plugin#future-directions >>> > >>> > The plugin is using by default OpenAI's model "text-embedding-ada-002" >>> > with 1536 dimensions >>> > >>> > https://openai.com/blog/new-and-improved-embedding-model >>> > >>> > but which means one won't be able to use it out-of-the-box with Lucene. >>> > >>> > Similar request here >>> > >>> > >>> https://learn.microsoft.com/en-us/answers/questions/1192796/open-ai-text-embedding-dimensions >>> > >>> > >>> > I understand we just recently had a lenghty discussion about >>> > increasing the max dimension and whatever one thinks of OpenAI, fact >>> > is, that it has a huge impact and I think it would be nice that Lucene >>> > could be part of this "revolution". All we have to do is increase the >>> > limit from 1024 to 1536 or even 2048 for example. >>> > >>> > Since the performace seems to be linear with the vector dimension and >>> > several members have done performance tests successfully and 1024 >>> > seems to have been chosen as max dimension quite arbitrarily in the >>> > first place, I think it should not be a problem to increase the max >>> > dimension by a factor 1.5 or 2. >>> > >>> > WDYT? >>> > >>> > Thanks >>> > >>> > Michael >>> > >>> > >>> > >>> > --------------------------------------------------------------------- >>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> > For additional commands, e-mail: dev-h...@lucene.apache.org >>> > >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>> >>> -- Jonathan Ellis co-founder, http://www.datastax.com @spyced