Re: Raising the Value of MAX_DIMENSIONS of Vector Values

Michael Wechner Mon, 08 Aug 2022 21:05:48 -0700

I agree that Lucene should support vector sizes depending on the modelone is choosing.


For example Weaviate seems to do this


https://weaviate.slack.com/archives/C017EG2SL3H/p1659981294040479

Thanks

Michael


Am 07.08.22 um 22:48 schrieb Marcus Eagan:

Hi Lucene Team,
In general, I have advised very strongly against our team at MongoDBmodifying the Lucene source, except in scenarios where we have strongneeds for a particular customization. Ultimately, people can do whatthey would like to do.
That being said, we have a number of customers preparing to use Lucenefor dense vector search. There are many language models that areoptimized for > 1024 dimensions. I remember Michael Wechner's email<https://www.mail-archive.com/dev@lucene.apache.org/msg314281.html>about one instance with Open API.
    I just tried to test the OpenAI model
    "text-similarity-davinci-001" with 12288 dimension
It seems that customers who attempt to use these models should not beturned away. It could be sufficient to explain the issues. The onlyones I have identified are two expected ones in very slow indexingthroughput, high CPU usage, and a maybe less defined risk of morenumerical errors.
I opened an issue <https://github.com/apache/lucene/issues/1060> andPR <https://github.com/apache/lucene/pull/1061> for the discussion aswell. I would appreciate guidance on where we think the warning shouldgo. I feel like burying in a Javadoc is a less than ideal experience.It would be better to be a warning on startup. In the PR, I increasedthe max limit by a factor of twenty. We should let users use thesystem based on their needs even if it was designed or optimized forthe models they bring because we need the feedback and the data fromthe world.
Is there something I'm overlooking from a risk standpoint?

Best,
--
Marcus Eagan

Re: Raising the Value of MAX_DIMENSIONS of Vector Values

Reply via email to