[ https://issues.apache.org/jira/browse/LUCENE-10471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17577849#comment-17577849 ]
Marcus Eagan commented on LUCENE-10471: --------------------------------------- [~michi] You are free to increase the dimension limit as it is a static variable and Lucene is your oyster. However, [~ehatcher] has Seared in my mind that this a long term fork ok Lucene is a bad idea for many reasons. #[~rcmuir] I agree with you on "whatever shitty models." They are here, and more are coming. With respect to the vector API, Oracle is doing an interesting bit of work in Open JDK 17 to improve their vector API. They've added support for Intel's short vector math library, which will improve. The folk at OpenJDK exploit the Panama APIs. There are several hardware accelerations they are yet to exploit, and many operations will fall back to scalar code. My argument is for increasing the limit of dimensions is not to suggest that there is a better fulcrum in the performance tradeoff balancer, but that more users testing Lucene is good for improving the feature. Open AI's Da Vinci is one such model but not the only I've had customers ask for 4096 based on the performance they observe with question an answering. I'm waiting on the model and will share when I know. If customers want to introduce rampant numerical errors in their systems, there is little we can do for them. Don't take my word on any of this yet. I need to bring data and complete evidence. I'm asking my customers why they cannot do dimensional reduction. > Increase the number of dims for KNN vectors to 2048 > --------------------------------------------------- > > Key: LUCENE-10471 > URL: https://issues.apache.org/jira/browse/LUCENE-10471 > Project: Lucene - Core > Issue Type: Wish > Reporter: Mayya Sharipova > Priority: Trivial > Time Spent: 40m > Remaining Estimate: 0h > > The current maximum allowed number of dimensions is equal to 1024. But we see > in practice a couple well-known models that produce vectors with > 1024 > dimensions (e.g > [mobilenet_v2|https://tfhub.dev/google/imagenet/mobilenet_v2_035_224/feature_vector/1] > uses 1280d vectors, OpenAI / GPT-3 Babbage uses 2048d vectors). Increasing > max dims to `2048` will satisfy these use cases. > I am wondering if anybody has strong objections against this. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org