[ 
https://issues.apache.org/jira/browse/LUCENE-10471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17577849#comment-17577849
 ] 

Marcus Eagan commented on LUCENE-10471:
---------------------------------------

[~michi] You are free to increase the dimension limit as it is a static 
variable and Lucene is your oyster. However, [~ehatcher] has Seared in my mind 
that this a long term fork ok Lucene is a bad idea for many reasons.

#[~rcmuir] I agree with you on "whatever shitty models." They are here, and 
more are coming. With respect to the vector API, Oracle is doing an interesting 
bit of work in Open JDK 17 to improve their vector API. They've added support 
for Intel's short vector math library, which will improve. The folk at OpenJDK 
exploit the Panama APIs. There are several hardware accelerations they are yet 
to exploit, and many operations will fall back to scalar code. 

My argument is for increasing the limit of dimensions is not to suggest that 
there is a better fulcrum in the performance tradeoff balancer, but that more 
users testing Lucene is good for improving the feature.

Open AI's Da Vinci is one such model but not the only

I've had customers ask for 4096 based on the performance they observe with 
question an answering. I'm waiting on the model and will  share when I know. If 
customers want to introduce rampant numerical errors in their systems, there is 
little we can do for them. Don't take my word on any of this yet. I need to 
bring data and complete evidence. I'm asking my customers why they cannot do 
dimensional reduction.

> Increase the number of dims for KNN vectors to 2048
> ---------------------------------------------------
>
>                 Key: LUCENE-10471
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10471
>             Project: Lucene - Core
>          Issue Type: Wish
>            Reporter: Mayya Sharipova
>            Priority: Trivial
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> The current maximum allowed number of dimensions is equal to 1024. But we see 
> in practice a couple well-known models that produce vectors with > 1024 
> dimensions (e.g 
> [mobilenet_v2|https://tfhub.dev/google/imagenet/mobilenet_v2_035_224/feature_vector/1]
>  uses 1280d vectors, OpenAI / GPT-3 Babbage uses 2048d vectors). Increasing 
> max dims to `2048` will satisfy these use cases.
> I am wondering if anybody has strong objections against this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to