mccullocht commented on PR #15903:
URL: https://github.com/apache/lucene/pull/15903#issuecomment-4200627888

   I think that OSQ does a good job of quantizing the same distribution as 
TurboQuant and the rotation is the real secret sauce here.
   
   I tried naively rotating the entire cohere dataset used by luceneutil and 
running it through exhaustive recall tests. I also hacked it so that we could 
operate OSQ without centering so we can discuss data blind performance.
   | format | centered + unrotated (default) | centered + rotated | uncentered 
+ rotated | uncentered + unrotated |
   | - | - | - | - | - |
   | OSQ1 | 0.660770 | **0.692940** | 0.647200 | _0.624450_ |
   | OSQ2 | 0.784210 | **0.811090** | 0.762620 | _0.741080_ |
   | OSQ4 | 0.884820 | **0.938840** | 0.923630 | _0.880600_ |
   | OSQ8 | 0.975710 | **0.993360** | 0.992320 | _0.977370_ |
   | Naive BQ | - | - | **0.671290** | _0.647720_ |
   
   The results suggest that rotations and centering are two great tastes that 
taste great together. I can see how the data blind property is really desirable 
though and it's possible to make changes to OSQ to allow this mode of 
operation, it seems that rotation-only performs pretty well at high bit rates.
   
   I tried a similar exercise with voyage vectors and rotating showed no 
improvement but centering still helped. I'm going to follow up with someone 
about distribution/rotation.
   
   @xande @shbhar I suggest you try rotating your vectors first and test recall 
with OSQ. It should be easy enough to perform the rotation outside of Lucene, 
and if there's significant value we can figure out if or how we'd like to 
internalize this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to