shubhamvishu commented on code in PR #16092:
URL: https://github.com/apache/lucene/pull/16092#discussion_r3358308203


##########
lucene/core/src/java/org/apache/lucene/search/KnnFloatVectorQuery.java:
##########
@@ -95,6 +98,26 @@ public KnnFloatVectorQuery(
       String field, float[] target, int k, Query filter, KnnSearchStrategy 
searchStrategy) {
     super(field, k, filter, searchStrategy);
     this.target = VectorUtil.checkFinite(Objects.requireNonNull(target, 
"target"));
+    this.targetPreRotated = false;
+  }
+
+  private boolean targetPreRotated;
+
+  @Override
+  public Query rewrite(IndexSearcher indexSearcher) throws IOException {
+    if (targetPreRotated == false) {
+      FieldInfo fi =
+          
FieldInfos.getMergedFieldInfos(indexSearcher.getIndexReader()).fieldInfo(field);
+      if (fi != null
+          && fi.getVectorDimension() == target.length
+          && 
"true".equals(fi.getAttribute(RotationAwareKnnVectorsFormat.ROTATION_ENABLED_KEY)))
 {
+        float[] rotated = new float[target.length];
+        HadamardRotation.forDimension(fi.getVectorDimension()).rotate(target, 
rotated);

Review Comment:
   @kaivalnp @benwtrent I agree with the concerns here and the challenge of 
such complicated API. I'm thinking if for starter we should keep the rotation 
per segment and keep this change restricted to rotation like a recall vs 
latency tradeoff. We could follow with more ideas on how we could clawback the 
latency. One idea would be to try and see if SIMD could help if some parts of 
rotation logic or we could think of the sort of API as mentioned which could do 
this. But doing this in this PR I feel would make this more complicated(or 
errorprone since we don't have a clear vision for such an API yet). Another 
good thing of doing it per segment would be that way the whole rotation logic 
would be retained in the KNN vector format and not spilling into Query 
implementations etc so we won't be unintentionally breaking any current use 
cases but providing user an way to improve recall at the cost of some latency 
overhead for starters and later optimize (keeping this change focussed on capabi
 lity addition).
   
   I also ran JMH benchmarks(below) on Graviton 3 to see how many dot 
products(byte vectors) a single rotation operation is equivalent to in some 
scenarios (it would change with machine vs machine or different quantization 
etc scenarios). When compared to byte vectors dot product (using Panama based 
vectorization), a single rotation operation is roughly equivalent to ~50-60 dot 
products max and when using optimized native kernels it is roughly equivalent 
to ~200-350 dot products max. This extra rotation overhead might be (or maybe 
not) within the range we could swallow but looking for suggestions if this 
overhead seems larger than the potential overhead we initially thought or less 
than that? (at least for some specific use cases [dim, segment size] the 
overhead might be less though we could easily think of a situation where the 
overhead could be large like with 1000 visited nodes(dot products) visited per 
segment 300 could be <= 20-30% overhead roughly but with large segments it bec
 omes less dominant and even with Panama when compared to other faster ways).
   
   ```
   Graviton 3
   
   Benchmark                                   (size)   Mode  Cnt    Score    
Error   Units
   HadamardRotationBenchmark.inverseRotate        128  thrpt   15    1.493 ±  
0.024  ops/us
   HadamardRotationBenchmark.inverseRotate        256  thrpt   15    0.708 ±  
0.013  ops/us
   HadamardRotationBenchmark.inverseRotate        702  thrpt   15    0.257 ±  
0.002  ops/us
   HadamardRotationBenchmark.inverseRotate       1024  thrpt   15    0.160 ±  
0.001  ops/us
   HadamardRotationBenchmark.inverseRotate       4096  thrpt   15    0.035 ±  
0.001  ops/us
   HadamardRotationBenchmark.rotate               128  thrpt   15    1.470 ±  
0.004  ops/us
   HadamardRotationBenchmark.rotate               256  thrpt   15    0.692 ±  
0.010  ops/us
   HadamardRotationBenchmark.rotate               702  thrpt   15    0.246 ±  
0.001  ops/us
   HadamardRotationBenchmark.rotate              1024  thrpt   15    0.155 ±  
0.001  ops/us
   HadamardRotationBenchmark.rotate              4096  thrpt   15    0.034 ±  
0.001  ops/us
   VectorUtilBenchmark.binaryDotProductScalar     128  thrpt   15   17.796 ±  
0.139  ops/us
   VectorUtilBenchmark.binaryDotProductScalar     256  thrpt   15    9.014 ±  
0.032  ops/us
   VectorUtilBenchmark.binaryDotProductScalar     702  thrpt   15    3.362 ±  
0.017  ops/us
   VectorUtilBenchmark.binaryDotProductScalar    1024  thrpt   15    2.285 ±  
0.021  ops/us
   VectorUtilBenchmark.binaryDotProductScalar    4096  thrpt   15    0.566 ±  
0.004  ops/us
   VectorUtilBenchmark.binaryDotProductVector     128  thrpt   15   60.224 ±  
0.010  ops/us
   VectorUtilBenchmark.binaryDotProductVector     256  thrpt   15   31.283 ±  
0.004  ops/us
   VectorUtilBenchmark.binaryDotProductVector     702  thrpt   15   11.609 ±  
0.005  ops/us
   VectorUtilBenchmark.binaryDotProductVector    1024  thrpt   15    8.039 ±  
0.001  ops/us
   VectorUtilBenchmark.binaryDotProductVector    4096  thrpt   15    2.024 ±  
0.001  ops/us
   VectorUtilBenchmark.dot8sNative                128  thrpt   15  135.797 ±  
2.069  ops/us
   VectorUtilBenchmark.dot8sNative                256  thrpt   15   90.647 ±  
3.913  ops/us
   VectorUtilBenchmark.dot8sNative                702  thrpt   15   49.021 ±  
3.444  ops/us
   VectorUtilBenchmark.dot8sNative               1024  thrpt   15   42.404 ±  
0.713  ops/us
   VectorUtilBenchmark.dot8sNative               4096  thrpt   15   12.222 ±  
0.227  ops/us
   ```
   
   | Dim | rotate (ops/μs) | binaryDotProductScalar (ops/μs) | 
binaryDotProductVector (ops/μs) | dot8sNative (ops/μs) |
     
|-----|-----------------|-------------------------------|-------------------------------|---------------------|
     | 128 | 1.47 | 17.8 (**12x**) | 60.2 (**41x**) | 135.8 (**92x**) |
     | 256 | 0.69 | 9.0 (**13x**) | 31.3 (**45x**) | 90.6 (**131x**) |
     | 702 | 0.25 | 3.4 (**14x**) | 11.6 (**47x**) | 49.0 (**199x**) |
     | 1024 | 0.155 | 2.3 (**15x**) | 8.0 (**52x**) | 42.4 (**274x**) |
     | 4096 | 0.034 | 0.57 (**17x**) | 2.0 (**60x**) | 12.2 (**359x**) |



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to