Pulkitg64 commented on PR #15549:
URL: https://github.com/apache/lucene/pull/15549#issuecomment-3758270550

   Hi @rmuir ,
   
   > I looked at your commented-out code here and it doesn't seem to use 
Float16Vector class but is instead doing a bunch of conversions and scalar 
operations
   
   Actually for the defaultUtilSupport (not using panama), I tried three 
different approaches (that's why other 2 are commented out), but the approach 1 
gave the best performance in my benchmarks
   
   * Approach 1 (Best performance): In this I am converting the short values to 
float32 values, then passing the values to fma function. 
   
   ```
   JMH:
   Benchmark                                  (size)   Mode  Cnt  Score   Error 
  Units
   VectorUtilBenchmark.shortDotProductScalar    1024  thrpt   15  0.753 ± 0.001 
 ops/us
   
   Code: 
   @Override
     public short dotProduct(short[] a, short[] b) {
       assert a.length == b.length : "Vector lengths must match";
   
       float sum = 0f;
       for (int i = 0; i < a.length; i++) {
         sum = Math.fma(
             Float.float16ToFloat(a[i]),
             Float.float16ToFloat(b[i]),
             sum
         );
       }
       return Float.floatToFloat16(sum);
     }
   ```
   
   * Approach 2: In this I am using Float16 class objects to assign values and 
use Float16.fma function for computation. In the implementation, the Float16 
object is converted to float32 array internally. Hence I think this is not 
performant enough.
   
   
   ```
   JMH:
   Benchmark                                  (size)   Mode  Cnt  Score    
Error   Units
   VectorUtilBenchmark.shortDotProductScalar    1024  thrpt   15  0.077 ±  
0.001  ops/us
   
   Code: 
   @Override
   public short dotProduct(short[] a, short[] b) {
       assert a.length == b.length : "Vector lengths must match";
      Float16 sum = Float16.valueOf(0);
       for (int i = 0; i < a.length; i++) {
       sum = Float16.fma(Float16.shortBitsToFloat16(a[i]), 
Float16.shortBitsToFloat16(b[i]), sum);
      }
       return sum.shortValue();
     }
   ```
   
   
   * Approach 3: This is extension to Approach 1 where I am trying to use loop 
unrolling, but I am not seeing any difference in performance.
   
   ```
   JMH:
   Benchmark                                  (size)   Mode  Cnt  Score   Error 
  Units
   VectorUtilBenchmark.shortDotProductScalar    1024  thrpt   15  0.748 ± 0.002 
 ops/us
   
   Code
   @Override
     public short dotProduct(short[] a, short[] b) {
       float res = 0f;
       int i = 0;
   
       // if the array is big, unroll it
       if (a.length > 32) {
         float acc1 = 0f;
         float acc2 = 0f;
         float acc3 = 0f;
         float acc4 = 0f;
         int upperBound = a.length & ~(4 - 1);
         for (; i < upperBound; i += 4) {
           acc1 = fma(Float.float16ToFloat(a[i]),     
Float.float16ToFloat(b[i]),     acc1);
           acc2 = fma(Float.float16ToFloat(a[i + 1]), Float.float16ToFloat(b[i 
+ 1]), acc2);
           acc3 = fma(Float.float16ToFloat(a[i + 2]), Float.float16ToFloat(b[i 
+ 2]), acc3);
           acc4 = fma(Float.float16ToFloat(a[i + 3]), Float.float16ToFloat(b[i 
+ 3]), acc4);
         }
         res += acc1 + acc2 + acc3 + acc4;
       }
   
       for (; i < a.length; i++) {
         res = fma(Float.float16ToFloat(a[i]), Float.float16ToFloat(b[i]), res);
       }
       return Float.floatToFloat16(res);
     }
   ```
   
   
   * Note:
   
   The Float16Vector is used in PanamaVectorUtilSupport class for which we are 
seeing very bad performance as explained in my above comment. (Sorry for the 
confusion, the PR size makes it difficult to navigate). But please let me know 
if you meant something else in your comment.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to