kaivalnp commented on code in PR #15736:
URL: https://github.com/apache/lucene/pull/15736#discussion_r2834987364
##########
lucene/core/src/java25/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java:
##########
@@ -610,11 +610,10 @@ private static int int4DotProductSinglePackedBody(
prod8a.convertShape(ZERO_EXTEND_B2S, Int4Constants.SHORT_SPECIES,
0);
acc1 = acc1.add(prod16a);
}
- Vector<Integer> intAcc0 = acc0.convert(S2I, 0);
- Vector<Integer> intAcc1 = acc0.convert(S2I, 1);
- Vector<Integer> intAcc2 = acc1.convert(S2I, 0);
- Vector<Integer> intAcc3 = acc1.convert(S2I, 1);
- sum +=
intAcc0.add(intAcc1).add(intAcc2).add(intAcc3).reinterpretAsInts().reduceLanes(ADD);
+ ShortVector accShort = acc0.add(acc1);
+ Vector<Integer> intAcc0 = accShort.convert(ZERO_EXTEND_S2I, 0);
+ Vector<Integer> intAcc1 = accShort.convert(ZERO_EXTEND_S2I, 1);
+ sum += intAcc0.add(intAcc1).reinterpretAsInts().reduceLanes(ADD);
}
Review Comment:
I'm seeing slightly better performance if we avoid the `convert` call
entirely like:
```java
IntVector intAcc0 = acc0.reinterpretAsInts();
IntVector intAcc1 = acc1.reinterpretAsInts();
sum +=
intAcc0
.and(0x0000FFFF)
.add(intAcc0.lanewise(LSHR, 16))
.add(intAcc1.and(0x0000FFFF))
.add(intAcc1.lanewise(LSHR, 16))
.reduceLanes(ADD);
```
JMH says:
```
Benchmark (size)
Mode Cnt Score Error Units
VectorUtilBenchmark.binaryHalfByteDotProductSinglePackedVector 1024
thrpt 15 16.288 ± 0.023 ops/us
VectorUtilBenchmark.binaryHalfByteSquareSinglePackedVector 1024
thrpt 15 16.739 ± 0.036 ops/us
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]