ChrisHegarty commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752723993
@rmuir
Building on your idea, and focusing again on the x64 case, I get a bit of a
boost by just converting directly to int (rather than the short dance).
On my Rocket Lake, AVX 512, I get the following results:
```
Benchmark (size) Mode Cnt Score
Error Units
BinaryDotProductBenchmark.dotProductNew 1024 thrpt 5 20.675 ±
0.051 ops/us
BinaryDotProductBenchmark.dotProductNewNew 1024 thrpt 5 22.705 ±
0.015 ops/us
BinaryDotProductBenchmark.dotProductOld 1024 thrpt 5 3.174 ±
0.113 ops/us
```
From ...
```
@Benchmark
public int dotProductNewNew() {
..
if (vectorSize >= 256) {
// optimized 256/512 bit implementation, processes 8/16 bytes at a time
by converting from 8/16 bytes to 8/16 ints
int upperBound = PREFERRED_BYTE_SPECIES.loopBound(a.length);
IntVector acc = IntVector.zero(IntVector.SPECIES_PREFERRED);
for (; i < upperBound; i += PREFERRED_BYTE_SPECIES.length()) {
ByteVector va8 = ByteVector.fromArray(PREFERRED_BYTE_SPECIES, a, i);
ByteVector vb8 = ByteVector.fromArray(PREFERRED_BYTE_SPECIES, b, i);
Vector<Integer> va32 = va8.convertShape(VectorOperators.B2I,
IntVector.SPECIES_PREFERRED, 0);
Vector<Integer> vb32 = vb8.convertShape(VectorOperators.B2I,
IntVector.SPECIES_PREFERRED, 0);
Vector<Integer> prod32 = va32.mul(vb32);
acc = acc.add(prod32);
}
// reduce
res += acc.reduceLanes(VectorOperators.ADD);
} else { ..
```
Trying a hand unrolled version, unrolling 4x, I see no perf benefits - the
numbers remain the same. So I just left it out, for the sake of simplicity.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]