uschindler commented on PR #13076:
URL: https://github.com/apache/lucene/pull/13076#issuecomment-1927626207
Hi,
I modified the scalar variant like that:
```java
@Override
public int binaryHammingDistance(byte[] a, byte[] b) {
int distance = 0, i = 0;
for (; i < a.length + 1 - Long.BYTES; i += Long.BYTES) {
distance += Long.bitCount(((long) BitUtil.VH_NATIVE_LONG.get(a, i) ^
(long) BitUtil.VH_NATIVE_LONG.get(b, i)) & 0xFFFFFFFFFFFFFFFFL);
}
// int tail
for (; i < a.length + 1 - Integer.BYTES; i += Integer.BYTES) {
distance += Integer.bitCount(((int) BitUtil.VH_NATIVE_INT.get(a, i) ^
(int) BitUtil.VH_NATIVE_INT.get(b, i)) & 0xFFFFFFFF);
}
// byte tail
for (; i < a.length; i++) {
distance += Integer.bitCount((a[i] ^ b[i]) & 0xFF);
}
return distance;
}
```
This one only uses popcnt CPU instruction. I then ran your benchmakrk to
compare the panama-vectorized one with my new implementation as above:
```
Benchmark (size) Mode Cnt Score
Error Units
VectorUtilBenchmark.binaryHammingDistanceScalar 1 thrpt 15 258,511
± 17,969 ops/us
VectorUtilBenchmark.binaryHammingDistanceScalar 128 thrpt 15 62,364
± 0,723 ops/us
VectorUtilBenchmark.binaryHammingDistanceScalar 207 thrpt 15 40,302
± 0,703 ops/us
VectorUtilBenchmark.binaryHammingDistanceScalar 256 thrpt 15 42,025
± 0,891 ops/us
VectorUtilBenchmark.binaryHammingDistanceScalar 300 thrpt 15 35,065
± 3,125 ops/us
VectorUtilBenchmark.binaryHammingDistanceScalar 512 thrpt 15 24,391
± 1,987 ops/us
VectorUtilBenchmark.binaryHammingDistanceScalar 702 thrpt 15 17,505
± 0,684 ops/us
VectorUtilBenchmark.binaryHammingDistanceScalar 1024 thrpt 15 13,806
± 0,102 ops/us
VectorUtilBenchmark.binaryHammingDistanceVector 1 thrpt 15 231,651
± 9,975 ops/us
VectorUtilBenchmark.binaryHammingDistanceVector 128 thrpt 15 16,760
± 0,251 ops/us
VectorUtilBenchmark.binaryHammingDistanceVector 207 thrpt 15 10,317
± 0,251 ops/us
VectorUtilBenchmark.binaryHammingDistanceVector 256 thrpt 15 8,887
± 0,559 ops/us
VectorUtilBenchmark.binaryHammingDistanceVector 300 thrpt 15 7,466
± 0,345 ops/us
VectorUtilBenchmark.binaryHammingDistanceVector 512 thrpt 15 4,706
± 0,080 ops/us
VectorUtilBenchmark.binaryHammingDistanceVector 702 thrpt 15 3,062
± 0,566 ops/us
VectorUtilBenchmark.binaryHammingDistanceVector 1024 thrpt 15 2,404
± 0,024 ops/us
````
Please not the SCALAR one here is the above variant. As this one four (!)
times faster than the Panama variant, there is no need to have a
panama-vectorized one and it looks like it is not working at all. I think the
lookup table is a bad idea.
To make it short:
- Remove the impl and the table from VectorUtilSupport (scalar and
vectorized)
- Implement my above methoid directly in VectorUtil class
This will be a short PR. Make sure to add more tests (there's only one for
the VectorUtil method).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]