On Fri, 14 May 2021 13:18:27 GMT, Laurent Bourgès <lbour...@openjdk.org> wrote:
>> I don't know Laurent, I find the handling of signed order over-complicated. >> Subtracting `Integer.MIN_VALUE` is really cheap... > > I made a JMH test on jdk16 to test count4 (xor) performance: > https://github.com/bourgesl/nearly-optimal-mergesort-code/tree/master/sort-bench/results/count_xor > > > Benchmark (size) Mode Cnt Score Error Units > ArrayXorBenchmark.arrayAndOriginal 1000000 avgt 20 684561,951 ± 2177,120 ns/op > ArrayXorBenchmark.arrayXorOriginal 1000000 avgt 20 777255,465 ± 1815,136 ns/op > ArrayXorBenchmark.arrayXor_Masked 1000000 avgt 20 814163,377 ± 2665,436 ns/op > ArrayXorBenchmark.arrayXor_Unsafe 1000000 avgt 20 707273,922 ± 2994,380 ns/op > > Masked xor does not get optimized by c2 too. > > Using Unsafe is better, see: > https://github.com/bourgesl/nearly-optimal-mergesort-code/blob/master/sort-bench/src/main/java/edu/sorting/bench/ArrayXorBenchmark.java > > If you want, I could make another radixsort() using on or off heap count > buffers and Unsafe, as I did in Marlin to avoid bound checks... I think an uncontroversial and efficient solution would be to replace count4[(a[i] >>> 24) ^ 0x80]--; with count4[(a[i] >>> 24) & 0xFF]--; and after finishing counting, first half of `count4` should be swapped with second half of `count4`. ------------- PR: https://git.openjdk.java.net/jdk/pull/3938