Hi Ivan,
What about branch-less variant?
public static int highestOneBit(int i) {
return i & (MIN_VALUE >>> numberOfLeadingZeros(i));
}
Would it be any better for call sites that vary 0 and non-0 argument?
Regards, Peter
On 03/20/2018 09:58 AM, Ivan Gerasimov wrote:
Hello!
The hightestOneBit function doesn't have an intrinsic and is currently
implemented with a dozen of instructions.
Alternatively, it could be implemented as MIN_VALUE >>>
numberOfLeadingZeros(i), which works for all integers but zero.
The former function gets intrisified by hotspot, which results in +27%
of throughput (see the jmh results below).
Would you please help review this simple fix?
BUGURL: https://bugs.openjdk.java.net/browse/JDK-8199843
WEBREV: http://cr.openjdk.java.net/~igerasim/8199843/00/webrev/
Benchmark:
http://cr.openjdk.java.net/~igerasim/8199843/00/MyBenchmark.java
Benchmark results:
Benchmark (arg) Mode Cnt Score Error Units
MyBenchmark.int_testMethod_new 0 thrpt 35 323430664.593 ±
7492044.171 ops/s
MyBenchmark.int_testMethod_new 42 thrpt 35 298526237.078 ±
5978291.689 ops/s
MyBenchmark.int_testMethod_new -42 thrpt 35 302903562.073 ±
7984723.721 ops/s
MyBenchmark.int_testMethod_org 0 thrpt 35 236245042.891 ±
3635990.596 ops/s
MyBenchmark.int_testMethod_org 42 thrpt 35 237903410.753 ±
3437684.390 ops/s
MyBenchmark.int_testMethod_org -42 thrpt 35 238472580.618 ±
2654886.010 ops/s
MyBenchmark.long_testMethod_new 0 thrpt 35 282646114.501 ±
48028366.305 ops/s
MyBenchmark.long_testMethod_new 42 thrpt 35 282382228.405 ±
5781529.307 ops/s
MyBenchmark.long_testMethod_new -42 thrpt 35 276724858.286 ±
6529561.227 ops/s
MyBenchmark.long_testMethod_org 0 thrpt 35 198500211.972 ±
15096862.367 ops/s
MyBenchmark.long_testMethod_org 42 thrpt 35 215854630.194 ±
3112930.563 ops/s
MyBenchmark.long_testMethod_org -42 thrpt 35 217992805.521 ±
2622877.082 ops/s
Thanks in advance!