Hi Claes!
On 3/20/18 2:46 AM, Claes Redestad wrote:
Hi,
On 2018-03-20 09:58, Ivan Gerasimov wrote:
Hello!
The hightestOneBit function doesn't have an intrinsic and is
currently implemented with a dozen of instructions.
Alternatively, it could be implemented as MIN_VALUE >>>
numberOfLeadingZeros(i), which works for all integers but zero.
The former function gets intrisified by hotspot, which results in
+27% of throughput (see the jmh results below).
Would you please help review this simple fix?
BUGURL: https://bugs.openjdk.java.net/browse/JDK-8199843
WEBREV: http://cr.openjdk.java.net/~igerasim/8199843/00/webrev/
nice optimization!
Benchmark:
http://cr.openjdk.java.net/~igerasim/8199843/00/MyBenchmark.java
Benchmark results:
Benchmark (arg) Mode Cnt Score Error Units
MyBenchmark.int_testMethod_new 0 thrpt 35 323430664.593 ±
7492044.171 ops/s
MyBenchmark.int_testMethod_new 42 thrpt 35 298526237.078 ±
5978291.689 ops/s
MyBenchmark.int_testMethod_new -42 thrpt 35 302903562.073 ±
7984723.721 ops/s
MyBenchmark.int_testMethod_org 0 thrpt 35 236245042.891 ±
3635990.596 ops/s
MyBenchmark.int_testMethod_org 42 thrpt 35 237903410.753 ±
3437684.390 ops/s
MyBenchmark.int_testMethod_org -42 thrpt 35 238472580.618 ±
2654886.010 ops/s
MyBenchmark.long_testMethod_new 0 thrpt 35 282646114.501 ±
48028366.305 ops/s
MyBenchmark.long_testMethod_new 42 thrpt 35 282382228.405 ±
5781529.307 ops/s
MyBenchmark.long_testMethod_new -42 thrpt 35 276724858.286 ±
6529561.227 ops/s
MyBenchmark.long_testMethod_org 0 thrpt 35 198500211.972 ±
15096862.367 ops/s
MyBenchmark.long_testMethod_org 42 thrpt 35 215854630.194 ±
3112930.563 ops/s
MyBenchmark.long_testMethod_org -42 thrpt 35 217992805.521 ±
2622877.082 ops/s
To nitpick a bit:
Please run with some appropriate time unit, e.g., "-tu us" to make
results more human readable.
And where are the baseline results? :-)
It'd also be nice to verify we don't regress too much in case there's
no intrinsic, i.e., test with the
intrinsic disabled.
Good point!
Here are results for Integer.highestOneBit with the intrinsic of
numberOfLeadingZeros being disabled:
Benchmark (arg) Mode Cnt Score Error Units
MyBenchmark.int_testMethod_00_base 0 thrpt 35 324.369 ± 15.437
ops/us
MyBenchmark.int_testMethod_00_base 42 thrpt 35 307.741 ± 29.623
ops/us
MyBenchmark.int_testMethod_00_base -42 thrpt 35 324.563 ± 25.039
ops/us
MyBenchmark.int_testMethod_01_org 0 thrpt 35 231.276 ± 8.392
ops/us
MyBenchmark.int_testMethod_01_org 42 thrpt 35 230.466 ± 10.557
ops/us
MyBenchmark.int_testMethod_01_org -42 thrpt 35 238.579 ± 8.257
ops/us
MyBenchmark.int_testMethod_02_new 0 thrpt 35 326.752 ± 18.400
ops/us
MyBenchmark.int_testMethod_02_new 42 thrpt 35 200.604 ± 8.139
ops/us
MyBenchmark.int_testMethod_02_new -42 thrpt 35 212.313 ± 21.284
ops/us
Base case just returns the argument, thus shows the maximum possible
upper bound of the throughput.
With non-zero values the new function performs 11-13% worse.
I guess it's acceptable?
With kind regards,
Ivan
Thanks!
/Claes
--
With kind regards,
Ivan Gerasimov