Hi Claes!

On 3/20/18 2:46 AM, Claes Redestad wrote:
Hi,

On 2018-03-20 09:58, Ivan Gerasimov wrote:
Hello!

The hightestOneBit function doesn't have an intrinsic and is currently implemented with a dozen of instructions. Alternatively, it could be implemented as MIN_VALUE >>> numberOfLeadingZeros(i), which works for all integers but zero. The former function gets intrisified by hotspot, which results in +27% of throughput (see the jmh results below).

Would you please help review this simple fix?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8199843
WEBREV: http://cr.openjdk.java.net/~igerasim/8199843/00/webrev/

nice optimization!

Benchmark: http://cr.openjdk.java.net/~igerasim/8199843/00/MyBenchmark.java

Benchmark results:

Benchmark                        (arg)   Mode  Cnt Score Error Units
MyBenchmark.int_testMethod_new 0 thrpt 35 323430664.593 ± 7492044.171 ops/s MyBenchmark.int_testMethod_new 42 thrpt 35 298526237.078 ± 5978291.689 ops/s MyBenchmark.int_testMethod_new -42 thrpt 35 302903562.073 ± 7984723.721 ops/s MyBenchmark.int_testMethod_org 0 thrpt 35 236245042.891 ± 3635990.596 ops/s MyBenchmark.int_testMethod_org 42 thrpt 35 237903410.753 ± 3437684.390 ops/s MyBenchmark.int_testMethod_org -42 thrpt 35 238472580.618 ± 2654886.010 ops/s MyBenchmark.long_testMethod_new 0 thrpt 35 282646114.501 ± 48028366.305 ops/s MyBenchmark.long_testMethod_new 42 thrpt 35 282382228.405 ± 5781529.307 ops/s MyBenchmark.long_testMethod_new -42 thrpt 35 276724858.286 ± 6529561.227 ops/s MyBenchmark.long_testMethod_org 0 thrpt 35 198500211.972 ± 15096862.367 ops/s MyBenchmark.long_testMethod_org 42 thrpt 35 215854630.194 ± 3112930.563 ops/s MyBenchmark.long_testMethod_org -42 thrpt 35 217992805.521 ± 2622877.082 ops/s

To nitpick a bit:

Please run with some appropriate time unit, e.g., "-tu us" to make results more human readable.
And where are the baseline results? :-)

It'd also be nice to verify we don't regress too much in case there's no intrinsic, i.e., test with the
intrinsic disabled.

Good point!

Here are results for Integer.highestOneBit with the intrinsic of numberOfLeadingZeros being disabled:
Benchmark                           (arg)   Mode  Cnt    Score Error   Units
MyBenchmark.int_testMethod_00_base 0 thrpt 35 324.369 ± 15.437 ops/us MyBenchmark.int_testMethod_00_base 42 thrpt 35 307.741 ± 29.623 ops/us MyBenchmark.int_testMethod_00_base -42 thrpt 35 324.563 ± 25.039 ops/us MyBenchmark.int_testMethod_01_org 0 thrpt 35 231.276 ± 8.392 ops/us MyBenchmark.int_testMethod_01_org 42 thrpt 35 230.466 ± 10.557 ops/us MyBenchmark.int_testMethod_01_org -42 thrpt 35 238.579 ± 8.257 ops/us MyBenchmark.int_testMethod_02_new 0 thrpt 35 326.752 ± 18.400 ops/us MyBenchmark.int_testMethod_02_new 42 thrpt 35 200.604 ± 8.139 ops/us MyBenchmark.int_testMethod_02_new -42 thrpt 35 212.313 ± 21.284 ops/us

Base case just returns the argument, thus shows the maximum possible upper bound of the throughput.

With non-zero values the new function performs 11-13% worse.
I guess it's acceptable?

With kind regards,
Ivan
Thanks!

/Claes


--
With kind regards,
Ivan Gerasimov

Reply via email to