Thank you Claes for your review and the detailed analysis!
On 5/17/18 4:07 AM, Claes Redestad wrote:
Shouldn't this be called "Faster rounding up to nearest power of two"?
Yes, it's more accurate.
Patch looks OK to me, but I'd like to see numbers with the
numberOfLeadingZeros intrinsic
disabled so that we ensure we're not incurring an unreasonable penalty
on platforms who don't
have an intrinsic for this.
Running your benchmark with the intrinsic disabled[1] on my machine I
see a 25-30% penalty
with testNew relative to testOld, which is perhaps a bit toomuch for
comfort..
So I took a look at profiles for numberOfLeadingZeros with the
intrinsic disabled and realized
it might be possible to improve:
diff -r de35abdfe5da src/java.base/share/classes/java/lang/Integer.java
--- a/src/java.base/share/classes/java/lang/Integer.java Mon May 14
16:21:08 2018 +0200
+++ b/src/java.base/share/classes/java/lang/Integer.java Thu May 17
12:44:53 2018 +0200
@@ -1621,12 +1621,12 @@
// HD, Figure 5-6
if (i <= 0)
return i == 0 ? 32 : 0;
- int n = 1;
- if (i >>> 16 == 0) { n += 16; i <<= 16; }
- if (i >>> 24 == 0) { n += 8; i <<= 8; }
- if (i >>> 28 == 0) { n += 4; i <<= 4; }
- if (i >>> 30 == 0) { n += 2; i <<= 2; }
- n -= i >>> 31;
+ int n = 0;
+ if ( i < 1 << 16) { n += 16; i <<= 16; }
+ if (i >= 0 && i < 1 << 24) { n += 8; i <<= 8; }
+ if (i >= 0 && i < 1 << 28) { n += 4; i <<= 4; }
+ if (i >= 0 && i < 1 << 30) { n += 2; i <<= 2; }
+ if (i >= 0) n++;
return n;
}
Adding a case that uses this version to your benchmark[2] and the new
version is only about
10% slower than the baseline, with the added benefit that other uses
of numberOfLeadingZeros
might see a speed-upif there's no intrinsic (runs with intrinsic
disabled[1]):
Interesting.
It can probably be done with fewer comparisons, if the direction of all
the shifts is inverted:
The following variant showed slightly better performance on my machine:
static final int numberOfLeadingZeros(int i) {
if (i <= 0)
return i == 0 ? 32 : 0;
int n = 31;
if (i >= 1 << 16) { n -= 16; i >>>= 16; }
if (i >= 1 << 8) { n -= 8; i >>>= 8; }
if (i >= 1 << 4) { n -= 4; i >>>= 4; }
if (i >= 1 << 2) { n -= 2; i >>>= 2; }
return n - (i >>> 1);
}
I agree that improving Java implementation of numberOfLeadingZeros() can
be done as a separate RFE.
With kind regards,
Ivan
Benchmark (arg) Mode Cnt Score Error Units
TableFor.testNew 1 avgt 6 28.343 ± 0.534 ns/op
TableFor.testNew 42 avgt 6 26.458 ± 0.064 ns/op
TableFor.testNew2 1 avgt 6 23.969 ± 0.201 ns/op
TableFor.testNew2 42 avgt 6 23.934 ± 0.107 ns/op
TableFor.testOld 1 avgt 6 21.615 ± 0.803 ns/op
TableFor.testOld 42 avgt 6 21.418 ± 0.106 ns/op
So I think with the above patch to Integer.numberOfLeadingZeros we can
get the benefit for
our supported platforms while staying roughly performance neutral on
platforms without
an intrinsic. Not strictly necessary to fold it into this RFE, of course.
Thanks!
/Claes
[1] -XX:+UnlockDiagnosticVMOptions
-XX:DisableIntrinsic=_numberOfLeadingZeros_i
[2] http://cr.openjdk.java.net/~redestad/scratch/TableFor.java
On 2018-05-17 05:48, David Holmes wrote:
Do you think it's good to go?
I think I'd rather defer to a more performance oriented reviewer -
paging Claes! ;-)
David
-----
--
With kind regards,
Ivan Gerasimov