Re: RFR 8203279 : Faster calculation of power of two
On 2018-05-17 22:44, Ivan Gerasimov wrote: The following variant showed slightly better performance on my machine: static final int numberOfLeadingZeros(int i) { if (i <= 0) return i == 0 ? 32 : 0; int n = 31; if (i >= 1 << 16) { n -= 16; i >>>= 16; } if (i >= 1 << 8) { n -= 8; i >>>= 8; } if (i >= 1 << 4) { n -= 4; i >>>= 4; } if (i >= 1 << 2) { n -= 2; i >>>= 2; } return n - (i >>> 1); } Nice, this version also wins on all of -Xint and -XX:TieredStopAtLevel=1-3 (my version lost out slightly versus the baseline on -Xint), so is potentially a startup enhancement even on platforms with C2 intrinsics. I agree that improving Java implementation of numberOfLeadingZeros() can be done as a separate RFE. I filed https://bugs.openjdk.java.net/browse/JDK-8203352 for this. /Claes
Re: RFR 8203279 : Faster calculation of power of two
Thank you Claes for your review and the detailed analysis! On 5/17/18 4:07 AM, Claes Redestad wrote: Shouldn't this be called "Faster rounding up to nearest power of two"? Yes, it's more accurate. Patch looks OK to me, but I'd like to see numbers with the numberOfLeadingZeros intrinsic disabled so that we ensure we're not incurring an unreasonable penalty on platforms who don't have an intrinsic for this. Running your benchmark with the intrinsic disabled[1] on my machine I see a 25-30% penalty with testNew relative to testOld, which is perhaps a bit toomuch for comfort.. So I took a look at profiles for numberOfLeadingZeros with the intrinsic disabled and realized it might be possible to improve: diff -r de35abdfe5da src/java.base/share/classes/java/lang/Integer.java --- a/src/java.base/share/classes/java/lang/Integer.java Mon May 14 16:21:08 2018 +0200 +++ b/src/java.base/share/classes/java/lang/Integer.java Thu May 17 12:44:53 2018 +0200 @@ -1621,12 +1621,12 @@ // HD, Figure 5-6 if (i <= 0) return i == 0 ? 32 : 0; -int n = 1; -if (i >>> 16 == 0) { n += 16; i <<= 16; } -if (i >>> 24 == 0) { n += 8; i <<= 8; } -if (i >>> 28 == 0) { n += 4; i <<= 4; } -if (i >>> 30 == 0) { n += 2; i <<= 2; } -n -= i >>> 31; +int n = 0; +if ( i < 1 << 16) { n += 16; i <<= 16; } +if (i >= 0 && i < 1 << 24) { n += 8; i <<= 8; } +if (i >= 0 && i < 1 << 28) { n += 4; i <<= 4; } +if (i >= 0 && i < 1 << 30) { n += 2; i <<= 2; } +if (i >= 0) n++; return n; } Adding a case that uses this version to your benchmark[2] and the new version is only about 10% slower than the baseline, with the added benefit that other uses of numberOfLeadingZeros might see a speed-upif there's no intrinsic (runs with intrinsic disabled[1]): Interesting. It can probably be done with fewer comparisons, if the direction of all the shifts is inverted: The following variant showed slightly better performance on my machine: static final int numberOfLeadingZeros(int i) { if (i <= 0) return i == 0 ? 32 : 0; int n = 31; if (i >= 1 << 16) { n -= 16; i >>>= 16; } if (i >= 1 << 8) { n -= 8; i >>>= 8; } if (i >= 1 << 4) { n -= 4; i >>>= 4; } if (i >= 1 << 2) { n -= 2; i >>>= 2; } return n - (i >>> 1); } I agree that improving Java implementation of numberOfLeadingZeros() can be done as a separate RFE. With kind regards, Ivan Benchmark (arg) Mode Cnt Score Error Units TableFor.testNew 1 avgt6 28.343 ± 0.534 ns/op TableFor.testNew 42 avgt6 26.458 ± 0.064 ns/op TableFor.testNew2 1 avgt6 23.969 ± 0.201 ns/op TableFor.testNew2 42 avgt6 23.934 ± 0.107 ns/op TableFor.testOld 1 avgt6 21.615 ± 0.803 ns/op TableFor.testOld 42 avgt6 21.418 ± 0.106 ns/op So I think with the above patch to Integer.numberOfLeadingZeros we can get the benefit for our supported platforms while staying roughly performance neutral on platforms without an intrinsic. Not strictly necessary to fold it into this RFE, of course. Thanks! /Claes [1] -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_numberOfLeadingZeros_i [2] http://cr.openjdk.java.net/~redestad/scratch/TableFor.java On 2018-05-17 05:48, David Holmes wrote: Do you think it's good to go? I think I'd rather defer to a more performance oriented reviewer - paging Claes! ;-) David - -- With kind regards, Ivan Gerasimov
Re: RFR 8203279 : Faster calculation of power of two
On 05/17/2018 07:07 AM, Claes Redestad wrote: > Shouldn't this be called "Faster rounding up to nearest power of two"? > > Patch looks OK to me, but I'd like to see numbers with the > numberOfLeadingZeros intrinsic > disabled so that we ensure we're not incurring an unreasonable penalty > on platforms who don't > have an intrinsic for this. > > Running your benchmark with the intrinsic disabled[1] on my machine I > see a 25-30% penalty > with testNew relative to testOld, which is perhaps a bit toomuch for > comfort.. This was the original reason for using the existing code -- hardware support was uncommon. I like your idea of tweaking non-intrinsic numberOfLeadingZeros to reduce penalty on the now-uncommon platforms without support. -Doug > > So I took a look at profiles for numberOfLeadingZeros with the intrinsic > disabled and realized > it might be possible to improve: > > diff -r de35abdfe5da src/java.base/share/classes/java/lang/Integer.java > --- a/src/java.base/share/classes/java/lang/Integer.java Mon May > 14 16:21:08 2018 +0200 > +++ b/src/java.base/share/classes/java/lang/Integer.java Thu May > 17 12:44:53 2018 +0200 > @@ -1621,12 +1621,12 @@ > // HD, Figure 5-6 > if (i <= 0) > return i == 0 ? 32 : 0; > - int n = 1; > - if (i >>> 16 == 0) { n += 16; i <<= 16; } > - if (i >>> 24 == 0) { n += 8; i <<= 8; } > - if (i >>> 28 == 0) { n += 4; i <<= 4; } > - if (i >>> 30 == 0) { n += 2; i <<= 2; } > - n -= i >>> 31; > + int n = 0; > + if ( i < 1 << 16) { n += 16; i <<= 16; } > + if (i >= 0 && i < 1 << 24) { n += 8; i <<= 8; } > + if (i >= 0 && i < 1 << 28) { n += 4; i <<= 4; } > + if (i >= 0 && i < 1 << 30) { n += 2; i <<= 2; } > + if (i >= 0) n++; > return n; > } > > Adding a case that uses this version to your benchmark[2] and the new > version is only about > 10% slower than the baseline, with the added benefit that other uses of > numberOfLeadingZeros > might see a speed-upif there's no intrinsic (runs with intrinsic > disabled[1]): > > Benchmark (arg) Mode Cnt Score Error Units > TableFor.testNew 1 avgt 6 28.343 ± 0.534 ns/op > TableFor.testNew 42 avgt 6 26.458 ± 0.064 ns/op > TableFor.testNew2 1 avgt 6 23.969 ± 0.201 ns/op > TableFor.testNew2 42 avgt 6 23.934 ± 0.107 ns/op > TableFor.testOld 1 avgt 6 21.615 ± 0.803 ns/op > TableFor.testOld 42 avgt 6 21.418 ± 0.106 ns/op > > So I think with the above patch to Integer.numberOfLeadingZeros we can > get the benefit for > our supported platforms while staying roughly performance neutral on > platforms without > an intrinsic. Not strictly necessary to fold it into this RFE, of course. > > Thanks! > > /Claes > > [1] -XX:+UnlockDiagnosticVMOptions > -XX:DisableIntrinsic=_numberOfLeadingZeros_i > [2] http://cr.openjdk.java.net/~redestad/scratch/TableFor.java > > On 2018-05-17 05:48, David Holmes wrote: >>> Do you think it's good to go? >> >> I think I'd rather defer to a more performance oriented reviewer - >> paging Claes! ;-) >> >> David >> - >
Re: RFR 8203279 : Faster calculation of power of two
> On May 17, 2018, at 4:07 AM, Claes Redestad wrote: > > Shouldn't this be called "Faster rounding up to nearest power of two"? > > Patch looks OK to me, but I'd like to see numbers with the > numberOfLeadingZeros intrinsic > disabled so that we ensure we're not incurring an unreasonable penalty on > platforms who don't > have an intrinsic for this. > I did a quick check and i believe it is intrinsic for C2 on all hotspot cpu platform code, zero does not count! (did not check Graal but i suspect it supports it too). Extra points for comparing the two approaches with C1 :-) I don’t think that is necessary but i am curious. Paul. > Running your benchmark with the intrinsic disabled[1] on my machine I see a > 25-30% penalty > with testNew relative to testOld, which is perhaps a bit toomuch for comfort.. > > So I took a look at profiles for numberOfLeadingZeros with the intrinsic > disabled and realized > it might be possible to improve: > > diff -r de35abdfe5da src/java.base/share/classes/java/lang/Integer.java > --- a/src/java.base/share/classes/java/lang/Integer.javaMon May 14 > 16:21:08 2018 +0200 > +++ b/src/java.base/share/classes/java/lang/Integer.javaThu May 17 > 12:44:53 2018 +0200 > @@ -1621,12 +1621,12 @@ > // HD, Figure 5-6 > if (i <= 0) > return i == 0 ? 32 : 0; > -int n = 1; > -if (i >>> 16 == 0) { n += 16; i <<= 16; } > -if (i >>> 24 == 0) { n += 8; i <<= 8; } > -if (i >>> 28 == 0) { n += 4; i <<= 4; } > -if (i >>> 30 == 0) { n += 2; i <<= 2; } > -n -= i >>> 31; > +int n = 0; > +if ( i < 1 << 16) { n += 16; i <<= 16; } > +if (i >= 0 && i < 1 << 24) { n += 8; i <<= 8; } > +if (i >= 0 && i < 1 << 28) { n += 4; i <<= 4; } > +if (i >= 0 && i < 1 << 30) { n += 2; i <<= 2; } > +if (i >= 0) n++; > return n; > } > > Adding a case that uses this version to your benchmark[2] and the new version > is only about > 10% slower than the baseline, with the added benefit that other uses of > numberOfLeadingZeros > might see a speed-upif there's no intrinsic (runs with intrinsic disabled[1]): > > Benchmark (arg) Mode Cnt Score Error Units > TableFor.testNew 1 avgt6 28.343 ± 0.534 ns/op > TableFor.testNew 42 avgt6 26.458 ± 0.064 ns/op > TableFor.testNew2 1 avgt6 23.969 ± 0.201 ns/op > TableFor.testNew2 42 avgt6 23.934 ± 0.107 ns/op > TableFor.testOld 1 avgt6 21.615 ± 0.803 ns/op > TableFor.testOld 42 avgt6 21.418 ± 0.106 ns/op > > So I think with the above patch to Integer.numberOfLeadingZeros we can get > the benefit for > our supported platforms while staying roughly performance neutral on > platforms without > an intrinsic. Not strictly necessary to fold it into this RFE, of course. > > Thanks! > > /Claes > > [1] -XX:+UnlockDiagnosticVMOptions > -XX:DisableIntrinsic=_numberOfLeadingZeros_i > [2] http://cr.openjdk.java.net/~redestad/scratch/TableFor.java > > On 2018-05-17 05:48, David Holmes wrote: >>> Do you think it's good to go? >> >> I think I'd rather defer to a more performance oriented reviewer - paging >> Claes! ;-) >> >> David >> - >
Re: RFR 8203279 : Faster calculation of power of two
On 17/05/2018 9:24 PM, Claes Redestad wrote: On 2018-05-17 13:15, David Holmes wrote: diff -r de35abdfe5da src/java.base/share/classes/java/lang/Integer.java --- a/src/java.base/share/classes/java/lang/Integer.java Mon May 14 16:21:08 2018 +0200 +++ b/src/java.base/share/classes/java/lang/Integer.java Thu May 17 12:44:53 2018 +0200 @@ -1621,12 +1621,12 @@ // HD, Figure 5-6 The code no longer maps to that from HD so the comment would need fixing. Seems that's a pre-existing issue, but yes. The comment is correct for the first edition of the book. It's 5-11 in the second edition. :) David /Claes
Re: RFR 8203279 : Faster calculation of power of two
On 2018-05-17 13:15, David Holmes wrote: diff -r de35abdfe5da src/java.base/share/classes/java/lang/Integer.java --- a/src/java.base/share/classes/java/lang/Integer.java Mon May 14 16:21:08 2018 +0200 +++ b/src/java.base/share/classes/java/lang/Integer.java Thu May 17 12:44:53 2018 +0200 @@ -1621,12 +1621,12 @@ // HD, Figure 5-6 The code no longer maps to that from HD so the comment would need fixing. Seems that's a pre-existing issue, but yes. /Claes
Re: RFR 8203279 : Faster calculation of power of two
On 17/05/2018 9:07 PM, Claes Redestad wrote: Shouldn't this be called "Faster rounding up to nearest power of two"? Patch looks OK to me, but I'd like to see numbers with the numberOfLeadingZeros intrinsic disabled so that we ensure we're not incurring an unreasonable penalty on platforms who don't have an intrinsic for this. Running your benchmark with the intrinsic disabled[1] on my machine I see a 25-30% penalty with testNew relative to testOld, which is perhaps a bit toomuch for comfort.. So I took a look at profiles for numberOfLeadingZeros with the intrinsic disabled and realized it might be possible to improve: diff -r de35abdfe5da src/java.base/share/classes/java/lang/Integer.java --- a/src/java.base/share/classes/java/lang/Integer.java Mon May 14 16:21:08 2018 +0200 +++ b/src/java.base/share/classes/java/lang/Integer.java Thu May 17 12:44:53 2018 +0200 @@ -1621,12 +1621,12 @@ // HD, Figure 5-6 The code no longer maps to that from HD so the comment would need fixing. David - if (i <= 0) return i == 0 ? 32 : 0; - int n = 1; - if (i >>> 16 == 0) { n += 16; i <<= 16; } - if (i >>> 24 == 0) { n += 8; i <<= 8; } - if (i >>> 28 == 0) { n += 4; i <<= 4; } - if (i >>> 30 == 0) { n += 2; i <<= 2; } - n -= i >>> 31; + int n = 0; + if ( i < 1 << 16) { n += 16; i <<= 16; } + if (i >= 0 && i < 1 << 24) { n += 8; i <<= 8; } + if (i >= 0 && i < 1 << 28) { n += 4; i <<= 4; } + if (i >= 0 && i < 1 << 30) { n += 2; i <<= 2; } + if (i >= 0) n++; return n; } Adding a case that uses this version to your benchmark[2] and the new version is only about 10% slower than the baseline, with the added benefit that other uses of numberOfLeadingZeros might see a speed-upif there's no intrinsic (runs with intrinsic disabled[1]): Benchmark (arg) Mode Cnt Score Error Units TableFor.testNew 1 avgt 6 28.343 ± 0.534 ns/op TableFor.testNew 42 avgt 6 26.458 ± 0.064 ns/op TableFor.testNew2 1 avgt 6 23.969 ± 0.201 ns/op TableFor.testNew2 42 avgt 6 23.934 ± 0.107 ns/op TableFor.testOld 1 avgt 6 21.615 ± 0.803 ns/op TableFor.testOld 42 avgt 6 21.418 ± 0.106 ns/op So I think with the above patch to Integer.numberOfLeadingZeros we can get the benefit for our supported platforms while staying roughly performance neutral on platforms without an intrinsic. Not strictly necessary to fold it into this RFE, of course. Thanks! /Claes [1] -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_numberOfLeadingZeros_i [2] http://cr.openjdk.java.net/~redestad/scratch/TableFor.java On 2018-05-17 05:48, David Holmes wrote: Do you think it's good to go? I think I'd rather defer to a more performance oriented reviewer - paging Claes! ;-) David -
Re: RFR 8203279 : Faster calculation of power of two
Shouldn't this be called "Faster rounding up to nearest power of two"? Patch looks OK to me, but I'd like to see numbers with the numberOfLeadingZeros intrinsic disabled so that we ensure we're not incurring an unreasonable penalty on platforms who don't have an intrinsic for this. Running your benchmark with the intrinsic disabled[1] on my machine I see a 25-30% penalty with testNew relative to testOld, which is perhaps a bit toomuch for comfort.. So I took a look at profiles for numberOfLeadingZeros with the intrinsic disabled and realized it might be possible to improve: diff -r de35abdfe5da src/java.base/share/classes/java/lang/Integer.java --- a/src/java.base/share/classes/java/lang/Integer.java Mon May 14 16:21:08 2018 +0200 +++ b/src/java.base/share/classes/java/lang/Integer.java Thu May 17 12:44:53 2018 +0200 @@ -1621,12 +1621,12 @@ // HD, Figure 5-6 if (i <= 0) return i == 0 ? 32 : 0; - int n = 1; - if (i >>> 16 == 0) { n += 16; i <<= 16; } - if (i >>> 24 == 0) { n += 8; i <<= 8; } - if (i >>> 28 == 0) { n += 4; i <<= 4; } - if (i >>> 30 == 0) { n += 2; i <<= 2; } - n -= i >>> 31; + int n = 0; + if ( i < 1 << 16) { n += 16; i <<= 16; } + if (i >= 0 && i < 1 << 24) { n += 8; i <<= 8; } + if (i >= 0 && i < 1 << 28) { n += 4; i <<= 4; } + if (i >= 0 && i < 1 << 30) { n += 2; i <<= 2; } + if (i >= 0) n++; return n; } Adding a case that uses this version to your benchmark[2] and the new version is only about 10% slower than the baseline, with the added benefit that other uses of numberOfLeadingZeros might see a speed-upif there's no intrinsic (runs with intrinsic disabled[1]): Benchmark (arg) Mode Cnt Score Error Units TableFor.testNew 1 avgt 6 28.343 ± 0.534 ns/op TableFor.testNew 42 avgt 6 26.458 ± 0.064 ns/op TableFor.testNew2 1 avgt 6 23.969 ± 0.201 ns/op TableFor.testNew2 42 avgt 6 23.934 ± 0.107 ns/op TableFor.testOld 1 avgt 6 21.615 ± 0.803 ns/op TableFor.testOld 42 avgt 6 21.418 ± 0.106 ns/op So I think with the above patch to Integer.numberOfLeadingZeros we can get the benefit for our supported platforms while staying roughly performance neutral on platforms without an intrinsic. Not strictly necessary to fold it into this RFE, of course. Thanks! /Claes [1] -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_numberOfLeadingZeros_i [2] http://cr.openjdk.java.net/~redestad/scratch/TableFor.java On 2018-05-17 05:48, David Holmes wrote: Do you think it's good to go? I think I'd rather defer to a more performance oriented reviewer - paging Claes! ;-) David -
Re: RFR 8203279 : Faster calculation of power of two
On 17/05/2018 1:01 PM, Ivan Gerasimov wrote: On 5/16/18 7:44 PM, David Holmes wrote: On 17/05/2018 12:30 PM, Ivan Gerasimov wrote: Hi David! On 5/16/18 6:12 PM, David Holmes wrote: Hi Ivan, Surely you need to back this up with some performance numbers! And verify not assume that numberOfLeadingZeroes is intrinsified! Yes, good point. Please find the benchmark with the results here: http://cr.openjdk.java.net/~igerasim/8203279/00/MyBenchmark.java It shows that the new version of HashMap. tableSizeFor() was (approximately) two times faster. Impressive. :) Thanks :) Of course it's not performance critical code, but why not optimize it by a tiny bit if possible? Do you think it's good to go? I think I'd rather defer to a more performance oriented reviewer - paging Claes! ;-) David - With kind regards, Ivan Thanks, David With kind regards, Ivan Cheers, David On 17/05/2018 10:32 AM, Ivan Gerasimov wrote: Hello! In a few places we have code that rounds an integer up to the nearest power of two. It is done with a series of RSHOTFs and ORs, but it can possibly be done faster with the use of Integer.numberOfLeadingZeros (assuming it is intrinsified). Would you please help review this trivial optimization: BUGURL: https://bugs.openjdk.java.net/browse/JDK-8203279 WEBREV: http://cr.openjdk.java.net/~igerasim/8203279/00/webrev/ For HashMap.tableSizeFor() I created a simple test with a loop from Integer.MIN_VALUE to Integer.MAX_VALUE (including), to make sure the result is the same. For TimSort.ensureCapacity() I checked all positive values of minCapacity.
Re: RFR 8203279 : Faster calculation of power of two
On 17/05/2018 12:55 PM, Martin Buchholz wrote: Hello Ivan, I've wondered about this myself. Probably the performance is architecture dependent. Probably a new method should be added to Integer and Long with @HotspotIntrinsicCandidate. That gives David another chance to practice his blind aarch64 assembly language coding skills. Ha! :) It's hard to give a good name for this. Hacker's delight calls it dclp2 and we can probably do better than that. http://www.hackersdelight.org/hdcodetxt/clp2.c.txt But I approve either way. Interesting that the existing code is almost, but not quite clp2. And numberOfLeadingZeroes also comes from HD. If both forms get optimally compiled I'm very surprised that we would see such a performance difference. David On Wed, May 16, 2018 at 5:32 PM, Ivan Gerasimov wrote: Hello! In a few places we have code that rounds an integer up to the nearest power of two. It is done with a series of RSHOTFs and ORs, but it can possibly be done faster with the use of Integer.numberOfLeadingZeros (assuming it is intrinsified). Would you please help review this trivial optimization: BUGURL: https://bugs.openjdk.java.net/browse/JDK-8203279 WEBREV: http://cr.openjdk.java.net/~igerasim/8203279/00/webrev/ For HashMap.tableSizeFor() I created a simple test with a loop from Integer.MIN_VALUE to Integer.MAX_VALUE (including), to make sure the result is the same. For TimSort.ensureCapacity() I checked all positive values of minCapacity. -- With kind regards, Ivan Gerasimov
Re: RFR 8203279 : Faster calculation of power of two
On 5/16/18 7:44 PM, David Holmes wrote: On 17/05/2018 12:30 PM, Ivan Gerasimov wrote: Hi David! On 5/16/18 6:12 PM, David Holmes wrote: Hi Ivan, Surely you need to back this up with some performance numbers! And verify not assume that numberOfLeadingZeroes is intrinsified! Yes, good point. Please find the benchmark with the results here: http://cr.openjdk.java.net/~igerasim/8203279/00/MyBenchmark.java It shows that the new version of HashMap. tableSizeFor() was (approximately) two times faster. Impressive. :) Thanks :) Of course it's not performance critical code, but why not optimize it by a tiny bit if possible? Do you think it's good to go? With kind regards, Ivan Thanks, David With kind regards, Ivan Cheers, David On 17/05/2018 10:32 AM, Ivan Gerasimov wrote: Hello! In a few places we have code that rounds an integer up to the nearest power of two. It is done with a series of RSHOTFs and ORs, but it can possibly be done faster with the use of Integer.numberOfLeadingZeros (assuming it is intrinsified). Would you please help review this trivial optimization: BUGURL: https://bugs.openjdk.java.net/browse/JDK-8203279 WEBREV: http://cr.openjdk.java.net/~igerasim/8203279/00/webrev/ For HashMap.tableSizeFor() I created a simple test with a loop from Integer.MIN_VALUE to Integer.MAX_VALUE (including), to make sure the result is the same. For TimSort.ensureCapacity() I checked all positive values of minCapacity. -- With kind regards, Ivan Gerasimov
Re: RFR 8203279 : Faster calculation of power of two
Hello Ivan, I've wondered about this myself. Probably the performance is architecture dependent. Probably a new method should be added to Integer and Long with @HotspotIntrinsicCandidate. That gives David another chance to practice his blind aarch64 assembly language coding skills. It's hard to give a good name for this. Hacker's delight calls it dclp2 and we can probably do better than that. http://www.hackersdelight.org/hdcodetxt/clp2.c.txt But I approve either way. On Wed, May 16, 2018 at 5:32 PM, Ivan Gerasimov wrote: > Hello! > > In a few places we have code that rounds an integer up to the nearest > power of two. > > It is done with a series of RSHOTFs and ORs, but it can possibly be done > faster with the use of Integer.numberOfLeadingZeros (assuming it is > intrinsified). > > Would you please help review this trivial optimization: > > BUGURL: https://bugs.openjdk.java.net/browse/JDK-8203279 > WEBREV: http://cr.openjdk.java.net/~igerasim/8203279/00/webrev/ > > For HashMap.tableSizeFor() I created a simple test with a loop from > Integer.MIN_VALUE to Integer.MAX_VALUE (including), to make sure the result > is the same. > > For TimSort.ensureCapacity() I checked all positive values of minCapacity. > > -- > With kind regards, > Ivan Gerasimov > >
Re: RFR 8203279 : Faster calculation of power of two
On 17/05/2018 12:30 PM, Ivan Gerasimov wrote: Hi David! On 5/16/18 6:12 PM, David Holmes wrote: Hi Ivan, Surely you need to back this up with some performance numbers! And verify not assume that numberOfLeadingZeroes is intrinsified! Yes, good point. Please find the benchmark with the results here: http://cr.openjdk.java.net/~igerasim/8203279/00/MyBenchmark.java It shows that the new version of HashMap. tableSizeFor() was (approximately) two times faster. Impressive. :) Thanks, David With kind regards, Ivan Cheers, David On 17/05/2018 10:32 AM, Ivan Gerasimov wrote: Hello! In a few places we have code that rounds an integer up to the nearest power of two. It is done with a series of RSHOTFs and ORs, but it can possibly be done faster with the use of Integer.numberOfLeadingZeros (assuming it is intrinsified). Would you please help review this trivial optimization: BUGURL: https://bugs.openjdk.java.net/browse/JDK-8203279 WEBREV: http://cr.openjdk.java.net/~igerasim/8203279/00/webrev/ For HashMap.tableSizeFor() I created a simple test with a loop from Integer.MIN_VALUE to Integer.MAX_VALUE (including), to make sure the result is the same. For TimSort.ensureCapacity() I checked all positive values of minCapacity.
Re: RFR 8203279 : Faster calculation of power of two
Hi David! On 5/16/18 6:12 PM, David Holmes wrote: Hi Ivan, Surely you need to back this up with some performance numbers! And verify not assume that numberOfLeadingZeroes is intrinsified! Yes, good point. Please find the benchmark with the results here: http://cr.openjdk.java.net/~igerasim/8203279/00/MyBenchmark.java It shows that the new version of HashMap. tableSizeFor() was (approximately) two times faster. With kind regards, Ivan Cheers, David On 17/05/2018 10:32 AM, Ivan Gerasimov wrote: Hello! In a few places we have code that rounds an integer up to the nearest power of two. It is done with a series of RSHOTFs and ORs, but it can possibly be done faster with the use of Integer.numberOfLeadingZeros (assuming it is intrinsified). Would you please help review this trivial optimization: BUGURL: https://bugs.openjdk.java.net/browse/JDK-8203279 WEBREV: http://cr.openjdk.java.net/~igerasim/8203279/00/webrev/ For HashMap.tableSizeFor() I created a simple test with a loop from Integer.MIN_VALUE to Integer.MAX_VALUE (including), to make sure the result is the same. For TimSort.ensureCapacity() I checked all positive values of minCapacity. -- With kind regards, Ivan Gerasimov
Re: RFR 8203279 : Faster calculation of power of two
Hi Ivan, Surely you need to back this up with some performance numbers! And verify not assume that numberOfLeadingZeroes is intrinsified! Cheers, David On 17/05/2018 10:32 AM, Ivan Gerasimov wrote: Hello! In a few places we have code that rounds an integer up to the nearest power of two. It is done with a series of RSHOTFs and ORs, but it can possibly be done faster with the use of Integer.numberOfLeadingZeros (assuming it is intrinsified). Would you please help review this trivial optimization: BUGURL: https://bugs.openjdk.java.net/browse/JDK-8203279 WEBREV: http://cr.openjdk.java.net/~igerasim/8203279/00/webrev/ For HashMap.tableSizeFor() I created a simple test with a loop from Integer.MIN_VALUE to Integer.MAX_VALUE (including), to make sure the result is the same. For TimSort.ensureCapacity() I checked all positive values of minCapacity.
RFR 8203279 : Faster calculation of power of two
Hello! In a few places we have code that rounds an integer up to the nearest power of two. It is done with a series of RSHOTFs and ORs, but it can possibly be done faster with the use of Integer.numberOfLeadingZeros (assuming it is intrinsified). Would you please help review this trivial optimization: BUGURL: https://bugs.openjdk.java.net/browse/JDK-8203279 WEBREV: http://cr.openjdk.java.net/~igerasim/8203279/00/webrev/ For HashMap.tableSizeFor() I created a simple test with a loop from Integer.MIN_VALUE to Integer.MAX_VALUE (including), to make sure the result is the same. For TimSort.ensureCapacity() I checked all positive values of minCapacity. -- With kind regards, Ivan Gerasimov