> On May 17, 2018, at 4:07 AM, Claes Redestad <claes.redes...@oracle.com> wrote:
> 
> Shouldn't this be called "Faster rounding up to nearest power of two"?
> 
> Patch looks OK to me, but I'd like to see numbers with the 
> numberOfLeadingZeros intrinsic
> disabled so that we ensure we're not incurring an unreasonable penalty on 
> platforms who don't
> have an intrinsic for this.
> 

I did a quick check and i believe it is intrinsic for C2 on all hotspot cpu 
platform code, zero does not count! (did not check Graal but i suspect it 
supports it too). 

Extra points for comparing the two approaches with C1 :-) I don’t think that is 
necessary but i am curious.

Paul.
 
> Running your benchmark with the intrinsic disabled[1] on my machine I see a 
> 25-30% penalty
> with testNew relative to testOld, which is perhaps a bit toomuch for comfort..
> 
> So I took a look at profiles for numberOfLeadingZeros with the intrinsic 
> disabled and realized
> it might be possible to improve:
> 
> diff -r de35abdfe5da src/java.base/share/classes/java/lang/Integer.java
> --- a/src/java.base/share/classes/java/lang/Integer.java        Mon May 14 
> 16:21:08 2018 +0200
> +++ b/src/java.base/share/classes/java/lang/Integer.java        Thu May 17 
> 12:44:53 2018 +0200
> @@ -1621,12 +1621,12 @@
>          // HD, Figure 5-6
>          if (i <= 0)
>              return i == 0 ? 32 : 0;
> -        int n = 1;
> -        if (i >>> 16 == 0) { n += 16; i <<= 16; }
> -        if (i >>> 24 == 0) { n +=  8; i <<=  8; }
> -        if (i >>> 28 == 0) { n +=  4; i <<=  4; }
> -        if (i >>> 30 == 0) { n +=  2; i <<=  2; }
> -        n -= i >>> 31;
> +        int n = 0;
> +        if (          i < 1 << 16) { n += 16; i <<= 16; }
> +        if (i >= 0 && i < 1 << 24) { n +=  8; i <<=  8; }
> +        if (i >= 0 && i < 1 << 28) { n +=  4; i <<=  4; }
> +        if (i >= 0 && i < 1 << 30) { n +=  2; i <<=  2; }
> +        if (i >= 0) n++;
>          return n;
>      }
> 
> Adding a case that uses this version to your benchmark[2] and the new version 
> is only about
> 10% slower than the baseline, with the added benefit that other uses of 
> numberOfLeadingZeros
> might see a speed-upif there's no intrinsic (runs with intrinsic disabled[1]):
> 
> Benchmark          (arg)  Mode  Cnt   Score   Error Units
> TableFor.testNew       1  avgt    6  28.343 ± 0.534  ns/op
> TableFor.testNew      42  avgt    6  26.458 ± 0.064  ns/op
> TableFor.testNew2      1  avgt    6  23.969 ± 0.201  ns/op
> TableFor.testNew2     42  avgt    6  23.934 ± 0.107  ns/op
> TableFor.testOld       1  avgt    6  21.615 ± 0.803  ns/op
> TableFor.testOld      42  avgt    6  21.418 ± 0.106  ns/op
> 
> So I think with the above patch to Integer.numberOfLeadingZeros we can get 
> the benefit for
> our supported platforms while staying roughly performance neutral on 
> platforms without
> an intrinsic. Not strictly necessary to fold it into this RFE, of course.
> 
> Thanks!
> 
> /Claes
> 
> [1] -XX:+UnlockDiagnosticVMOptions 
> -XX:DisableIntrinsic=_numberOfLeadingZeros_i
> [2] http://cr.openjdk.java.net/~redestad/scratch/TableFor.java
> 
> On 2018-05-17 05:48, David Holmes wrote:
>>> Do you think it's good to go?
>> 
>> I think I'd rather defer to a more performance oriented reviewer - paging 
>> Claes! ;-)
>> 
>> David
>> ----- 
> 

Reply via email to