Unsafe might be similarly tricky dependency-wise, but perhaps less so. Either 
solution might work fine if we extract the code for encoding to a utility class 
that isn’t initialized eagerly with String.class itself.

I’ll file an RFE and get the ”trivial” fix to use StringCoding.countPositives 
in to establish a better baseline. I’m sure there are ways to beat this, for 
example with a fused loop. An unrolled and/or VH-based solution like you’ve 
proposed could at the very least be used to speedup the case of Strings for 
which we need to replace with ’?’.

/Claes

19 feb. 2023 kl. 02:54 skrev Brett Okken 
<brett.okken...@gmail.com<mailto:brett.okken...@gmail.com>>:

Thanks for taking a look at this.
That is a pretty good improvement with a much smaller change.
I recognize the intricacies of bootstrapping, but have no expertise. Would 
using Unsafe directly be easier? Particularly for shorter strings, doing the 
copying and checking together rather than as separate loops seems to speed 
things up considerably, even for happy-path of no failures.

Brett

On Sat, Feb 18, 2023 at 5:49 PM Claes Redestad 
<claes.redes...@oracle.com<mailto:claes.redes...@oracle.com>> wrote:
Hi,

general comment: String might be one of the trickier places to add a VarHandle 
dependency, since String is used very early in the bootstrap and depended upon 
by everything else, so I’d expect such a solution would have to navigate 
potential circularity issues carefully. It’d be good to experiment with changes 
to java.lang.String proper to see that the solution that works nice externally 
is or can be made feasible within String.

Specifically on the performance opportunity then while US-ASCII encoding is 
probably on the way out we shouldn’t neglect it.

One way to go about this without pulling VarHandles into String might be to use 
what other encode methods in String does and leverage 
StringCoding.countPositives:

https://github.com/openjdk/jdk/pull/12640<https://urldefense.com/v3/__https://github.com/openjdk/jdk/pull/12640__;!!ACWV5N9M2RV99hQ!O-1OoPjv-Jw1iGFv2-_q5zTiayoRyLJBcSiQ9J5IEun3kcusUeGRjZzOb_dploCjPh_Kjp6FJXr2cIk7Se5Zp4JhLWU$>

Testing this on the existing StringEncode microbenchmark, shows a promising 
speed-up when the input is ASCII-encodable:

Baseline
Benchmark                      (charsetName)  Mode  Cnt      Score     Error  
Units
StringEncode.encodeAsciiLong        US-ASCII  avgt    5  26626,025 ± 448,307  
ns/op
StringEncode.encodeAsciiShort       US-ASCII  avgt    5     33,336 ±   2,032  
ns/op

Patch:
Benchmark                      (charsetName)  Mode  Cnt     Score    Error  
Units
StringEncode.encodeAsciiLong        US-ASCII  avgt    5  5492,985 ± 40,066  
ns/op
StringEncode.encodeAsciiShort       US-ASCII  avgt    5    28,545 ±  4,883  
ns/op

(You might see that this will go back into a scalar loop on encoding failures. 
That loop could still benefit from unrolling or byteArrayViewVarHandle, but I 
think you have a bigger problem in an app than raw performance if you have a 
lot of encoding failures...)

WDYT?

/Claes

18 feb. 2023 kl. 19:36 skrev Brett Okken 
<brett.okken...@gmail.com<mailto:brett.okken...@gmail.com>>:

https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L976-L981<https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java*L976-L981__;Iw!!ACWV5N9M2RV99hQ!O-1OoPjv-Jw1iGFv2-_q5zTiayoRyLJBcSiQ9J5IEun3kcusUeGRjZzOb_dploCjPh_Kjp6FJXr2cIk7Se5Z_Ook_Ws$>

For String.encodeASCII, with the LATIN1 coder is there any interest in
exploring the performance impacts of utilizing a
byteArrayViewVarHandle to read/write as longs and utilize a bitmask to
identify if negative values are present?

A simple jmh benchmark covering either 0 or 1 non-ascii (negative)
values shows times cut in half (or more) for most scenarios with
strings ranging in length from 3 - ~2000.
VM version: JDK 17.0.6, OpenJDK 64-Bit Server VM, 17.0.6+10
Windows 10 Intel(R) Core(TM) i7-9850H

Hand unrolling the loops shows noted improvement, but does make for
less aesthetically pleasing code.


Benchmark                                  (nonascii)  (size)  Mode
Cnt    Score     Error  Units
AsciiEncodeBenchmark.jdk                         none       3  avgt
4   15.531 ±   1.122  ns/op
AsciiEncodeBenchmark.jdk                         none      10  avgt
4   17.350 ±   0.473  ns/op
AsciiEncodeBenchmark.jdk                         none      16  avgt
4   18.277 ±   0.421  ns/op
AsciiEncodeBenchmark.jdk                         none      23  avgt
4   20.139 ±   0.350  ns/op
AsciiEncodeBenchmark.jdk                         none      33  avgt
4   22.008 ±   0.656  ns/op
AsciiEncodeBenchmark.jdk                         none      42  avgt
4   24.393 ±   1.155  ns/op
AsciiEncodeBenchmark.jdk                         none     201  avgt
4   55.884 ±   0.645  ns/op
AsciiEncodeBenchmark.jdk                         none     511  avgt
4  120.817 ±   2.917  ns/op
AsciiEncodeBenchmark.jdk                         none    2087  avgt
4  471.039 ±  13.329  ns/op
AsciiEncodeBenchmark.jdk                        first       3  avgt
4   15.794 ±   1.494  ns/op
AsciiEncodeBenchmark.jdk                        first      10  avgt
4   18.446 ±   0.780  ns/op
AsciiEncodeBenchmark.jdk                        first      16  avgt
4   20.458 ±   0.394  ns/op
AsciiEncodeBenchmark.jdk                        first      23  avgt
4   22.934 ±   0.422  ns/op
AsciiEncodeBenchmark.jdk                        first      33  avgt
4   25.367 ±   0.178  ns/op
AsciiEncodeBenchmark.jdk                        first      42  avgt
4   28.620 ±   0.678  ns/op
AsciiEncodeBenchmark.jdk                        first     201  avgt
4   80.250 ±   4.376  ns/op
AsciiEncodeBenchmark.jdk                        first     511  avgt
4  185.518 ±   6.370  ns/op
AsciiEncodeBenchmark.jdk                        first    2087  avgt
4  713.213 ±  13.488  ns/op
AsciiEncodeBenchmark.jdk                         last       3  avgt
4   14.991 ±   0.190  ns/op
AsciiEncodeBenchmark.jdk                         last      10  avgt
4   18.284 ±   0.317  ns/op
AsciiEncodeBenchmark.jdk                         last      16  avgt
4   20.591 ±   1.002  ns/op
AsciiEncodeBenchmark.jdk                         last      23  avgt
4   22.560 ±   0.963  ns/op
AsciiEncodeBenchmark.jdk                         last      33  avgt
4   25.521 ±   0.554  ns/op
AsciiEncodeBenchmark.jdk                         last      42  avgt
4   28.484 ±   0.446  ns/op
AsciiEncodeBenchmark.jdk                         last     201  avgt
4   79.434 ±   2.256  ns/op
AsciiEncodeBenchmark.jdk                         last     511  avgt
4  186.639 ±   4.258  ns/op
AsciiEncodeBenchmark.jdk                         last    2087  avgt
4  725.196 ± 149.416  ns/op
AsciiEncodeBenchmark.longCheckCopy               none       3  avgt
4    7.222 ±   0.428  ns/op
AsciiEncodeBenchmark.longCheckCopy               none      10  avgt
4    8.070 ±   0.171  ns/op
AsciiEncodeBenchmark.longCheckCopy               none      16  avgt
4    6.711 ±   0.409  ns/op
AsciiEncodeBenchmark.longCheckCopy               none      23  avgt
4   12.906 ±   3.633  ns/op
AsciiEncodeBenchmark.longCheckCopy               none      33  avgt
4   10.414 ±   0.447  ns/op
AsciiEncodeBenchmark.longCheckCopy               none      42  avgt
4   11.935 ±   1.235  ns/op
AsciiEncodeBenchmark.longCheckCopy               none     201  avgt
4   29.538 ±   3.265  ns/op
AsciiEncodeBenchmark.longCheckCopy               none     511  avgt
4  106.228 ±  68.475  ns/op
AsciiEncodeBenchmark.longCheckCopy               none    2087  avgt
4  494.845 ± 890.985  ns/op
AsciiEncodeBenchmark.longCheckCopy              first       3  avgt
4    7.775 ±   0.278  ns/op
AsciiEncodeBenchmark.longCheckCopy              first      10  avgt
4   13.396 ±   2.072  ns/op
AsciiEncodeBenchmark.longCheckCopy              first      16  avgt
4   13.528 ±   0.702  ns/op
AsciiEncodeBenchmark.longCheckCopy              first      23  avgt
4   17.376 ±   0.360  ns/op
AsciiEncodeBenchmark.longCheckCopy              first      33  avgt
4   16.251 ±   0.203  ns/op
AsciiEncodeBenchmark.longCheckCopy              first      42  avgt
4   17.932 ±   1.773  ns/op
AsciiEncodeBenchmark.longCheckCopy              first     201  avgt
4   39.028 ±   4.699  ns/op
AsciiEncodeBenchmark.longCheckCopy              first     511  avgt
4   92.599 ±  11.078  ns/op
AsciiEncodeBenchmark.longCheckCopy              first    2087  avgt
4  347.728 ±   7.837  ns/op
AsciiEncodeBenchmark.longCheckCopy               last       3  avgt
4    7.472 ±   0.078  ns/op
AsciiEncodeBenchmark.longCheckCopy               last      10  avgt
4    8.371 ±   0.815  ns/op
AsciiEncodeBenchmark.longCheckCopy               last      16  avgt
4    6.766 ±   0.253  ns/op
AsciiEncodeBenchmark.longCheckCopy               last      23  avgt
4   12.879 ±   0.454  ns/op
AsciiEncodeBenchmark.longCheckCopy               last      33  avgt
4   10.491 ±   0.811  ns/op
AsciiEncodeBenchmark.longCheckCopy               last      42  avgt
4   12.435 ±   1.212  ns/op
AsciiEncodeBenchmark.longCheckCopy               last     201  avgt
4   28.507 ±   1.058  ns/op
AsciiEncodeBenchmark.longCheckCopy               last     511  avgt
4   85.763 ±   1.941  ns/op
AsciiEncodeBenchmark.longCheckCopy               last    2087  avgt
4  411.555 ±   3.595  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        none       3  avgt
4    5.858 ±   0.637  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        none      10  avgt
4    7.031 ±   0.274  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        none      16  avgt
4    6.768 ±   0.222  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        none      23  avgt
4   10.084 ±   0.102  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        none      33  avgt
4    9.876 ±   0.240  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        none      42  avgt
4   11.061 ±   0.590  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        none     201  avgt
4   29.264 ±   1.690  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        none     511  avgt
4   61.920 ±   5.482  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        none    2087  avgt
4  309.183 ±  42.354  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll       first       3  avgt
4    5.687 ±   0.249  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll       first      10  avgt
4    9.537 ±   0.337  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll       first      16  avgt
4    9.928 ±   0.329  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll       first      23  avgt
4   12.510 ±   0.519  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll       first      33  avgt
4   13.028 ±   0.335  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll       first      42  avgt
4   13.640 ±   0.219  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll       first     201  avgt
4   31.046 ±   0.647  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll       first     511  avgt
4   82.998 ±   5.611  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll       first    2087  avgt
4  360.294 ±   8.419  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        last       3  avgt
4    5.657 ±   0.197  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        last      10  avgt
4    6.997 ±   0.081  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        last      16  avgt
4    6.890 ±   1.319  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        last      23  avgt
4   10.154 ±   0.389  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        last      33  avgt
4    9.986 ±   0.592  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        last      42  avgt
4   11.481 ±   0.375  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        last     201  avgt
4   29.286 ±   0.723  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        last     511  avgt
4   61.056 ±   0.977  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        last    2087  avgt
4  303.415 ±  17.326  ns/op



   @Benchmark
   public byte[] jdk() {
       final byte[] val = this.data;
       byte[] dst = Arrays.copyOf(val, val.length);
       for (int i = 0; i < dst.length; i++) {
           if (dst[i] < 0) {
               dst[i] = '?';
           }
       }
       return dst;
   }

   @Benchmark
   public byte[] longCheckCopy() {
       final byte[] val = this.data;
       byte[] dst = new byte[val.length];
       int i = 0;
       long word;
       for (int j=dst.length - 7; i < j; i+=8) {
           word = (long)LONG_BYTES.get(val, i);
           LONG_BYTES.set(dst, i, word);
           if ((word & LONG_NEG_MASK) != 0) {
               for (int x=i, y=i+8; x<y; x++) {
                   if (dst[x] < 0) {
                       dst[x] = '?';
                   }
               }
           }
       }
       byte b;
       for (; i < dst.length; i++) {
           b = val[i];
           dst[i] = b < 0 ? (byte) '?' : b;
       }
       return dst;
   }

   @Benchmark
   public byte[] longCheckCopy_unroll() {
       final byte[] val = this.data;
       byte[] dst = new byte[val.length];
       int i = 0;
       long word;
       for (int j=dst.length - 7; i < j; i+=8) {
           word = (long)LONG_BYTES.get(val, i);
           LONG_BYTES.set(dst, i, word);
           if ((word & LONG_NEG_MASK) != 0) {
               if (dst[i] < 0) {
                   dst[i] = '?';
               }
               if (dst[i + 1] < 0) {
                   dst[i + 1] = '?';
               }
               if (dst[i + 2] < 0) {
                   dst[i + 2] = '?';
               }
               if (dst[i + 3] < 0) {
                   dst[i + 3] = '?';
               }
               if (dst[i + 4] < 0) {
                   dst[i + 4] = '?';
               }
               if (dst[i + 5] < 0) {
                   dst[i + 5] = '?';
               }
               if (dst[i + 6] < 0) {
                   dst[i + 6] = '?';
               }
               if (dst[i + 7] < 0) {
                   dst[i + 7] = '?';
               }
           }
       }
       byte b;
       switch (dst.length & 0x7) {
           case 7:
               b = val[i + 6];
               dst[i + 6] = b < 0 ? (byte) '?' : b;
           case 6:
               b = val[i + 5];
               dst[i + 5] = b < 0 ? (byte) '?' : b;
           case 5:
               b = val[i + 4];
               dst[i + 4] = b < 0 ? (byte) '?' : b;
           case 4:
               b = val[i + 3];
               dst[i + 3] = b < 0 ? (byte) '?' : b;
           case 3:
               b = val[i + 2];
               dst[i + 2] = b < 0 ? (byte) '?' : b;
           case 2:
               b = val[i + 1];
               dst[i + 1] = b < 0 ? (byte) '?' : b;
           case 1:
               b = val[i];
               dst[i] = b < 0 ? (byte) '?' : b;
       }
       return dst;
   }


Thanks,

Brett


Reply via email to