https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L976-L981

For String.encodeASCII, with the LATIN1 coder is there any interest in
exploring the performance impacts of utilizing a
byteArrayViewVarHandle to read/write as longs and utilize a bitmask to
identify if negative values are present?

A simple jmh benchmark covering either 0 or 1 non-ascii (negative)
values shows times cut in half (or more) for most scenarios with
strings ranging in length from 3 - ~2000.
VM version: JDK 17.0.6, OpenJDK 64-Bit Server VM, 17.0.6+10
Windows 10 Intel(R) Core(TM) i7-9850H

Hand unrolling the loops shows noted improvement, but does make for
less aesthetically pleasing code.


Benchmark                                  (nonascii)  (size)  Mode
Cnt    Score     Error  Units
AsciiEncodeBenchmark.jdk                         none       3  avgt
4   15.531 ±   1.122  ns/op
AsciiEncodeBenchmark.jdk                         none      10  avgt
4   17.350 ±   0.473  ns/op
AsciiEncodeBenchmark.jdk                         none      16  avgt
4   18.277 ±   0.421  ns/op
AsciiEncodeBenchmark.jdk                         none      23  avgt
4   20.139 ±   0.350  ns/op
AsciiEncodeBenchmark.jdk                         none      33  avgt
4   22.008 ±   0.656  ns/op
AsciiEncodeBenchmark.jdk                         none      42  avgt
4   24.393 ±   1.155  ns/op
AsciiEncodeBenchmark.jdk                         none     201  avgt
4   55.884 ±   0.645  ns/op
AsciiEncodeBenchmark.jdk                         none     511  avgt
4  120.817 ±   2.917  ns/op
AsciiEncodeBenchmark.jdk                         none    2087  avgt
4  471.039 ±  13.329  ns/op
AsciiEncodeBenchmark.jdk                        first       3  avgt
4   15.794 ±   1.494  ns/op
AsciiEncodeBenchmark.jdk                        first      10  avgt
4   18.446 ±   0.780  ns/op
AsciiEncodeBenchmark.jdk                        first      16  avgt
4   20.458 ±   0.394  ns/op
AsciiEncodeBenchmark.jdk                        first      23  avgt
4   22.934 ±   0.422  ns/op
AsciiEncodeBenchmark.jdk                        first      33  avgt
4   25.367 ±   0.178  ns/op
AsciiEncodeBenchmark.jdk                        first      42  avgt
4   28.620 ±   0.678  ns/op
AsciiEncodeBenchmark.jdk                        first     201  avgt
4   80.250 ±   4.376  ns/op
AsciiEncodeBenchmark.jdk                        first     511  avgt
4  185.518 ±   6.370  ns/op
AsciiEncodeBenchmark.jdk                        first    2087  avgt
4  713.213 ±  13.488  ns/op
AsciiEncodeBenchmark.jdk                         last       3  avgt
4   14.991 ±   0.190  ns/op
AsciiEncodeBenchmark.jdk                         last      10  avgt
4   18.284 ±   0.317  ns/op
AsciiEncodeBenchmark.jdk                         last      16  avgt
4   20.591 ±   1.002  ns/op
AsciiEncodeBenchmark.jdk                         last      23  avgt
4   22.560 ±   0.963  ns/op
AsciiEncodeBenchmark.jdk                         last      33  avgt
4   25.521 ±   0.554  ns/op
AsciiEncodeBenchmark.jdk                         last      42  avgt
4   28.484 ±   0.446  ns/op
AsciiEncodeBenchmark.jdk                         last     201  avgt
4   79.434 ±   2.256  ns/op
AsciiEncodeBenchmark.jdk                         last     511  avgt
4  186.639 ±   4.258  ns/op
AsciiEncodeBenchmark.jdk                         last    2087  avgt
4  725.196 ± 149.416  ns/op
AsciiEncodeBenchmark.longCheckCopy               none       3  avgt
4    7.222 ±   0.428  ns/op
AsciiEncodeBenchmark.longCheckCopy               none      10  avgt
4    8.070 ±   0.171  ns/op
AsciiEncodeBenchmark.longCheckCopy               none      16  avgt
4    6.711 ±   0.409  ns/op
AsciiEncodeBenchmark.longCheckCopy               none      23  avgt
4   12.906 ±   3.633  ns/op
AsciiEncodeBenchmark.longCheckCopy               none      33  avgt
4   10.414 ±   0.447  ns/op
AsciiEncodeBenchmark.longCheckCopy               none      42  avgt
4   11.935 ±   1.235  ns/op
AsciiEncodeBenchmark.longCheckCopy               none     201  avgt
4   29.538 ±   3.265  ns/op
AsciiEncodeBenchmark.longCheckCopy               none     511  avgt
4  106.228 ±  68.475  ns/op
AsciiEncodeBenchmark.longCheckCopy               none    2087  avgt
4  494.845 ± 890.985  ns/op
AsciiEncodeBenchmark.longCheckCopy              first       3  avgt
4    7.775 ±   0.278  ns/op
AsciiEncodeBenchmark.longCheckCopy              first      10  avgt
4   13.396 ±   2.072  ns/op
AsciiEncodeBenchmark.longCheckCopy              first      16  avgt
4   13.528 ±   0.702  ns/op
AsciiEncodeBenchmark.longCheckCopy              first      23  avgt
4   17.376 ±   0.360  ns/op
AsciiEncodeBenchmark.longCheckCopy              first      33  avgt
4   16.251 ±   0.203  ns/op
AsciiEncodeBenchmark.longCheckCopy              first      42  avgt
4   17.932 ±   1.773  ns/op
AsciiEncodeBenchmark.longCheckCopy              first     201  avgt
4   39.028 ±   4.699  ns/op
AsciiEncodeBenchmark.longCheckCopy              first     511  avgt
4   92.599 ±  11.078  ns/op
AsciiEncodeBenchmark.longCheckCopy              first    2087  avgt
4  347.728 ±   7.837  ns/op
AsciiEncodeBenchmark.longCheckCopy               last       3  avgt
4    7.472 ±   0.078  ns/op
AsciiEncodeBenchmark.longCheckCopy               last      10  avgt
4    8.371 ±   0.815  ns/op
AsciiEncodeBenchmark.longCheckCopy               last      16  avgt
4    6.766 ±   0.253  ns/op
AsciiEncodeBenchmark.longCheckCopy               last      23  avgt
4   12.879 ±   0.454  ns/op
AsciiEncodeBenchmark.longCheckCopy               last      33  avgt
4   10.491 ±   0.811  ns/op
AsciiEncodeBenchmark.longCheckCopy               last      42  avgt
4   12.435 ±   1.212  ns/op
AsciiEncodeBenchmark.longCheckCopy               last     201  avgt
4   28.507 ±   1.058  ns/op
AsciiEncodeBenchmark.longCheckCopy               last     511  avgt
4   85.763 ±   1.941  ns/op
AsciiEncodeBenchmark.longCheckCopy               last    2087  avgt
4  411.555 ±   3.595  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        none       3  avgt
4    5.858 ±   0.637  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        none      10  avgt
4    7.031 ±   0.274  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        none      16  avgt
4    6.768 ±   0.222  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        none      23  avgt
4   10.084 ±   0.102  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        none      33  avgt
4    9.876 ±   0.240  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        none      42  avgt
4   11.061 ±   0.590  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        none     201  avgt
4   29.264 ±   1.690  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        none     511  avgt
4   61.920 ±   5.482  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        none    2087  avgt
4  309.183 ±  42.354  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll       first       3  avgt
4    5.687 ±   0.249  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll       first      10  avgt
4    9.537 ±   0.337  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll       first      16  avgt
4    9.928 ±   0.329  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll       first      23  avgt
4   12.510 ±   0.519  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll       first      33  avgt
4   13.028 ±   0.335  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll       first      42  avgt
4   13.640 ±   0.219  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll       first     201  avgt
4   31.046 ±   0.647  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll       first     511  avgt
4   82.998 ±   5.611  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll       first    2087  avgt
4  360.294 ±   8.419  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        last       3  avgt
4    5.657 ±   0.197  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        last      10  avgt
4    6.997 ±   0.081  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        last      16  avgt
4    6.890 ±   1.319  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        last      23  avgt
4   10.154 ±   0.389  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        last      33  avgt
4    9.986 ±   0.592  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        last      42  avgt
4   11.481 ±   0.375  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        last     201  avgt
4   29.286 ±   0.723  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        last     511  avgt
4   61.056 ±   0.977  ns/op
AsciiEncodeBenchmark.longCheckCopy_unroll        last    2087  avgt
4  303.415 ±  17.326  ns/op



    @Benchmark
    public byte[] jdk() {
        final byte[] val = this.data;
        byte[] dst = Arrays.copyOf(val, val.length);
        for (int i = 0; i < dst.length; i++) {
            if (dst[i] < 0) {
                dst[i] = '?';
            }
        }
        return dst;
    }

    @Benchmark
    public byte[] longCheckCopy() {
        final byte[] val = this.data;
        byte[] dst = new byte[val.length];
        int i = 0;
        long word;
        for (int j=dst.length - 7; i < j; i+=8) {
            word = (long)LONG_BYTES.get(val, i);
            LONG_BYTES.set(dst, i, word);
            if ((word & LONG_NEG_MASK) != 0) {
                for (int x=i, y=i+8; x<y; x++) {
                    if (dst[x] < 0) {
                        dst[x] = '?';
                    }
                }
            }
        }
        byte b;
        for (; i < dst.length; i++) {
            b = val[i];
            dst[i] = b < 0 ? (byte) '?' : b;
        }
        return dst;
    }

    @Benchmark
    public byte[] longCheckCopy_unroll() {
        final byte[] val = this.data;
        byte[] dst = new byte[val.length];
        int i = 0;
        long word;
        for (int j=dst.length - 7; i < j; i+=8) {
            word = (long)LONG_BYTES.get(val, i);
            LONG_BYTES.set(dst, i, word);
            if ((word & LONG_NEG_MASK) != 0) {
                if (dst[i] < 0) {
                    dst[i] = '?';
                }
                if (dst[i + 1] < 0) {
                    dst[i + 1] = '?';
                }
                if (dst[i + 2] < 0) {
                    dst[i + 2] = '?';
                }
                if (dst[i + 3] < 0) {
                    dst[i + 3] = '?';
                }
                if (dst[i + 4] < 0) {
                    dst[i + 4] = '?';
                }
                if (dst[i + 5] < 0) {
                    dst[i + 5] = '?';
                }
                if (dst[i + 6] < 0) {
                    dst[i + 6] = '?';
                }
                if (dst[i + 7] < 0) {
                    dst[i + 7] = '?';
                }
            }
        }
        byte b;
        switch (dst.length & 0x7) {
            case 7:
                b = val[i + 6];
                dst[i + 6] = b < 0 ? (byte) '?' : b;
            case 6:
                b = val[i + 5];
                dst[i + 5] = b < 0 ? (byte) '?' : b;
            case 5:
                b = val[i + 4];
                dst[i + 4] = b < 0 ? (byte) '?' : b;
            case 4:
                b = val[i + 3];
                dst[i + 3] = b < 0 ? (byte) '?' : b;
            case 3:
                b = val[i + 2];
                dst[i + 2] = b < 0 ? (byte) '?' : b;
            case 2:
                b = val[i + 1];
                dst[i + 1] = b < 0 ? (byte) '?' : b;
            case 1:
                b = val[i];
                dst[i] = b < 0 ? (byte) '?' : b;
        }
        return dst;
    }


Thanks,

Brett

Reply via email to