Hello Wenshao and the core libraries mailing list,
First, I want to talk about the roles of Unsafe and BALE.
Unsafe itself is a collection of JVM-specific APIs that must be guarded
from dependent Java code. The set/getXxx methods are one set of such APIs
that directly utilizes unaligned reads and writes on supported platforms.
This set of APIs are already exposed to regular Java users via 2 public
APIs: ByteBuffer and VarHandle (MethodHandles.byteArrayViewVarHandle, used
by BALE), both of which are invoking the Unsafe API, and their overhead can
be eliminated by JIT.

Currently, we have a Vector API in incubation, which ensures vectorization
of some operations; our usage of BALE is similar, that we wish to
accomplish SLP reliably.

I took some time to look through where you use BALE to speed up writing: I
believe that performing the optimization at JIT level would be better if
possible, for the JIT knows the best way to group bytes together to write
at a given offset on an arbitrary platform (such as a big-endian one).
Similar to the Vector API, I think we might add new internal APIs like:
public static void write(byte[] arr, int offset, int b0, int b1, ....)
where we declare explicitly that we write multiple bytes at once so we know
JIT will reliably optimize our writes (if JIT have trouble optimizing SLP
like auto-vectorization)

Another reason JIT is better than reusing BALE/ByteBuffer is that their
resulting values are "meaningful"; i.e. the results are used directly and
the read/writes are 2-way. In our case, however, we are only interested in
faster writing, and there are multiple ways to group the writes, so I don't
think Java-based APIs will be useful.

For JVM startup, I recommend you to run a simple Hello World with
-Xlog:class+init flags to see what classes are initialized before
java/lang/invoke/MethodHandleImpl. In general, you shouldn't initialize
java.lang.invoke classes (lambdas, VarHandle, MethodHandle), such as by
keeping them in fields, in wrappers, String, collection, and reflection.
They can use lambdas in their methods, but those lambdas cannot be called
before java.lang.invoke is ready.

Best,
Chen Liang

On Sun, Oct 8, 2023 at 5:14 PM 温绍锦(高铁) <shaojin.we...@alibaba-inc.com>
wrote:

> Should we allow use Unsafe or ByteArrayLittleEndian for trivial byte[]
> writes in core-libs?
>
> There is already code that uses ByteArrayLittleEndian to improve
> performance, such as:
> ```java
> package java.util;
>
> class UUID {
>     public String toString() {
>         // ...
>         ByteArrayLittleEndian.setInt(
>                 buf,
>                 9,
>                 HexDigits.packDigits(((int) msb) >> 24, ((int) msb) >>
> 16));
>         // ...
>     }
> }
> ```
>
> There are examples of using ByteArrayLittleEndian and then removing it
> because it caused the JVM to start slowly (we can use
> Unsafe.putShortUnaligned to solve the problem of slow JVM startup)
> ```java
> package java.lang;
> class StringLatin1 {
>     private static void writeDigitPair(byte[] buf, int charPos, int value)
> {
>         short pair = DecimalDigits.digitPair(value);
>         // UNSAFE.putShortUnaligned(buf, ARRAY_BYTE_BASE_OFFSET + charPos,
> pair);
>         buf[charPos] = (byte)(pair);
>         buf[charPos + 1] = (byte)(pair >> 8);
>     }
> }
> ```
>
> Here is an example in the PR Review is disagreeing with the use of
> ByteArrayLittleEndian
> https://github.com/openjdk/jdk/pull/15768
> ```java
> package java.util;
> class HexFormat {
>     String formatOptDelimiter(byte[] bytes, int fromIndex, int toIndex) {
>     // ...
>         short pair = HexDigits.digitPair(bytes[fromIndex + i], ucase);
>         int pos = i * 2;
>         rep[pos] = (byte)pair;
>         rep[pos + 1] = (byte)(pair >>> 8);
>         // ByteArrayLittleEndian.setShort(rep, pos, pair);
>     }
> }
> ```
>
> This is another example of PR Review disagreeing with the use of
> ByteArrayLittleEndian.
> https://github.com/openjdk/jdk/pull/15990
> ```java
> package java.lang;
> class AbstractStringBuilder {
>     static final class Constants {
>         static final int NULL_LATIN1;
>         static final int NULL_UTF16;
>         static {
>             byte[] bytes4 = new byte[] {'t', 'r', 'u', 'e'};
>             byte[] bytes8 = new byte[8];
>             NULL_LATIN1 = ByteArrayLittleEndian.getInt(bytes4, 0);
>             StringLatin1.inflate(bytes4, 0, bytes8, 0, 4);
>             NULL_UTF16 = ByteArrayLittleEndian.getLong(bytes8, 0);
>         }
>     }
>
>     private AbstractStringBuilder appendNull() {
>         ensureCapacityInternal(count + 4);
>         int count = this.count;
>         byte[] val = this.value;
>         if (isLatin1()) {
>             ByteArrayLittleEndian.setInt(val, count,
> Constants.NULL_LATIN1);
>         } else {
>             ByteArrayLittleEndian.setLong(val, count << 1,
> Constants.NULL_UTF16);
>         }
>         this.count = count + 4;
>         return this;
>     }
> }
> ```
>
> In these examples, using Unsafe/ByteArrayLittleEndian significantly
> improves performance. If JIT automatic optimization is the best solution,
> but SuperWord Level Parallelism (SLP) does not currently support this
> optimization, what are our recommendations? What scenarios cannot use
> Unsafe, and what scenarios cannot use ByteArrayLittleEndian?
>

Reply via email to