Hello Wenshao and the core libraries mailing list, First, I want to talk about the roles of Unsafe and BALE. Unsafe itself is a collection of JVM-specific APIs that must be guarded from dependent Java code. The set/getXxx methods are one set of such APIs that directly utilizes unaligned reads and writes on supported platforms. This set of APIs are already exposed to regular Java users via 2 public APIs: ByteBuffer and VarHandle (MethodHandles.byteArrayViewVarHandle, used by BALE), both of which are invoking the Unsafe API, and their overhead can be eliminated by JIT.
Currently, we have a Vector API in incubation, which ensures vectorization of some operations; our usage of BALE is similar, that we wish to accomplish SLP reliably. I took some time to look through where you use BALE to speed up writing: I believe that performing the optimization at JIT level would be better if possible, for the JIT knows the best way to group bytes together to write at a given offset on an arbitrary platform (such as a big-endian one). Similar to the Vector API, I think we might add new internal APIs like: public static void write(byte[] arr, int offset, int b0, int b1, ....) where we declare explicitly that we write multiple bytes at once so we know JIT will reliably optimize our writes (if JIT have trouble optimizing SLP like auto-vectorization) Another reason JIT is better than reusing BALE/ByteBuffer is that their resulting values are "meaningful"; i.e. the results are used directly and the read/writes are 2-way. In our case, however, we are only interested in faster writing, and there are multiple ways to group the writes, so I don't think Java-based APIs will be useful. For JVM startup, I recommend you to run a simple Hello World with -Xlog:class+init flags to see what classes are initialized before java/lang/invoke/MethodHandleImpl. In general, you shouldn't initialize java.lang.invoke classes (lambdas, VarHandle, MethodHandle), such as by keeping them in fields, in wrappers, String, collection, and reflection. They can use lambdas in their methods, but those lambdas cannot be called before java.lang.invoke is ready. Best, Chen Liang On Sun, Oct 8, 2023 at 5:14 PM 温绍锦(高铁) <shaojin.we...@alibaba-inc.com> wrote: > Should we allow use Unsafe or ByteArrayLittleEndian for trivial byte[] > writes in core-libs? > > There is already code that uses ByteArrayLittleEndian to improve > performance, such as: > ```java > package java.util; > > class UUID { > public String toString() { > // ... > ByteArrayLittleEndian.setInt( > buf, > 9, > HexDigits.packDigits(((int) msb) >> 24, ((int) msb) >> > 16)); > // ... > } > } > ``` > > There are examples of using ByteArrayLittleEndian and then removing it > because it caused the JVM to start slowly (we can use > Unsafe.putShortUnaligned to solve the problem of slow JVM startup) > ```java > package java.lang; > class StringLatin1 { > private static void writeDigitPair(byte[] buf, int charPos, int value) > { > short pair = DecimalDigits.digitPair(value); > // UNSAFE.putShortUnaligned(buf, ARRAY_BYTE_BASE_OFFSET + charPos, > pair); > buf[charPos] = (byte)(pair); > buf[charPos + 1] = (byte)(pair >> 8); > } > } > ``` > > Here is an example in the PR Review is disagreeing with the use of > ByteArrayLittleEndian > https://github.com/openjdk/jdk/pull/15768 > ```java > package java.util; > class HexFormat { > String formatOptDelimiter(byte[] bytes, int fromIndex, int toIndex) { > // ... > short pair = HexDigits.digitPair(bytes[fromIndex + i], ucase); > int pos = i * 2; > rep[pos] = (byte)pair; > rep[pos + 1] = (byte)(pair >>> 8); > // ByteArrayLittleEndian.setShort(rep, pos, pair); > } > } > ``` > > This is another example of PR Review disagreeing with the use of > ByteArrayLittleEndian. > https://github.com/openjdk/jdk/pull/15990 > ```java > package java.lang; > class AbstractStringBuilder { > static final class Constants { > static final int NULL_LATIN1; > static final int NULL_UTF16; > static { > byte[] bytes4 = new byte[] {'t', 'r', 'u', 'e'}; > byte[] bytes8 = new byte[8]; > NULL_LATIN1 = ByteArrayLittleEndian.getInt(bytes4, 0); > StringLatin1.inflate(bytes4, 0, bytes8, 0, 4); > NULL_UTF16 = ByteArrayLittleEndian.getLong(bytes8, 0); > } > } > > private AbstractStringBuilder appendNull() { > ensureCapacityInternal(count + 4); > int count = this.count; > byte[] val = this.value; > if (isLatin1()) { > ByteArrayLittleEndian.setInt(val, count, > Constants.NULL_LATIN1); > } else { > ByteArrayLittleEndian.setLong(val, count << 1, > Constants.NULL_UTF16); > } > this.count = count + 4; > return this; > } > } > ``` > > In these examples, using Unsafe/ByteArrayLittleEndian significantly > improves performance. If JIT automatic optimization is the best solution, > but SuperWord Level Parallelism (SLP) does not currently support this > optimization, what are our recommendations? What scenarios cannot use > Unsafe, and what scenarios cannot use ByteArrayLittleEndian? >