On Thu, 24 Jul 2025 14:20:48 GMT, Chen Liang <li...@openjdk.org> wrote:
>> src/java.base/share/classes/java/lang/StringUTF16.java line 1490: >> >>> 1488: val, >>> 1489: Unsafe.ARRAY_BYTE_BASE_OFFSET + ((long) index << 1), >>> 1490: (long) (end - off) << 1); >> >> The documentation of `copyMemory()` is not super-clear about endianness. >> But it seems to imply that in this case it behaves as if it were to copy >> `short`s, so endianness seems to be preserved. >> >> The invocation of `copyMemory()` here implicitly assumes that >> `ARRAY_CHAR_INDEX_SCALE` and `ARRAY_BYTE_INDEX_SCALE` are 2 and 1, resp., >> which seems quite reasonable but not written in the stone. > > I recall runtime requires UTF16 byte array and char array have exactly the > same layout - would be nice if we keep this in the design notes for the > string implementation classes, such as on the class header. > > (Useful notes could include that indices are char-based, UTF16 byte[] and > char[] has identical layout, etc.) The StringUTF16.getChar and putChar methods are carefully written to use the platform endianness to compose and decompose char values from and to byte[] in terms of shifts of the lower and upper bytes. The mapping of that into other apis that try to optimize between char[] and the compact string byte[] are less well documented. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24773#discussion_r2228721098