Re: RFR: 8369564: Provide a MemorySegment API to read strings with known lengths

Jorn Vernee Wed, 29 Oct 2025 08:43:57 -0700

On Wed, 29 Oct 2025 10:50:53 GMT, Liam Miller-Cushon <[email protected]> wrote:


> This PR proposes adding a new overload to `MemorySegment::getString` that 
> takes a known byte length of the content.
> 
> This was previously proposed in https://github.com/openjdk/jdk/pull/20725, 
> but the outcome of [JDK-8333843](https://bugs.openjdk.org/browse/JDK-8333843) 
> was to update `MemorySegment#getString` to suggest
> 
> 
>     byte[] bytes = new byte[length];
>     MemorySegment.copy(segment, JAVA_BYTE, offset, bytes, 0, length);
>     return new String(bytes, charset);
> 
> 
> However this is less efficient than what the implementation of getString does 
> after [JDK-8362893](https://bugs.openjdk.org/browse/JDK-8362893), it now uses 
> `JavaLangAccess::uncheckedNewStringNoRepl` to avoid the copy.

Some preliminary comments. I didn't look at the tests yet.

src/java.base/share/classes/java/lang/foreign/MemorySegment.java line 1338:

> 1336:      *                access operation will occur
> 1337:      * @param length  byte length to be used for string conversion (not 
> including any
> 1338:      *                null termination)

I think 'to be used for string conversion' is a bit too vague (used _how_?). I 
think a more descriptive text could be something like 'length in bytes of the 
string to read' (matching also the pattern of the existing 'offset in bytes').

Also, what happens if:
- The length _does_ include a null terminator
- The length is not a multiple of the byte size of a character in the given 
charset.

On that last note, I wonder if this shouldn't be the length in bytes, but the 
length in characters. Then we can compute the byte length from the charset. 
That will make it impossible to pass a length that is not a multiple of the 
character size.

src/java.base/share/classes/java/lang/foreign/MemorySegment.java line 1349:

> 1347:      *         <ul>
> 1348:      *             <li>{@code B} is the size, in bytes, of the string 
> encoded using the
> 1349:      *             provided charset (e.g. {@code 
> str.getBytes(charset).length});</li>

Isn't `B` equal to the `length` argument?

src/java.base/share/classes/java/lang/foreign/MemorySegment.java line 1351:

> 1349:      *             provided charset (e.g. {@code 
> str.getBytes(charset).length});</li>
> 1350:      *             <li>{@code N} is the size (in bytes) of the 
> terminator char according
> 1351:      *             to the provided charset. For instance, this is 1 for

Why is the terminator char important? The segment doesn't necessarily need to 
have a terminator char, right? I don't see this invariant being checked in the 
code either.

src/java.base/share/classes/java/lang/foreign/MemorySegment.java line 1363:

> 1361:      */
> 1362:     String getString(long offset, int length, Charset charset);
> 1363: 

I'd suggest putting the `length` parameter at the end, so that this becomes a 
telescoping overload of the length-less variant.

src/java.base/share/classes/jdk/internal/foreign/StringSupport.java line 66:

> 64:             case SINGLE_BYTE -> readByte(segment, offset, len, charset);
> 65:             case DOUBLE_BYTE -> readShort(segment, offset, len, charset);
> 66:             case QUAD_BYTE -> readInt(segment, offset, len, charset);

These 3 methods appear to be identical

-------------

PR Review: https://git.openjdk.org/jdk/pull/28043#pullrequestreview-3394314370
PR Review Comment: https://git.openjdk.org/jdk/pull/28043#discussion_r2473855397
PR Review Comment: https://git.openjdk.org/jdk/pull/28043#discussion_r2473837599
PR Review Comment: https://git.openjdk.org/jdk/pull/28043#discussion_r2473832085
PR Review Comment: https://git.openjdk.org/jdk/pull/28043#discussion_r2473823655
PR Review Comment: https://git.openjdk.org/jdk/pull/28043#discussion_r2473819485

Re: RFR: 8369564: Provide a MemorySegment API to read strings with known lengths

Reply via email to