On Wed, 19 Nov 2025 14:52:04 GMT, Maurizio Cimadamore <[email protected]> 
wrote:

>> Liam Miller-Cushon has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Add a dstOffset parameter, stop using 
>> StringCharBuffer/CharsetEncoder::encode
>
> src/java.base/share/classes/java/lang/foreign/MemorySegment.java line 2649:
> 
>> 2647:      *
>> 2648:      * @param src      the Java string to be written into this segment
>> 2649:      * @param dstEncoding the charset used to {@linkplain 
>> Charset#newEncoder() encode}
> 
> I'm not sure we have a dependency on the charset being standard?

We do not, thanks, fixed.

Although I think the existing `allocateFrom(String, Charset)` method does have 
an undocumented dependency, because it uses `CharsetKind` to get the terminator 
char length, which only supports standard Charsets. If we add a fast path for 
UTF-16 that may need a dependency on a standard Charset (or a standard way to 
get the code unit size of a charset, if it has one).

> src/java.base/share/classes/java/lang/foreign/SegmentAllocator.java line 194:
> 
>> 192:         MemorySegment segment;
>> 193:         if (StringSupport.bytesCompatible(str, charset, srcIndex, 
>> numChars)) {
>> 194:             segment = allocateNoInit(numChars);
> 
> This also seems to rely on the fact that we end up here only for latin1 
> strings. Again, I don't think this is correct, but if it's deliberate, we 
> should add an assertion check.

Good point. I think we make a similar assumption in the existing 
`allocateFrom(String, Charset)`, it does  `length + termCharSize` and that 
should perhaps be `(length + 1) * codeUnitSize`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28043#discussion_r2544835957
PR Review Comment: https://git.openjdk.org/jdk/pull/28043#discussion_r2544862879

Reply via email to