On Sun, 18 Jan 2026 09:06:31 GMT, Alan Bateman <[email protected]> wrote:

> > Question: Have you considered the handling of replacement characters? They 
> > currently are counted into the returned length, but I wonder whether users 
> > actually want to print those characters as-is.
> 
> That is a good point. As `getBytes(Charset)` is specified to replace 
> malformed-input and unmappable-character sequences, and the proposed method 
> is specified to return the equivalent of `getBytes(Charset).length` then the 
> returned length has to include them.

The motivating use cases I've seen for this method are to compute the length of 
encoded data that contains strings, where the strings would be encoded with 
`getBytes`. The CSR gives the example of encoding multiple large strings into a 
single array. Specifying the output in terms of `getBytes(cs).length` is 
necessary for that use-case, and requires the handling of replacement 
characters and unpaired surrogates to be the same between the two methods. Do 
you see alternatives that should be considered?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28454#issuecomment-3767013988

Reply via email to