Re: RFR: 8364365: HKSCS encoder does not properly set the replacement character

Xueming Shen Tue, 05 Aug 2025 18:54:36 -0700

On Tue, 5 Aug 2025 08:31:31 GMT, Volkan Yazici <[email protected]> wrote:


>> Fix `HKSCS` encoder to correctly set the replacement character, and add 
>> tests to verify the `CodingErrorAction.REPLACE` behavior of all available 
>> encoders.
>
> test/jdk/sun/nio/cs/TestEncoderReplaceUTF16.java line 140:
> 
>> 138:      * Finds an {@linkplain CoderResult#isUnmappable() unmappable} 
>> non-Latin-1 {@code char[]} for the given encoder.
>> 139:      */
>> 140:     private static char[] findUnmappableNonLatin1(CharsetEncoder 
>> encoder) {
> 
> I'd appreciate it if you can double-check this method.

I would assume your "double char" actually means the "surrogate pair"?

I believe for the first pass of scanning you might want to skip the 
'surrogate",  as a single dangling surrogate char should trigger a "malformed" 
error, instead of 'unmappable", if the charset is implemented to handle 
supplementary character. 

        for (char c = 0xFF; c < 0xFFFF; c++) {
            if (Character.isSurrogate(c))
                continue;
            if (!encoder.canEncode(c))
                return new char[]{c};
        }

And for the second pass for the 'surrogates", I think we can just pick any 
non-bmp panel, which should always be translated into a surrogate pair and 
check if the charset can map/encode it, if not, it's our candidate.

        for (int i = 0x10000; i < 0x1FFFF; i++) {
            char[] cc = Character.toChars(i);
            if (!encoder.canEncode(new String(cc)))
              return cc;
        }

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26635#discussion_r2255682596

Re: RFR: 8364365: HKSCS encoder does not properly set the replacement character

Reply via email to