Hello,
IBM would like to contribute NIO converter patch to OpenJDK project.

Bug:    https://bugs.openjdk.java.net/browse/JDK-8211382
Change: https://cr.openjdk.java.net/~itakiguchi/8211382/webrev.00/

Issue:
ISO2022JP decoder and GB18030 decoder (for decodeBufferLoop()) have code range definition issues.

ISO2022JP, 0x1B, 0x28, 0x49, 0x60, 0x1B, 0x28, 0x42, is converted to \uFFA0 ISO2022JP is for Japanese, but \uFFA0 is a part of Korean Hangul character.

GB18030, \uFFFE is converted to 0x84, 0x31, 0xA4, 0x38.
0x84, 0x31, 0xA4, 0x38 is converted to replacement character \uFFFD.

$ java Test1
\uFFA0
\uFFFD

Expected result
$ java Test1
\uFFFD
\uFFFE

Testcase is as follows:
========================
$ cat Test1.java
import java.nio.*;
import java.nio.charset.*;

public class Test1 {
  public static void main(String[] args) throws Exception {
    {
      byte[] ba = new byte[] {0x1B, 0x28, 0x49, 0x60, 0x1B, 0x28, 0x42,};
      for(char ch : (new String(ba, "ISO2022JP")).toCharArray()) {
        System.out.printf("\\u%04X",(int)ch);
      }
      System.out.println();
    }
    {
      Charset cs = Charset.forName("GB18030");
      CharsetDecoder cd = cs.newDecoder();
      cd.onMalformedInput(CodingErrorAction.REPLACE)
        .onUnmappableCharacter(CodingErrorAction.REPLACE);
      byte[] ba = "\uFFFE".getBytes(cs);
      ByteBuffer bb = ByteBuffer.allocateDirect(ba.length);
      bb.put(ByteBuffer.wrap(ba));
      bb.position(0);
      CharBuffer cb = cd.decode(bb);
      for(int i=0; i<cb.limit(); i++) {
        System.out.printf("\\u%04X",(int)cb.get(i));
      }
      System.out.println();
    }
  }
}
========================

I'd like to obtain a sponsor for this issue.

Thanks,
Ichiroh Takiguchi
IBM Japan, Ltd.

Reply via email to