RFR: 8211382 ISO2022JP and GB18030 NIO converter issues

Ichiroh Takiguchi Tue, 02 Oct 2018 02:22:35 -0700

Hello,
IBM would like to contribute NIO converter patch to OpenJDK project.


Bug:    https://bugs.openjdk.java.net/browse/JDK-8211382
Change: https://cr.openjdk.java.net/~itakiguchi/8211382/webrev.00/

Issue:

ISO2022JP decoder and GB18030 decoder (for decodeBufferLoop()) have coderange definition issues.

ISO2022JP, 0x1B, 0x28, 0x49, 0x60, 0x1B, 0x28, 0x42, is converted to\uFFA0ISO2022JP is for Japanese, but \uFFA0 is a part of Korean Hangulcharacter.


GB18030, \uFFFE is converted to 0x84, 0x31, 0xA4, 0x38.
0x84, 0x31, 0xA4, 0x38 is converted to replacement character \uFFFD.

$ java Test1
\uFFA0
\uFFFD

Expected result
$ java Test1
\uFFFD
\uFFFE

Testcase is as follows:
========================
$ cat Test1.java
import java.nio.*;
import java.nio.charset.*;

public class Test1 {
  public static void main(String[] args) throws Exception {
    {

byte[] ba = new byte[] {0x1B, 0x28, 0x49, 0x60, 0x1B, 0x28,0x42,};

      for(char ch : (new String(ba, "ISO2022JP")).toCharArray()) {
        System.out.printf("\\u%04X",(int)ch);
      }
      System.out.println();
    }
    {
      Charset cs = Charset.forName("GB18030");
      CharsetDecoder cd = cs.newDecoder();
      cd.onMalformedInput(CodingErrorAction.REPLACE)
        .onUnmappableCharacter(CodingErrorAction.REPLACE);
      byte[] ba = "\uFFFE".getBytes(cs);
      ByteBuffer bb = ByteBuffer.allocateDirect(ba.length);
      bb.put(ByteBuffer.wrap(ba));
      bb.position(0);
      CharBuffer cb = cd.decode(bb);
      for(int i=0; i<cb.limit(); i++) {
        System.out.printf("\\u%04X",(int)cb.get(i));
      }
      System.out.println();
    }
  }
}
========================

I'd like to obtain a sponsor for this issue.

Thanks,
Ichiroh Takiguchi
IBM Japan, Ltd.

RFR: 8211382 ISO2022JP and GB18030 NIO converter issues

Reply via email to