On Fri, 26 Aug 2022 09:25:55 GMT, Alan Bateman <al...@openjdk.org> wrote:
>> OpenJDK supports "Japanese EBCDIC - Katakana" and "Korean EBCDIC" SBCS and >> DBCS Only charsets. >> |Charset|Mix|SBCS|DBCS| >> | -- | -- | -- | -- | >> | Japanese EBCDIC - Katakana | Cp930 | Cp290 | Cp300 | >> | Korean | Cp933 | Cp833 | Cp834 | >> >> But OpenJDK does not supports some of "Japanese EBCDIC - English" / >> "Simplified Chinese EBCDIC" / "Traditional Chinese EBCDIC" SBCS and DBCS >> Only charsets. >> >> I'd like to request Cp1027/Cp835/Cp836/Cp837 for consistency >> |Charset|Mix|SBCS|DBCS| >> | ------------- | ------------- | ------------- | ------------- | >> | Japanese EBCDIC - English | Cp939 | **Cp1027** | Cp300 | >> | Simplified Chinese EBCDIC | Cp935 | **Cp836** | **Cp837** | >> | Traditional Chinese EBCDIC | Cp937 | (*1) | **Cp835** | >> >> *1: Cp037 compatible > >> Use following options, like OpenJDK: `java -cp >> icu4j-71_1.jar:icu4j-charset-71_1.jar:. tc IBM-1047 20000 1 1` ICU4J `java >> -cp icu4j-71_1.jar:icu4j-charset-71_1.jar:. tc IBM-1047_P100-1995 20000 1 1` >> >> Actually, I'm confused by this result. Previously, I was just comparing A/A >> with B/B on OpenJDK's charset. I didn't think ICU4J's result would make a >> difference. > > My initial reaction is one of relief that the icu4j provider can be used with > current JDK builds. This means there is an option should we decide to stop > adding more EBCDIC charsets to the JDK. > > The test uses IBM-1047 and I can't tell if the icu4j provider is used or not. > Charset doesn't define a provider method but I think would be useful to print > cs.getClass() or cs.getClass().getModule() so we know which Charset > implementation is used. Also I think any discussion on performance would be > better served with a JMH benchmark rather than a standalone test. Hello @AlanBateman . Sorry I'm late. I created Charset SPI JAR `x-IBM1047_SPI` (`custom-charsets.jar`) which was ported from `sun.nio.cs.SingleByte.java` and `IBM1047.java` (generated one). Test code: package com.example; import java.nio.charset.Charset; import org.openjdk.jmh.annotations.Benchmark; public class MyBenchmark { final static String s; static { char[] ca = new char[0x2000]; for (int i = 0; i < ca.length; i++) { ca[i] = (char) (i & 0xFF); } s = new String(ca); } @Benchmark public void testIBM1047() throws Exception { byte[] ba = s.getBytes("IBM1047"); } @Benchmark public void testIBM1047_SPI() throws Exception { byte[] ba = s.getBytes("x-IBM1047_SPI"); } } All test related files are in [JDK-8289834](https://bugs.openjdk.org/browse/JDK-8289834). Test results are as follows on RHEL8.6 x86_64 (Intel Core i7 3520M) : 1.8.0_345-b01 Benchmark Mode Cnt Score Error Units MyBenchmark.testIBM1047 thrpt 25 53213.092 ± 126.962 ops/s MyBenchmark.testIBM1047_SPI thrpt 25 47442.669 ± 349.003 ops/s 20-ea+17-1181 Benchmark Mode Cnt Score Error Units MyBenchmark.testIBM1047 thrpt 25 136331.141 ± 1078.481 ops/s MyBenchmark.testIBM1047_SPI thrpt 25 51563.213 ± 843.238 ops/s IBM1047 is 2.6 times faster than the SPI version on JDK20. I think this results are related to **JEP 254: Compact Strings** . As I requested before, we'd like to use `sun.nio.cs.SingleByte*` and `sun.nio.cs.DoubleByte*` class as public API. ------------- PR: https://git.openjdk.org/jdk/pull/9399