On Mon, 8 Aug 2022 09:22:32 GMT, Alan Bateman <al...@openjdk.org> wrote:
>> Hello @AlanBateman . >> Sorry I'm late. >> I got some responses from ICU. >> [ICU-22091](https://unicode-org.atlassian.net/browse/ICU-22091) >> I'm not sure if they're interested in the new charset... >> >> As you know `sun.nio.cs.ArrayDecoder` and `sun.nio.cs.ArrayEncoder`interface >> have performance advantage. >> And some other performance advantages are there on built-in charset >> decoder/encoder. >> Is it possible to create simple public API by using `sun.nio.cs.SingleByte` >> and `sun.nio.cs.DoubleByte*` classes? >> We'd like to use stable conversion loop. > >> As you know `sun.nio.cs.ArrayDecoder` and `sun.nio.cs.ArrayEncoder`interface >> have performance advantage. And some other performance advantages are there >> on built-in charset decoder/encoder. Is it possible to create simple public >> API by using `sun.nio.cs.SingleByte` and `sun.nio.cs.DoubleByte*` classes? >> We'd like to use stable conversion loop. > > If they have ASCII compatible regions then that may be so but I haven't see > any performance data published on that. Do you know if any experiments that > have deployed a CharsetProvider for the EBCDIC charsets and compared the > performance with the charsets that in the JDK? There may be merit in > exploring adding base abstracts implementations of > CharsetEncoder/CharsetDecoder to java.nio.charsets.spi to support single and > double byte charsets to see how such base implementations might look, how > they would help performance, and if there are any security downsides. Hello @AlanBateman . Sorry, I'm late. Test result is attached (not guaranteed). I created attached small test program, I'm not sure it's good or not import java.nio.*; import java.nio.charset.*; public class tc { public static void main(String[] args) throws Exception { Charset cs = Charset.forName(args[0]); int cnt = Integer.parseInt(args[1]); boolean useCA = "1".equals(args[2]); boolean useBA = "1".equals(args[3]); CharsetEncoder ce = cs.newEncoder(); byte[] ba = new byte[0x4000]; for(int i = 0; i < ba.length; i++) { ba[i] = (byte) i; } String s = new String(ba, cs); char[] ca = s.toCharArray(); ByteBuffer bb = useBA ? ByteBuffer.allocate(ca.length) : ByteBuffer.allocateDirect(ca.length);; CharBuffer cb = useCA ? CharBuffer.wrap(ca) : CharBuffer.wrap(s); System.out.println("CharBuffer.hasArray() = " + cb.hasArray()); System.out.println("ByteBuffer.hasArray() = " + bb.hasArray()); long start_t = System.currentTimeMillis(); for(int i = 0; i < 200; i++) { ce.reset(); bb.position(0); cb.position(0); ce.encode(cb, bb, true); } System.out.println("Warmup: "+(System.currentTimeMillis() - start_t)); start_t = System.currentTimeMillis(); for(int i = 0; i < cnt; i++) { ce.reset(); bb.position(0); cb.position(0); ce.encode(cb, bb, true); } System.out.println("Test: "+(System.currentTimeMillis() - start_t)); } } Following test result is just for my test environment * CPU: Intel (On-premises environment), not same machine * Executed 5 times, the values are their average Use following options, like OpenJDK: `java -cp icu4j-71_1.jar:icu4j-charset-71_1.jar:. tc IBM-1047 20000 1 1` ICU4J `java -cp icu4j-71_1.jar:icu4j-charset-71_1.jar:. tc IBM-1047_P100-1995 20000 1 1` I used jdk-20 b12 Only A/A with OpenJDK uses ArrayEncoder (ArrayDecoder) interface | | A/A | A/B | B/A | B/B | | -- | --: | --: | --: | --: | | Linux (OpenJDK) | 862 | 1265 | 1838 | 1843 | | Linux (ICU4J) | 1450 | 1410 | 1152 | 1138 | | Windows (OpenJDK) | 921 | 1231 | 1959 | 1850 | | Windows (ICU4J) | 1431 | 1446 | 2227 | 2265 | | Mac (OpenJDK) | 820 | 1163 | 1799 | 1774 | | Mac (ICU4J) | 1282 | 1242 | 994 | 1049 | Notes: * A/A means CharBuffer is created via char[], ByteBuffer is generated by allocate() * A/B means CharBuffer is created via char[], ByteBuffer is generated by allocateDirect() * B/A means CharBuffer is created via String, ByteBuffer is generated by allocate() * B/B means CharBuffer is created via String, ByteBuffer is generated by allocateDirect() Actually, I'm confused by this result. Previously, I was just comparing A/A with B/B on OpenJDK's charset. I didn't think ICU4J's result would make a difference. Anyway, please evaluate about this result. And please let me know if I need more investigation. ------------- PR: https://git.openjdk.org/jdk/pull/9399