Integrated: 8308046: Move Solaris related charsets from java.base to jdk.charsets module
On Mon, 15 May 2023 00:28:41 GMT, Ichiroh Takiguchi wrote: > According to "JDK 20 Internationalization Guide" > https://docs.oracle.com/en/java/javase/20/intl/supported-encodings.html > Following Solaris related charsets are in "contained in jdk.charsets module" > list. > > - PCK (x-PCK) > - EUC_JP_Solaris (x-eucJP-Open) > - Big5_Solaris (x-Big5-Solaris) > > These are not supported by Linux platform, so they should not be in java.base > module. > > Note: > GHA Linux x86 builds were failed. > I think it's not related by my modified code. > I opened [JDK-8308051](https://bugs.openjdk.org/browse/JDK-8308051) GHA: > Linux x86 builds failure This pull request has now been integrated. Changeset: 5d8ba938 Author:Ichiroh Takiguchi URL: https://git.openjdk.org/jdk/commit/5d8ba938bef162b74816147eb1002a0620a419ba Stats: 21 lines in 4 files changed: 0 ins; 6 del; 15 mod 8308046: Move Solaris related charsets from java.base to jdk.charsets module Reviewed-by: naoto - PR: https://git.openjdk.org/jdk/pull/13973
Re: RFR: 8308046: Move Solaris related charsets from java.base to jdk.charsets module [v2]
On Sat, 20 May 2023 17:26:53 GMT, Naoto Sato wrote: >> Hello @naotoj . >> I'd like to confirm about DoubleByte-X.java.template and >> EUC_JP.java.template. >> I think the values are package-private. >> Even if class is changed to `public`, the classes in`sun.nio.cs.ext` package >> could not access to these values in `sun.nio.cs` package... >> I may be misunderstanding your suggestion, could you tell me more ? > >> I think the values are package-private. Even if class is changed to >> `public`, the classes in`sun.nio.cs.ext` package could not access to these >> values in `sun.nio.cs` package... > > I meant making those package-private fields public. I believe it's OK because > java.base/sun.nio.cs package is only exported to jdk.charsets module. Hello @naotoj . I appreciate your attention about JBS side. I changed title and description, add noreg-cleanup label. - PR Comment: https://git.openjdk.org/jdk/pull/13973#issuecomment-1558228901
Re: RFR: 8308046: Move Solaris related charsets from java.base to jdk.charsets module [v2]
On Sat, 20 May 2023 17:26:53 GMT, Naoto Sato wrote: >> Hello @naotoj . >> I'd like to confirm about DoubleByte-X.java.template and >> EUC_JP.java.template. >> I think the values are package-private. >> Even if class is changed to `public`, the classes in`sun.nio.cs.ext` package >> could not access to these values in `sun.nio.cs` package... >> I may be misunderstanding your suggestion, could you tell me more ? > >> I think the values are package-private. Even if class is changed to >> `public`, the classes in`sun.nio.cs.ext` package could not access to these >> values in `sun.nio.cs` package... > > I meant making those package-private fields public. I believe it's OK because > java.base/sun.nio.cs package is only exported to jdk.charsets module. Thanks @naotoj . I changed related fields to `public`. - PR Comment: https://git.openjdk.org/jdk/pull/13973#issuecomment-1557308396
Re: RFR: 8308046: Move Solaris related charsets from java.base to jdk.charsets module [v3]
> According to "JDK 20 Internationalization Guide" > https://docs.oracle.com/en/java/javase/20/intl/supported-encodings.html > Following Solaris related charsets are in "contained in jdk.charsets module" > list. > > - PCK (x-PCK) > - EUC_JP_Solaris (x-eucJP-Open) > - Big5_Solaris (x-Big5-Solaris) > > These are not supported by Linux platform, so they should not be in java.base > module. > > Note: > GHA Linux x86 builds were failed. > I think it's not related by my modified code. > I opened [JDK-8308051](https://bugs.openjdk.org/browse/JDK-8308051) GHA: > Linux x86 builds failure Ichiroh Takiguchi has updated the pull request incrementally with one additional commit since the last revision: 8308046: Move Solaris related charsets from java.base to jdk.charsets module - Changes: - all: https://git.openjdk.org/jdk/pull/13973/files - new: https://git.openjdk.org/jdk/pull/13973/files/6fd12fcd..1c10b107 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=13973=02 - incr: https://webrevs.openjdk.org/?repo=jdk=13973=01-02 Stats: 43 lines in 4 files changed: 0 ins; 29 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/13973.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13973/head:pull/13973 PR: https://git.openjdk.org/jdk/pull/13973
Re: RFR: 8308046: Move Solaris related charsets from java.base to jdk.charsets module [v2]
On Thu, 18 May 2023 00:50:22 GMT, Naoto Sato wrote: >>> Hello @naotoj . I'm not sure we can remove Solaris related charsets. >>> Somebody may use them for text communication between Solaris. >> >> OK, maybe not now. >> >> I think the fix may be simplified by changing access for those >> `DoubleByte-X.java.template` internals, such as `DecodeHolder` to `public`, >> instead of introducing those access methods. You can import classes in >> `java.base/sun.nio.cs` with the wild card so that it would work on all >> platforms (`Big5` either in `java.base` or `jdk.charsets`) >> >> Also, please drop `Japanese` from the issue/PR title > >>You can import classes in `java.base/sun.nio.cs` with the wild card so that >>it would work on all platforms (`Big5` either in `java.base` or >>`jdk.charsets`) > > Scratch that, you've already did it. Then you can remove these: > > import sun.nio.cs.DoubleByte; > import sun.nio.cs.HistoricallyNamedCharset; Hello @naotoj . I'd like to confirm about DoubleByte-X.java.template and EUC_JP.java.template. I think the values are package-private. Even if class is changed to `public`, the classes in`sun.nio.cs.ext` package could not access to these values in `sun.nio.cs` package... I may be misunderstanding your suggestion, could you tell me more ? - PR Comment: https://git.openjdk.org/jdk/pull/13973#issuecomment-1555405480
Re: RFR: 8308046: Move Solaris related Japanese charsets from java.base to jdk.charsets module [v2]
On Tue, 16 May 2023 17:13:02 GMT, Naoto Sato wrote: >> Ichiroh Takiguchi has updated the pull request incrementally with one >> additional commit since the last revision: >> >> 8308046: Move Solaris related Japanese charsets from java.base to >> jdk.charsets module > > I now think it is better simply removing Solaris-related charsets, as moving > them from java.base to jdk.charsets would require unnecessary code changes in > non-Solaris code. Hello @naotoj . I'm not sure we can remove Solaris related charsets. Somebody may use them for text communication between Solaris. The latest change can move Big5_Solaris from java.base to jdk.charsets module. - PR Comment: https://git.openjdk.org/jdk/pull/13973#issuecomment-1551244863
Re: RFR: 8308046: Move Solaris related Japanese charsets from java.base to jdk.charsets module [v2]
> According to "JDK 20 Internationalization Guide" > https://docs.oracle.com/en/java/javase/20/intl/supported-encodings.html > Following Solaris related Japanese charsets are in "contained in jdk.charsets > module" list. > > - PCK (x-PCK) > - EUC_JP_Solaris (x-eucJP-Open) > > These are not supported by Linux platform, so they should not be in java.base > module. > > Note: > GHA Linux x86 builds were failed. > I think it's not related by my modified code. > I opened [JDK-8308051](https://bugs.openjdk.org/browse/JDK-8308051) GHA: > Linux x86 builds failure Ichiroh Takiguchi has updated the pull request incrementally with one additional commit since the last revision: 8308046: Move Solaris related Japanese charsets from java.base to jdk.charsets module - Changes: - all: https://git.openjdk.org/jdk/pull/13973/files - new: https://git.openjdk.org/jdk/pull/13973/files/192db59c..6fd12fcd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=13973=01 - incr: https://webrevs.openjdk.org/?repo=jdk=13973=00-01 Stats: 29 lines in 3 files changed: 22 ins; 1 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/13973.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13973/head:pull/13973 PR: https://git.openjdk.org/jdk/pull/13973
Re: RFR: 8301119: Support for GB18030-2022 [v3]
On Fri, 24 Feb 2023 17:19:22 GMT, Naoto Sato wrote: >> Hello @naotoj . >> Sorry for bothering you. >> >> I have following question: >> - Why GB18030.java.template is in >> src/jdk.charsets/share/classes/sun/nio/cs/ext/ directory even if the >> generated code is always stored into sun/nio/cs ? >> I think the file should be moved to src/java.base/share/classes/sun/nio/cs >> and the file name should be GB18030.java instead of GB18030.java.template. >> Is there specific reason ? > >> Hello @naotoj . Sorry for bothering you. >> >> I have following question: >> >> * Why GB18030.java.template is in >> src/jdk.charsets/share/classes/sun/nio/cs/ext/ directory even if the >> generated code is always stored into sun/nio/cs ? >> I think the file should be moved to src/java.base/share/classes/sun/nio/cs >> and the file name should be GB18030.java instead of GB18030.java.template. >> Is there specific reason ? > > No, there is not. Thanks for pointing it out. Fixed. Thanks @naotoj . That's what I expected. - PR: https://git.openjdk.org/jdk/pull/12518
Re: RFR: 8301119: Support for GB18030-2022 [v3]
On Thu, 23 Feb 2023 19:34:44 GMT, Naoto Sato wrote: >> Upgrading the GB18030 charset in the JDK to the latest 2022 standard. Since >> this is not a compatible upgrade to the existing mapping, a new system >> property `jdk.charset.GB18030` is introduced. If it is set to "2000", the >> mapping falls back to the existing mapping based on the 2000 standard, >> otherwise, it defaults to 2022 mapping. Refer to the corresponding CSR for >> more detail. > > Naoto Sato has updated the pull request incrementally with one additional > commit since the last revision: > > Moved the 2000 flag into GB18030 Hello @naotoj . Sorry for bothering you. I have following question: - Why GB18030.java.template is in src/jdk.charsets/share/classes/sun/nio/cs/ext/ directory even if the generated code is always stored into sun/nio/cs ? I think the file should be moved to src/java.base/share/classes/sun/nio/cs and the file name should be GB18030.java instead of GB18030.java.template. Is there specific reason ? - PR: https://git.openjdk.org/jdk/pull/12518
Re: RFR: 8300916: Re-examine the initialization of JNU Charset in StaticProperty [v4]
On Wed, 25 Jan 2023 18:58:40 GMT, Alan Bateman wrote: >> `Charset.defaultCharset()` now uses >> `standardProvider.charsetForName()` charset. > >> `Charset.defaultCharset()` now uses >> `standardProvider.charsetForName()` charset. > > I think this is the right thing to do. It can also be changed to use > StaticProperty.fileEncoding() and maybe the field can be changed to be a > `@Stable` field. > > It might be that we will need to create a CSR and Release Note for this > change. The scenario is PR 12132 is unfortunate but does not show that some > deployments may have been relying on the this from JDK 9 to JDK 17. With the > change here, we are doubling now on ensuring that the default charset is > loaded from java.base. Hello @AlanBateman You said > The change to StaticProperty to avoid calling out to Charset.defaultCharset > from the initializer is good. However, the other part to that is the scenario > in PR 12132 where the default Charset was accidentally located via the > provider mechanism in JDK 9-17. If I read the changes correctly, that fragile > scenario will come back. We have a couple of ways to avoid that, one being to > ensure that defaultCharset is called before the boot layer is set. A simpler, > and more reliable, would be to change Charset.defaultCharset to use > standardProvider.charsetForName with the value of "file.encoding", and avoid > the provider lookup completely. Could you explain about fragile scenario ? - PR: https://git.openjdk.org/jdk/pull/12171
Re: RFR: 8300819: -Dfile.encoding=Cp943C option does not work as expected since jdk18 [v2]
On Tue, 24 Jan 2023 12:51:31 GMT, Alan Bateman wrote: > Do you know if there is any configuration on AIX that would derive Cp943C as > the default charset? That is, are they running with -Dfile.encoding=Cp943C on > the AIX systems or is it chosen by default. This goes to the question as to > whether they just moving these applications to Linux and expecting the > default charset to be the same. In my understanding, my client uses `-Dfile.encoding=Cp943C` option on Japanese IBM-943 locale on AIX. Default charset on Japanese IBM-943 locale with IBM Java8 and OpenJDK JDK11+ is x-IBM943C(Cp943C). (We need to use `-Dfile.encoding=Cp943C` for OpenJDK JDK8.) We never thought we could just move to Linux because of JEP-400. But we don't move the apps all at once to Linux. We expected that we could change default charset by `-Dfile.encoding=Cp943C`, at least until Java8 EOS. > Do you know which APIs they are using? We filled in the gaps many releases > ago so that all APIs that do encoding/decoding allow the charset to be > specified and I'm wondering why they don't use those. We checked String.getByte()/new String(...)/Reader/Writer/ByteArrayOutputStream.toString()... Is there good way to pick up which parts need to be fixed ? - PR: https://git.openjdk.org/jdk/pull/12132
Re: RFR: 8300819: -Dfile.encoding=Cp943C option does not work as expected since jdk18 [v2]
On Mon, 23 Jan 2023 13:46:15 GMT, Alan Bateman wrote: > It's never been supported to run with -Dfile.encoding=Cp943C. It may have > worked in JDK 8 but I doubt it could have worked consistently since JDK 9 > because the default charset is derived before it's possible to locate charset > implementations outside of java.base. As described before, JDK17 worked with `-Dfile.encoding=Cp943C`, and JDK18 changed the behavior. I heard some apps had already ported on JDK17 with the option, and works. > I think it would be useful to know a bit more about the environment. It > sounds like it might be AIX -> Linux migration but I'm curious if you have > any insight into why these applications depend on default charset being > Cp943C. Is it text files that are opened without specifying the charset or is > is something else? One of my client has many legacy Java apps on AIX. Their apps use default charset to communicate with other apps via cipher communication, and validate data by using Cp943C. I hope IBM943C is moved to java.base module, like #11908 . - PR: https://git.openjdk.org/jdk/pull/12132
Re: RFR: 8300819: -Dfile.encoding=Cp943C option does not work as expected since jdk18 [v2]
On Mon, 23 Jan 2023 07:48:41 GMT, Alan Bateman wrote: >> Ichiroh Takiguchi has updated the pull request incrementally with one >> additional commit since the last revision: >> >> 8300819: -Dfile.encoding=Cp943C option does not work as expected since >> jdk18 > > I'm trying to understand what the real issue is. The java.base module on > Linux builds includes SJIS, MS932, and PCK. Is there a Linux configuration in > Japanese environments where the default charset in any JDK release is > IBM943C? Same question for AIX builds that is the only build that includes > IBM943C in java.base. Hello @AlanBateman . Sorry for your confusion. Java8 works `-Dfile.encoding=Cp943C` option on Linux. Since many users are migrating from Java8, I'm getting similar requests from my clients. Cp943C is not supported by Linux natively, but some clients want to use same encoding with Linux and AIX. Japanese AIX environment supports IBM-943(Cp943C)/IBM-eucJP(Cp29626C)/UTF-8 encoding. Cp943C and Cp29626C are in base.base module on AIX platform. - PR: https://git.openjdk.org/jdk/pull/12132
Re: RFR: 8300819: -Dfile.encoding=Cp943C option does not work as expected since jdk18 [v2]
On Sun, 22 Jan 2023 23:17:10 GMT, Ichiroh Takiguchi wrote: >> On jdk17, following testcase works fine on Linux platform. >> >> Testcase >> >> $ cat cstest1.java >> import java.nio.charset.*; >> >> public class cstest1 { >> public static void main(String[] args) throws Exception { >> Charset cs = Charset.defaultCharset(); >> System.out.println(cs + ", " + cs.getClass() + ", " + >> cs.getClass().getModule()); >> } >> } >> >> >> $ ~/jdk-17.0.6+10/bin/java -Dfile.encoding=Cp943C -showversion cstest1 >> openjdk version "17.0.6" 2023-01-17 >> OpenJDK Runtime Environment Temurin-17.0.6+10 (build 17.0.6+10) >> OpenJDK 64-Bit Server VM Temurin-17.0.6+10 (build 17.0.6+10, mixed mode, >> sharing) >> x-IBM943C, class sun.nio.cs.ext.IBM943C, module jdk.charsets >> >> >> But it does not work as expected on jdk18 and jdk21b06 >> >> $ ~/jdk-18.0.2.1+1/bin/java -Dfile.encoding=Cp943C -showversion cstest1 >> openjdk version "18.0.2.1" 2022-08-18 >> OpenJDK Runtime Environment Temurin-18.0.2.1+1 (build 18.0.2.1+1) >> OpenJDK 64-Bit Server VM Temurin-18.0.2.1+1 (build 18.0.2.1+1, mixed mode, >> sharing) >> UTF-8, class sun.nio.cs.UTF_8, module java.base >> $ ~/jdk-21/bin/java -Dfile.encoding=Cp943C -showversion cstest1 >> openjdk version "21-ea" 2023-09-19 >> OpenJDK Runtime Environment (build 21-ea+6-365) >> OpenJDK 64-Bit Server VM (build 21-ea+6-365, mixed mode, sharing) >> UTF-8, class sun.nio.cs.UTF_8, module java.base >> >> >> Fixed result is as follows: >> >> $ java -Dfile.encoding=Cp943C -showversion PrintDefaultCharset >> openjdk version "21-internal" 2023-09-19 >> OpenJDK Runtime Environment (build 21-internal-adhoc.jdktest.jdk) >> OpenJDK 64-Bit Server VM (build 21-internal-adhoc.jdktest.jdk, mixed mode, >> sharing) >> x-IBM943C > > Ichiroh Takiguchi has updated the pull request incrementally with one > additional commit since the last revision: > > 8300819: -Dfile.encoding=Cp943C option does not work as expected since jdk18 First, Sorry I forgot to change Copyright date. @AlanBateman , I appreciate your reply. In my understanding, - io stream side can use native.encoding system property. - Now file.encoding system property is used for non-io stream. This issue is related #11908 . I need a solution to use the Cp943C charset as default charset. Please give me some suggestion. - PR: https://git.openjdk.org/jdk/pull/12132
Re: RFR: 8300819: -Dfile.encoding=Cp943C option does not work as expected since jdk18 [v2]
> On jdk17, following testcase works fine on Linux platform. > > Testcase > > $ cat cstest1.java > import java.nio.charset.*; > > public class cstest1 { > public static void main(String[] args) throws Exception { > Charset cs = Charset.defaultCharset(); > System.out.println(cs + ", " + cs.getClass() + ", " + > cs.getClass().getModule()); > } > } > > > $ ~/jdk-17.0.6+10/bin/java -Dfile.encoding=Cp943C -showversion cstest1 > openjdk version "17.0.6" 2023-01-17 > OpenJDK Runtime Environment Temurin-17.0.6+10 (build 17.0.6+10) > OpenJDK 64-Bit Server VM Temurin-17.0.6+10 (build 17.0.6+10, mixed mode, > sharing) > x-IBM943C, class sun.nio.cs.ext.IBM943C, module jdk.charsets > > > But it does not work as expected on jdk18 and jdk21b06 > > $ ~/jdk-18.0.2.1+1/bin/java -Dfile.encoding=Cp943C -showversion cstest1 > openjdk version "18.0.2.1" 2022-08-18 > OpenJDK Runtime Environment Temurin-18.0.2.1+1 (build 18.0.2.1+1) > OpenJDK 64-Bit Server VM Temurin-18.0.2.1+1 (build 18.0.2.1+1, mixed mode, > sharing) > UTF-8, class sun.nio.cs.UTF_8, module java.base > $ ~/jdk-21/bin/java -Dfile.encoding=Cp943C -showversion cstest1 > openjdk version "21-ea" 2023-09-19 > OpenJDK Runtime Environment (build 21-ea+6-365) > OpenJDK 64-Bit Server VM (build 21-ea+6-365, mixed mode, sharing) > UTF-8, class sun.nio.cs.UTF_8, module java.base > > > Fixed result is as follows: > > $ java -Dfile.encoding=Cp943C -showversion PrintDefaultCharset > openjdk version "21-internal" 2023-09-19 > OpenJDK Runtime Environment (build 21-internal-adhoc.jdktest.jdk) > OpenJDK 64-Bit Server VM (build 21-internal-adhoc.jdktest.jdk, mixed mode, > sharing) > x-IBM943C Ichiroh Takiguchi has updated the pull request incrementally with one additional commit since the last revision: 8300819: -Dfile.encoding=Cp943C option does not work as expected since jdk18 - Changes: - all: https://git.openjdk.org/jdk/pull/12132/files - new: https://git.openjdk.org/jdk/pull/12132/files/9e400d60..5e7db0e0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=12132=01 - incr: https://webrevs.openjdk.org/?repo=jdk=12132=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12132.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12132/head:pull/12132 PR: https://git.openjdk.org/jdk/pull/12132
RFR: 8300819: -Dfile.encoding=Cp943C option does not work as expected since jdk18
On jdk17, following testcase works fine on Linux platform. Testcase $ cat cstest1.java import java.nio.charset.*; public class cstest1 { public static void main(String[] args) throws Exception { Charset cs = Charset.defaultCharset(); System.out.println(cs + ", " + cs.getClass() + ", " + cs.getClass().getModule()); } } $ ~/jdk-17.0.6+10/bin/java -Dfile.encoding=Cp943C -showversion cstest1 openjdk version "17.0.6" 2023-01-17 OpenJDK Runtime Environment Temurin-17.0.6+10 (build 17.0.6+10) OpenJDK 64-Bit Server VM Temurin-17.0.6+10 (build 17.0.6+10, mixed mode, sharing) x-IBM943C, class sun.nio.cs.ext.IBM943C, module jdk.charsets But it does not work as expected on jdk18 and jdk21b06 $ ~/jdk-18.0.2.1+1/bin/java -Dfile.encoding=Cp943C -showversion cstest1 openjdk version "18.0.2.1" 2022-08-18 OpenJDK Runtime Environment Temurin-18.0.2.1+1 (build 18.0.2.1+1) OpenJDK 64-Bit Server VM Temurin-18.0.2.1+1 (build 18.0.2.1+1, mixed mode, sharing) UTF-8, class sun.nio.cs.UTF_8, module java.base $ ~/jdk-21/bin/java -Dfile.encoding=Cp943C -showversion cstest1 openjdk version "21-ea" 2023-09-19 OpenJDK Runtime Environment (build 21-ea+6-365) OpenJDK 64-Bit Server VM (build 21-ea+6-365, mixed mode, sharing) UTF-8, class sun.nio.cs.UTF_8, module java.base Fixed result is as follows: $ java -Dfile.encoding=Cp943C -showversion PrintDefaultCharset openjdk version "21-internal" 2023-09-19 OpenJDK Runtime Environment (build 21-internal-adhoc.jdktest.jdk) OpenJDK 64-Bit Server VM (build 21-internal-adhoc.jdktest.jdk, mixed mode, sharing) x-IBM943C - Commit messages: - -Dfile.encoding=Cp943C option does not work as expected since jdk18 Changes: https://git.openjdk.org/jdk/pull/12132/files Webrev: https://webrevs.openjdk.org/?repo=jdk=12132=00 Issue: https://bugs.openjdk.org/browse/JDK-8300819 Stats: 45 lines in 2 files changed: 44 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12132.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12132/head:pull/12132 PR: https://git.openjdk.org/jdk/pull/12132
Integrated: 8299194: CustomTzIDCheckDST.java may fail at future date
On Wed, 21 Dec 2022 15:57:29 GMT, Ichiroh Takiguchi wrote: > test/jdk/java/util/TimeZone/CustomTzIDCheckDST.java may fail at future date. > I used following standalone testcase > > import java.util.Calendar; > import java.util.Date; > import java.util.SimpleTimeZone; > > public class CheckDST { > private static String CUSTOM_TZ = "MEZ-1MESZ,M3.5.0,M10.5.0"; > public static void main(String args[]) throws Throwable { > runTZTest(); > } > > /* TZ code will always be set to "MEZ-1MESZ,M3.5.0,M10.5.0". > * This ensures the transition periods for Daylights Savings should be at > March's last > * Sunday and October's last Sunday. > */ > private static void runTZTest() { > Date time = new Date(); > if (new SimpleTimeZone(360, "MEZ-1MESZ", Calendar.MARCH, -1, > Calendar.SUNDAY, 0, > Calendar.OCTOBER, -1, Calendar.SUNDAY, > 0).inDaylightTime(time)) { > // We are in Daylight savings period. > if (time.toString().endsWith("GMT+02:00 " + > Integer.toString(time.getYear() + 1900))) > return; > } else { > if (time.toString().endsWith("GMT+01:00 " + > Integer.toString(time.getYear() + 1900))) > return; > } > > // Reaching here means time zone did not match up as expected. > throw new RuntimeException("Got unexpected timezone information: " + > time); > } > } > > > I tested CheckDST with faketime, then I got following results > > $ TZ=GMT faketime -m "2023-03-25 22:59:59" env TZ="MEZ-1MESZ,M3.5.0,M10.5.0" > $HOME/jdk-21-b02/bin/java CheckDST > $ TZ=GMT faketime -m "2023-03-25 23:00:00" env TZ="MEZ-1MESZ,M3.5.0,M10.5.0" > $HOME/jdk-21-b02/bin/java CheckDST > Exception in thread "main" java.lang.RuntimeException: Got unexpected > timezone information: Sun Mar 26 00:00:00 GMT+01:00 2023 > at CheckDST.runTZTest(CheckDST.java:28) > at CheckDST.main(CheckDST.java:8) > > > I assume `TZ=MEZ-1MESZ`refers Europe/Berlin timezone. > In this case, `TZ` environment variable should be > `MEZ-1MESZ,M3.5.0,M10.5.0/3` (`/3` is missing in testcase) > > CustomTzIDCheckDST should run with daylight saving time. > Add Simulate Southern Hemisphere by `MEZ-1MESZ,M10.5.0,M3.5.0/3` > > Tested by standalone testcase > > $ cat CheckDST1.java > import java.util.Calendar; > import java.util.Date; > import java.util.List; > import java.util.SimpleTimeZone; > import java.util.TimeZone; > import java.time.DayOfWeek; > import java.time.ZonedDateTime; > import java.time.temporal.TemporalAdjusters; > public class CheckDST1 { > // Northern Hemisphere > private static String CUSTOM_TZ = "MEZ-1MESZ,M3.5.0,M10.5.0/3"; > // Simulate Southern Hemisphere > private static String CUSTOM_TZ2 = "MEZ-1MESZ,M10.5.0,M3.5.0/3"; > public static void main(String args[]) throws Throwable { > runTZTest(); > } > > /* TZ code will always be set to "MEZ-1MESZ,M3.5.0,M10.5.0/3". > * This ensures the transition periods for Daylights Savings should be at > March's last > * Sunday and October's last Sunday. > */ > private static void runTZTest() { > Date time = new Date(); > String tzStr = System.getenv("TZ"); > if (tzStr == null) > throw new RuntimeException("Got unexpected timezone information: > TZ is null"); > boolean nor = tzStr.matches(".*,M3\..*,M10\..*"); > TimeZone tz = new SimpleTimeZone(360, tzStr, > nor ? Calendar.MARCH : Calendar.OCTOBER, -1, > Calendar.SUNDAY, 360, SimpleTimeZone.UTC_TIME, > nor ? Calendar.OCTOBER : Calendar.MARCH, -1, > Calendar.SUNDAY, 360, SimpleTimeZone.UTC_TIME, > 360); > System.out.println(time); > if (tz.inDaylightTime(time)) { > // We are in Daylight savings period. > if (time.toString().endsWith("GMT+02:00 " + > Integer.toString(time.getYear() + 1900))) > return; > } else { > if (time.toString().endsWith("GMT+01:00 " + > Integer.toString(time.getYear() + 1900))) > return; > } > > // Reaching here means time zone did not match up as expected. > throw new RuntimeException("Got unexpected timezone information: " + > tzStr + " " + time); > } > > private static ZonedDateTi
Re: RFR: 8299194: CustomTzIDCheckDST.java may fail at future date [v2]
On Wed, 21 Dec 2022 20:54:25 GMT, Naoto Sato wrote: >> Ichiroh Takiguchi has updated the pull request incrementally with one >> additional commit since the last revision: >> >> 8299194: CustomTzIDCheckDST.java may fail at future date > > Thanks for the fix. Looks good overall. A couple of minor comments/questions. Thanks @naotoj . I appreciate you suggestion. Please review it again. - PR: https://git.openjdk.org/jdk/pull/11756
Re: RFR: 8299194: CustomTzIDCheckDST.java may fail at future date [v2]
turn date.with(TemporalAdjusters.lastInMonth(DayOfWeek.SUNDAY)); > } > } > > > Check Europe/Berlin timezone settings > > $ zdump -v Europe/Berlin | grep 2023 > Europe/Berlin Sun Mar 26 00:59:59 2023 UTC = Sun Mar 26 01:59:59 2023 CET > isdst=0 gmtoff=3600 > Europe/Berlin Sun Mar 26 01:00:00 2023 UTC = Sun Mar 26 03:00:00 2023 CEST > isdst=1 gmtoff=7200 > Europe/Berlin Sun Oct 29 00:59:59 2023 UTC = Sun Oct 29 02:59:59 2023 CEST > isdst=1 gmtoff=7200 > Europe/Berlin Sun Oct 29 01:00:00 2023 UTC = Sun Oct 29 02:00:00 2023 CET > isdst=0 gmtoff=3600 > > > Test results are as follows: > > Northern Hemisphere side > > $ TZ=GMT faketime -m '2023-03-26 00:59:59' env TZ=MEZ-1MESZ,M3.5.0,M10.5.0/3 > date > Sun Mar 26 01:59:59 MEZ 2023 > $ TZ=GMT faketime -m '2023-03-26 00:59:59' env TZ=MEZ-1MESZ,M3.5.0,M10.5.0/3 > java CheckDST1 > Sun Mar 26 01:59:59 GMT+01:00 2023 > > $ TZ=GMT faketime -m '2023-03-26 01:00:00' env TZ=MEZ-1MESZ,M3.5.0,M10.5.0/3 > date > Sun Mar 26 03:00:00 MESZ 2023 > $ TZ=GMT faketime -m '2023-03-26 01:00:00' env TZ=MEZ-1MESZ,M3.5.0,M10.5.0/3 > java CheckDST1 > Sun Mar 26 03:00:00 GMT+02:00 2023 > > $ TZ=GMT faketime -m '2023-10-29 00:59:59' env TZ=MEZ-1MESZ,M3.5.0,M10.5.0/3 > date > Sun Oct 29 02:59:59 MESZ 2023 > $ TZ=GMT faketime -m '2023-10-29 00:59:59' env TZ=MEZ-1MESZ,M3.5.0,M10.5.0/3 > java CheckDST1 > Sun Oct 29 02:59:59 GMT+02:00 2023 > > $ TZ=GMT faketime -m '2023-10-29 01:00:00' env TZ=MEZ-1MESZ,M3.5.0,M10.5.0/3 > date > Sun Oct 29 02:00:00 MEZ 2023 > $ TZ=GMT faketime -m '2023-10-29 01:00:00' env TZ=MEZ-1MESZ,M3.5.0,M10.5.0/3 > java CheckDST1 > Sun Oct 29 02:00:00 GMT+01:00 2023 > > > Southern Hemisphere side > > $ TZ=GMT faketime -m '2023-03-26 00:59:59' env TZ=MEZ-1MESZ,M10.5.0,M3.5.0/3 > date > Sun Mar 26 02:59:59 MESZ 2023 > $bTZ=GMT faketime -m '2023-03-26 00:59:59' env TZ=MEZ-1MESZ,M10.5.0,M3.5.0/3 > java CheckDST1 > Sun Mar 26 02:59:59 GMT+02:00 2023 > > $ TZ=GMT faketime -m '2023-03-26 01:00:00' env TZ=MEZ-1MESZ,M10.5.0,M3.5.0/3 > date > Sun Mar 26 02:00:00 MEZ 2023 > $ TZ=GMT faketime -m '2023-03-26 01:00:00' env TZ=MEZ-1MESZ,M10.5.0,M3.5.0/3 > java CheckDST1 > Sun Mar 26 02:00:00 GMT+01:00 2023 > > $ TZ=GMT faketime -m '2023-10-29 00:59:59' env TZ=MEZ-1MESZ,M10.5.0,M3.5.0/3 > date > Sun Oct 29 01:59:59 MEZ 2023 > $ TZ=GMT faketime -m '2023-10-29 00:59:59' env TZ=MEZ-1MESZ,M10.5.0,M3.5.0/3 > java CheckDST1 > Sun Oct 29 01:59:59 GMT+01:00 2023 > > $ TZ=GMT faketime -m '2023-10-29 01:00:00' env TZ=MEZ-1MESZ,M10.5.0,M3.5.0/3 > date > Sun Oct 29 03:00:00 MESZ 2023 > $ TZ=GMT faketime -m '2023-10-29 01:00:00' env TZ=MEZ-1MESZ,M10.5.0,M3.5.0/3 > java CheckDST1 > Sun Oct 29 03:00:00 GMT+02:00 2023 Ichiroh Takiguchi has updated the pull request incrementally with one additional commit since the last revision: 8299194: CustomTzIDCheckDST.java may fail at future date - Changes: - all: https://git.openjdk.org/jdk/pull/11756/files - new: https://git.openjdk.org/jdk/pull/11756/files/a17d83d0..df2e8a86 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=11756=01 - incr: https://webrevs.openjdk.org/?repo=jdk=11756=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/11756.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11756/head:pull/11756 PR: https://git.openjdk.org/jdk/pull/11756
RFR: 8299194: CustomTzIDCheckDST.java may fail at future date
test/jdk/java/util/TimeZone/CustomTzIDCheckDST.java may fail at future date. I used following standalone testcase import java.util.Calendar; import java.util.Date; import java.util.SimpleTimeZone; public class CheckDST { private static String CUSTOM_TZ = "MEZ-1MESZ,M3.5.0,M10.5.0"; public static void main(String args[]) throws Throwable { runTZTest(); } /* TZ code will always be set to "MEZ-1MESZ,M3.5.0,M10.5.0". * This ensures the transition periods for Daylights Savings should be at March's last * Sunday and October's last Sunday. */ private static void runTZTest() { Date time = new Date(); if (new SimpleTimeZone(360, "MEZ-1MESZ", Calendar.MARCH, -1, Calendar.SUNDAY, 0, Calendar.OCTOBER, -1, Calendar.SUNDAY, 0).inDaylightTime(time)) { // We are in Daylight savings period. if (time.toString().endsWith("GMT+02:00 " + Integer.toString(time.getYear() + 1900))) return; } else { if (time.toString().endsWith("GMT+01:00 " + Integer.toString(time.getYear() + 1900))) return; } // Reaching here means time zone did not match up as expected. throw new RuntimeException("Got unexpected timezone information: " + time); } } I tested CheckDST with faketime, then I got following results $ TZ=GMT faketime -m "2023-03-25 22:59:59" env TZ="MEZ-1MESZ,M3.5.0,M10.5.0" $HOME/jdk-21-b02/bin/java CheckDST $ TZ=GMT faketime -m "2023-03-25 23:00:00" env TZ="MEZ-1MESZ,M3.5.0,M10.5.0" $HOME/jdk-21-b02/bin/java CheckDST Exception in thread "main" java.lang.RuntimeException: Got unexpected timezone information: Sun Mar 26 00:00:00 GMT+01:00 2023 at CheckDST.runTZTest(CheckDST.java:28) at CheckDST.main(CheckDST.java:8) I assume `TZ=MEZ-1MESZ`refers Europe/Berlin timezone. In this case, `TZ` environment variable should be `MEZ-1MESZ,M3.5.0,M10.5.0/3` (`/3` is missing in testcase) CustomTzIDCheckDST should run with daylight saving time. Add Simulate Southern Hemisphere by `MEZ-1MESZ,M10.5.0,M3.5.0/3` Tested by standalone testcase $ cat CheckDST1.java import java.util.Calendar; import java.util.Date; import java.util.List; import java.util.SimpleTimeZone; import java.util.TimeZone; import java.time.DayOfWeek; import java.time.ZonedDateTime; import java.time.temporal.TemporalAdjusters; public class CheckDST1 { // Northern Hemisphere private static String CUSTOM_TZ = "MEZ-1MESZ,M3.5.0,M10.5.0/3"; // Simulate Southern Hemisphere private static String CUSTOM_TZ2 = "MEZ-1MESZ,M10.5.0,M3.5.0/3"; public static void main(String args[]) throws Throwable { runTZTest(); } /* TZ code will always be set to "MEZ-1MESZ,M3.5.0,M10.5.0/3". * This ensures the transition periods for Daylights Savings should be at March's last * Sunday and October's last Sunday. */ private static void runTZTest() { Date time = new Date(); String tzStr = System.getenv("TZ"); if (tzStr == null) throw new RuntimeException("Got unexpected timezone information: TZ is null"); boolean nor = tzStr.matches(".*,M3\..*,M10\..*"); TimeZone tz = new SimpleTimeZone(360, tzStr, nor ? Calendar.MARCH : Calendar.OCTOBER, -1, Calendar.SUNDAY, 360, SimpleTimeZone.UTC_TIME, nor ? Calendar.OCTOBER : Calendar.MARCH, -1, Calendar.SUNDAY, 360, SimpleTimeZone.UTC_TIME, 360); System.out.println(time); if (tz.inDaylightTime(time)) { // We are in Daylight savings period. if (time.toString().endsWith("GMT+02:00 " + Integer.toString(time.getYear() + 1900))) return; } else { if (time.toString().endsWith("GMT+01:00 " + Integer.toString(time.getYear() + 1900))) return; } // Reaching here means time zone did not match up as expected. throw new RuntimeException("Got unexpected timezone information: " + tzStr + " " + time); } private static ZonedDateTime getLastSundayOfMonth(ZonedDateTime date) { return date.with(TemporalAdjusters.lastInMonth(DayOfWeek.SUNDAY)); } } Check Europe/Berlin timezone settings $ zdump -v Europe/Berlin | grep 2023 Europe/Berlin Sun Mar 26 00:59:59 2023 UTC = Sun Mar 26 01:59:59 2023 CET isdst=0 gmtoff=3600 Europe/Berlin Sun Mar 26 01:00:00 2023 UTC = Sun Mar 26 03:00:00 2023 CEST isdst=1 gmtoff=7200 Europe/Berlin Sun Oct 29 00:59:59 2023 UTC = Sun Oct 29 02:59:59 2023 CEST isdst=1 gmtoff=7200 Europe/Berlin Sun Oct 29 01:00:00 2023 UTC = Sun Oct 29 02:00:00 2023 CET isdst=0 gmtoff=3600 Test results are as follows: Northern Hemisphere side $ TZ=GMT faketime -m '2023-03-26 00:59:59' env TZ=MEZ-1MESZ,M3.5.0,M10.5.0/3 date Sun Mar 26 01:59:59 MEZ 2023 $ TZ=GMT faketime -m '2023-03-26 00:59:59' env TZ=MEZ-1MESZ,M3.5.0,M10.5.0/3 java
Re: RFR: 8289834: Add SBCS and DBCS Only EBCDIC charsets
On Wed, 6 Jul 2022 14:05:39 GMT, Ichiroh Takiguchi wrote: > OpenJDK supports "Japanese EBCDIC - Katakana" and "Korean EBCDIC" SBCS and > DBCS Only charsets. > |Charset|Mix|SBCS|DBCS| > | -- | -- | -- | -- | > | Japanese EBCDIC - Katakana | Cp930 | Cp290 | Cp300 | > | Korean | Cp933 | Cp833 | Cp834 | > > But OpenJDK does not supports some of "Japanese EBCDIC - English" / > "Simplified Chinese EBCDIC" / "Traditional Chinese EBCDIC" SBCS and DBCS Only > charsets. > > I'd like to request Cp1027/Cp835/Cp836/Cp837 for consistency > |Charset|Mix|SBCS|DBCS| > | - | - | - | - | > | Japanese EBCDIC - English | Cp939 | **Cp1027** | Cp300 | > | Simplified Chinese EBCDIC | Cp935 | **Cp836** | **Cp837** | > | Traditional Chinese EBCDIC | Cp937 | (*1) | **Cp835** | > > *1: Cp037 compatible I'm still working on this one. - PR: https://git.openjdk.org/jdk/pull/9399
Re: RFR: 8289834: Add SBCS and DBCS Only EBCDIC charsets
On Fri, 26 Aug 2022 09:25:55 GMT, Alan Bateman wrote: >> OpenJDK supports "Japanese EBCDIC - Katakana" and "Korean EBCDIC" SBCS and >> DBCS Only charsets. >> |Charset|Mix|SBCS|DBCS| >> | -- | -- | -- | -- | >> | Japanese EBCDIC - Katakana | Cp930 | Cp290 | Cp300 | >> | Korean | Cp933 | Cp833 | Cp834 | >> >> But OpenJDK does not supports some of "Japanese EBCDIC - English" / >> "Simplified Chinese EBCDIC" / "Traditional Chinese EBCDIC" SBCS and DBCS >> Only charsets. >> >> I'd like to request Cp1027/Cp835/Cp836/Cp837 for consistency >> |Charset|Mix|SBCS|DBCS| >> | - | - | - | - | >> | Japanese EBCDIC - English | Cp939 | **Cp1027** | Cp300 | >> | Simplified Chinese EBCDIC | Cp935 | **Cp836** | **Cp837** | >> | Traditional Chinese EBCDIC | Cp937 | (*1) | **Cp835** | >> >> *1: Cp037 compatible > >> Use following options, like OpenJDK: `java -cp >> icu4j-71_1.jar:icu4j-charset-71_1.jar:. tc IBM-1047 2 1 1` ICU4J `java >> -cp icu4j-71_1.jar:icu4j-charset-71_1.jar:. tc IBM-1047_P100-1995 2 1 1` >> >> Actually, I'm confused by this result. Previously, I was just comparing A/A >> with B/B on OpenJDK's charset. I didn't think ICU4J's result would make a >> difference. > > My initial reaction is one of relief that the icu4j provider can be used with > current JDK builds. This means there is an option should we decide to stop > adding more EBCDIC charsets to the JDK. > > The test uses IBM-1047 and I can't tell if the icu4j provider is used or not. > Charset doesn't define a provider method but I think would be useful to print > cs.getClass() or cs.getClass().getModule() so we know which Charset > implementation is used. Also I think any discussion on performance would be > better served with a JMH benchmark rather than a standalone test. Hello @AlanBateman . Sorry I'm late. I created Charset SPI JAR `x-IBM1047_SPI` (`custom-charsets.jar`) which was ported from `sun.nio.cs.SingleByte.java` and `IBM1047.java` (generated one). Test code: package com.example; import java.nio.charset.Charset; import org.openjdk.jmh.annotations.Benchmark; public class MyBenchmark { final static String s; static { char[] ca = new char[0x2000]; for (int i = 0; i < ca.length; i++) { ca[i] = (char) (i & 0xFF); } s = new String(ca); } @Benchmark public void testIBM1047() throws Exception { byte[] ba = s.getBytes("IBM1047"); } @Benchmark public void testIBM1047_SPI() throws Exception { byte[] ba = s.getBytes("x-IBM1047_SPI"); } } All test related files are in [JDK-8289834](https://bugs.openjdk.org/browse/JDK-8289834). Test results are as follows on RHEL8.6 x86_64 (Intel Core i7 3520M) : 1.8.0_345-b01 Benchmark Mode Cnt Score Error Units MyBenchmark.testIBM1047 thrpt 25 53213.092 ± 126.962 ops/s MyBenchmark.testIBM1047_SPI thrpt 25 47442.669 ± 349.003 ops/s 20-ea+17-1181 Benchmark Mode Cnt Score Error Units MyBenchmark.testIBM1047 thrpt 25 136331.141 ± 1078.481 ops/s MyBenchmark.testIBM1047_SPI thrpt 25 51563.213 ± 843.238 ops/s IBM1047 is 2.6 times faster than the SPI version on JDK20. I think this results are related to **JEP 254: Compact Strings** . As I requested before, we'd like to use `sun.nio.cs.SingleByte*` and `sun.nio.cs.DoubleByte*` class as public API. - PR: https://git.openjdk.org/jdk/pull/9399
Re: RFR: 8291916: Unexpected output on Windows command prompt
On Tue, 9 Aug 2022 20:38:25 GMT, Naoto Sato wrote: >> To support Windows command prompt's codepage, following charsets should be >> moved from jdk.charsets module to java.base module. >> >> - IBM860 >> - IBM861 >> - IBM863 >> - IBM864 >> - IBM865 >> - IBM869 > > I looked at this issue a bit more. It looks to me that the issue is caused by > the fact that the encoding of `System.out` falls back to the default > encoding, as `IBM864` is not in `java.base`. This issue seems not new and > reproducible with the releases since JDK9 where modularization has been > introduced. Also, I think other encodings than those `IBM*` listed here, can > possibly cause this issue. In order to fix this completely, those obscure > encodings also have to be in `java.base` which I don't think we would want to > do. Hello @naotoj . Sorry for my bad reaction. I checked these charsets with IBM CDRA definitions. These are also same, but some round-trip definitions are not same, like #9661 . I think there come from files under https://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/ . As you know, `CP860/CP861/CP863/CP864/CP865/CP869` are defined into [IANA Character Sets](https://www.iana.org/assignments/character-sets/character-sets.xhtml) as an alias. Even if the registered names are `IBM*`, these charset implementations are from Microsoft. I think these charset should be usable as default charset on Windows command prompt. Please reconsider current Java implementation. - PR: https://git.openjdk.org/jdk/pull/9761
Integrated: 8292899: CustomTzIDCheckDST.java testcase failed on AIX platform
On Fri, 26 Aug 2022 07:26:46 GMT, Ichiroh Takiguchi wrote: > After `test/jdk/java/util/TimeZone/CustomTzIDCheckDST.java` testcase was > integrated, it failed on the AIX platform. > > Error output > > STDERR: > stdout: []; > stderr: [Exception in thread "main" java.lang.RuntimeException: Got > unexpected timezone information: Thu Aug 25 09:29:10 CEST 2022 > at CustomTzIDCheckDST.runTZTest(CustomTzIDCheckDST.java:71) > at CustomTzIDCheckDST.main(CustomTzIDCheckDST.java:50) > ] > > > By my investigation, `TZ=MEZ-1MESZ,M3.5.0,M10.5.0` timezone was changed to > `Europe/Berlin` timezone on AIX platform. > It seems this situation is happened because older AIX did not support > `MEZ-1MESZ,M3.5.0,M10.5.0` timezone by TZ environment variable. > https://www.ibm.com/support/pages/managing-time-zone-variable-posix > AIX special code was implemented into > `src/java.base/unix/native/libjava/TimeZone_md.c`. > Current AIX supports `TZ=EST5EDT,M3.2.0/2:00:00,M11.1.0/2:00:00` style. > I think implementation change is required. > > Some pre-submit tests are failed, but I think these are not related this > change since modified parts are just for AIX platform. This pull request has now been integrated. Changeset: 3464019d Author:Ichiroh Takiguchi URL: https://git.openjdk.org/jdk/commit/3464019d7e8fe57adc910339c00ba79884c77852 Stats: 17 lines in 1 file changed: 11 ins; 1 del; 5 mod 8292899: CustomTzIDCheckDST.java testcase failed on AIX platform Reviewed-by: naoto - PR: https://git.openjdk.org/jdk/pull/10036
Re: RFR: 8292899: CustomTzIDCheckDST.java testcase failed on AIX platform
On Fri, 26 Aug 2022 18:56:31 GMT, Naoto Sato wrote: >> After `test/jdk/java/util/TimeZone/CustomTzIDCheckDST.java` testcase was >> integrated, it failed on the AIX platform. >> >> Error output >> >> STDERR: >> stdout: []; >> stderr: [Exception in thread "main" java.lang.RuntimeException: Got >> unexpected timezone information: Thu Aug 25 09:29:10 CEST 2022 >> at CustomTzIDCheckDST.runTZTest(CustomTzIDCheckDST.java:71) >> at CustomTzIDCheckDST.main(CustomTzIDCheckDST.java:50) >> ] >> >> >> By my investigation, `TZ=MEZ-1MESZ,M3.5.0,M10.5.0` timezone was changed to >> `Europe/Berlin` timezone on AIX platform. >> It seems this situation is happened because older AIX did not support >> `MEZ-1MESZ,M3.5.0,M10.5.0` timezone by TZ environment variable. >> https://www.ibm.com/support/pages/managing-time-zone-variable-posix >> AIX special code was implemented into >> `src/java.base/unix/native/libjava/TimeZone_md.c`. >> Current AIX supports `TZ=EST5EDT,M3.2.0/2:00:00,M11.1.0/2:00:00` style. >> I think implementation change is required. >> >> Some pre-submit tests are failed, but I think these are not related this >> change since modified parts are just for AIX platform. > > src/java.base/unix/native/libjava/TimeZone_md.c line 589: > >> 587: // But Hotspot does not support XPG_SUS_ENV=ON. >> 588: // Ignore daylight saving settings to calculate current time >> difference >> 589: localtm.tm_isdst = 0; > > Is it OK to reset it always? Could this defy the original purpose of the fix > to https://bugs.openjdk.org/browse/JDK-8285838? I executed test program [JDK-8285838](https://bugs.openjdk.org/browse/JDK-8285838) on RHEL8 x86_64. TZ="MEZ-1MESZ,M3.5.0,M10.5.0" means I'm on daylight saving time on today. By JDK18 $ TZ="MEZ-1MESZ,M3.5.0,M10.5.0" jdk-18/bin/java -showversion TimeTest.java openjdk version "18" 2022-03-22 OpenJDK Runtime Environment (build 18+36-2087) OpenJDK 64-Bit Server VM (build 18+36-2087, mixed mode, sharing) Calendar.getInstance().getTime() = Thu Sep 01 11:52:09 GMT+01:00 2022 SimpleDateFormat = 01.09.2022 11:52:09.747 By JDK20 $ TZ="MEZ-1MESZ,M3.5.0,M10.5.0" jdk-20-b12/bin/java -showversion TimeTest.java openjdk version "20-ea" 2023-03-21 OpenJDK Runtime Environment (build 20-ea+12-790) OpenJDK 64-Bit Server VM (build 20-ea+12-790, mixed mode, sharing) Calendar.getInstance().getTime() = Thu Sep 01 12:52:21 GMT+02:00 2022 SimpleDateFormat = 01.09.2022 12:52:21.269 Expected result is GMT+02:00. It means the output is the current time difference between GMT and MESZ. On modified build $ TZ="MEZ-1MESZ,M3.5.0,M10.5.0" jdk/build/aix-ppc64-server-release/images/jdk/bin/java TimeTest.java Calendar.getInstance().getTime() = Thu Sep 01 12:53:12 GMT+02:00 2022 SimpleDateFormat = 01.09.2022 12:53:12.930 According to AIX docs for mktime() https://www.ibm.com/docs/en/aix/7.2?topic=c-ctime-localtime-gmtime-mktime-difftime-asctime-tzset-subroutine > The value of the ```tm_isdst``` field determines the following actions of the > **mktime** subroutine: > "0" means "Initially presumes that Daylight Saving Time (DST) is not in > effect." If daylight saving time ends by August, timezone should be GMT+01:00 $ TZ="MEZ-1MESZ,M3.5.0,M8.5.0" jdk/build/aix-ppc64-server-release/images/jdk/bin/java TimeTest.java Calendar.getInstance().getTime() = Thu Sep 01 11:53:36 GMT+01:00 2022 SimpleDateFormat = 01.09.2022 11:53:36.189 According to simple C testcase $ cat sf.c #include #include #include int main(void) { char buf[100]; struct tm localtm; struct tm gmt; time_t clock = time(NULL); int gmt_off; #if defined(_AIX) putenv("XPG_SUS_ENV=ON"); #endif if (localtime_r(, ) == NULL) { return 1; } if (gmtime_r(, ) == NULL) { return 1; } strftime(buf, sizeof(buf),"%z", ); printf("strftime: %s\n",buf); localtm.tm_isdst = 0; gmt_off = (int)(difftime(mktime(), mktime()) / 60.0); sprintf(buf, (const char *)"%c%02.2d%02.2d", gmt_off < 0 ? '-' : '+' , abs(gmt_off / 60), gmt_off % 60); printf("difftime: %s\n",buf); return 0; } On RHEL8: $ TZ="MEZ-1MESZ,M3.5.0,M10.5.0" ./sf strftime: +0200 difftime: +0200 $ TZ="ZZZ-1-3,M3.5.0,M10.5.0" ./sf strftime: +0300 difftime: +0300 $ TZ="ZZZ-1-3,M3.5.0,M8.5.0" ./sf strftime: +0100 difftime: +0100 On AIX: $ TZ="MEZ-1MESZ,M3.5.0,M10.5.0" ./sf strftime: +0200 difftime: +0200 $ TZ="ZZZ-1-3,M3.5.0,M10.5.0" ./sf strftime: +0300 difftime: +0300 $ TZ="ZZZ-1-3,M3.5.0,M8.5.0" ./sf strftime: +0100 difftime: +0100 I assume the modified code should be fine. - PR: https://git.openjdk.org/jdk/pull/10036
RFR: 8292899: CustomTzIDCheckDST.java testcase failed on AIX platform
After `test/jdk/java/util/TimeZone/CustomTzIDCheckDST.java` testcase was integrated, it failed on the AIX platform. Error output STDERR: stdout: []; stderr: [Exception in thread "main" java.lang.RuntimeException: Got unexpected timezone information: Thu Aug 25 09:29:10 CEST 2022 at CustomTzIDCheckDST.runTZTest(CustomTzIDCheckDST.java:71) at CustomTzIDCheckDST.main(CustomTzIDCheckDST.java:50) ] By my investigation, `TZ=MEZ-1MESZ,M3.5.0,M10.5.0` timezone was changed to `Europe/Berlin` timezone on AIX platform. It seems this situation is happened because older AIX did not support `MEZ-1MESZ,M3.5.0,M10.5.0` timezone by TZ environment variable. https://www.ibm.com/support/pages/managing-time-zone-variable-posix AIX special code was implemented into `src/java.base/unix/native/libjava/TimeZone_md.c`. Current AIX supports `TZ=EST5EDT,M3.2.0/2:00:00,M11.1.0/2:00:00` style. I think implementation change is required. Some pre-submit tests are failed, but I think these are not related this change since modified parts are just for AIX platform. - Commit messages: - 8292899: CustomTzIDCheckDST.java testcase failed on AIX platform Changes: https://git.openjdk.org/jdk/pull/10036/files Webrev: https://webrevs.openjdk.org/?repo=jdk=10036=00 Issue: https://bugs.openjdk.org/browse/JDK-8292899 Stats: 17 lines in 1 file changed: 11 ins; 1 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/10036.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10036/head:pull/10036 PR: https://git.openjdk.org/jdk/pull/10036
Re: RFR: 8289834: Add SBCS and DBCS Only EBCDIC charsets
On Mon, 8 Aug 2022 09:22:32 GMT, Alan Bateman wrote: >> Hello @AlanBateman . >> Sorry I'm late. >> I got some responses from ICU. >> [ICU-22091](https://unicode-org.atlassian.net/browse/ICU-22091) >> I'm not sure if they're interested in the new charset... >> >> As you know `sun.nio.cs.ArrayDecoder` and `sun.nio.cs.ArrayEncoder`interface >> have performance advantage. >> And some other performance advantages are there on built-in charset >> decoder/encoder. >> Is it possible to create simple public API by using `sun.nio.cs.SingleByte` >> and `sun.nio.cs.DoubleByte*` classes? >> We'd like to use stable conversion loop. > >> As you know `sun.nio.cs.ArrayDecoder` and `sun.nio.cs.ArrayEncoder`interface >> have performance advantage. And some other performance advantages are there >> on built-in charset decoder/encoder. Is it possible to create simple public >> API by using `sun.nio.cs.SingleByte` and `sun.nio.cs.DoubleByte*` classes? >> We'd like to use stable conversion loop. > > If they have ASCII compatible regions then that may be so but I haven't see > any performance data published on that. Do you know if any experiments that > have deployed a CharsetProvider for the EBCDIC charsets and compared the > performance with the charsets that in the JDK? There may be merit in > exploring adding base abstracts implementations of > CharsetEncoder/CharsetDecoder to java.nio.charsets.spi to support single and > double byte charsets to see how such base implementations might look, how > they would help performance, and if there are any security downsides. Hello @AlanBateman . Sorry, I'm late. Test result is attached (not guaranteed). I created attached small test program, I'm not sure it's good or not import java.nio.*; import java.nio.charset.*; public class tc { public static void main(String[] args) throws Exception { Charset cs = Charset.forName(args[0]); int cnt = Integer.parseInt(args[1]); boolean useCA = "1".equals(args[2]); boolean useBA = "1".equals(args[3]); CharsetEncoder ce = cs.newEncoder(); byte[] ba = new byte[0x4000]; for(int i = 0; i < ba.length; i++) { ba[i] = (byte) i; } String s = new String(ba, cs); char[] ca = s.toCharArray(); ByteBuffer bb = useBA ? ByteBuffer.allocate(ca.length) : ByteBuffer.allocateDirect(ca.length);; CharBuffer cb = useCA ? CharBuffer.wrap(ca) : CharBuffer.wrap(s); System.out.println("CharBuffer.hasArray() = " + cb.hasArray()); System.out.println("ByteBuffer.hasArray() = " + bb.hasArray()); long start_t = System.currentTimeMillis(); for(int i = 0; i < 200; i++) { ce.reset(); bb.position(0); cb.position(0); ce.encode(cb, bb, true); } System.out.println("Warmup: "+(System.currentTimeMillis() - start_t)); start_t = System.currentTimeMillis(); for(int i = 0; i < cnt; i++) { ce.reset(); bb.position(0); cb.position(0); ce.encode(cb, bb, true); } System.out.println("Test: "+(System.currentTimeMillis() - start_t)); } } Following test result is just for my test environment * CPU: Intel (On-premises environment), not same machine * Executed 5 times, the values are their average Use following options, like OpenJDK: `java -cp icu4j-71_1.jar:icu4j-charset-71_1.jar:. tc IBM-1047 2 1 1` ICU4J `java -cp icu4j-71_1.jar:icu4j-charset-71_1.jar:. tc IBM-1047_P100-1995 2 1 1` I used jdk-20 b12 Only A/A with OpenJDK uses ArrayEncoder (ArrayDecoder) interface | | A/A | A/B | B/A | B/B | | -- | --: | --: | --: | --: | | Linux (OpenJDK) | 862 | 1265 | 1838 | 1843 | | Linux (ICU4J) | 1450 | 1410 | 1152 | 1138 | | Windows (OpenJDK) | 921 | 1231 | 1959 | 1850 | | Windows (ICU4J) | 1431 | 1446 | 2227 | 2265 | | Mac (OpenJDK) | 820 | 1163 | 1799 | 1774 | | Mac (ICU4J) | 1282 | 1242 | 994 | 1049 | Notes: * A/A means CharBuffer is created via char[], ByteBuffer is generated by allocate() * A/B means CharBuffer is created via char[], ByteBuffer is generated by allocateDirect() * B/A means CharBuffer is created via String, ByteBuffer is generated by allocate() * B/B means CharBuffer is created via String, ByteBuffer is generated by allocateDirect() Actually, I'm confused by this result. Previously, I was just comparing A/A with B/B on OpenJDK's charset. I didn't think ICU4J's result would make a difference. Anyway, please evaluate about this result. And please let me know if I need more investigation. - PR: https://git.openjdk.org/jdk/pull/9399
Re: RFR: 8291916: Unexpected output on Arabic Windows command prompt
On Fri, 5 Aug 2022 16:44:37 GMT, Naoto Sato wrote: >> To support Windows command prompt's codepage, following charsets should be >> moved from jdk.charsets module to java.base module. >> >> - IBM860 >> - IBM861 >> - IBM863 >> - IBM864 >> - IBM865 >> - IBM869 > > Hi @takiguc, > I am not quite sure what is the rationale for moving those charsets into > `java.base` module. IIUC, we typically did such a fix when the java runtime > cannot boot in a supported configuration > (https://bugs.openjdk.org/browse/JDK-8187910), but it seems that this issue > does not warrant such a requirement. Will you elaborate more? Hello @naotoj . As Alan was described, windows codepage mapping table is as follows - 860 - Portuguese (DOS) - IBM860 - 861 - Icelandic (DOS) - IBM861 - 863 - French Canadian (DOS) - IBM863 - 864 - Arabic (864) - IBM864 - 865 - Nordic (DOS) - IBM865 - 869 - Greek, Modern (DOS) - IBM869 Java 8 implementation is as follows: Windows command prompt setting, following sample is 864. >chcp 864 Active code page: 864 Test program >type termdump.java import java.nio.charset.*; public class termdump { public static void main(String[] args) throws Exception { String csname = System.getProperty("sun.stdout.encoding"); if (csname == null) csname = System.getProperty("stdout.encoding"); System.out.println(csname); Charset cs = Charset.forName(csname); for (int i0 = 0; i0 < 0x100; i0 += 0x10) { StringBuilder sb = new StringBuilder(); for (int i1 = 0; i1 < 0x10; i1++) { byte[] ba = new byte[1]; ba[0] = (byte) (i0 | i1); String s = new String(ba, csname); if (s.length() == 1) { char ch = s.charAt(0); if (ch < 0x7F) continue; if (Character.isISOControl(ch)) continue; if (ch == '\uFFFD') continue; sb.append(ch); } } if (sb.length() > 0) { System.out.printf("0x%02X %s%n", i0, sb.toString()); System.out.print(""); for (char ch : sb.toString().toCharArray()) { System.out.printf(" %04X", (int)ch); } System.out.println(); } } } } Java8 output >jdk8u345-b01\jre\bin\java termdump cp864 0x20 % 066A 0x80 °·∙√▒─│┼┤┬├┴┐┌└┘ 00B0 00B7 2219 221A 2592 2500 2502 253C 2524 252C 251C 2534 2510 250C 2514 2518 0x90 β∞φ±½¼≈«»ﻷﻸﻻﻼ 03B2 221E 03C6 00B1 00BD 00BC 2248 00AB 00BB FEF7 FEF8 FEFB FEFC 0xA0 ﺂ£¤ﺄﺎﺏﺕﺙ،ﺝﺡﺥ 00A0 00AD FE82 00A3 00A4 FE84 FE8E FE8F FE95 FE99 060C FE9D FEA1 FEA5 0xB0 ٠١٢٣٤٥٦٧٨٩ﻑ؛ﺱﺵﺹ؟ 0660 0661 0662 0663 0664 0665 0666 0667 0668 0669 FED1 061B FEB1 FEB5 FEB9 061F 0xC0 ¢ﺀﺁﺃﺅﻊﺋﺍﺑﺓﺗﺛﺟﺣﺧﺩ 00A2 FE80 FE81 FE83 FE85 FECA FE8B FE8D FE91 FE93 FE97 FE9B FE9F FEA3 FEA7 FEA9 0xD0 ﺫﺭﺯﺳﺷﺻﺿﻁﻅﻋﻏ¦¬÷×ﻉ FEAB FEAD FEAF FEB3 FEB7 FEBB FEBF FEC1 FEC5 FECB FECF 00A6 00AC 00F7 00D7 FEC9 0xE0 ـﻓﻗﻛﻟﻣﻧﻫﻭﻯﻳﺽﻌﻎﻍﻡ 0640 FED3 FED7 FEDB FEDF FEE3 FEE7 FEEB FEED FEEF FEF3 FEBD FECC FECE FECD FEE1 0xF0 ﹽّﻥﻩﻬﻰﻲﻐﻕﻵﻶﻝﻙﻱ■ FE7D 0651 FEE5 FEE9 FEEC FEF0 FEF2 FED0 FED5 FEF5 FEF6 FEDD FED9 FEF1 25A0 Java20 output >jdk-20\bin\java termdump cp864 0x20 ﻋﺕ 066A 0x80 ﺁ٠ﺁ٧ﻗ┤ﻷﻗ┤ﻸﻗ≈φﻗ½°ﻗ½∙ﻗ½ﺱﻗ½¤ﻗ½،ﻗ½ﻗ½٤ﻗ½βﻗ½┐ﻗ½½ﻗ½» 00B0 00B7 2219 221A 2592 2500 2502 253C 2524 252C 251C 2534 2510 250C 2514 2518 0x90 ﺧ٢ﻗ┤ﻼﺩ│ﺁ١ﺁﺵﺁﺱﻗ┬┤ﺁﺙﺁ؛ﻡ؛٧ﻡ؛٨ﻡ؛؛ﻡ؛ﺱ 03B2 221E 03C6 00B1 00BD 00BC 2248 00AB 00BB FEF7 FEF8 FEFB FEFC 0xA0 ﺁ ﺁﺝﻡﻑ∙ﺁ£ﺁ¤ﻡﻑ▒ﻡﻑ└ﻡﻑ┘ﻡﻑ¼ﻡﻑﻷﻅ┐ﻡﻑﻻﻡﻑﻡﻑﺄ 00A0 00AD FE82 00A3 00A4 FE84 FE8E FE8F FE95 FE99 060C FE9D FEA1 FEA5 0xB0 ﻋ ﻋﻋﺂﻋ£ﻋ¤ﻋﺄﻋﻋﻋﺎﻋﺏﻡ؛∞ﻅﻡﻑ١ﻡﻑ٥ﻡﻑ٩ﻅ 0660 0661 0662 0663 0664 0665 0666 0667 0668 0669 FED1 061B FEB1 FEB5 FEB9 061F 0xC0 ﺁﺂﻡﻑ°ﻡﻑ·ﻡﻑ√ﻡﻑ─ﻡ؛├ﻡﻑ┴ﻡﻑ┌ﻡﻑ∞ﻡﻑ±ﻡﻑ«ﻡﻑﻡﻑﻡﻑ£ﻡﻑﻡﻑﺏ 00A2 FE80 FE81 FE83 FE85 FECA FE8B FE8D FE91 FE93 FE97 FE9B FE9F FEA3 FEA7 FEA9 0xD0 ﻡﻑﺙﻡﻑﺝﻡﻑﺥﻡﻑ٣ﻡﻑ٧ﻡﻑ؛ﻡﻑ؟ﻡ؛·ﻡ؛─ﻡ؛┴ﻡ؛┘ﺁﺁ،ﺃ٧ﺃ«ﻡ؛┬ FEAB FEAD FEAF FEB3 FEB7 FEBB FEBF FEC1 FEC5 FECB FECF 00A6 00AC 00F7 00D7 FEC9 0xE0 ﻋ°ﻡ؛±ﻡ؛«ﻡ؛ﻡ؛ﻡ؛£ﻡ؛ﻡ؛ﺙﻡ؛ﺝﻡ؛ﺥﻡ؛٣ﻡﻑﺵﻡ؛┐ﻡ؛└ﻡ؛┌ﻡ؛ 0640 FED3 FED7 FEDB FEDF FEE3 FEE7 FEEB FEED FEEF FEF3 FEBD FECC FECE FECD FEE1 0xF0 ﻡ٩ﺵﻋ∞ﻡ؛ﺄﻡ؛ﺏﻡ؛،ﻡ؛٠ﻡ؛٢ﻡ؛βﻡ؛¼ﻡ؛٥ﻡ؛٦ﻡ؛ﻻﻡ؛ﻷﻡ؛١ﻗ≈ FE7D 0651 FEE5 FEE9 FEEC FEF0 FEF2 FED0 FED5 FEF5 FEF6 FEDD FED9 FEF1 25A0 Fixed output >java -showversion termdump openjdk version "20-internal" 2023-03-21 OpenJDK Runtime Environment (build 20-internal-adhoc.Administrator.jdk) OpenJDK 64-Bit Server VM (build 20-internal-adhoc.Administrator.jdk, mixed mode, sharing) cp864 0x20 % 066A 0x80 °·∙√▒─│┼┤┬├┴┐┌└┘ 00B0 00B7 2219 221A 2592 2500 2502 253C 2524 252C 251C 2534 2510 250C 2514 2518 0x90 β∞φ±½¼≈«»ﻷﻸﻻﻼ 03B2 221E 03C6 00B1 00BD 00BC 2248 00AB 00BB FEF7 FEF8 FEFB FEFC 0xA0 ﺂ£¤ﺄﺎﺏﺕﺙ،ﺝﺡﺥ 00A0 00AD FE82 00A3 00A4 FE84 FE8E FE8F FE95 FE99 060C FE9D FEA1 FEA5 0xB0 ٠١٢٣٤٥٦٧٨٩ﻑ؛ﺱﺵﺹ؟ 0660 0661 0662 0663 0664 0665 0666 0667 0668 0669 FED1 061B FEB1 FEB5 FEB9 061F 0xC0 ¢ﺀﺁﺃﺅﻊﺋﺍﺑﺓﺗﺛﺟﺣﺧﺩ 00A2 FE80 FE81 FE83 FE85 FECA FE8B FE8D FE91 FE93 FE97 FE9B FE9F FEA3 FEA7 FEA9 0xD0 ﺫﺭﺯﺳﺷﺻﺿﻁﻅﻋﻏ¦¬÷×ﻉ FEAB FEAD FEAF FEB3 FEB7 FEBB
Re: RFR: 8289834: Add SBCS and DBCS Only EBCDIC charsets
On Thu, 7 Jul 2022 09:47:25 GMT, Alan Bateman wrote: >> And also there is no reason why db drivers or host connectors should not >> ship their own charset support \(Oracle JDBC for example had nls\_charset >> addons\. My employer also ship a custom EBCDIC encoding which includes some >> compatibility hacks\, and that took some effort to adopt it to the missing >> ext mechanism\)\. >> >> Having said that\, with JPMS a \?legacy ebcdic\? encoding module would be >> possible while still being optional\. Maybe in the future a mechanism for >> modules which can be added \(instead of removed\) from standard distribution >> would make that nicer\? >> >> Is there a performance restriction for charset if they are not part of a >> platform module \(optimized string access\)\? >> >> Gruss >> Bernd >> >> >> \-\- >> http\:\/\/bernd\.eckenfels\.net >> \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ >> Von\: core\-libs\-dev \ im Auftrag >> von Alan Bateman \ >> Gesendet\: Thursday\, July 7\, 2022 11\:50\:39 AM >> An\: build\-dev at openjdk\.org \\; >> core\-libs\-dev at openjdk\.org \\; >> i18n\-dev at openjdk\.org \ >> Betreff\: Re\: RFR\: 8289834\: Add SBCS and DBCS Only EBCDIC charsets >> >> On Wed\, 6 Jul 2022 16\:18\:08 GMT\, Ichiroh Takiguchi \> openjdk\.org> wrote\: >> >>> Discussions are available on \: >>> \[JDK\-8289834\]\(https\:\/\/bugs\.openjdk\.org\/browse\/JDK\-8289834\)\: >>> Add SBCS and DBCS Only EBCDIC charsets >> >> Yes\, I think this need discussion on whether the JDK really needs to keep >> including and adding more EBCDIC charsets\. I understand they can be useful >> for someone using JDBC to connect to a database on z\/OS but this scenario >> would work equally well if the EBCDIC charsets were deployed on the class >> path or module path\. Do you know if the icu4j project is still alive\? >> I\'ve often wondered why there wasn\'t more use of the provider mechanism\. >> >> \-\-\-\-\-\-\-\-\-\-\-\-\- >> >> PR\: https\:\/\/git\.openjdk\.org\/jdk\/pull\/9399 >> \-\-\-\-\-\-\-\-\-\-\-\-\-\- next part \-\-\-\-\-\-\-\-\-\-\-\-\-\- >> An HTML attachment was scrubbed\.\.\. >> URL\: >> \ > >> Discussions are available on : >> [JDK-8289834](https://bugs.openjdk.org/browse/JDK-8289834): Add SBCS and >> DBCS Only EBCDIC charsets > > Yes, I think this need discussion on whether the JDK really needs to keep > including and adding more EBCDIC charsets. I understand they can be useful > for someone using JDBC to connect to a database on z/OS but this scenario > would work equally well if the EBCDIC charsets were deployed on the class > path or module path. Do you know if the icu4j project is still alive? I've > often wondered why there wasn't more use of the provider mechanism. Hello @AlanBateman . Sorry I'm late. I got some responses from ICU. [ICU-22091](https://unicode-org.atlassian.net/browse/ICU-22091) I'm not sure if they're interested in the new charset... As you know `sun.nio.cs.ArrayDecoder` and `sun.nio.cs.ArrayEncoder`interface have performance advantage. And some other performance advantages are there on built-in charset decoder/encoder. Is it possible to create simple public API by using `sun.nio.cs.SingleByte` and `sun.nio.cs.DoubleByte*` classes? We'd like to use stable conversion loop. - PR: https://git.openjdk.org/jdk/pull/9399
Re: RFR: 8290488: IBM864 character encoding implementation bug
On Wed, 27 Jul 2022 17:47:36 GMT, Naoto Sato wrote: > Adding an extra c2b mapping for the `%` in `IBM864` charset. The discrepancy > came from the mapping difference between MS and IBM. I think you can ignore my comments. I'm not sure if this change will solve the reporter's issue... - PR: https://git.openjdk.org/jdk/pull/9661
Re: RFR: 8290488: IBM864 character encoding implementation bug
On Thu, 28 Jul 2022 16:18:51 GMT, Naoto Sato wrote: >> Many thanks @naotoj . >> >> I checked the latest IBM-864 mapping table. >> (I assume current OpenJDK's IBM864 may refer older mapping table) >> https://raw.githubusercontent.com/unicode-org/icu/main/icu4c/source/data/mappings/ibm-864_X110-1999.ucm >> .ucm file format is as follows: >> https://unicode-org.github.io/icu/userguide/conversion/data.html#ucm-file-format >> >> I checked roundtrip mapping >> (Roundtrip entries have `|0` at the end of line) >> | IBM864.map | ibm-864_X110-1999.ucm | >> | --- | --- | >> | 0x1aU+001a | 0x1aU+001c | >> | 0x1cU+001c | 0x1cU+007f | >> | **0x25U+066a** | **0x25U+0025** | >> | 0x7fU+007f | 0x7fU+001a | >> | 0x9fU+fffd | 0x9fU+200b | >> | 0xd7U+fec1 | 0xd7U+fec3 | >> | 0xd8U+fec5 | 0xd8U+fec7 | >> | 0xf1U+0651 | 0xf1U+fe7c | >> >> **Note**: 0x1a <-> U+001c / 0x1c <-> U+007f / 0x7f <-> U+001a entries are >> control character rotation for DOS. >> I think it should be ignored. >> >> I think, roundtrip side should be changed. >> 0x25 entry should be U+0025 on IBM864.map >> Add `0x25 U+066a` into IBM864.c2b >> >> Modify test/jdk/sun/nio/cs/mapping/Cp864.b2c for `0025 0025` >> Add `0025 066a` into test/jdk/sun/nio/cs/mapping/Cp864.c2b-irreversible >> >> This issue just for U+0025, but f possible, please add `0x9f, 0xd7, 0xd8, >> 0xf1` entries. > > Thanks for trying it out @takiguc. However, I am not planning to change any > existing mappings because of the obvious compatibility issues. The fix I > proposed is safe because it is additional, which used to be unmappable (thus > turned into a replacement '?'). Hello @naotoj . I checked [JDK-8290488](https://bugs.openjdk.org/browse/JDK-8290488). This issue was tested by Windows 10. I think we need to confirm expected result for b2c side to reporter. I checked MS's 864 via following test program on my Windows 10. >type b2c_1.ps1 param($code, $hex) $h = [string]$hex $enc_r = [Text.Encoding]::GetEncoding([int]$code) [byte[]]$ba = @() for($i = 0; $i -lt $h.length; $i+=2) { $ba += ([System.Convert]::ToInt32($h.SubString($i,2), 16)) } $s = "" $enc_r.GetChars($ba) | foreach {$s += [System.Convert]::ToInt32($_).ToString("X4")} $s >powershell -NoProfile -ExecutionPolicy Unrestricted .\b2c_1.ps1 864 25 0025 Please ignore about 0xD7,0xD8,0xF1 if the target platform is Windows. Note: Test result for c2b side. >type c2b_1.ps1 param($code, $hex) $enc_r = [Text.Encoding]::GetEncoding([int]$code) [char[]]$ca = @() $ca += ([System.Convert]::ToInt32([string]$hex, 16)) $s = "" $enc_r.GetBytes($ca) | foreach {$s += [System.Convert]::ToInt32($_).ToString("X2")} $s >powershell -NoProfile -ExecutionPolicy Unrestricted .\c2b_1.ps1 864 0025 25 >powershell -NoProfile -ExecutionPolicy Unrestricted .\c2b_1.ps1 864 066A 25 - PR: https://git.openjdk.org/jdk/pull/9661
Re: RFR: 8290488: IBM864 character encoding implementation bug
On Thu, 28 Jul 2022 01:46:26 GMT, Naoto Sato wrote: >> Hello @naotoj . >> I'm not reviewer, but I'd like to test this change. >> Could you wait for a moment ? >> Thanks. > > @takiguc Sure. Appreciate it. Many thanks @naotoj . I checked the latest IBM-864 mapping table. (I assume current OpenJDK's IBM864 may refer older mapping table) https://raw.githubusercontent.com/unicode-org/icu/main/icu4c/source/data/mappings/ibm-864_X110-1999.ucm .ucm file format is as follows: https://unicode-org.github.io/icu/userguide/conversion/data.html#ucm-file-format I checked roundtrip mapping | IBM864.map | ibm-864_X110-1999.ucm | | --- | --- | | 0x1aU+001a | 0x1aU+001c | | 0x1cU+001c | 0x1cU+007f | | **0x25U+066a** | **0x25U+0025** | | 0x7fU+007f | 0x7fU+001a | | 0x9fU+fffd | 0x9fU+200b | | 0xd7U+fec1 | 0xd7U+fec3 | | 0xd8U+fec5 | 0xd8U+fec7 | | 0xf1U+0651 | 0xf1U+fe7c | **Note**: 0x1a <-> U+001c / 0x1c <-> U+007f / 0x7f <-> U+001a entries are control character rotation for DOS. I think it should be ignored. I think, roundtrip side should be changed. 0x25 entry should be U+0025 on IBM864.map Add `0x25 U+066a` into IBM864.c2b Modify test/jdk/sun/nio/cs/mapping/Cp864.b2c for `0025 0025` Add `0025 066a` into test/jdk/sun/nio/cs/mapping/Cp864.c2b-irreversible This issue just for U+0025, but f possible, please add `0x9f, 0xd7, 0xd8, 0xf1` entries. - PR: https://git.openjdk.org/jdk/pull/9661
Re: RFR: 8290488: IBM864 character encoding implementation bug
On Wed, 27 Jul 2022 17:47:36 GMT, Naoto Sato wrote: > Adding an extra c2b mapping for the `%` in `IBM864` charset. The discrepancy > came from the mapping difference between MS and IBM. Hello @naotoj . I'm not reviewer, but I'd like to test this change. Could you wait for a moment ? Thanks. - PR: https://git.openjdk.org/jdk/pull/9661
Re: RFR: 8289834: Add SBCS and DBCS Only EBCDIC charsets
On Wed, 6 Jul 2022 14:05:39 GMT, Ichiroh Takiguchi wrote: > OpenJDK supports "Japanese EBCDIC - Katakana" and "Korean EBCDIC" SBCS and > DBCS Only charsets. > |Charset|Mix|SBCS|DBCS| > | -- | -- | -- | -- | > | Japanese EBCDIC - Katakana | Cp930 | Cp290 | Cp300 | > | Korean | Cp933 | Cp833 | Cp834 | > > But OpenJDK does not supports some of "Japanese EBCDIC - English" / > "Simplified Chinese EBCDIC" / "Traditional Chinese EBCDIC" SBCS and DBCS Only > charsets. > > I'd like to request Cp1027/Cp835/Cp836/Cp837 for consistency > |Charset|Mix|SBCS|DBCS| > | - | - | - | - | > | Japanese EBCDIC - English | Cp939 | **Cp1027** | Cp300 | > | Simplified Chinese EBCDIC | Cp935 | **Cp836** | **Cp837** | > | Traditional Chinese EBCDIC | Cp937 | (*1) | **Cp835** | > > *1: Cp037 compatible Discussions are available on : [JDK-8289834](https://bugs.openjdk.org/browse/JDK-8289834): Add SBCS and DBCS Only EBCDIC charsets - PR: https://git.openjdk.org/jdk/pull/9399
RFR: 8289834: Add SBCS and DBCS Only EBCDIC charsets
OpenJDK supports "Japanese EBCDIC - Katakana" and "Korean EBCDIC" SBCS and DBCS Only charsets. |Charset|Mix|SBCS|DBCS| | -- | -- | -- | -- | | Japanese EBCDIC - Katakana | Cp930 | Cp290 | Cp300 | | Korean | Cp933 | Cp833 | Cp834 | But OpenJDK does not supports some of "Japanese EBCDIC - English" / "Simplified Chinese EBCDIC" / "Traditional Chinese EBCDIC" SBCS and DBCS Only charsets. I'd like to request Cp1027/Cp835/Cp836/Cp837 for consistency |Charset|Mix|SBCS|DBCS| | - | - | - | - | | Japanese EBCDIC - English | Cp939 | **Cp1027** | Cp300 | | Simplified Chinese EBCDIC | Cp935 | **Cp836** | **Cp837** | | Traditional Chinese EBCDIC | Cp937 | (*1) | **Cp835** | *1: Cp037 compatible - Commit messages: - 8289834: Missing SBCS and DBCS Only EBCDIC charsets Changes: https://git.openjdk.org/jdk/pull/9399/files Webrev: https://webrevs.openjdk.org/?repo=jdk=9399=00 Issue: https://bugs.openjdk.org/browse/JDK-8289834 Stats: 369 lines in 6 files changed: 367 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/9399.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9399/head:pull/9399 PR: https://git.openjdk.org/jdk/pull/9399