Integrated: 8308046: Move Solaris related charsets from java.base to jdk.charsets module

2023-05-22 Thread Ichiroh Takiguchi
On Mon, 15 May 2023 00:28:41 GMT, Ichiroh Takiguchi  
wrote:

> According to "JDK 20 Internationalization Guide"
> https://docs.oracle.com/en/java/javase/20/intl/supported-encodings.html
> Following Solaris related charsets are in "contained in jdk.charsets module" 
> list.
> 
> - PCK (x-PCK)
> - EUC_JP_Solaris (x-eucJP-Open)
> - Big5_Solaris (x-Big5-Solaris)
> 
> These are not supported by Linux platform, so they should not be in java.base 
> module.
> 
> Note:
> GHA Linux x86 builds were failed.
> I think it's not related by my modified code.
> I opened [JDK-8308051](https://bugs.openjdk.org/browse/JDK-8308051) GHA: 
> Linux x86 builds failure

This pull request has now been integrated.

Changeset: 5d8ba938
Author:Ichiroh Takiguchi 
URL:   
https://git.openjdk.org/jdk/commit/5d8ba938bef162b74816147eb1002a0620a419ba
Stats: 21 lines in 4 files changed: 0 ins; 6 del; 15 mod

8308046: Move Solaris related charsets from java.base to jdk.charsets module

Reviewed-by: naoto

-

PR: https://git.openjdk.org/jdk/pull/13973


Re: RFR: 8308046: Move Solaris related charsets from java.base to jdk.charsets module [v2]

2023-05-22 Thread Ichiroh Takiguchi
On Sat, 20 May 2023 17:26:53 GMT, Naoto Sato  wrote:

>> Hello @naotoj .
>> I'd like to confirm about DoubleByte-X.java.template and 
>> EUC_JP.java.template.
>> I think the values are package-private.
>> Even if class is changed to `public`, the classes in`sun.nio.cs.ext` package 
>> could not access to these values in `sun.nio.cs` package...
>> I may be misunderstanding your suggestion, could you tell me more ?
>
>> I think the values are package-private. Even if class is changed to 
>> `public`, the classes in`sun.nio.cs.ext` package could not access to these 
>> values in `sun.nio.cs` package... 
> 
> I meant making those package-private fields public. I believe it's OK because 
> java.base/sun.nio.cs package is only exported to jdk.charsets module.

Hello @naotoj .
I appreciate your attention about JBS side.
I changed title and description, add noreg-cleanup label.

-

PR Comment: https://git.openjdk.org/jdk/pull/13973#issuecomment-1558228901


Re: RFR: 8308046: Move Solaris related charsets from java.base to jdk.charsets module [v2]

2023-05-22 Thread Ichiroh Takiguchi
On Sat, 20 May 2023 17:26:53 GMT, Naoto Sato  wrote:

>> Hello @naotoj .
>> I'd like to confirm about DoubleByte-X.java.template and 
>> EUC_JP.java.template.
>> I think the values are package-private.
>> Even if class is changed to `public`, the classes in`sun.nio.cs.ext` package 
>> could not access to these values in `sun.nio.cs` package...
>> I may be misunderstanding your suggestion, could you tell me more ?
>
>> I think the values are package-private. Even if class is changed to 
>> `public`, the classes in`sun.nio.cs.ext` package could not access to these 
>> values in `sun.nio.cs` package... 
> 
> I meant making those package-private fields public. I believe it's OK because 
> java.base/sun.nio.cs package is only exported to jdk.charsets module.

Thanks @naotoj .
I changed related fields to `public`.

-

PR Comment: https://git.openjdk.org/jdk/pull/13973#issuecomment-1557308396


Re: RFR: 8308046: Move Solaris related charsets from java.base to jdk.charsets module [v3]

2023-05-22 Thread Ichiroh Takiguchi
> According to "JDK 20 Internationalization Guide"
> https://docs.oracle.com/en/java/javase/20/intl/supported-encodings.html
> Following Solaris related charsets are in "contained in jdk.charsets module" 
> list.
> 
> - PCK (x-PCK)
> - EUC_JP_Solaris (x-eucJP-Open)
> - Big5_Solaris (x-Big5-Solaris)
> 
> These are not supported by Linux platform, so they should not be in java.base 
> module.
> 
> Note:
> GHA Linux x86 builds were failed.
> I think it's not related by my modified code.
> I opened [JDK-8308051](https://bugs.openjdk.org/browse/JDK-8308051) GHA: 
> Linux x86 builds failure

Ichiroh Takiguchi has updated the pull request incrementally with one 
additional commit since the last revision:

  8308046: Move Solaris related charsets from java.base to jdk.charsets module

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/13973/files
  - new: https://git.openjdk.org/jdk/pull/13973/files/6fd12fcd..1c10b107

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=13973=02
 - incr: https://webrevs.openjdk.org/?repo=jdk=13973=01-02

  Stats: 43 lines in 4 files changed: 0 ins; 29 del; 14 mod
  Patch: https://git.openjdk.org/jdk/pull/13973.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13973/head:pull/13973

PR: https://git.openjdk.org/jdk/pull/13973


Re: RFR: 8308046: Move Solaris related charsets from java.base to jdk.charsets module [v2]

2023-05-19 Thread Ichiroh Takiguchi
On Thu, 18 May 2023 00:50:22 GMT, Naoto Sato  wrote:

>>> Hello @naotoj . I'm not sure we can remove Solaris related charsets. 
>>> Somebody may use them for text communication between Solaris.
>> 
>> OK, maybe not now.
>> 
>> I think the fix may be simplified by changing access for those 
>> `DoubleByte-X.java.template` internals, such as `DecodeHolder` to `public`, 
>> instead of introducing those access methods. You can import classes in 
>> `java.base/sun.nio.cs` with the wild card so that it would work on all 
>> platforms (`Big5` either in `java.base` or `jdk.charsets`)
>> 
>> Also, please drop `Japanese` from the issue/PR title
>
>>You can import classes in `java.base/sun.nio.cs` with the wild card so that 
>>it would work on all platforms (`Big5` either in `java.base` or 
>>`jdk.charsets`)
> 
> Scratch that, you've already did it. Then you can remove these: 
> 
> import sun.nio.cs.DoubleByte;
> import sun.nio.cs.HistoricallyNamedCharset;

Hello @naotoj .
I'd like to confirm about DoubleByte-X.java.template and EUC_JP.java.template.
I think the values are package-private.
Even if class is changed to `public`, the classes in`sun.nio.cs.ext` package 
could not access to these values in `sun.nio.cs` package...
I may be misunderstanding your suggestion, could you tell me more ?

-

PR Comment: https://git.openjdk.org/jdk/pull/13973#issuecomment-1555405480


Re: RFR: 8308046: Move Solaris related Japanese charsets from java.base to jdk.charsets module [v2]

2023-05-17 Thread Ichiroh Takiguchi
On Tue, 16 May 2023 17:13:02 GMT, Naoto Sato  wrote:

>> Ichiroh Takiguchi has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   8308046: Move Solaris related Japanese charsets from java.base to 
>> jdk.charsets module
>
> I now think it is better simply removing Solaris-related charsets, as moving 
> them from java.base to jdk.charsets would require unnecessary code changes in 
> non-Solaris code.

Hello @naotoj .
I'm not sure we can remove Solaris related charsets.
Somebody may use them for text communication between Solaris.
The latest change can move Big5_Solaris from java.base to jdk.charsets module.

-

PR Comment: https://git.openjdk.org/jdk/pull/13973#issuecomment-1551244863


Re: RFR: 8308046: Move Solaris related Japanese charsets from java.base to jdk.charsets module [v2]

2023-05-16 Thread Ichiroh Takiguchi
> According to "JDK 20 Internationalization Guide"
> https://docs.oracle.com/en/java/javase/20/intl/supported-encodings.html
> Following Solaris related Japanese charsets are in "contained in jdk.charsets 
> module" list.
> 
> - PCK (x-PCK)
> - EUC_JP_Solaris (x-eucJP-Open)
> 
> These are not supported by Linux platform, so they should not be in java.base 
> module.
> 
> Note:
> GHA Linux x86 builds were failed.
> I think it's not related by my modified code.
> I opened [JDK-8308051](https://bugs.openjdk.org/browse/JDK-8308051) GHA: 
> Linux x86 builds failure

Ichiroh Takiguchi has updated the pull request incrementally with one 
additional commit since the last revision:

  8308046: Move Solaris related Japanese charsets from java.base to 
jdk.charsets module

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/13973/files
  - new: https://git.openjdk.org/jdk/pull/13973/files/192db59c..6fd12fcd

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=13973=01
 - incr: https://webrevs.openjdk.org/?repo=jdk=13973=00-01

  Stats: 29 lines in 3 files changed: 22 ins; 1 del; 6 mod
  Patch: https://git.openjdk.org/jdk/pull/13973.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13973/head:pull/13973

PR: https://git.openjdk.org/jdk/pull/13973


Re: RFR: 8301119: Support for GB18030-2022 [v3]

2023-02-25 Thread Ichiroh Takiguchi
On Fri, 24 Feb 2023 17:19:22 GMT, Naoto Sato  wrote:

>> Hello @naotoj .
>> Sorry for bothering you.
>> 
>> I have following question:
>> - Why GB18030.java.template is in 
>> src/jdk.charsets/share/classes/sun/nio/cs/ext/ directory even if the 
>> generated code is always stored into sun/nio/cs ?
>> I think the file should be moved to src/java.base/share/classes/sun/nio/cs 
>> and the file name should be GB18030.java instead of GB18030.java.template.
>> Is there specific reason ?
>
>> Hello @naotoj . Sorry for bothering you.
>> 
>> I have following question:
>> 
>> * Why GB18030.java.template is in 
>> src/jdk.charsets/share/classes/sun/nio/cs/ext/ directory even if the 
>> generated code is always stored into sun/nio/cs ?
>>   I think the file should be moved to src/java.base/share/classes/sun/nio/cs 
>> and the file name should be GB18030.java instead of GB18030.java.template.
>>   Is there specific reason ?
> 
> No, there is not. Thanks for pointing it out. Fixed.

Thanks @naotoj .
That's what I expected.

-

PR: https://git.openjdk.org/jdk/pull/12518


Re: RFR: 8301119: Support for GB18030-2022 [v3]

2023-02-24 Thread Ichiroh Takiguchi
On Thu, 23 Feb 2023 19:34:44 GMT, Naoto Sato  wrote:

>> Upgrading the GB18030 charset in the JDK to the latest 2022 standard. Since 
>> this is not a compatible upgrade to the existing mapping, a new system 
>> property `jdk.charset.GB18030` is introduced. If it is set to "2000", the 
>> mapping falls back to the existing mapping based on the 2000 standard, 
>> otherwise, it defaults to 2022 mapping. Refer to the corresponding CSR for 
>> more detail.
>
> Naoto Sato has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Moved the 2000 flag into GB18030

Hello @naotoj .
Sorry for bothering you.

I have following question:
- Why GB18030.java.template is in 
src/jdk.charsets/share/classes/sun/nio/cs/ext/ directory even if the generated 
code is always stored into sun/nio/cs ?
I think the file should be moved to src/java.base/share/classes/sun/nio/cs and 
the file name should be GB18030.java instead of GB18030.java.template.
Is there specific reason ?

-

PR: https://git.openjdk.org/jdk/pull/12518


Re: RFR: 8300916: Re-examine the initialization of JNU Charset in StaticProperty [v4]

2023-01-25 Thread Ichiroh Takiguchi
On Wed, 25 Jan 2023 18:58:40 GMT, Alan Bateman  wrote:

>> `Charset.defaultCharset()` now uses 
>> `standardProvider.charsetForName()` charset.
>
>> `Charset.defaultCharset()` now uses 
>> `standardProvider.charsetForName()` charset.
> 
> I think this is the right thing to do. It can also be changed to use 
> StaticProperty.fileEncoding() and maybe the field can be changed to be a 
> `@Stable` field. 
> 
> It might be that we will need to create a CSR and Release Note for this 
> change.  The scenario is PR 12132 is unfortunate but does not show that some 
> deployments may have been relying on the this from JDK 9 to JDK 17. With the 
> change here, we are doubling now on ensuring that the default charset is 
> loaded from java.base.

Hello @AlanBateman 
You said

> The change to StaticProperty to avoid calling out to Charset.defaultCharset 
> from the initializer is good. However, the other part to that is the scenario 
> in PR 12132 where the default Charset was accidentally located via the 
> provider mechanism in JDK 9-17. If I read the changes correctly, that fragile 
> scenario will come back. We have a couple of ways to avoid that, one being to 
> ensure that defaultCharset is called before the boot layer is set. A simpler, 
> and more reliable, would be to change Charset.defaultCharset to use 
> standardProvider.charsetForName with the value of "file.encoding", and avoid 
> the provider lookup completely.

Could you explain about fragile scenario ?

-

PR: https://git.openjdk.org/jdk/pull/12171


Re: RFR: 8300819: -Dfile.encoding=Cp943C option does not work as expected since jdk18 [v2]

2023-01-25 Thread Ichiroh Takiguchi
On Tue, 24 Jan 2023 12:51:31 GMT, Alan Bateman  wrote:

> Do you know if there is any configuration on AIX that would derive Cp943C as 
> the default charset? That is, are they running with -Dfile.encoding=Cp943C on 
> the AIX systems or is it chosen by default. This goes to the question as to 
> whether they just moving these applications to Linux and expecting the 
> default charset to be the same.

In my understanding, my client uses `-Dfile.encoding=Cp943C` option on Japanese 
IBM-943 locale on AIX.
Default charset on Japanese IBM-943 locale with IBM Java8 and OpenJDK JDK11+ is 
x-IBM943C(Cp943C).
(We need to use `-Dfile.encoding=Cp943C` for OpenJDK JDK8.)
We never thought we could just move to Linux because of JEP-400. But we don't 
move the apps all at once to Linux.
We expected that we could change default charset by `-Dfile.encoding=Cp943C`, 
at least until Java8 EOS.

> Do you know which APIs they are using? We filled in the gaps many releases 
> ago so that all APIs that do encoding/decoding allow the charset to be 
> specified and I'm wondering why they don't use those.

We checked String.getByte()/new 
String(...)/Reader/Writer/ByteArrayOutputStream.toString()...
Is there good way to pick up which parts need to be fixed ?

-

PR: https://git.openjdk.org/jdk/pull/12132


Re: RFR: 8300819: -Dfile.encoding=Cp943C option does not work as expected since jdk18 [v2]

2023-01-24 Thread Ichiroh Takiguchi
On Mon, 23 Jan 2023 13:46:15 GMT, Alan Bateman  wrote:

> It's never been supported to run with -Dfile.encoding=Cp943C. It may have 
> worked in JDK 8 but I doubt it could have worked consistently since JDK 9 
> because the default charset is derived before it's possible to locate charset 
> implementations outside of java.base.

As described before, JDK17 worked with `-Dfile.encoding=Cp943C`, and JDK18 
changed the behavior. I heard some apps had already ported on JDK17 with the 
option, and works.

> I think it would be useful to know a bit more about the environment. It 
> sounds like it might be AIX -> Linux migration but I'm curious if you have 
> any insight into why these applications depend on default charset being 
> Cp943C. Is it text files that are opened without specifying the charset or is 
> is something else?

One of my client has many legacy Java apps on AIX. Their apps use default 
charset to communicate with other apps via cipher communication, and validate 
data by using Cp943C.

I hope IBM943C is moved to java.base module, like #11908 .

-

PR: https://git.openjdk.org/jdk/pull/12132


Re: RFR: 8300819: -Dfile.encoding=Cp943C option does not work as expected since jdk18 [v2]

2023-01-23 Thread Ichiroh Takiguchi
On Mon, 23 Jan 2023 07:48:41 GMT, Alan Bateman  wrote:

>> Ichiroh Takiguchi has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   8300819: -Dfile.encoding=Cp943C option does not work as expected since 
>> jdk18
>
> I'm trying to understand what the real issue is. The java.base module on 
> Linux builds includes SJIS, MS932, and PCK. Is there a Linux configuration in 
> Japanese environments where the default charset in any JDK release is 
> IBM943C? Same question for AIX builds that is the only build that includes 
> IBM943C in java.base.

Hello @AlanBateman .
Sorry for your confusion.

Java8 works `-Dfile.encoding=Cp943C` option on Linux. Since many users are 
migrating from Java8, I'm getting similar requests from my clients. Cp943C is 
not supported by Linux natively, but some clients want to use same encoding 
with Linux and AIX.

Japanese AIX environment supports IBM-943(Cp943C)/IBM-eucJP(Cp29626C)/UTF-8 
encoding.
Cp943C and Cp29626C are in base.base module on AIX platform.

-

PR: https://git.openjdk.org/jdk/pull/12132


Re: RFR: 8300819: -Dfile.encoding=Cp943C option does not work as expected since jdk18 [v2]

2023-01-22 Thread Ichiroh Takiguchi
On Sun, 22 Jan 2023 23:17:10 GMT, Ichiroh Takiguchi  
wrote:

>> On jdk17, following testcase works fine on Linux platform.
>> 
>> Testcase
>> 
>> $ cat cstest1.java
>> import java.nio.charset.*;
>> 
>> public class cstest1 {
>>   public static void main(String[] args) throws Exception {
>> Charset cs = Charset.defaultCharset();
>> System.out.println(cs + ", " + cs.getClass() + ", " + 
>> cs.getClass().getModule());
>>   }
>> }
>> 
>> 
>> $ ~/jdk-17.0.6+10/bin/java -Dfile.encoding=Cp943C -showversion cstest1
>> openjdk version "17.0.6" 2023-01-17
>> OpenJDK Runtime Environment Temurin-17.0.6+10 (build 17.0.6+10)
>> OpenJDK 64-Bit Server VM Temurin-17.0.6+10 (build 17.0.6+10, mixed mode, 
>> sharing)
>> x-IBM943C, class sun.nio.cs.ext.IBM943C, module jdk.charsets
>> 
>> 
>> But it does not work as expected on jdk18 and jdk21b06
>> 
>> $ ~/jdk-18.0.2.1+1/bin/java -Dfile.encoding=Cp943C -showversion cstest1
>> openjdk version "18.0.2.1" 2022-08-18
>> OpenJDK Runtime Environment Temurin-18.0.2.1+1 (build 18.0.2.1+1)
>> OpenJDK 64-Bit Server VM Temurin-18.0.2.1+1 (build 18.0.2.1+1, mixed mode, 
>> sharing)
>> UTF-8, class sun.nio.cs.UTF_8, module java.base
>> $ ~/jdk-21/bin/java -Dfile.encoding=Cp943C -showversion cstest1
>> openjdk version "21-ea" 2023-09-19
>> OpenJDK Runtime Environment (build 21-ea+6-365)
>> OpenJDK 64-Bit Server VM (build 21-ea+6-365, mixed mode, sharing)
>> UTF-8, class sun.nio.cs.UTF_8, module java.base
>> 
>> 
>> Fixed result is as follows:
>> 
>> $ java -Dfile.encoding=Cp943C -showversion PrintDefaultCharset
>> openjdk version "21-internal" 2023-09-19
>> OpenJDK Runtime Environment (build 21-internal-adhoc.jdktest.jdk)
>> OpenJDK 64-Bit Server VM (build 21-internal-adhoc.jdktest.jdk, mixed mode, 
>> sharing)
>> x-IBM943C
>
> Ichiroh Takiguchi has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   8300819: -Dfile.encoding=Cp943C option does not work as expected since jdk18

First, Sorry I forgot to change Copyright date.

@AlanBateman , I appreciate your reply.
In my understanding,

- io stream side can use native.encoding system property.
- Now file.encoding system property is used for non-io stream.

This issue is related #11908 .
I need a solution to use the Cp943C charset as default charset.
Please give me some suggestion.

-

PR: https://git.openjdk.org/jdk/pull/12132


Re: RFR: 8300819: -Dfile.encoding=Cp943C option does not work as expected since jdk18 [v2]

2023-01-22 Thread Ichiroh Takiguchi
> On jdk17, following testcase works fine on Linux platform.
> 
> Testcase
> 
> $ cat cstest1.java
> import java.nio.charset.*;
> 
> public class cstest1 {
>   public static void main(String[] args) throws Exception {
> Charset cs = Charset.defaultCharset();
> System.out.println(cs + ", " + cs.getClass() + ", " + 
> cs.getClass().getModule());
>   }
> }
> 
> 
> $ ~/jdk-17.0.6+10/bin/java -Dfile.encoding=Cp943C -showversion cstest1
> openjdk version "17.0.6" 2023-01-17
> OpenJDK Runtime Environment Temurin-17.0.6+10 (build 17.0.6+10)
> OpenJDK 64-Bit Server VM Temurin-17.0.6+10 (build 17.0.6+10, mixed mode, 
> sharing)
> x-IBM943C, class sun.nio.cs.ext.IBM943C, module jdk.charsets
> 
> 
> But it does not work as expected on jdk18 and jdk21b06
> 
> $ ~/jdk-18.0.2.1+1/bin/java -Dfile.encoding=Cp943C -showversion cstest1
> openjdk version "18.0.2.1" 2022-08-18
> OpenJDK Runtime Environment Temurin-18.0.2.1+1 (build 18.0.2.1+1)
> OpenJDK 64-Bit Server VM Temurin-18.0.2.1+1 (build 18.0.2.1+1, mixed mode, 
> sharing)
> UTF-8, class sun.nio.cs.UTF_8, module java.base
> $ ~/jdk-21/bin/java -Dfile.encoding=Cp943C -showversion cstest1
> openjdk version "21-ea" 2023-09-19
> OpenJDK Runtime Environment (build 21-ea+6-365)
> OpenJDK 64-Bit Server VM (build 21-ea+6-365, mixed mode, sharing)
> UTF-8, class sun.nio.cs.UTF_8, module java.base
> 
> 
> Fixed result is as follows:
> 
> $ java -Dfile.encoding=Cp943C -showversion PrintDefaultCharset
> openjdk version "21-internal" 2023-09-19
> OpenJDK Runtime Environment (build 21-internal-adhoc.jdktest.jdk)
> OpenJDK 64-Bit Server VM (build 21-internal-adhoc.jdktest.jdk, mixed mode, 
> sharing)
> x-IBM943C

Ichiroh Takiguchi has updated the pull request incrementally with one 
additional commit since the last revision:

  8300819: -Dfile.encoding=Cp943C option does not work as expected since jdk18

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/12132/files
  - new: https://git.openjdk.org/jdk/pull/12132/files/9e400d60..5e7db0e0

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=12132=01
 - incr: https://webrevs.openjdk.org/?repo=jdk=12132=00-01

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/12132.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12132/head:pull/12132

PR: https://git.openjdk.org/jdk/pull/12132


RFR: 8300819: -Dfile.encoding=Cp943C option does not work as expected since jdk18

2023-01-22 Thread Ichiroh Takiguchi
On jdk17, following testcase works fine on Linux platform.

Testcase

$ cat cstest1.java
import java.nio.charset.*;

public class cstest1 {
  public static void main(String[] args) throws Exception {
Charset cs = Charset.defaultCharset();
System.out.println(cs + ", " + cs.getClass() + ", " + 
cs.getClass().getModule());
  }
}


$ ~/jdk-17.0.6+10/bin/java -Dfile.encoding=Cp943C -showversion cstest1
openjdk version "17.0.6" 2023-01-17
OpenJDK Runtime Environment Temurin-17.0.6+10 (build 17.0.6+10)
OpenJDK 64-Bit Server VM Temurin-17.0.6+10 (build 17.0.6+10, mixed mode, 
sharing)
x-IBM943C, class sun.nio.cs.ext.IBM943C, module jdk.charsets


But it does not work as expected on jdk18 and jdk21b06

$ ~/jdk-18.0.2.1+1/bin/java -Dfile.encoding=Cp943C -showversion cstest1
openjdk version "18.0.2.1" 2022-08-18
OpenJDK Runtime Environment Temurin-18.0.2.1+1 (build 18.0.2.1+1)
OpenJDK 64-Bit Server VM Temurin-18.0.2.1+1 (build 18.0.2.1+1, mixed mode, 
sharing)
UTF-8, class sun.nio.cs.UTF_8, module java.base
$ ~/jdk-21/bin/java -Dfile.encoding=Cp943C -showversion cstest1
openjdk version "21-ea" 2023-09-19
OpenJDK Runtime Environment (build 21-ea+6-365)
OpenJDK 64-Bit Server VM (build 21-ea+6-365, mixed mode, sharing)
UTF-8, class sun.nio.cs.UTF_8, module java.base


Fixed result is as follows:

$ java -Dfile.encoding=Cp943C -showversion PrintDefaultCharset
openjdk version "21-internal" 2023-09-19
OpenJDK Runtime Environment (build 21-internal-adhoc.jdktest.jdk)
OpenJDK 64-Bit Server VM (build 21-internal-adhoc.jdktest.jdk, mixed mode, 
sharing)
x-IBM943C

-

Commit messages:
 - -Dfile.encoding=Cp943C option does not work as expected since jdk18

Changes: https://git.openjdk.org/jdk/pull/12132/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk=12132=00
  Issue: https://bugs.openjdk.org/browse/JDK-8300819
  Stats: 45 lines in 2 files changed: 44 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/12132.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12132/head:pull/12132

PR: https://git.openjdk.org/jdk/pull/12132


Integrated: 8299194: CustomTzIDCheckDST.java may fail at future date

2022-12-22 Thread Ichiroh Takiguchi
On Wed, 21 Dec 2022 15:57:29 GMT, Ichiroh Takiguchi  
wrote:

> test/jdk/java/util/TimeZone/CustomTzIDCheckDST.java may fail at future date.
> I used following standalone testcase
> 
> import java.util.Calendar;
> import java.util.Date;
> import java.util.SimpleTimeZone;
> 
> public class CheckDST {
> private static String CUSTOM_TZ = "MEZ-1MESZ,M3.5.0,M10.5.0";
> public static void main(String args[]) throws Throwable {
> runTZTest();
> }
> 
> /* TZ code will always be set to "MEZ-1MESZ,M3.5.0,M10.5.0".
>  * This ensures the transition periods for Daylights Savings should be at 
> March's last
>  * Sunday and October's last Sunday.
>  */
> private static void runTZTest() {
> Date time = new Date();
> if (new SimpleTimeZone(360, "MEZ-1MESZ", Calendar.MARCH, -1, 
> Calendar.SUNDAY, 0,
> Calendar.OCTOBER, -1, Calendar.SUNDAY, 
> 0).inDaylightTime(time)) {
> // We are in Daylight savings period.
> if (time.toString().endsWith("GMT+02:00 " + 
> Integer.toString(time.getYear() + 1900)))
> return;
> } else {
> if (time.toString().endsWith("GMT+01:00 " + 
> Integer.toString(time.getYear() + 1900)))
> return;
> }
> 
> // Reaching here means time zone did not match up as expected.
> throw new RuntimeException("Got unexpected timezone information: " + 
> time);
> }
> }
> 
> 
> I tested CheckDST with faketime, then I got following results
> 
> $ TZ=GMT faketime -m "2023-03-25 22:59:59" env TZ="MEZ-1MESZ,M3.5.0,M10.5.0" 
> $HOME/jdk-21-b02/bin/java CheckDST
> $ TZ=GMT faketime -m "2023-03-25 23:00:00" env TZ="MEZ-1MESZ,M3.5.0,M10.5.0" 
> $HOME/jdk-21-b02/bin/java CheckDST
> Exception in thread "main" java.lang.RuntimeException: Got unexpected 
> timezone information: Sun Mar 26 00:00:00 GMT+01:00 2023
> at CheckDST.runTZTest(CheckDST.java:28)
> at CheckDST.main(CheckDST.java:8)
> 
> 
> I assume `TZ=MEZ-1MESZ`refers Europe/Berlin timezone.
> In this case, `TZ` environment variable should be 
> `MEZ-1MESZ,M3.5.0,M10.5.0/3` (`/3` is missing in testcase)
> 
> CustomTzIDCheckDST should run with daylight saving time.
> Add Simulate Southern Hemisphere by `MEZ-1MESZ,M10.5.0,M3.5.0/3`
> 
> Tested by standalone testcase
> 
> $ cat CheckDST1.java
> import java.util.Calendar;
> import java.util.Date;
> import java.util.List;
> import java.util.SimpleTimeZone;
> import java.util.TimeZone;
> import java.time.DayOfWeek;
> import java.time.ZonedDateTime;
> import java.time.temporal.TemporalAdjusters;
> public class CheckDST1 {
> // Northern Hemisphere
> private static String CUSTOM_TZ = "MEZ-1MESZ,M3.5.0,M10.5.0/3";
> // Simulate Southern Hemisphere
> private static String CUSTOM_TZ2 = "MEZ-1MESZ,M10.5.0,M3.5.0/3";
> public static void main(String args[]) throws Throwable {
> runTZTest();
> }
> 
> /* TZ code will always be set to "MEZ-1MESZ,M3.5.0,M10.5.0/3".
>  * This ensures the transition periods for Daylights Savings should be at 
> March's last
>  * Sunday and October's last Sunday.
>  */
> private static void runTZTest() {
> Date time = new Date();
> String tzStr = System.getenv("TZ");
> if (tzStr == null)
> throw new RuntimeException("Got unexpected timezone information: 
> TZ is null");
> boolean nor = tzStr.matches(".*,M3\..*,M10\..*");
> TimeZone tz = new SimpleTimeZone(360, tzStr,
> nor ? Calendar.MARCH : Calendar.OCTOBER, -1,
> Calendar.SUNDAY, 360, SimpleTimeZone.UTC_TIME,
> nor ? Calendar.OCTOBER : Calendar.MARCH, -1,
> Calendar.SUNDAY, 360, SimpleTimeZone.UTC_TIME,
> 360);
> System.out.println(time);
> if (tz.inDaylightTime(time)) {
> // We are in Daylight savings period.
> if (time.toString().endsWith("GMT+02:00 " + 
> Integer.toString(time.getYear() + 1900)))
> return;
> } else {
> if (time.toString().endsWith("GMT+01:00 " + 
> Integer.toString(time.getYear() + 1900)))
> return;
> }
> 
> // Reaching here means time zone did not match up as expected.
> throw new RuntimeException("Got unexpected timezone information: " + 
> tzStr + " " + time);
> }
> 
> private static ZonedDateTi

Re: RFR: 8299194: CustomTzIDCheckDST.java may fail at future date [v2]

2022-12-21 Thread Ichiroh Takiguchi
On Wed, 21 Dec 2022 20:54:25 GMT, Naoto Sato  wrote:

>> Ichiroh Takiguchi has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   8299194: CustomTzIDCheckDST.java may fail at future date
>
> Thanks for the fix. Looks good overall. A couple of minor comments/questions.

Thanks @naotoj .
I appreciate you suggestion.
Please review it again.

-

PR: https://git.openjdk.org/jdk/pull/11756


Re: RFR: 8299194: CustomTzIDCheckDST.java may fail at future date [v2]

2022-12-21 Thread Ichiroh Takiguchi
turn date.with(TemporalAdjusters.lastInMonth(DayOfWeek.SUNDAY));
> }
> }
> 
> 
> Check Europe/Berlin timezone settings
> 
> $ zdump -v Europe/Berlin | grep 2023
> Europe/Berlin  Sun Mar 26 00:59:59 2023 UTC = Sun Mar 26 01:59:59 2023 CET 
> isdst=0 gmtoff=3600
> Europe/Berlin  Sun Mar 26 01:00:00 2023 UTC = Sun Mar 26 03:00:00 2023 CEST 
> isdst=1 gmtoff=7200
> Europe/Berlin  Sun Oct 29 00:59:59 2023 UTC = Sun Oct 29 02:59:59 2023 CEST 
> isdst=1 gmtoff=7200
> Europe/Berlin  Sun Oct 29 01:00:00 2023 UTC = Sun Oct 29 02:00:00 2023 CET 
> isdst=0 gmtoff=3600
> 
> 
> Test results are as follows:
> 
> Northern Hemisphere side
> 
> $ TZ=GMT faketime -m '2023-03-26 00:59:59' env TZ=MEZ-1MESZ,M3.5.0,M10.5.0/3 
> date
> Sun Mar 26 01:59:59 MEZ 2023
> $ TZ=GMT faketime -m '2023-03-26 00:59:59' env TZ=MEZ-1MESZ,M3.5.0,M10.5.0/3 
> java CheckDST1
> Sun Mar 26 01:59:59 GMT+01:00 2023
> 
> $ TZ=GMT faketime -m '2023-03-26 01:00:00' env TZ=MEZ-1MESZ,M3.5.0,M10.5.0/3 
> date
> Sun Mar 26 03:00:00 MESZ 2023
> $ TZ=GMT faketime -m '2023-03-26 01:00:00' env TZ=MEZ-1MESZ,M3.5.0,M10.5.0/3 
> java CheckDST1
> Sun Mar 26 03:00:00 GMT+02:00 2023
> 
> $ TZ=GMT faketime -m '2023-10-29 00:59:59' env TZ=MEZ-1MESZ,M3.5.0,M10.5.0/3 
> date
> Sun Oct 29 02:59:59 MESZ 2023
> $ TZ=GMT faketime -m '2023-10-29 00:59:59' env TZ=MEZ-1MESZ,M3.5.0,M10.5.0/3 
> java CheckDST1
> Sun Oct 29 02:59:59 GMT+02:00 2023
> 
> $ TZ=GMT faketime -m '2023-10-29 01:00:00' env TZ=MEZ-1MESZ,M3.5.0,M10.5.0/3 
> date
> Sun Oct 29 02:00:00 MEZ 2023
> $ TZ=GMT faketime -m '2023-10-29 01:00:00' env TZ=MEZ-1MESZ,M3.5.0,M10.5.0/3 
> java CheckDST1
> Sun Oct 29 02:00:00 GMT+01:00 2023
> 
> 
> Southern Hemisphere side
> 
> $ TZ=GMT faketime -m '2023-03-26 00:59:59' env TZ=MEZ-1MESZ,M10.5.0,M3.5.0/3 
> date
> Sun Mar 26 02:59:59 MESZ 2023
> $bTZ=GMT faketime -m '2023-03-26 00:59:59' env TZ=MEZ-1MESZ,M10.5.0,M3.5.0/3 
> java CheckDST1
> Sun Mar 26 02:59:59 GMT+02:00 2023
> 
> $ TZ=GMT faketime -m '2023-03-26 01:00:00' env TZ=MEZ-1MESZ,M10.5.0,M3.5.0/3 
> date
> Sun Mar 26 02:00:00 MEZ 2023
> $ TZ=GMT faketime -m '2023-03-26 01:00:00' env TZ=MEZ-1MESZ,M10.5.0,M3.5.0/3 
> java CheckDST1
> Sun Mar 26 02:00:00 GMT+01:00 2023
> 
> $ TZ=GMT faketime -m '2023-10-29 00:59:59' env TZ=MEZ-1MESZ,M10.5.0,M3.5.0/3 
> date
> Sun Oct 29 01:59:59 MEZ 2023
> $ TZ=GMT faketime -m '2023-10-29 00:59:59' env TZ=MEZ-1MESZ,M10.5.0,M3.5.0/3 
> java CheckDST1
> Sun Oct 29 01:59:59 GMT+01:00 2023
> 
> $ TZ=GMT faketime -m '2023-10-29 01:00:00' env TZ=MEZ-1MESZ,M10.5.0,M3.5.0/3 
> date
> Sun Oct 29 03:00:00 MESZ 2023
> $ TZ=GMT faketime -m '2023-10-29 01:00:00' env TZ=MEZ-1MESZ,M10.5.0,M3.5.0/3 
> java CheckDST1
> Sun Oct 29 03:00:00 GMT+02:00 2023

Ichiroh Takiguchi has updated the pull request incrementally with one 
additional commit since the last revision:

  8299194: CustomTzIDCheckDST.java may fail at future date

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/11756/files
  - new: https://git.openjdk.org/jdk/pull/11756/files/a17d83d0..df2e8a86

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=11756=01
 - incr: https://webrevs.openjdk.org/?repo=jdk=11756=00-01

  Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/11756.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/11756/head:pull/11756

PR: https://git.openjdk.org/jdk/pull/11756


RFR: 8299194: CustomTzIDCheckDST.java may fail at future date

2022-12-21 Thread Ichiroh Takiguchi
test/jdk/java/util/TimeZone/CustomTzIDCheckDST.java may fail at future date.
I used following standalone testcase

import java.util.Calendar;
import java.util.Date;
import java.util.SimpleTimeZone;

public class CheckDST {
private static String CUSTOM_TZ = "MEZ-1MESZ,M3.5.0,M10.5.0";
public static void main(String args[]) throws Throwable {
runTZTest();
}

/* TZ code will always be set to "MEZ-1MESZ,M3.5.0,M10.5.0".
 * This ensures the transition periods for Daylights Savings should be at 
March's last
 * Sunday and October's last Sunday.
 */
private static void runTZTest() {
Date time = new Date();
if (new SimpleTimeZone(360, "MEZ-1MESZ", Calendar.MARCH, -1, 
Calendar.SUNDAY, 0,
Calendar.OCTOBER, -1, Calendar.SUNDAY, 0).inDaylightTime(time)) 
{
// We are in Daylight savings period.
if (time.toString().endsWith("GMT+02:00 " + 
Integer.toString(time.getYear() + 1900)))
return;
} else {
if (time.toString().endsWith("GMT+01:00 " + 
Integer.toString(time.getYear() + 1900)))
return;
}

// Reaching here means time zone did not match up as expected.
throw new RuntimeException("Got unexpected timezone information: " + 
time);
}
}


I tested CheckDST with faketime, then I got following results

$ TZ=GMT faketime -m "2023-03-25 22:59:59" env TZ="MEZ-1MESZ,M3.5.0,M10.5.0" 
$HOME/jdk-21-b02/bin/java CheckDST
$ TZ=GMT faketime -m "2023-03-25 23:00:00" env TZ="MEZ-1MESZ,M3.5.0,M10.5.0" 
$HOME/jdk-21-b02/bin/java CheckDST
Exception in thread "main" java.lang.RuntimeException: Got unexpected timezone 
information: Sun Mar 26 00:00:00 GMT+01:00 2023
at CheckDST.runTZTest(CheckDST.java:28)
at CheckDST.main(CheckDST.java:8)


I assume `TZ=MEZ-1MESZ`refers Europe/Berlin timezone.
In this case, `TZ` environment variable should be `MEZ-1MESZ,M3.5.0,M10.5.0/3` 
(`/3` is missing in testcase)

CustomTzIDCheckDST should run with daylight saving time.
Add Simulate Southern Hemisphere by `MEZ-1MESZ,M10.5.0,M3.5.0/3`

Tested by standalone testcase

$ cat CheckDST1.java
import java.util.Calendar;
import java.util.Date;
import java.util.List;
import java.util.SimpleTimeZone;
import java.util.TimeZone;
import java.time.DayOfWeek;
import java.time.ZonedDateTime;
import java.time.temporal.TemporalAdjusters;
public class CheckDST1 {
// Northern Hemisphere
private static String CUSTOM_TZ = "MEZ-1MESZ,M3.5.0,M10.5.0/3";
// Simulate Southern Hemisphere
private static String CUSTOM_TZ2 = "MEZ-1MESZ,M10.5.0,M3.5.0/3";
public static void main(String args[]) throws Throwable {
runTZTest();
}

/* TZ code will always be set to "MEZ-1MESZ,M3.5.0,M10.5.0/3".
 * This ensures the transition periods for Daylights Savings should be at 
March's last
 * Sunday and October's last Sunday.
 */
private static void runTZTest() {
Date time = new Date();
String tzStr = System.getenv("TZ");
if (tzStr == null)
throw new RuntimeException("Got unexpected timezone information: TZ 
is null");
boolean nor = tzStr.matches(".*,M3\..*,M10\..*");
TimeZone tz = new SimpleTimeZone(360, tzStr,
nor ? Calendar.MARCH : Calendar.OCTOBER, -1,
Calendar.SUNDAY, 360, SimpleTimeZone.UTC_TIME,
nor ? Calendar.OCTOBER : Calendar.MARCH, -1,
Calendar.SUNDAY, 360, SimpleTimeZone.UTC_TIME,
360);
System.out.println(time);
if (tz.inDaylightTime(time)) {
// We are in Daylight savings period.
if (time.toString().endsWith("GMT+02:00 " + 
Integer.toString(time.getYear() + 1900)))
return;
} else {
if (time.toString().endsWith("GMT+01:00 " + 
Integer.toString(time.getYear() + 1900)))
return;
}

// Reaching here means time zone did not match up as expected.
throw new RuntimeException("Got unexpected timezone information: " + 
tzStr + " " + time);
}

private static ZonedDateTime getLastSundayOfMonth(ZonedDateTime date) {
return date.with(TemporalAdjusters.lastInMonth(DayOfWeek.SUNDAY));
}
}


Check Europe/Berlin timezone settings

$ zdump -v Europe/Berlin | grep 2023
Europe/Berlin  Sun Mar 26 00:59:59 2023 UTC = Sun Mar 26 01:59:59 2023 CET 
isdst=0 gmtoff=3600
Europe/Berlin  Sun Mar 26 01:00:00 2023 UTC = Sun Mar 26 03:00:00 2023 CEST 
isdst=1 gmtoff=7200
Europe/Berlin  Sun Oct 29 00:59:59 2023 UTC = Sun Oct 29 02:59:59 2023 CEST 
isdst=1 gmtoff=7200
Europe/Berlin  Sun Oct 29 01:00:00 2023 UTC = Sun Oct 29 02:00:00 2023 CET 
isdst=0 gmtoff=3600


Test results are as follows:

Northern Hemisphere side

$ TZ=GMT faketime -m '2023-03-26 00:59:59' env TZ=MEZ-1MESZ,M3.5.0,M10.5.0/3 
date
Sun Mar 26 01:59:59 MEZ 2023
$ TZ=GMT faketime -m '2023-03-26 00:59:59' env TZ=MEZ-1MESZ,M3.5.0,M10.5.0/3 
java 

Re: RFR: 8289834: Add SBCS and DBCS Only EBCDIC charsets

2022-11-28 Thread Ichiroh Takiguchi
On Wed, 6 Jul 2022 14:05:39 GMT, Ichiroh Takiguchi  
wrote:

> OpenJDK supports "Japanese EBCDIC - Katakana" and "Korean EBCDIC" SBCS and 
> DBCS Only charsets.
> |Charset|Mix|SBCS|DBCS|
> | -- | -- | -- | -- |
> | Japanese EBCDIC - Katakana | Cp930 | Cp290 | Cp300 |
> | Korean | Cp933 | Cp833 | Cp834 |
> 
> But OpenJDK does not supports some of "Japanese EBCDIC - English" / 
> "Simplified Chinese EBCDIC" / "Traditional Chinese EBCDIC" SBCS and DBCS Only 
> charsets.
> 
> I'd like to request Cp1027/Cp835/Cp836/Cp837 for consistency
> |Charset|Mix|SBCS|DBCS|
> | - | - | - | - |
> | Japanese EBCDIC - English | Cp939 | **Cp1027** | Cp300 |
> | Simplified Chinese EBCDIC | Cp935 | **Cp836** | **Cp837** |
> | Traditional Chinese EBCDIC | Cp937 | (*1) | **Cp835** | 
> 
> *1: Cp037 compatible

I'm still working on this one.

-

PR: https://git.openjdk.org/jdk/pull/9399


Re: RFR: 8289834: Add SBCS and DBCS Only EBCDIC charsets

2022-10-03 Thread Ichiroh Takiguchi
On Fri, 26 Aug 2022 09:25:55 GMT, Alan Bateman  wrote:

>> OpenJDK supports "Japanese EBCDIC - Katakana" and "Korean EBCDIC" SBCS and 
>> DBCS Only charsets.
>> |Charset|Mix|SBCS|DBCS|
>> | -- | -- | -- | -- |
>> | Japanese EBCDIC - Katakana | Cp930 | Cp290 | Cp300 |
>> | Korean | Cp933 | Cp833 | Cp834 |
>> 
>> But OpenJDK does not supports some of "Japanese EBCDIC - English" / 
>> "Simplified Chinese EBCDIC" / "Traditional Chinese EBCDIC" SBCS and DBCS 
>> Only charsets.
>> 
>> I'd like to request Cp1027/Cp835/Cp836/Cp837 for consistency
>> |Charset|Mix|SBCS|DBCS|
>> | - | - | - | - |
>> | Japanese EBCDIC - English | Cp939 | **Cp1027** | Cp300 |
>> | Simplified Chinese EBCDIC | Cp935 | **Cp836** | **Cp837** |
>> | Traditional Chinese EBCDIC | Cp937 | (*1) | **Cp835** | 
>> 
>> *1: Cp037 compatible
>
>> Use following options, like OpenJDK: `java -cp 
>> icu4j-71_1.jar:icu4j-charset-71_1.jar:. tc IBM-1047 2 1 1` ICU4J `java 
>> -cp icu4j-71_1.jar:icu4j-charset-71_1.jar:. tc IBM-1047_P100-1995 2 1 1`
>> 
>> Actually, I'm confused by this result. Previously, I was just comparing A/A 
>> with B/B on OpenJDK's charset. I didn't think ICU4J's result would make a 
>> difference.
> 
> My initial reaction is one of relief that the icu4j provider can be used with 
> current JDK builds. This means there is an option should we decide to stop 
> adding more EBCDIC charsets to the JDK.
> 
> The test uses IBM-1047 and I can't tell if the icu4j provider is used or not. 
> Charset doesn't define a provider method but I think would be useful to print 
> cs.getClass() or cs.getClass().getModule() so we know which Charset 
> implementation is used. Also I think any discussion on performance would be 
> better served with a JMH benchmark rather than a standalone test.

Hello @AlanBateman .
Sorry I'm late.

I created Charset SPI JAR `x-IBM1047_SPI` (`custom-charsets.jar`) which was 
ported from `sun.nio.cs.SingleByte.java` and `IBM1047.java` (generated one).

Test code:

package com.example;

import java.nio.charset.Charset;
import org.openjdk.jmh.annotations.Benchmark;

public class MyBenchmark {

final static String s;

static {
char[] ca = new char[0x2000];
for (int i = 0; i < ca.length; i++) {
ca[i] = (char) (i & 0xFF);
}
s = new String(ca);
}

@Benchmark
public void testIBM1047() throws Exception {
byte[] ba = s.getBytes("IBM1047");
}

@Benchmark
public void testIBM1047_SPI() throws Exception {
byte[] ba = s.getBytes("x-IBM1047_SPI");
}

}

All test related files are in 
[JDK-8289834](https://bugs.openjdk.org/browse/JDK-8289834).

Test results are as follows on RHEL8.6 x86_64 (Intel Core i7 3520M) :

1.8.0_345-b01
Benchmark Mode  Cnt  Score Error  Units
MyBenchmark.testIBM1047  thrpt   25  53213.092 ± 126.962  ops/s
MyBenchmark.testIBM1047_SPI  thrpt   25  47442.669 ± 349.003  ops/s


20-ea+17-1181
Benchmark Mode  Cnt   Score  Error  Units
MyBenchmark.testIBM1047  thrpt   25  136331.141 ± 1078.481  ops/s
MyBenchmark.testIBM1047_SPI  thrpt   25   51563.213 ±  843.238  ops/s

IBM1047 is 2.6 times faster than the SPI version on JDK20.
I think this results are related to **JEP 254: Compact Strings** .
As I requested before, we'd like to use `sun.nio.cs.SingleByte*` and 
`sun.nio.cs.DoubleByte*` class as public API.

-

PR: https://git.openjdk.org/jdk/pull/9399


Re: RFR: 8291916: Unexpected output on Windows command prompt

2022-09-09 Thread Ichiroh Takiguchi
On Tue, 9 Aug 2022 20:38:25 GMT, Naoto Sato  wrote:

>> To support Windows command prompt's codepage, following charsets should be 
>> moved from jdk.charsets module to java.base module.
>> 
>> - IBM860
>> - IBM861
>> - IBM863
>> - IBM864
>> - IBM865
>> - IBM869
>
> I looked at this issue a bit more. It looks to me that the issue is caused by 
> the fact that the encoding of `System.out` falls back to the default 
> encoding, as `IBM864` is not in `java.base`. This issue seems not new and 
> reproducible with the releases since JDK9 where modularization has been 
> introduced. Also, I think other encodings than those `IBM*` listed here, can 
> possibly cause this issue. In order to fix this completely, those obscure 
> encodings also have to be in `java.base` which I don't think we would want to 
> do.

Hello @naotoj .
Sorry for my bad reaction.

I checked these charsets with IBM CDRA definitions.
These are also same, but some round-trip definitions are not same, like #9661 .
I think there come from files under 
https://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/ .
As you know, `CP860/CP861/CP863/CP864/CP865/CP869` are defined into [IANA 
Character 
Sets](https://www.iana.org/assignments/character-sets/character-sets.xhtml) as 
an alias.
Even if the registered names are `IBM*`, these charset implementations are from 
Microsoft.
I think these charset should be usable as default charset on Windows command 
prompt.
Please reconsider current Java implementation.

-

PR: https://git.openjdk.org/jdk/pull/9761


Integrated: 8292899: CustomTzIDCheckDST.java testcase failed on AIX platform

2022-09-04 Thread Ichiroh Takiguchi
On Fri, 26 Aug 2022 07:26:46 GMT, Ichiroh Takiguchi  
wrote:

> After `test/jdk/java/util/TimeZone/CustomTzIDCheckDST.java` testcase was 
> integrated, it failed on the AIX platform.
> 
> Error output
> 
> STDERR:
>  stdout: [];
>  stderr: [Exception in thread "main" java.lang.RuntimeException: Got 
> unexpected timezone information: Thu Aug 25 09:29:10 CEST 2022
> at CustomTzIDCheckDST.runTZTest(CustomTzIDCheckDST.java:71)
> at CustomTzIDCheckDST.main(CustomTzIDCheckDST.java:50)
> ]
> 
> 
> By my investigation, `TZ=MEZ-1MESZ,M3.5.0,M10.5.0` timezone was changed to 
> `Europe/Berlin` timezone on AIX platform.
> It seems this situation is happened because older AIX did not support 
> `MEZ-1MESZ,M3.5.0,M10.5.0` timezone by TZ environment variable.
> https://www.ibm.com/support/pages/managing-time-zone-variable-posix
> AIX special code was implemented into 
> `src/java.base/unix/native/libjava/TimeZone_md.c`.
> Current AIX supports `TZ=EST5EDT,M3.2.0/2:00:00,M11.1.0/2:00:00` style.
> I think implementation change is required. 
> 
> Some pre-submit tests are failed, but I think these are not related this 
> change since modified parts are just for AIX platform.

This pull request has now been integrated.

Changeset: 3464019d
Author:Ichiroh Takiguchi 
URL:   
https://git.openjdk.org/jdk/commit/3464019d7e8fe57adc910339c00ba79884c77852
Stats: 17 lines in 1 file changed: 11 ins; 1 del; 5 mod

8292899: CustomTzIDCheckDST.java testcase failed on AIX platform

Reviewed-by: naoto

-

PR: https://git.openjdk.org/jdk/pull/10036


Re: RFR: 8292899: CustomTzIDCheckDST.java testcase failed on AIX platform

2022-09-01 Thread Ichiroh Takiguchi
On Fri, 26 Aug 2022 18:56:31 GMT, Naoto Sato  wrote:

>> After `test/jdk/java/util/TimeZone/CustomTzIDCheckDST.java` testcase was 
>> integrated, it failed on the AIX platform.
>> 
>> Error output
>> 
>> STDERR:
>>  stdout: [];
>>  stderr: [Exception in thread "main" java.lang.RuntimeException: Got 
>> unexpected timezone information: Thu Aug 25 09:29:10 CEST 2022
>> at CustomTzIDCheckDST.runTZTest(CustomTzIDCheckDST.java:71)
>> at CustomTzIDCheckDST.main(CustomTzIDCheckDST.java:50)
>> ]
>> 
>> 
>> By my investigation, `TZ=MEZ-1MESZ,M3.5.0,M10.5.0` timezone was changed to 
>> `Europe/Berlin` timezone on AIX platform.
>> It seems this situation is happened because older AIX did not support 
>> `MEZ-1MESZ,M3.5.0,M10.5.0` timezone by TZ environment variable.
>> https://www.ibm.com/support/pages/managing-time-zone-variable-posix
>> AIX special code was implemented into 
>> `src/java.base/unix/native/libjava/TimeZone_md.c`.
>> Current AIX supports `TZ=EST5EDT,M3.2.0/2:00:00,M11.1.0/2:00:00` style.
>> I think implementation change is required. 
>> 
>> Some pre-submit tests are failed, but I think these are not related this 
>> change since modified parts are just for AIX platform.
>
> src/java.base/unix/native/libjava/TimeZone_md.c line 589:
> 
>> 587: // But Hotspot does not support XPG_SUS_ENV=ON.
>> 588: // Ignore daylight saving settings to calculate current time 
>> difference
>> 589: localtm.tm_isdst = 0;
> 
> Is it OK to reset it always? Could this defy the original purpose of the fix 
> to https://bugs.openjdk.org/browse/JDK-8285838?

I executed test program 
[JDK-8285838](https://bugs.openjdk.org/browse/JDK-8285838) on RHEL8 x86_64.
TZ="MEZ-1MESZ,M3.5.0,M10.5.0" means I'm on daylight saving time on today. 
By JDK18

$ TZ="MEZ-1MESZ,M3.5.0,M10.5.0" jdk-18/bin/java -showversion TimeTest.java
openjdk version "18" 2022-03-22
OpenJDK Runtime Environment (build 18+36-2087)
OpenJDK 64-Bit Server VM (build 18+36-2087, mixed mode, sharing)
Calendar.getInstance().getTime() = Thu Sep 01 11:52:09 GMT+01:00 2022
SimpleDateFormat = 01.09.2022 11:52:09.747

By JDK20

$ TZ="MEZ-1MESZ,M3.5.0,M10.5.0" jdk-20-b12/bin/java -showversion TimeTest.java
openjdk version "20-ea" 2023-03-21
OpenJDK Runtime Environment (build 20-ea+12-790)
OpenJDK 64-Bit Server VM (build 20-ea+12-790, mixed mode, sharing)
Calendar.getInstance().getTime() = Thu Sep 01 12:52:21 GMT+02:00 2022
SimpleDateFormat = 01.09.2022 12:52:21.269

Expected result is GMT+02:00.
It means the output is the current time difference between GMT and MESZ.

On modified build

$ TZ="MEZ-1MESZ,M3.5.0,M10.5.0" 
jdk/build/aix-ppc64-server-release/images/jdk/bin/java TimeTest.java
Calendar.getInstance().getTime() = Thu Sep 01 12:53:12 GMT+02:00 2022
SimpleDateFormat = 01.09.2022 12:53:12.930


According to AIX docs for mktime()
https://www.ibm.com/docs/en/aix/7.2?topic=c-ctime-localtime-gmtime-mktime-difftime-asctime-tzset-subroutine

> The value of the ```tm_isdst``` field determines the following actions of the 
> **mktime** subroutine:
> "0" means "Initially presumes that Daylight Saving Time (DST) is not in 
> effect."

If daylight saving time ends by August, timezone should be GMT+01:00

$ TZ="MEZ-1MESZ,M3.5.0,M8.5.0" 
jdk/build/aix-ppc64-server-release/images/jdk/bin/java TimeTest.java  
Calendar.getInstance().getTime() = Thu Sep 01 11:53:36 GMT+01:00 2022
SimpleDateFormat = 01.09.2022 11:53:36.189


According to simple C testcase

$ cat sf.c
#include 
#include 
#include 
 
int main(void)
{
char buf[100];
struct tm localtm;
struct tm gmt;
time_t clock = time(NULL);
int gmt_off;

#if defined(_AIX)
putenv("XPG_SUS_ENV=ON");
#endif
if (localtime_r(, ) == NULL) {
return 1;
}
if (gmtime_r(, ) == NULL) {
return 1;
}
strftime(buf, sizeof(buf),"%z", );
printf("strftime: %s\n",buf);
localtm.tm_isdst = 0;
gmt_off = (int)(difftime(mktime(), mktime()) / 60.0);
sprintf(buf, (const char *)"%c%02.2d%02.2d",
gmt_off < 0 ? '-' : '+' , abs(gmt_off / 60), gmt_off % 60);
printf("difftime: %s\n",buf);
return 0;
}

On RHEL8:

$ TZ="MEZ-1MESZ,M3.5.0,M10.5.0" ./sf
strftime: +0200
difftime: +0200
$ TZ="ZZZ-1-3,M3.5.0,M10.5.0" ./sf
strftime: +0300
difftime: +0300
$ TZ="ZZZ-1-3,M3.5.0,M8.5.0" ./sf
strftime: +0100
difftime: +0100

On AIX:

$ TZ="MEZ-1MESZ,M3.5.0,M10.5.0" ./sf
strftime: +0200
difftime: +0200
$ TZ="ZZZ-1-3,M3.5.0,M10.5.0" ./sf
strftime: +0300
difftime: +0300
$ TZ="ZZZ-1-3,M3.5.0,M8.5.0" ./sf
strftime: +0100
difftime: +0100

I assume the modified code should be fine.

-

PR: https://git.openjdk.org/jdk/pull/10036


RFR: 8292899: CustomTzIDCheckDST.java testcase failed on AIX platform

2022-08-26 Thread Ichiroh Takiguchi
After `test/jdk/java/util/TimeZone/CustomTzIDCheckDST.java` testcase was 
integrated, it failed on the AIX platform.

Error output

STDERR:
 stdout: [];
 stderr: [Exception in thread "main" java.lang.RuntimeException: Got unexpected 
timezone information: Thu Aug 25 09:29:10 CEST 2022
at CustomTzIDCheckDST.runTZTest(CustomTzIDCheckDST.java:71)
at CustomTzIDCheckDST.main(CustomTzIDCheckDST.java:50)
]


By my investigation, `TZ=MEZ-1MESZ,M3.5.0,M10.5.0` timezone was changed to 
`Europe/Berlin` timezone on AIX platform.
It seems this situation is happened because older AIX did not support 
`MEZ-1MESZ,M3.5.0,M10.5.0` timezone by TZ environment variable.
https://www.ibm.com/support/pages/managing-time-zone-variable-posix
AIX special code was implemented into 
`src/java.base/unix/native/libjava/TimeZone_md.c`.
Current AIX supports `TZ=EST5EDT,M3.2.0/2:00:00,M11.1.0/2:00:00` style.
I think implementation change is required. 

Some pre-submit tests are failed, but I think these are not related this change 
since modified parts are just for AIX platform.

-

Commit messages:
 - 8292899: CustomTzIDCheckDST.java testcase failed on AIX platform

Changes: https://git.openjdk.org/jdk/pull/10036/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk=10036=00
  Issue: https://bugs.openjdk.org/browse/JDK-8292899
  Stats: 17 lines in 1 file changed: 11 ins; 1 del; 5 mod
  Patch: https://git.openjdk.org/jdk/pull/10036.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/10036/head:pull/10036

PR: https://git.openjdk.org/jdk/pull/10036


Re: RFR: 8289834: Add SBCS and DBCS Only EBCDIC charsets

2022-08-26 Thread Ichiroh Takiguchi
On Mon, 8 Aug 2022 09:22:32 GMT, Alan Bateman  wrote:

>> Hello @AlanBateman .
>> Sorry I'm late.
>> I got some responses from ICU. 
>> [ICU-22091](https://unicode-org.atlassian.net/browse/ICU-22091)
>> I'm not sure if they're interested in the new charset...
>> 
>> As you know `sun.nio.cs.ArrayDecoder` and `sun.nio.cs.ArrayEncoder`interface 
>> have performance advantage.
>> And some other performance advantages are there on built-in charset 
>> decoder/encoder.
>> Is it possible to create simple public API by using `sun.nio.cs.SingleByte` 
>> and `sun.nio.cs.DoubleByte*` classes?
>> We'd like to use stable conversion loop.
>
>> As you know `sun.nio.cs.ArrayDecoder` and `sun.nio.cs.ArrayEncoder`interface 
>> have performance advantage. And some other performance advantages are there 
>> on built-in charset decoder/encoder. Is it possible to create simple public 
>> API by using `sun.nio.cs.SingleByte` and `sun.nio.cs.DoubleByte*` classes? 
>> We'd like to use stable conversion loop.
> 
> If they have ASCII compatible regions then that may be so but I haven't see 
> any performance data published on that. Do you know if any experiments that 
> have deployed a CharsetProvider for the EBCDIC charsets and compared the 
> performance with the charsets that in the JDK? There may be merit in 
> exploring adding base abstracts implementations of 
> CharsetEncoder/CharsetDecoder to java.nio.charsets.spi to support single and 
> double byte charsets to see how such base implementations might look, how 
> they would help performance, and if there are any security downsides.

Hello @AlanBateman .
Sorry, I'm late.
Test result is attached (not guaranteed).

I created attached small test program, I'm not sure it's good or not

import java.nio.*;
import java.nio.charset.*;

public class tc {
  public static void main(String[] args) throws Exception {
Charset cs = Charset.forName(args[0]);
int cnt = Integer.parseInt(args[1]);
boolean useCA = "1".equals(args[2]);
boolean useBA = "1".equals(args[3]);
CharsetEncoder ce = cs.newEncoder();
byte[] ba = new byte[0x4000];
for(int i = 0; i < ba.length; i++) {
  ba[i] = (byte) i;
}
String s = new String(ba, cs);
char[] ca = s.toCharArray();
ByteBuffer bb = useBA ? ByteBuffer.allocate(ca.length) : 
ByteBuffer.allocateDirect(ca.length);;
CharBuffer cb = useCA ? CharBuffer.wrap(ca) : CharBuffer.wrap(s);
System.out.println("CharBuffer.hasArray() = " + cb.hasArray());
System.out.println("ByteBuffer.hasArray() = " + bb.hasArray());
long start_t = System.currentTimeMillis();
for(int i = 0; i < 200; i++) {
  ce.reset();
  bb.position(0);
  cb.position(0);
  ce.encode(cb, bb, true);
}
System.out.println("Warmup: "+(System.currentTimeMillis() - start_t));
start_t = System.currentTimeMillis();
for(int i = 0; i < cnt; i++) {
  ce.reset();
  bb.position(0);
  cb.position(0);
  ce.encode(cb, bb, true);
}
System.out.println("Test: "+(System.currentTimeMillis() - start_t));
  }
}


Following test result is just for my test environment
* CPU: Intel (On-premises environment), not same machine
* Executed 5 times, the values are their average 

Use following options, like
OpenJDK:
`java -cp icu4j-71_1.jar:icu4j-charset-71_1.jar:. tc IBM-1047 2 1 1`
ICU4J
`java -cp icu4j-71_1.jar:icu4j-charset-71_1.jar:. tc IBM-1047_P100-1995 2 1 
1`

I used jdk-20 b12
Only A/A with OpenJDK uses ArrayEncoder (ArrayDecoder) interface

| | A/A | A/B | B/A | B/B |
| -- | --: | --: | --: | --: |
| Linux (OpenJDK) | 862 | 1265 | 1838 | 1843 |
| Linux (ICU4J) | 1450 | 1410 | 1152 | 1138 |
| Windows (OpenJDK) | 921 | 1231 | 1959 | 1850 |
| Windows (ICU4J) | 1431 | 1446 | 2227 | 2265 |
| Mac (OpenJDK) | 820 | 1163 | 1799 | 1774 |
| Mac (ICU4J) | 1282 | 1242 | 994 | 1049 |

Notes:
* A/A means CharBuffer is created via char[], ByteBuffer is generated by 
allocate()
* A/B means CharBuffer is created via char[], ByteBuffer is generated by 
allocateDirect()
* B/A means CharBuffer is created via String, ByteBuffer is generated by 
allocate()
* B/B means CharBuffer is created via String, ByteBuffer is generated by 
allocateDirect()

Actually, I'm confused by this result.
Previously, I was just comparing A/A with B/B on OpenJDK's charset.
I didn't think ICU4J's result would make a difference.

Anyway, please evaluate about this result.
And please let me know if I need more investigation.

-

PR: https://git.openjdk.org/jdk/pull/9399


Re: RFR: 8291916: Unexpected output on Arabic Windows command prompt

2022-08-07 Thread Ichiroh Takiguchi
On Fri, 5 Aug 2022 16:44:37 GMT, Naoto Sato  wrote:

>> To support Windows command prompt's codepage, following charsets should be 
>> moved from jdk.charsets module to java.base module.
>> 
>> - IBM860
>> - IBM861
>> - IBM863
>> - IBM864
>> - IBM865
>> - IBM869
>
> Hi @takiguc,
> I am not quite sure what is the rationale for moving those charsets into 
> `java.base` module. IIUC, we typically did such a fix when the java runtime 
> cannot boot in a supported configuration 
> (https://bugs.openjdk.org/browse/JDK-8187910), but it seems that this issue 
> does not warrant such a requirement. Will you elaborate more?

Hello @naotoj .
As Alan was described, windows codepage mapping table is as follows

- 860 - Portuguese (DOS) - IBM860
- 861 - Icelandic (DOS) - IBM861
- 863 - French Canadian (DOS) - IBM863
- 864 - Arabic (864) - IBM864
- 865 - Nordic (DOS) - IBM865
- 869 - Greek, Modern (DOS) - IBM869

Java 8 implementation is as follows:
Windows command prompt setting, following sample is 864.

>chcp 864
Active code page: 864

Test program

>type termdump.java
import java.nio.charset.*;

public class termdump {
  public static void main(String[] args) throws Exception {
String csname = System.getProperty("sun.stdout.encoding");
if (csname == null) csname = System.getProperty("stdout.encoding");
System.out.println(csname);
Charset cs = Charset.forName(csname);
for (int i0 = 0; i0 < 0x100; i0 += 0x10) {
  StringBuilder sb = new StringBuilder();
  for (int i1 = 0; i1 < 0x10; i1++) {
byte[] ba = new byte[1];
ba[0] = (byte) (i0 | i1);
String s = new String(ba, csname);
if (s.length() == 1) {
  char ch = s.charAt(0);
  if (ch < 0x7F) continue;
  if (Character.isISOControl(ch)) continue;
  if (ch == '\uFFFD') continue;
  sb.append(ch);
}
  }
  if (sb.length() > 0) {
System.out.printf("0x%02X %s%n", i0, sb.toString());
System.out.print("");
for (char ch : sb.toString().toCharArray()) {
  System.out.printf(" %04X", (int)ch);
}
System.out.println();
  }
}
  }
}

Java8 output

>jdk8u345-b01\jre\bin\java termdump
cp864
0x20 %
 066A
0x80 °·∙√▒─│┼┤┬├┴┐┌└┘
 00B0 00B7 2219 221A 2592 2500 2502 253C 2524 252C 251C 2534 2510 250C 2514 
2518
0x90 β∞φ±½¼≈«»ﻷﻸﻻﻼ
 03B2 221E 03C6 00B1 00BD 00BC 2248 00AB 00BB FEF7 FEF8 FEFB FEFC
0xA0  ­ﺂ£¤ﺄﺎﺏﺕﺙ،ﺝﺡﺥ
 00A0 00AD FE82 00A3 00A4 FE84 FE8E FE8F FE95 FE99 060C FE9D FEA1 FEA5
0xB0 ٠١٢٣٤٥٦٧٨٩ﻑ؛ﺱﺵﺹ؟
 0660 0661 0662 0663 0664 0665 0666 0667 0668 0669 FED1 061B FEB1 FEB5 FEB9 
061F
0xC0 ¢ﺀﺁﺃﺅﻊﺋﺍﺑﺓﺗﺛﺟﺣﺧﺩ
 00A2 FE80 FE81 FE83 FE85 FECA FE8B FE8D FE91 FE93 FE97 FE9B FE9F FEA3 FEA7 
FEA9
0xD0 ﺫﺭﺯﺳﺷﺻﺿﻁﻅﻋﻏ¦¬÷×ﻉ
 FEAB FEAD FEAF FEB3 FEB7 FEBB FEBF FEC1 FEC5 FECB FECF 00A6 00AC 00F7 00D7 
FEC9
0xE0 ـﻓﻗﻛﻟﻣﻧﻫﻭﻯﻳﺽﻌﻎﻍﻡ
 0640 FED3 FED7 FEDB FEDF FEE3 FEE7 FEEB FEED FEEF FEF3 FEBD FECC FECE FECD 
FEE1
0xF0 ﹽّﻥﻩﻬﻰﻲﻐﻕﻵﻶﻝﻙﻱ■
 FE7D 0651 FEE5 FEE9 FEEC FEF0 FEF2 FED0 FED5 FEF5 FEF6 FEDD FED9 FEF1 25A0

Java20 output

>jdk-20\bin\java termdump
cp864
0x20 ﻋﺕ
 066A
0x80 ﺁ٠ﺁ٧ﻗ┤ﻷﻗ┤ﻸﻗ≈φﻗ½°ﻗ½∙ﻗ½ﺱﻗ½¤ﻗ½،ﻗ½œﻗ½٤ﻗ½βﻗ½┐ﻗ½½ﻗ½»
 00B0 00B7 2219 221A 2592 2500 2502 253C 2524 252C 251C 2534 2510 250C 2514 
2518
0x90 ﺧ٢ﻗ┤ﻼﺩ│ﺁ١ﺁﺵﺁﺱﻗ┬┤ﺁﺙﺁ؛ﻡ؛٧ﻡ؛٨ﻡ؛؛ﻡ؛ﺱ
 03B2 221E 03C6 00B1 00BD 00BC 2248 00AB 00BB FEF7 FEF8 FEFB FEFC
0xA0 ﺁ ﺁﺝﻡﻑ∙ﺁ£ﺁ¤ﻡﻑ▒ﻡﻑ└ﻡﻑ┘ﻡﻑ¼ﻡﻑﻷﻅ┐ﻡﻑﻻﻡﻑ­ﻡﻑﺄ
 00A0 00AD FE82 00A3 00A4 FE84 FE8E FE8F FE95 FE99 060C FE9D FEA1 FEA5
0xB0 ﻋ ﻋ­ﻋﺂﻋ£ﻋ¤ﻋﺄﻋﻋﻋﺎﻋﺏﻡ؛∞ﻅ›ﻡﻑ١ﻡﻑ٥ﻡﻑ٩ﻅŸ
 0660 0661 0662 0663 0664 0665 0666 0667 0668 0669 FED1 061B FEB1 FEB5 FEB9 
061F
0xC0 ﺁﺂﻡﻑ°ﻡﻑ·ﻡﻑ√ﻡﻑ─ﻡ؛├ﻡﻑ┴ﻡﻑ┌ﻡﻑ∞ﻡﻑ±ﻡﻑ«ﻡﻑ›ﻡﻑŸﻡﻑ£ﻡﻑﻡﻑﺏ
 00A2 FE80 FE81 FE83 FE85 FECA FE8B FE8D FE91 FE93 FE97 FE9B FE9F FEA3 FEA7 
FEA9
0xD0 ﻡﻑﺙﻡﻑﺝﻡﻑﺥﻡﻑ٣ﻡﻑ٧ﻡﻑ؛ﻡﻑ؟ﻡ؛·ﻡ؛─ﻡ؛┴ﻡ؛┘ﺁﺁ،ﺃ٧ﺃ«ﻡ؛┬
 FEAB FEAD FEAF FEB3 FEB7 FEBB FEBF FEC1 FEC5 FECB FECF 00A6 00AC 00F7 00D7 
FEC9
0xE0 ﻋ°ﻡ؛±ﻡ؛«ﻡ؛›ﻡ؛Ÿﻡ؛£ﻡ؛ﻡ؛ﺙﻡ؛ﺝﻡ؛ﺥﻡ؛٣ﻡﻑﺵﻡ؛┐ﻡ؛└ﻡ؛┌ﻡ؛­
 0640 FED3 FED7 FEDB FEDF FEE3 FEE7 FEEB FEED FEEF FEF3 FEBD FECC FECE FECD 
FEE1
0xF0 ﻡ٩ﺵﻋ∞ﻡ؛ﺄﻡ؛ﺏﻡ؛،ﻡ؛٠ﻡ؛٢ﻡ؛βﻡ؛¼ﻡ؛٥ﻡ؛٦ﻡ؛ﻻﻡ؛ﻷﻡ؛١ﻗ≈ 
 FE7D 0651 FEE5 FEE9 FEEC FEF0 FEF2 FED0 FED5 FEF5 FEF6 FEDD FED9 FEF1 25A0

Fixed output

>java -showversion termdump
openjdk version "20-internal" 2023-03-21
OpenJDK Runtime Environment (build 20-internal-adhoc.Administrator.jdk)
OpenJDK 64-Bit Server VM (build 20-internal-adhoc.Administrator.jdk, mixed 
mode, sharing)
cp864
0x20 %
 066A
0x80 °·∙√▒─│┼┤┬├┴┐┌└┘
 00B0 00B7 2219 221A 2592 2500 2502 253C 2524 252C 251C 2534 2510 250C 2514 
2518
0x90 β∞φ±½¼≈«»ﻷﻸﻻﻼ
 03B2 221E 03C6 00B1 00BD 00BC 2248 00AB 00BB FEF7 FEF8 FEFB FEFC
0xA0  ­ﺂ£¤ﺄﺎﺏﺕﺙ،ﺝﺡﺥ
 00A0 00AD FE82 00A3 00A4 FE84 FE8E FE8F FE95 FE99 060C FE9D FEA1 FEA5
0xB0 ٠١٢٣٤٥٦٧٨٩ﻑ؛ﺱﺵﺹ؟
 0660 0661 0662 0663 0664 0665 0666 0667 0668 0669 FED1 061B FEB1 FEB5 FEB9 
061F
0xC0 ¢ﺀﺁﺃﺅﻊﺋﺍﺑﺓﺗﺛﺟﺣﺧﺩ
 00A2 FE80 FE81 FE83 FE85 FECA FE8B FE8D FE91 FE93 FE97 FE9B FE9F FEA3 FEA7 
FEA9
0xD0 ﺫﺭﺯﺳﺷﺻﺿﻁﻅﻋﻏ¦¬÷×ﻉ
 FEAB FEAD FEAF FEB3 FEB7 FEBB 

Re: RFR: 8289834: Add SBCS and DBCS Only EBCDIC charsets

2022-08-07 Thread Ichiroh Takiguchi
On Thu, 7 Jul 2022 09:47:25 GMT, Alan Bateman  wrote:

>> And also there is no reason why db drivers or host connectors should not 
>> ship their own charset support \(Oracle JDBC for example had nls\_charset 
>> addons\. My employer also ship a custom EBCDIC encoding which includes some 
>> compatibility hacks\, and that took some effort to adopt it to the missing 
>> ext mechanism\)\.
>> 
>> Having said that\, with JPMS a \?legacy ebcdic\? encoding module would be 
>> possible while still being optional\. Maybe in the future a mechanism for 
>> modules which can be added \(instead of removed\) from standard distribution 
>> would make that nicer\?
>> 
>> Is there a performance restriction for charset if they are not part of a 
>> platform module \(optimized string access\)\?
>> 
>> Gruss
>> Bernd
>> 
>> 
>> \-\-
>> http\:\/\/bernd\.eckenfels\.net
>> \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
>> Von\: core\-libs\-dev \ im Auftrag 
>> von Alan Bateman \
>> Gesendet\: Thursday\, July 7\, 2022 11\:50\:39 AM
>> An\: build\-dev at openjdk\.org \\; 
>> core\-libs\-dev at openjdk\.org \\; 
>> i18n\-dev at openjdk\.org \
>> Betreff\: Re\: RFR\: 8289834\: Add SBCS and DBCS Only EBCDIC charsets
>> 
>> On Wed\, 6 Jul 2022 16\:18\:08 GMT\, Ichiroh Takiguchi \> openjdk\.org> wrote\:
>> 
>>> Discussions are available on \:
>>> \[JDK\-8289834\]\(https\:\/\/bugs\.openjdk\.org\/browse\/JDK\-8289834\)\: 
>>> Add SBCS and DBCS Only EBCDIC charsets
>> 
>> Yes\, I think this need discussion on whether the JDK really needs to keep 
>> including and adding more EBCDIC charsets\. I understand they can be useful 
>> for someone using JDBC to connect to a database on z\/OS but this scenario 
>> would work equally well if the EBCDIC charsets were deployed on the class 
>> path or module path\. Do you know if the icu4j project is still alive\? 
>> I\'ve often wondered why there wasn\'t more use of the provider mechanism\.
>> 
>> \-\-\-\-\-\-\-\-\-\-\-\-\-
>> 
>> PR\: https\:\/\/git\.openjdk\.org\/jdk\/pull\/9399
>> \-\-\-\-\-\-\-\-\-\-\-\-\-\- next part \-\-\-\-\-\-\-\-\-\-\-\-\-\-
>> An HTML attachment was scrubbed\.\.\.
>> URL\: 
>> \
>
>> Discussions are available on :
>> [JDK-8289834](https://bugs.openjdk.org/browse/JDK-8289834): Add SBCS and 
>> DBCS Only EBCDIC charsets
> 
> Yes, I think this need discussion on whether the JDK really needs to keep 
> including and adding more EBCDIC charsets. I understand they can be useful 
> for someone using JDBC to connect to a database on z/OS but this scenario 
> would work equally well if the EBCDIC charsets were deployed on the class 
> path or module path. Do you know if the icu4j project is still alive? I've 
> often wondered why there wasn't more use of the provider mechanism.

Hello @AlanBateman .
Sorry I'm late.
I got some responses from ICU. 
[ICU-22091](https://unicode-org.atlassian.net/browse/ICU-22091)
I'm not sure if they're interested in the new charset...

As you know `sun.nio.cs.ArrayDecoder` and `sun.nio.cs.ArrayEncoder`interface 
have performance advantage.
And some other performance advantages are there on built-in charset 
decoder/encoder.
Is it possible to create simple public API by using `sun.nio.cs.SingleByte` and 
`sun.nio.cs.DoubleByte*` classes?
We'd like to use stable conversion loop.

-

PR: https://git.openjdk.org/jdk/pull/9399


Re: RFR: 8290488: IBM864 character encoding implementation bug

2022-08-03 Thread Ichiroh Takiguchi
On Wed, 27 Jul 2022 17:47:36 GMT, Naoto Sato  wrote:

> Adding an extra c2b mapping for the `%` in `IBM864` charset. The discrepancy 
> came from the mapping difference between MS and IBM.

I think you can ignore my comments.
I'm not sure if this change will solve the reporter's issue...

-

PR: https://git.openjdk.org/jdk/pull/9661


Re: RFR: 8290488: IBM864 character encoding implementation bug

2022-07-28 Thread Ichiroh Takiguchi
On Thu, 28 Jul 2022 16:18:51 GMT, Naoto Sato  wrote:

>> Many thanks @naotoj .
>> 
>> I checked the latest IBM-864 mapping table.
>> (I assume current OpenJDK's IBM864 may refer older mapping table)
>> https://raw.githubusercontent.com/unicode-org/icu/main/icu4c/source/data/mappings/ibm-864_X110-1999.ucm
>> .ucm file format is as follows:
>> https://unicode-org.github.io/icu/userguide/conversion/data.html#ucm-file-format
>> 
>> I checked roundtrip mapping
>> (Roundtrip entries have `|0` at the end of line)
>> | IBM864.map | ibm-864_X110-1999.ucm  |
>> | --- | --- |
>> | 0x1aU+001a | 0x1aU+001c |
>> | 0x1cU+001c | 0x1cU+007f |
>> | **0x25U+066a** | **0x25U+0025** |
>> | 0x7fU+007f | 0x7fU+001a |
>> | 0x9fU+fffd | 0x9fU+200b |
>> | 0xd7U+fec1 | 0xd7U+fec3 |
>> | 0xd8U+fec5 | 0xd8U+fec7 |
>> | 0xf1U+0651 | 0xf1U+fe7c |
>> 
>> **Note**: 0x1a <-> U+001c / 0x1c <-> U+007f /  0x7f <-> U+001a entries are 
>> control character rotation for DOS.
>> I think it should be ignored.
>> 
>> I think, roundtrip side should be changed.
>> 0x25 entry should be U+0025 on IBM864.map
>> Add `0x25 U+066a` into IBM864.c2b
>> 
>> Modify test/jdk/sun/nio/cs/mapping/Cp864.b2c for `0025 0025`
>> Add `0025 066a` into test/jdk/sun/nio/cs/mapping/Cp864.c2b-irreversible
>> 
>> This issue just for U+0025, but f possible, please add `0x9f, 0xd7, 0xd8, 
>> 0xf1` entries.
>
> Thanks for trying it out @takiguc. However, I am not planning to change any 
> existing mappings because of the obvious compatibility issues. The fix I 
> proposed is safe because it is additional, which used to be unmappable (thus 
> turned into a replacement '?').

Hello @naotoj .

I checked [JDK-8290488](https://bugs.openjdk.org/browse/JDK-8290488).
This issue was tested by Windows 10.
I think we need to confirm expected result for b2c side to reporter.

I checked MS's 864 via following test program on my Windows 10.

>type b2c_1.ps1
param($code, $hex)
$h = [string]$hex
$enc_r = [Text.Encoding]::GetEncoding([int]$code)
[byte[]]$ba = @()
for($i = 0; $i -lt $h.length; $i+=2) {
  $ba += ([System.Convert]::ToInt32($h.SubString($i,2), 16))
}
$s = ""
$enc_r.GetChars($ba) | foreach {$s += 
[System.Convert]::ToInt32($_).ToString("X4")}
$s
>powershell -NoProfile -ExecutionPolicy Unrestricted .\b2c_1.ps1 864 25
0025


Please ignore about 0xD7,0xD8,0xF1 if the target platform is Windows.

Note: Test result for c2b side.

>type c2b_1.ps1
param($code, $hex)
$enc_r = [Text.Encoding]::GetEncoding([int]$code)
[char[]]$ca = @()
$ca += ([System.Convert]::ToInt32([string]$hex, 16))
$s = ""
$enc_r.GetBytes($ca) | foreach {$s += 
[System.Convert]::ToInt32($_).ToString("X2")}
$s
>powershell -NoProfile -ExecutionPolicy Unrestricted .\c2b_1.ps1 864 0025
25

>powershell -NoProfile -ExecutionPolicy Unrestricted .\c2b_1.ps1 864 066A
25

-

PR: https://git.openjdk.org/jdk/pull/9661


Re: RFR: 8290488: IBM864 character encoding implementation bug

2022-07-28 Thread Ichiroh Takiguchi
On Thu, 28 Jul 2022 01:46:26 GMT, Naoto Sato  wrote:

>> Hello @naotoj .
>> I'm not reviewer, but I'd like to test this change.
>> Could you wait for a moment ?
>> Thanks.
>
> @takiguc Sure. Appreciate it.

Many thanks @naotoj .

I checked the latest IBM-864 mapping table.
(I assume current OpenJDK's IBM864 may refer older mapping table)
https://raw.githubusercontent.com/unicode-org/icu/main/icu4c/source/data/mappings/ibm-864_X110-1999.ucm
.ucm file format is as follows:
https://unicode-org.github.io/icu/userguide/conversion/data.html#ucm-file-format

I checked roundtrip mapping
| IBM864.map | ibm-864_X110-1999.ucm  |
| --- | --- |
| 0x1aU+001a | 0x1aU+001c |
| 0x1cU+001c | 0x1cU+007f |
| **0x25U+066a** | **0x25U+0025** |
| 0x7fU+007f | 0x7fU+001a |
| 0x9fU+fffd | 0x9fU+200b |
| 0xd7U+fec1 | 0xd7U+fec3 |
| 0xd8U+fec5 | 0xd8U+fec7 |
| 0xf1U+0651 | 0xf1U+fe7c |

**Note**: 0x1a <-> U+001c / 0x1c <-> U+007f /  0x7f <-> U+001a entries are 
control character rotation for DOS.
I think it should be ignored.

I think, roundtrip side should be changed.
0x25 entry should be U+0025 on IBM864.map
Add `0x25 U+066a` into IBM864.c2b

Modify test/jdk/sun/nio/cs/mapping/Cp864.b2c for `0025 0025`
Add `0025 066a` into test/jdk/sun/nio/cs/mapping/Cp864.c2b-irreversible

This issue just for U+0025, but f possible, please add `0x9f, 0xd7, 0xd8, 0xf1` 
entries.

-

PR: https://git.openjdk.org/jdk/pull/9661


Re: RFR: 8290488: IBM864 character encoding implementation bug

2022-07-27 Thread Ichiroh Takiguchi
On Wed, 27 Jul 2022 17:47:36 GMT, Naoto Sato  wrote:

> Adding an extra c2b mapping for the `%` in `IBM864` charset. The discrepancy 
> came from the mapping difference between MS and IBM.

Hello @naotoj .
I'm not reviewer, but I'd like to test this change.
Could you wait for a moment ?
Thanks.

-

PR: https://git.openjdk.org/jdk/pull/9661


Re: RFR: 8289834: Add SBCS and DBCS Only EBCDIC charsets

2022-07-07 Thread Ichiroh Takiguchi
On Wed, 6 Jul 2022 14:05:39 GMT, Ichiroh Takiguchi  
wrote:

> OpenJDK supports "Japanese EBCDIC - Katakana" and "Korean EBCDIC" SBCS and 
> DBCS Only charsets.
> |Charset|Mix|SBCS|DBCS|
> | -- | -- | -- | -- |
> | Japanese EBCDIC - Katakana | Cp930 | Cp290 | Cp300 |
> | Korean | Cp933 | Cp833 | Cp834 |
> 
> But OpenJDK does not supports some of "Japanese EBCDIC - English" / 
> "Simplified Chinese EBCDIC" / "Traditional Chinese EBCDIC" SBCS and DBCS Only 
> charsets.
> 
> I'd like to request Cp1027/Cp835/Cp836/Cp837 for consistency
> |Charset|Mix|SBCS|DBCS|
> | - | - | - | - |
> | Japanese EBCDIC - English | Cp939 | **Cp1027** | Cp300 |
> | Simplified Chinese EBCDIC | Cp935 | **Cp836** | **Cp837** |
> | Traditional Chinese EBCDIC | Cp937 | (*1) | **Cp835** | 
> 
> *1: Cp037 compatible

Discussions are available on :
[JDK-8289834](https://bugs.openjdk.org/browse/JDK-8289834): Add SBCS and DBCS 
Only EBCDIC charsets

-

PR: https://git.openjdk.org/jdk/pull/9399


RFR: 8289834: Add SBCS and DBCS Only EBCDIC charsets

2022-07-07 Thread Ichiroh Takiguchi
OpenJDK supports "Japanese EBCDIC - Katakana" and "Korean EBCDIC" SBCS and DBCS 
Only charsets.
|Charset|Mix|SBCS|DBCS|
| -- | -- | -- | -- |
| Japanese EBCDIC - Katakana | Cp930 | Cp290 | Cp300 |
| Korean | Cp933 | Cp833 | Cp834 |

But OpenJDK does not supports some of "Japanese EBCDIC - English" / "Simplified 
Chinese EBCDIC" / "Traditional Chinese EBCDIC" SBCS and DBCS Only charsets.

I'd like to request Cp1027/Cp835/Cp836/Cp837 for consistency
|Charset|Mix|SBCS|DBCS|
| - | - | - | - |
| Japanese EBCDIC - English | Cp939 | **Cp1027** | Cp300 |
| Simplified Chinese EBCDIC | Cp935 | **Cp836** | **Cp837** |
| Traditional Chinese EBCDIC | Cp937 | (*1) | **Cp835** | 

*1: Cp037 compatible

-

Commit messages:
 - 8289834: Missing SBCS and DBCS Only EBCDIC charsets

Changes: https://git.openjdk.org/jdk/pull/9399/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk=9399=00
  Issue: https://bugs.openjdk.org/browse/JDK-8289834
  Stats: 369 lines in 6 files changed: 367 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/9399.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/9399/head:pull/9399

PR: https://git.openjdk.org/jdk/pull/9399