Re: JDK-8226810: An other case and a small change suggestion

2020-05-11 Thread Brent Christian

FYI, I've filed a new issue[1] for this fix, and requested a code review:

http://mail.openjdk.java.net/pipermail/core-libs-dev/2020-May/066384.html

I don't believe I need anything else, Johannes.  Thank you!

-Brent

1. https://bugs.openjdk.java.net/browse/JDK-8244767

On 5/10/20 7:47 AM, Johannes Kuhn wrote:

On 09-May-20 1:27, Brent Christian wrote:

On 5/7/20 5:50 AM, Alan Bateman wrote:

On 07/05/2020 12:37, Johannes Kuhn wrote:

:

In the end, I don't know what causes the bug, or how I can replicate 
it.

I think I did find a likely suspect.

>
Good sleuthing. I don't what the conditions are for GetLocaleInfo to 
fail but it does look like that would return possibly non-terminated 
garbage starting with "CP" so we should at least fix that.


I agree.  I can file a bug and sponsor this small change.

-Brent

Thank you.

After the discussion with Naoto, I would like to change the one line to 
strcpy(ret + 2, "1252")


diff --git a/src/java.base/windows/native/libjava/java_props_md.c 
b/src/java.base/windows/native/libjava/java_props_md.c

--- a/src/java.base/windows/native/libjava/java_props_md.c
+++ b/src/java.base/windows/native/libjava/java_props_md.c
@@ -73,6 +73,7 @@
    LOCALE_IDEFAULTANSICODEPAGE,
    ret+2, 14) == 0) {
  codepage = 1252;
+    strcpy(ret + 2, "1252");
  } else {
  codepage = atoi(ret+2);
  }

I have already signed the OCA.
This would be my first patch. Is there anything else you need?

I will take a further look into GetLocaleInfo, and try to find a model 
that works for me (currently thinking about blackbox, can return 
anything from [1]),

and then discussing how to handle all those cases.

- Johannes

[1]: 
https://docs.microsoft.com/en-us/windows/win32/intl/code-page-identifiers


Re: JDK-8226810: An other case and a small change suggestion

2020-05-11 Thread Johannes Kuhn

On 11-May-20 10:59, Alan Bateman wrote:



On 10/05/2020 15:47, Johannes Kuhn wrote:

:

After the discussion with Naoto, I would like to change the one line 
to strcpy(ret + 2, "1252")


diff --git a/src/java.base/windows/native/libjava/java_props_md.c 
b/src/java.base/windows/native/libjava/java_props_md.c

--- a/src/java.base/windows/native/libjava/java_props_md.c
+++ b/src/java.base/windows/native/libjava/java_props_md.c
@@ -73,6 +73,7 @@
   LOCALE_IDEFAULTANSICODEPAGE,
   ret+2, 14) == 0) {
 codepage = 1252;
+    strcpy(ret + 2, "1252");
 } else {
 codepage = atoi(ret+2);
 }

I have already signed the OCA.
This would be my first patch. Is there anything else you need?

I will take a further look into GetLocaleInfo, and try to find a 
model that works for me (currently thinking about blackbox, can 
return anything from [1]),

and then discussing how to handle all those cases.
It's not clear how conditions are necessary for GetLocaleInfo to fail 
but good to fix this code path. We should probably create a new issue 
in JIRA for this as it's not clear that it is the same issue as 
JDK-8226810. We've been chasing JDK-8226810 for some time but have not 
been able to find a system where it duplicates. The reports so far 
suggests is Windows Chinese but most of the people that report the 
issue to not respond to questions so we can't be sure. One of the 
reports (see my comment in the bug from June 2019) suggests the code 
page is 54936, useful information as GB18030 is not in java.base. Once 
your fix is in then I think we should try to improve the exception 
message rather than NPE. There may be some of these cases where it 
should default to UTF-8.


-Alan.


I agree.
Looking through the JDK code, the IMHO best way to create a hook for 
this is Charset.defaultCharset().
Instead of using GetPropertyAction, it could retrieve 
System.getProperties and do a null check there. This has probably the 
least impact on performance as this code path is only executed once.
Indeed, it could (theoretically) emit a warning/debug message there with 
debug information and return sun.nio.cs.UTF_8.INSTANCE as fallback 
without setting defaultCharset.


But that is a hard problem. I don't really like adding a warning to java 
startup, even if it would not work before just to get debug information.
On the other hand, just continuing as if nothing did happen could hide 
further bugs. (Also, how would I get the debug info from the Java side, 
as properties are not yet initialized.)


I wrote a small C program which just outputs the result of 
GetLocaleInfo(GetSystemDefaultLCID(), LOCALE_IDEFAULTANSICODEPAGE, ...).

I was not able to influence that result - on my machine it's always 1252.

My current understanding of codepages is basically: They are like encodings.
They come in two variants (?): ANSI and non-ANSI. The ansi codepages use 
ANSI for bytes between 0 and 127.
Each system, application or thread (wtf? see CP_THREAD_ACP) can have a 
different current codepage.
Some windows API functions (the *A variants) use the current codepage 
(as returned by GetACP) to translate the string from/to unicode.
GetACP returns the codepage identifier for the operating system. (No 
mention of thread in the doc for that, so why is there even CP_THREAD_ACP?)


It's a rabbit hole.
If someone knows about a way to influence the result of 
GetLocaleInfo(GetSystemDefaultLCID(), LOCALE_IDEFAULTANSICODEPAGE, ...), 
then at least reproducing the error would become possible.


- Johannes



Re: JDK-8226810: An other case and a small change suggestion

2020-05-11 Thread Alan Bateman




On 10/05/2020 15:47, Johannes Kuhn wrote:

:

After the discussion with Naoto, I would like to change the one line 
to strcpy(ret + 2, "1252")


diff --git a/src/java.base/windows/native/libjava/java_props_md.c 
b/src/java.base/windows/native/libjava/java_props_md.c

--- a/src/java.base/windows/native/libjava/java_props_md.c
+++ b/src/java.base/windows/native/libjava/java_props_md.c
@@ -73,6 +73,7 @@
   LOCALE_IDEFAULTANSICODEPAGE,
   ret+2, 14) == 0) {
 codepage = 1252;
+    strcpy(ret + 2, "1252");
 } else {
 codepage = atoi(ret+2);
 }

I have already signed the OCA.
This would be my first patch. Is there anything else you need?

I will take a further look into GetLocaleInfo, and try to find a model 
that works for me (currently thinking about blackbox, can return 
anything from [1]),

and then discussing how to handle all those cases.
It's not clear how conditions are necessary for GetLocaleInfo to fail 
but good to fix this code path. We should probably create a new issue in 
JIRA for this as it's not clear that it is the same issue as 
JDK-8226810. We've been chasing JDK-8226810 for some time but have not 
been able to find a system where it duplicates. The reports so far 
suggests is Windows Chinese but most of the people that report the issue 
to not respond to questions so we can't be sure. One of the reports (see 
my comment in the bug from June 2019) suggests the code page is 54936, 
useful information as GB18030 is not in java.base. Once your fix is in 
then I think we should try to improve the exception message rather than 
NPE. There may be some of these cases where it should default to UTF-8.


-Alan.



Re: JDK-8226810: An other case and a small change suggestion

2020-05-10 Thread Johannes Kuhn

On 09-May-20 1:27, Brent Christian wrote:

On 5/7/20 5:50 AM, Alan Bateman wrote:

On 07/05/2020 12:37, Johannes Kuhn wrote:

:

In the end, I don't know what causes the bug, or how I can replicate 
it.

I think I did find a likely suspect.

>
Good sleuthing. I don't what the conditions are for GetLocaleInfo to 
fail but it does look like that would return possibly non-terminated 
garbage starting with "CP" so we should at least fix that.


I agree.  I can file a bug and sponsor this small change.

-Brent

Thank you.

After the discussion with Naoto, I would like to change the one line to 
strcpy(ret + 2, "1252")


diff --git a/src/java.base/windows/native/libjava/java_props_md.c 
b/src/java.base/windows/native/libjava/java_props_md.c

--- a/src/java.base/windows/native/libjava/java_props_md.c
+++ b/src/java.base/windows/native/libjava/java_props_md.c
@@ -73,6 +73,7 @@
   LOCALE_IDEFAULTANSICODEPAGE,
   ret+2, 14) == 0) {
 codepage = 1252;
+    strcpy(ret + 2, "1252");
 } else {
 codepage = atoi(ret+2);
 }

I have already signed the OCA.
This would be my first patch. Is there anything else you need?

I will take a further look into GetLocaleInfo, and try to find a model 
that works for me (currently thinking about blackbox, can return 
anything from [1]),

and then discussing how to handle all those cases.

- Johannes

[1]: 
https://docs.microsoft.com/en-us/windows/win32/intl/code-page-identifiers


Re: JDK-8226810: An other case and a small change suggestion

2020-05-08 Thread Johannes Kuhn

Thanks.

I think strcpy(ret+2, "1252") vs. strcpy(ret, "Cp1252")  is a just 
matter of style. I would prefer the later, as it makes the intent clear.

But not my call.

Do you have any info how I can change the detected codepage there? I 
wrote a small C program that basically just does that part and printf it.
In my limited tests (windows likes to require a restart after each 
configuration change) I did not find a way to influence that.


An other thing to consider is if Cp65001 should be treated as UTF-8 in 
that function?
(As said before, locale is not my expertise. Can that function with that 
LCSID even return 65001?)
I can see how things go wrong if it returns 65001 as locale, so... could 
be a safe change? (I'm sure that things break if that function returns 
65001.)


Then there is the other part:
The mismatch between the comment in jni_util.c/newSizedStringJava and 
the implementation on the Java side.
There is no fallback to iso-8859-1. If new String(byte[]) is called 
before the system properties are installed, then this will lead to a 
NullPointerException.
And there is a code path that leads to exactly that - newPlatformString 
is called from the initialization of the properties. [1]
And if the encoding returned by the windows function is not supported, 
then it will call new String(byte[]) - during system property 
initialization.


- Johannes

[1]: 
https://hg.openjdk.java.net/jdk/jdk/file/d40d865753fb/src/java.base/share/native/libjava/System.c#l207


On 08-May-20 18:27, naoto.s...@oracle.com wrote:

Ditto. Good catch!

I am not sure the fix would address the issue in 8226810 (cannot 
confirm it either, as my Windows box is at my office where I cannot 
enter at the moment :-), but this definitely looks like a bug. I would 
change the additional line to "strcpy(ret+2, "1252");" as Cp is copied 
in the following switch.


Naoto



On 5/7/20 5:50 AM, Alan Bateman wrote:

On 07/05/2020 12:37, Johannes Kuhn wrote:

:

In the end, I don't know what causes the bug, or how I can replicate 
it.

I think I did find a likely suspect.
Good sleuthing. I don't what the conditions are for GetLocaleInfo to 
fail but it does look like that would return possibly non-terminated 
garbage starting with "CP" so we should at least fix that.


The issue in JDK-8226810 might be something else. One of the 
submitters to that issue did engage and provided enough information 
to learn that the locale is zh_CN and also reported that it was 
failing for GB18030. GB18030 is not in java.base so that at least 
explained that report.


-Alan





Re: JDK-8226810: An other case and a small change suggestion

2020-05-08 Thread Brent Christian

On 5/7/20 5:50 AM, Alan Bateman wrote:

On 07/05/2020 12:37, Johannes Kuhn wrote:

:

In the end, I don't know what causes the bug, or how I can replicate it.
I think I did find a likely suspect.

>
Good sleuthing. I don't what the conditions are for GetLocaleInfo to 
fail but it does look like that would return possibly non-terminated 
garbage starting with "CP" so we should at least fix that.


I agree.  I can file a bug and sponsor this small change.

-Brent


Re: JDK-8226810: An other case and a small change suggestion

2020-05-08 Thread naoto . sato

Hi Johannes,

On 5/8/20 1:37 PM, Johannes Kuhn wrote:

Thanks.

I think strcpy(ret+2, "1252") vs. strcpy(ret, "Cp1252")  is a just 
matter of style. I would prefer the later, as it makes the intent clear.

But not my call.


I thought the former was clearer, as at that point, Cp/MS/GB part is not 
initialized in normal cases. It follows that pattern.




Do you have any info how I can change the detected codepage there? I 
wrote a small C program that basically just does that part and printf it.
In my limited tests (windows likes to require a restart after each 
configuration change) I did not find a way to influence that.


Have you tried changing the Windows System Locale to Japanese? I am 
pretty sure the code will return MS932.




An other thing to consider is if Cp65001 should be treated as UTF-8 in 
that function?
(As said before, locale is not my expertise. Can that function with that 
LCSID even return 65001?)
I can see how things go wrong if it returns 65001 as locale, so... could 
be a safe change? (I'm sure that things break if that function returns 
65001.)


Yes it should return UTF-8, which is not implemented. If the code page 
is 65001, then the following switch should put UTF-8 as the default charset.




Then there is the other part:
The mismatch between the comment in jni_util.c/newSizedStringJava and 
the implementation on the Java side.
There is no fallback to iso-8859-1. If new String(byte[]) is called 
before the system properties are installed, then this will lead to a 
NullPointerException.
And there is a code path that leads to exactly that - newPlatformString 
is called from the initialization of the properties. [1]
And if the encoding returned by the windows function is not supported, 
then it will call new String(byte[]) - during system property 
initialization.


I would expect there shouldn't be a mismatch, i.e., all the default 
system locale in Windows should return *known* default charset. 
Returning UTF-8 in java_props_md.c should resolve this.


Naoto



- Johannes

[1]: 
https://hg.openjdk.java.net/jdk/jdk/file/d40d865753fb/src/java.base/share/native/libjava/System.c#l207


On 08-May-20 18:27, naoto.s...@oracle.com wrote:

Ditto. Good catch!

I am not sure the fix would address the issue in 8226810 (cannot 
confirm it either, as my Windows box is at my office where I cannot 
enter at the moment :-), but this definitely looks like a bug. I would 
change the additional line to "strcpy(ret+2, "1252");" as Cp is copied 
in the following switch.


Naoto



On 5/7/20 5:50 AM, Alan Bateman wrote:

On 07/05/2020 12:37, Johannes Kuhn wrote:

:

In the end, I don't know what causes the bug, or how I can replicate 
it.

I think I did find a likely suspect.
Good sleuthing. I don't what the conditions are for GetLocaleInfo to 
fail but it does look like that would return possibly non-terminated 
garbage starting with "CP" so we should at least fix that.


The issue in JDK-8226810 might be something else. One of the 
submitters to that issue did engage and provided enough information 
to learn that the locale is zh_CN and also reported that it was 
failing for GB18030. GB18030 is not in java.base so that at least 
explained that report.


-Alan





Re: JDK-8226810: An other case and a small change suggestion

2020-05-08 Thread naoto . sato

Ditto. Good catch!

I am not sure the fix would address the issue in 8226810 (cannot confirm 
it either, as my Windows box is at my office where I cannot enter at the 
moment :-), but this definitely looks like a bug. I would change the 
additional line to "strcpy(ret+2, "1252");" as Cp is copied in the 
following switch.


Naoto



On 5/7/20 5:50 AM, Alan Bateman wrote:

On 07/05/2020 12:37, Johannes Kuhn wrote:

:

In the end, I don't know what causes the bug, or how I can replicate it.
I think I did find a likely suspect.
Good sleuthing. I don't what the conditions are for GetLocaleInfo to 
fail but it does look like that would return possibly non-terminated 
garbage starting with "CP" so we should at least fix that.


The issue in JDK-8226810 might be something else. One of the submitters 
to that issue did engage and provided enough information to learn that 
the locale is zh_CN and also reported that it was failing for GB18030. 
GB18030 is not in java.base so that at least explained that report.


-Alan


Re: JDK-8226810: An other case and a small change suggestion

2020-05-07 Thread Alan Bateman

On 07/05/2020 12:37, Johannes Kuhn wrote:

:

In the end, I don't know what causes the bug, or how I can replicate it.
I think I did find a likely suspect.
Good sleuthing. I don't what the conditions are for GetLocaleInfo to 
fail but it does look like that would return possibly non-terminated 
garbage starting with "CP" so we should at least fix that.


The issue in JDK-8226810 might be something else. One of the submitters 
to that issue did engage and provided enough information to learn that 
the locale is zh_CN and also reported that it was failing for GB18030. 
GB18030 is not in java.base so that at least explained that report.


-Alan