On Thu, 17 Feb 2022 22:50:37 GMT, Tyler Steele <d...@openjdk.java.net> wrote:

> FileEncodingTest expects all non-Windows platforms will have 
> `Charset.defaultCharset().name()` set to US-ASCII when file.encoding is set 
> to COMPAT. This assumption does not hold for AIX where it is ISO-8859-1.
> 
> According to [JEP-400](https://openjdk.java.net/jeps/400), we should expect  
> `Charset.defaultCharset().name()` to equal 
> `System.getProperty("native.encoding")` whenever the COMPAT flag is set. From 
> JEP-400: "... if file.encoding is set to COMPAT on the command line, then the 
> run-time value of file.encoding will be the same as the run-time value of 
> native.encoding...". So one way to resolve this is to choose the value for 
> each system from the native.encoding property.
> 
> With these changes however, my test systems continue to fail. 
> 
> - AIX reports: Default Charset: ISO-8859-1, expected: ISO8859-1
> - Linux/Z reports: Default Charset: US-ASCII, expected: ANSI_X3.4-1968
> - Linux/PowerLE reports: Default Charset: US-ASCII, expected: ANSI_X3.4-1968
> 
> Note that the expected value is populated from native.encoding.
> 
> This implies more work to be done. It looks to me that some modification to 
> java_props_md.c may be needed to ensure that the System properties for 
> native.encoding return [canonical 
> names](http://www.iana.org/assignments/character-sets). 
> 
> ---
> 
> A tempting alternative is to set the expected value for AIX to "ISO-8859-1" 
> in the test explicitly, as was done for the Windows expected encoding prior 
> to this proposed change. The main advantage to this alternative is that it is 
> quick and easy, but the disadvantages are that it fails to test that COMPAT 
> behaves as specified in JEP-400, and the approach does not scale well if it 
> happens that other systems require other cases. I wonder if this is the 
> reason non-English locals are excluded by the test.
> 
> Proceeding with this change and the work implied by the new failures it 
> highlights goes beyond the scope of what I thought was a simple testbug. So 
> I'm opening this up for some comments before proceeding down the rabbit hole 
> of further changes. If there is generally positive support for this direction 
> I'm happy to make the modifications necessary to populate native.encoding 
> with canonical names. As I am new to OpenJDK, I am especially looking to 
> ensure that changing the value returned by native.encoding will not have 
> unintended consequences elsewhere in the project.

The purpose of this test is to check the default encoding for the environments 
known to be correct. Thus those test values are hardcoded. Replacing it with 
`System.getProperty("native.encoding")` would introduce some uncertainty 
because the test may not be run under the C locale.
As to the suggestion to canonicalize `native.encoding`, it was introduced for 
users to obtain the encoding name that used to be the value of `file.encoding` 
prior to JEP 400. So normalizing it to the canonicalized name, such as 
`ANSI_X3.4-1968` to `US-ASCII` would somewhat defy the purpose.
Now, back to the test case, the test blindly assumes that C locale's default 
code set is `US-ASCII` which is not correct (as in this issue), it only 
requires Portable Character Set, which US-ASCII/ISO-8859-1/UTF-8 all suffice. I 
would change the test to check if the platform is AIX, then check the charset 
for COMPAT to ISO-8859-1.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7525

Reply via email to