Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]

2023-09-13 Thread Justin Lu
On Wed, 13 Sep 2023 18:12:15 GMT, Naoto Sato  wrote:

> Looks good to me, although I did not look at each l10n file, but sampled 
> some. Thanks for tackling this conversion.

Thanks for the review; (In addition to testing), I ran a script to verify only 
white space escape sequences exist in JDK .properties files. (Excluding escape 
sequences in test files that should remain as is for the purpose of the test)

-

PR Comment: https://git.openjdk.org/jdk/pull/15694#issuecomment-1718139807


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]

2023-09-13 Thread Naoto Sato
On Wed, 13 Sep 2023 17:38:28 GMT, Justin Lu  wrote:

>> JDK .properties files still use ISO-8859-1 encoding with escape sequences. 
>> It would improve readability to see the native characters instead of escape 
>> sequences (especially for the L10n process). The majority of files changed 
>> are localized resource files.
>> 
>> This change converts the Unicode escape sequences in the JDK .properties 
>> files (both in src and test) to UTF-8 native characters. Additionally, the 
>> build logic is adjusted to read the .properties files in UTF-8 while 
>> generating the ListResourceBundle files.
>> 
>> The only escape sequence not converted was `\u0020` as this is used to 
>> denote intentional trailing white space. (E.g. `key=This is the 
>> value:\u0020`)
>> 
>> The conversion was done using native2ascii with options `-reverse -encoding 
>> UTF-8`.
>> 
>> If this PR is integrated, the IDE default encoding for .properties files 
>> need to be updated to UTF-8. (IntelliJ IDEA locks .properties files as 
>> ISO-8859-1 unless manually changed).
>
> Justin Lu has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Replace InputStreamReader with BufferedReader

Looks good to me, although I did not look at each l10n file, but sampled some. 
Thanks for tackling this conversion.

-

Marked as reviewed by naoto (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/15694#pullrequestreview-1625154951


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]

2023-09-13 Thread Justin Lu
> JDK .properties files still use ISO-8859-1 encoding with escape sequences. It 
> would improve readability to see the native characters instead of escape 
> sequences (especially for the L10n process). The majority of files changed 
> are localized resource files.
> 
> This change converts the Unicode escape sequences in the JDK .properties 
> files (both in src and test) to UTF-8 native characters. Additionally, the 
> build logic is adjusted to read the .properties files in UTF-8 while 
> generating the ListResourceBundle files.
> 
> The only escape sequence not converted was `\u0020` as this is used to denote 
> intentional trailing white space. (E.g. `key=This is the value:\u0020`)
> 
> The conversion was done using native2ascii with options `-reverse -encoding 
> UTF-8`.
> 
> If this PR is integrated, the IDE default encoding for .properties files need 
> to be updated to UTF-8. (IntelliJ IDEA locks .properties files as ISO-8859-1 
> unless manually changed).

Justin Lu has updated the pull request incrementally with one additional commit 
since the last revision:

  Replace InputStreamReader with BufferedReader

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/15694/files
  - new: https://git.openjdk.org/jdk/pull/15694/files/0f3698a5..ceb48bbe

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=15694=01
 - incr: https://webrevs.openjdk.org/?repo=jdk=15694=00-01

  Stats: 18 lines in 2 files changed: 6 ins; 8 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/15694.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/15694/head:pull/15694

PR: https://git.openjdk.org/jdk/pull/15694


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native

2023-09-12 Thread Chen Liang
On Tue, 12 Sep 2023 21:57:31 GMT, Justin Lu  wrote:

> JDK .properties files still use ISO-8859-1 encoding with escape sequences. It 
> would improve readability to see the native characters instead of escape 
> sequences (especially for the L10n process). The majority of files changed 
> are localized resource files.
> 
> This change converts the Unicode escape sequences in the JDK .properties 
> files (both in src and test) to UTF-8 native characters. Additionally, the 
> build logic is adjusted to read the .properties files in UTF-8 while 
> generating the ListResourceBundle files.
> 
> The only escape sequence not converted was `\u0020` as this is used to denote 
> intentional trailing white space. (E.g. `key=This is the value:\u0020`)
> 
> The conversion was done using native2ascii with options `-reverse -encoding 
> UTF-8`.
> 
> If this PR is integrated, the IDE default encoding for .properties files need 
> to be updated to UTF-8. (IntelliJ IDEA locks .properties files as ISO-8859-1 
> unless manually changed).

make/jdk/src/classes/build/tools/compileproperties/CompileProperties.java line 
227:

> 225: try (FileInputStream input = new FileInputStream(propertiesPath);
> 226:  // Read in JDK .properties files in UTF-8
> 227:  InputStreamReader streamReader = new 
> InputStreamReader(input, StandardCharsets.UTF_8)

Can we just uses `Files.newBufferedReader(Path.of(propertiesPath))` instead?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15694#discussion_r1323716978


RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native

2023-09-12 Thread Justin Lu
JDK .properties files still use ISO-8859-1 encoding with escape sequences. It 
would improve readability to see the native characters instead of escape 
sequences (especially for the L10n process). The majority of files changed are 
localized resource files.

This change converts the Unicode escape sequences in the JDK .properties files 
(both in src and test) to UTF-8 native characters. Additionally, the build 
logic is adjusted to read the .properties files in UTF-8 while generating the 
ListResourceBundle files.

The only escape sequence not converted was `\u0020` as this is used to denote 
intentional trailing white space. (E.g. `key=This is the value:\u0020`)

The conversion was done using native2ascii with options `-reverse -encoding 
UTF-8`.

If this PR is integrated, the IDE default encoding for .properties files need 
to be updated to UTF-8. (IntelliJ IDEA locks .properties files as ISO-8859-1 
unless manually changed).

-

Commit messages:
 - Update header / copyright for CurrencyFormat
 - Adjust CurrencyFormat test to read in .properties with UTF-8
 - Convert unicode escape sequences to native
 - Add clarifying comment in Bug6204853 for lack of conversion
 - Read JDK properties files in UTF-8 during build process for LRB

Changes: https://git.openjdk.org/jdk/pull/15694/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk=15694=00
  Issue: https://bugs.openjdk.org/browse/JDK-8301991
  Stats: 28966 lines in 488 files changed: 14 ins; 0 del; 28952 mod
  Patch: https://git.openjdk.org/jdk/pull/15694.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/15694/head:pull/15694

PR: https://git.openjdk.org/jdk/pull/15694


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v6]

2023-05-11 Thread Naoto Sato
On Thu, 11 May 2023 20:21:57 GMT, Justin Lu  wrote:

>> This PR converts Unicode sequences to UTF-8 native in .properties file. 
>> (Excluding the Unicode space and tab sequence). The conversion was done 
>> using native2ascii.
>> 
>> In addition, the build logic is adjusted to support reading in the 
>> .properties files as UTF-8 during the conversion from .properties file to 
>> .java ListResourceBundle file.
>
> Justin Lu has updated the pull request with a new target base due to a merge 
> or a rebase. The pull request now contains 16 commits:
> 
>  - Convert the merged master changes to UTF-8
>  - Merge master and fix conflicts
>  - Close streams when finished loading into props
>  - Adjust CF test to read in with UTF-8 to fix failing test
>  - Reconvert CS.properties to UTF-8
>  - Revert all changes to CurrencySymbols.properties
>  - Bug6204853 should not be converted
>  - Copyright year for CompileProperties
>  - Redo translation for CS.properties
>  - Spot convert CurrencySymbols.properties
>  - ... and 6 more: https://git.openjdk.org/jdk/compare/4386d42d...f15b373a

I think this is fine, as those properties files are JDK's own. I believe the 
benefit of moving to UTF-8 outweighs the issue you wrote, which can be remedied 
by changing the encoding in the IDEs.

-

PR Comment: https://git.openjdk.org/jdk/pull/12726#issuecomment-1544722480


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v6]

2023-05-11 Thread Justin Lu
On Thu, 11 May 2023 20:21:57 GMT, Justin Lu  wrote:

>> This PR converts Unicode sequences to UTF-8 native in .properties file. 
>> (Excluding the Unicode space and tab sequence). The conversion was done 
>> using native2ascii.
>> 
>> In addition, the build logic is adjusted to support reading in the 
>> .properties files as UTF-8 during the conversion from .properties file to 
>> .java ListResourceBundle file.
>
> Justin Lu has updated the pull request with a new target base due to a merge 
> or a rebase. The pull request now contains 16 commits:
> 
>  - Convert the merged master changes to UTF-8
>  - Merge master and fix conflicts
>  - Close streams when finished loading into props
>  - Adjust CF test to read in with UTF-8 to fix failing test
>  - Reconvert CS.properties to UTF-8
>  - Revert all changes to CurrencySymbols.properties
>  - Bug6204853 should not be converted
>  - Copyright year for CompileProperties
>  - Redo translation for CS.properties
>  - Spot convert CurrencySymbols.properties
>  - ... and 6 more: https://git.openjdk.org/jdk/compare/4386d42d...f15b373a

Wondering if anyone has any thoughts on the consequences of this PR, in 
relation to Intellj's (and other IDEs) default encoding for .properties files. 
Intellj sets the default encoding for .properties files to ISO-8859-1, which 
would be the wrong encoding if the .properties files are converted to UTF-8 
native. This would cause certain key,values to be skewed when represented in 
the file. 

Although the default file-encoding for .properties can be switched to UTF-8, it 
is not the default.

Wondering what some solutions/thoughts to this are.

-

PR Comment: https://git.openjdk.org/jdk/pull/12726#issuecomment-1544708830


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v6]

2023-05-11 Thread Justin Lu
> This PR converts Unicode sequences to UTF-8 native in .properties file. 
> (Excluding the Unicode space and tab sequence). The conversion was done using 
> native2ascii.
> 
> In addition, the build logic is adjusted to support reading in the 
> .properties files as UTF-8 during the conversion from .properties file to 
> .java ListResourceBundle file.

Justin Lu has updated the pull request with a new target base due to a merge or 
a rebase. The pull request now contains 16 commits:

 - Convert the merged master changes to UTF-8
 - Merge master and fix conflicts
 - Close streams when finished loading into props
 - Adjust CF test to read in with UTF-8 to fix failing test
 - Reconvert CS.properties to UTF-8
 - Revert all changes to CurrencySymbols.properties
 - Bug6204853 should not be converted
 - Copyright year for CompileProperties
 - Redo translation for CS.properties
 - Spot convert CurrencySymbols.properties
 - ... and 6 more: https://git.openjdk.org/jdk/compare/4386d42d...f15b373a

-

Changes: https://git.openjdk.org/jdk/pull/12726/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk=12726=05
  Stats: 28877 lines in 493 files changed: 14 ins; 1 del; 28862 mod
  Patch: https://git.openjdk.org/jdk/pull/12726.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/12726/head:pull/12726

PR: https://git.openjdk.org/jdk/pull/12726


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v5]

2023-03-31 Thread Naoto Sato
On Fri, 17 Mar 2023 22:27:48 GMT, Justin Lu  wrote:

>> This PR converts Unicode sequences to UTF-8 native in .properties file. 
>> (Excluding the Unicode space and tab sequence). The conversion was done 
>> using native2ascii.
>> 
>> In addition, the build logic is adjusted to support reading in the 
>> .properties files as UTF-8 during the conversion from .properties file to 
>> .java ListResourceBundle file.
>
> Justin Lu has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Close streams when finished loading into props

Hmm, I just wonder why they are sticking to ISO-8859-1 as the default. I know 
j.u.Properties defaults to 8859-1, but PropertyResourceBundle, which is their 
primary use defaults to UTF-8 since JDK9 (https://openjdk.org/jeps/226)

-

PR Comment: https://git.openjdk.org/jdk/pull/12726#issuecomment-1492682703


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v5]

2023-03-31 Thread Justin Lu
On Fri, 17 Mar 2023 22:27:48 GMT, Justin Lu  wrote:

>> This PR converts Unicode sequences to UTF-8 native in .properties file. 
>> (Excluding the Unicode space and tab sequence). The conversion was done 
>> using native2ascii.
>> 
>> In addition, the build logic is adjusted to support reading in the 
>> .properties files as UTF-8 during the conversion from .properties file to 
>> .java ListResourceBundle file.
>
> Justin Lu has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Close streams when finished loading into props

Something thing to consider is that Intellj defaults .properties files to ISO 
8859-1. 

https://www.jetbrains.com/help/idea/properties-files.html#encoding

So users of Intellj / (other IDEs that default to ISO 8859-1 for .properties 
files) will need to change the default encoding to utf-8 for such files. Or 
ideally, the respective IDEs can change their default encoding for .properties 
files if this change is integrated.

-

PR Comment: https://git.openjdk.org/jdk/pull/12726#issuecomment-1492640306


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v5]

2023-03-17 Thread Justin Lu
> This PR converts Unicode sequences to UTF-8 native in .properties file. 
> (Excluding the Unicode space and tab sequence). The conversion was done using 
> native2ascii.
> 
> In addition, the build logic is adjusted to support reading in the 
> .properties files as UTF-8 during the conversion from .properties file to 
> .java ListResourceBundle file.

Justin Lu has updated the pull request incrementally with one additional commit 
since the last revision:

  Close streams when finished loading into props

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/12726/files
  - new: https://git.openjdk.org/jdk/pull/12726/files/007c78a7..19b91e6b

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=12726=04
 - incr: https://webrevs.openjdk.org/?repo=jdk=12726=03-04

  Stats: 15 lines in 3 files changed: 6 ins; 1 del; 8 mod
  Patch: https://git.openjdk.org/jdk/pull/12726.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12726/head:pull/12726

PR: https://git.openjdk.org/jdk/pull/12726


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v4]

2023-03-17 Thread Weijun Wang
On Fri, 17 Mar 2023 21:49:33 GMT, Weijun Wang  wrote:

>> Justin Lu has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Adjust CF test to read in with UTF-8 to fix failing test
>
> make/jdk/src/classes/build/tools/compileproperties/CompileProperties.java 
> line 326:
> 
>> 324: outBuffer.append(toHex((aChar >> 8) & 0xF));
>> 325: outBuffer.append(toHex((aChar >> 4) & 0xF));
>> 326: outBuffer.append(toHex(aChar & 0xF));
> 
> Sorry I don't know when this tool is called, but why is it still writing in 
> `\u` style?

I probably understand it now, source code still needs escaping. When can we put 
in UTF-8 there as well?

-

PR: https://git.openjdk.org/jdk/pull/12726


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v4]

2023-03-17 Thread Weijun Wang
On Fri, 17 Mar 2023 20:28:13 GMT, Justin Lu  wrote:

>> This PR converts Unicode sequences to UTF-8 native in .properties file. 
>> (Excluding the Unicode space and tab sequence). The conversion was done 
>> using native2ascii.
>> 
>> In addition, the build logic is adjusted to support reading in the 
>> .properties files as UTF-8 during the conversion from .properties file to 
>> .java ListResourceBundle file.
>
> Justin Lu has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Adjust CF test to read in with UTF-8 to fix failing test

make/jdk/src/classes/build/tools/compileproperties/CompileProperties.java line 
326:

> 324: outBuffer.append(toHex((aChar >> 8) & 0xF));
> 325: outBuffer.append(toHex((aChar >> 4) & 0xF));
> 326: outBuffer.append(toHex(aChar & 0xF));

Sorry I don't know when this tool is called, but why is it still writing in 
`\u` style?

-

PR: https://git.openjdk.org/jdk/pull/12726


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v4]

2023-03-17 Thread Naoto Sato
On Fri, 17 Mar 2023 20:31:27 GMT, Andy Goryachev  wrote:

>> Justin Lu has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Adjust CF test to read in with UTF-8 to fix failing test
>
> make/jdk/src/classes/build/tools/compileproperties/CompileProperties.java 
> line 226:
> 
>> 224: Properties p = new Properties();
>> 225: try {
>> 226: FileInputStream input = new FileInputStream(propertiesPath);
> 
> Should this stream be closed in a finally { } block?

or better be `try-with-resources`?

-

PR: https://git.openjdk.org/jdk/pull/12726


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v4]

2023-03-17 Thread Andy Goryachev
On Fri, 17 Mar 2023 20:28:13 GMT, Justin Lu  wrote:

>> This PR converts Unicode sequences to UTF-8 native in .properties file. 
>> (Excluding the Unicode space and tab sequence). The conversion was done 
>> using native2ascii.
>> 
>> In addition, the build logic is adjusted to support reading in the 
>> .properties files as UTF-8 during the conversion from .properties file to 
>> .java ListResourceBundle file.
>
> Justin Lu has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Adjust CF test to read in with UTF-8 to fix failing test

make/jdk/src/classes/build/tools/compileproperties/CompileProperties.java line 
226:

> 224: Properties p = new Properties();
> 225: try {
> 226: FileInputStream input = new FileInputStream(propertiesPath);

Should this stream be closed in a finally { } block?

-

PR: https://git.openjdk.org/jdk/pull/12726


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v4]

2023-03-17 Thread Justin Lu
> This PR converts Unicode sequences to UTF-8 native in .properties file. 
> (Excluding the Unicode space and tab sequence). The conversion was done using 
> native2ascii.
> 
> In addition, the build logic is adjusted to support reading in the 
> .properties files as UTF-8 during the conversion from .properties file to 
> .java ListResourceBundle file.

Justin Lu has updated the pull request incrementally with one additional commit 
since the last revision:

  Adjust CF test to read in with UTF-8 to fix failing test

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/12726/files
  - new: https://git.openjdk.org/jdk/pull/12726/files/7119830b..007c78a7

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=12726=03
 - incr: https://webrevs.openjdk.org/?repo=jdk=12726=02-03

  Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/12726.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12726/head:pull/12726

PR: https://git.openjdk.org/jdk/pull/12726


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v3]

2023-03-16 Thread Justin Lu
On Thu, 16 Mar 2023 18:31:23 GMT, Justin Lu  wrote:

>> This PR converts Unicode sequences to UTF-8 native in .properties file. 
>> (Excluding the Unicode space and tab sequence). The conversion was done 
>> using native2ascii.
>> 
>> In addition, the build logic is adjusted to support reading in the 
>> .properties files as UTF-8 during the conversion from .properties file to 
>> .java ListResourceBundle file.
>
> Justin Lu has updated the pull request incrementally with two additional 
> commits since the last revision:
> 
>  - Reconvert CS.properties to UTF-8
>  - Revert all changes to CurrencySymbols.properties

test/jdk/java/text/Format/NumberFormat/CurrencySymbols.properties line 1:

> 1: #

CurrencySymbols.properties is fully converted to UTF-8 now

-

PR: https://git.openjdk.org/jdk/pull/12726


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v3]

2023-03-16 Thread Justin Lu
> This PR converts Unicode sequences to UTF-8 native in .properties file. 
> (Excluding the Unicode space and tab sequence). The conversion was done using 
> native2ascii.
> 
> In addition, the build logic is adjusted to support reading in the 
> .properties files as UTF-8 during the conversion from .properties file to 
> .java ListResourceBundle file.

Justin Lu has updated the pull request incrementally with two additional 
commits since the last revision:

 - Reconvert CS.properties to UTF-8
 - Revert all changes to CurrencySymbols.properties

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/12726/files
  - new: https://git.openjdk.org/jdk/pull/12726/files/6d6bffe8..7119830b

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=12726=02
 - incr: https://webrevs.openjdk.org/?repo=jdk=12726=01-02

  Stats: 87 lines in 1 file changed: 0 ins; 0 del; 87 mod
  Patch: https://git.openjdk.org/jdk/pull/12726.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12726/head:pull/12726

PR: https://git.openjdk.org/jdk/pull/12726


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]

2023-03-16 Thread Justin Lu
On Wed, 15 Mar 2023 16:18:44 GMT, Archie L. Cobbs  wrote:

>> Justin Lu has updated the pull request incrementally with four additional 
>> commits since the last revision:
>> 
>>  - Bug6204853 should not be converted
>>  - Copyright year for CompileProperties
>>  - Redo translation for CS.properties
>>  - Spot convert CurrencySymbols.properties
>
> test/jdk/java/util/ResourceBundle/Bug6204853.properties line 1:
> 
>> 1: #
> 
> This file should probably be excluded because it's used in a test that 
> relates to UTF-8 encoding (or not) of property files.

Thank you, removed the changes for this file

-

PR: https://git.openjdk.org/jdk/pull/12726


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]

2023-03-16 Thread Justin Lu
On Thu, 16 Mar 2023 18:19:29 GMT, Justin Lu  wrote:

>> This PR converts Unicode sequences to UTF-8 native in .properties file. 
>> (Excluding the Unicode space and tab sequence). The conversion was done 
>> using native2ascii.
>> 
>> In addition, the build logic is adjusted to support reading in the 
>> .properties files as UTF-8 during the conversion from .properties file to 
>> .java ListResourceBundle file.
>
> Justin Lu has updated the pull request incrementally with four additional 
> commits since the last revision:
> 
>  - Bug6204853 should not be converted
>  - Copyright year for CompileProperties
>  - Redo translation for CS.properties
>  - Spot convert CurrencySymbols.properties

test/jdk/java/text/Format/NumberFormat/CurrencySymbols.properties line 1:

> 1: #

Conversion did not work as expected, addressing right now.

-

PR: https://git.openjdk.org/jdk/pull/12726


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]

2023-03-16 Thread Justin Lu
On Wed, 15 Mar 2023 20:19:51 GMT, Naoto Sato  wrote:

>> Justin Lu has updated the pull request incrementally with four additional 
>> commits since the last revision:
>> 
>>  - Bug6204853 should not be converted
>>  - Copyright year for CompileProperties
>>  - Redo translation for CS.properties
>>  - Spot convert CurrencySymbols.properties
>
> test/jdk/java/text/Format/NumberFormat/CurrencySymbols.properties line 156:
> 
>> 154: zh=\u00A4
>> 155: zh_CN=\uFFE5
>> 156: zh_HK=HK$
> 
> Why are they not encoded into UTF-8 native?

Not sure, thank you for catching it. Working on it right now.

-

PR: https://git.openjdk.org/jdk/pull/12726


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]

2023-03-16 Thread Justin Lu
> This PR converts Unicode sequences to UTF-8 native in .properties file. 
> (Excluding the Unicode space and tab sequence). The conversion was done using 
> native2ascii.
> 
> In addition, the build logic is adjusted to support reading in the 
> .properties files as UTF-8 during the conversion from .properties file to 
> .java ListResourceBundle file.

Justin Lu has updated the pull request incrementally with four additional 
commits since the last revision:

 - Bug6204853 should not be converted
 - Copyright year for CompileProperties
 - Redo translation for CS.properties
 - Spot convert CurrencySymbols.properties

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/12726/files
  - new: https://git.openjdk.org/jdk/pull/12726/files/1e798f24..6d6bffe8

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=12726=01
 - incr: https://webrevs.openjdk.org/?repo=jdk=12726=00-01

  Stats: 92 lines in 4 files changed: 0 ins; 0 del; 92 mod
  Patch: https://git.openjdk.org/jdk/pull/12726.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12726/head:pull/12726

PR: https://git.openjdk.org/jdk/pull/12726


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native

2023-03-15 Thread Naoto Sato
On Thu, 23 Feb 2023 09:04:23 GMT, Justin Lu  wrote:

> This PR converts Unicode sequences to UTF-8 native in .properties file. 
> (Excluding the Unicode space and tab sequence). The conversion was done using 
> native2ascii.
> 
> In addition, the build logic is adjusted to support reading in the 
> .properties files as UTF-8 during the conversion from .properties file to 
> .java ListResourceBundle file.

test/jdk/java/text/Format/NumberFormat/CurrencySymbols.properties line 156:

> 154: zh=\u00A4
> 155: zh_CN=\uFFE5
> 156: zh_HK=HK$

Why are they not encoded into UTF-8 native?

-

PR: https://git.openjdk.org/jdk/pull/12726


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native

2023-03-15 Thread Archie L . Cobbs
On Thu, 23 Feb 2023 09:04:23 GMT, Justin Lu  wrote:

> This PR converts Unicode sequences to UTF-8 native in .properties file. 
> (Excluding the Unicode space and tab sequence). The conversion was done using 
> native2ascii.
> 
> In addition, the build logic is adjusted to support reading in the 
> .properties files as UTF-8 during the conversion from .properties file to 
> .java ListResourceBundle file.

test/jdk/java/util/ResourceBundle/Bug6204853.properties line 1:

> 1: #

This file should probably be excluded because it's used in a test that relates 
to UTF-8 encoding (or not) of property files.

-

PR: https://git.openjdk.org/jdk/pull/12726


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native

2023-03-15 Thread Jonathan Gibbons
On Thu, 23 Feb 2023 09:04:23 GMT, Justin Lu  wrote:

> This PR converts Unicode sequences to UTF-8 native in .properties file. 
> (Excluding the Unicode space and tab sequence). The conversion was done using 
> native2ascii.
> 
> In addition, the build logic is adjusted to support reading in the 
> .properties files as UTF-8 during the conversion from .properties file to 
> .java ListResourceBundle file.

make/langtools/tools/compileproperties/CompileProperties.java line 252:

> 250: try {
> 251: writer = new BufferedWriter(
> 252: new OutputStreamWriter(new 
> FileOutputStream(outputPath), StandardCharsets.ISO_8859_1));

Using ISO_8859_1 seems strange.
Since these are generated files, you could write them as UTF-8 and then 
override the default javac option for ascii when compiling _just_ these files.

Or else just stay with ascii; no one should be looking at these files!

-

PR: https://git.openjdk.org/jdk/pull/12726


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native

2023-03-15 Thread Justin Lu
On Tue, 7 Mar 2023 23:15:14 GMT, Jonathan Gibbons  wrote:

>> This PR converts Unicode sequences to UTF-8 native in .properties file. 
>> (Excluding the Unicode space and tab sequence). The conversion was done 
>> using native2ascii.
>> 
>> In addition, the build logic is adjusted to support reading in the 
>> .properties files as UTF-8 during the conversion from .properties file to 
>> .java ListResourceBundle file.
>
> make/langtools/tools/compileproperties/CompileProperties.java line 252:
> 
>> 250: try {
>> 251: writer = new BufferedWriter(
>> 252: new OutputStreamWriter(new 
>> FileOutputStream(outputPath), StandardCharsets.ISO_8859_1));
> 
> Using ISO_8859_1 seems strange.
> Since these are generated files, you could write them as UTF-8 and then 
> override the default javac option for ascii when compiling _just_ these files.
> 
> Or else just stay with ascii; no one should be looking at these files!

Will stick with your latter solution, as since the .properties files were 
converted via native2ascii, it makes sense to write out via ascii.

-

PR: https://git.openjdk.org/jdk/pull/12726


RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native

2023-03-15 Thread Justin Lu
This PR converts Unicode sequences to UTF-8 native in .properties file. 
(Excluding the Unicode space and tab sequence). The conversion was done using 
native2ascii.

In addition, the build logic is adjusted to support reading in the .properties 
files as UTF-8 during the conversion from .properties file to .java 
ListResourceBundle file.

-

Commit messages:
 - Write to ASCII
 - Read in .properties as UTF-8, but write to LRB .java as ISO-8859-1
 - Compile class with ascii (Not ready to make system wide change)
 - Toggle UTF-8 for javac option in JavaCompilation.gmk
 - CompileProperties converts in UTF-8
 - Convert .properties from ISO-8859-1 to UTF-8

Changes: https://git.openjdk.org/jdk/pull/12726/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk=12726=00
  Issue: https://bugs.openjdk.org/browse/JDK-8301991
  Stats: 29093 lines in 490 files changed: 6 ins; 0 del; 29087 mod
  Patch: https://git.openjdk.org/jdk/pull/12726.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12726/head:pull/12726

PR: https://git.openjdk.org/jdk/pull/12726