Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]

2024-08-20 Thread Naoto Sato
On Tue, 20 Aug 2024 09:07:54 GMT, Pavel Rappo  wrote:

>> Justin Lu has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Replace InputStreamReader with BufferedReader
>
> src/jdk.jartool/share/classes/sun/tools/jar/resources/jar_pt_BR.properties 
> line 95:
> 
>> 93: 
>> 94: main.usage.summary=Uso: jar [OPTION...] [ [--release VERSION] [-C dir] 
>> files] ...
>> 95: main.usage.summary.try=Tente `jar --ajuda' para obter mais informações.
> 
> I was looking for something unrelated in properties files, and found this. It 
> is surprising to see an option name being localised; it must be a bug.

Good catch, Pavel. It is indeed a bug. This type of overtranslation l10n bug 
happens all the time, and hard to catch.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15694#discussion_r1723520963


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]

2024-08-20 Thread Pavel Rappo
On Wed, 13 Sep 2023 17:38:28 GMT, Justin Lu  wrote:

>> JDK .properties files still use ISO-8859-1 encoding with escape sequences. 
>> It would improve readability to see the native characters instead of escape 
>> sequences (especially for the L10n process). The majority of files changed 
>> are localized resource files.
>> 
>> This change converts the Unicode escape sequences in the JDK .properties 
>> files (both in src and test) to UTF-8 native characters. Additionally, the 
>> build logic is adjusted to read the .properties files in UTF-8 while 
>> generating the ListResourceBundle files.
>> 
>> The only escape sequence not converted was `\u0020` as this is used to 
>> denote intentional trailing white space. (E.g. `key=This is the 
>> value:\u0020`)
>> 
>> The conversion was done using native2ascii with options `-reverse -encoding 
>> UTF-8`.
>> 
>> If this PR is integrated, the IDE default encoding for .properties files 
>> need to be updated to UTF-8. (IntelliJ IDEA locks .properties files as 
>> ISO-8859-1 unless manually changed).
>
> Justin Lu has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Replace InputStreamReader with BufferedReader

src/jdk.jartool/share/classes/sun/tools/jar/resources/jar_pt_BR.properties line 
95:

> 93: 
> 94: main.usage.summary=Uso: jar [OPTION...] [ [--release VERSION] [-C dir] 
> files] ...
> 95: main.usage.summary.try=Tente `jar --ajuda' para obter mais informações.

I was looking for something unrelated in properties files, and found this. It 
is surprising to see an option name being localised; it must be a bug.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15694#discussion_r1722966688


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]

2023-09-13 Thread Justin Lu
On Wed, 13 Sep 2023 18:12:15 GMT, Naoto Sato  wrote:

> Looks good to me, although I did not look at each l10n file, but sampled 
> some. Thanks for tackling this conversion.

Thanks for the review; (In addition to testing), I ran a script to verify only 
white space escape sequences exist in JDK .properties files. (Excluding escape 
sequences in test files that should remain as is for the purpose of the test)

-

PR Comment: https://git.openjdk.org/jdk/pull/15694#issuecomment-1718139807


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]

2023-09-13 Thread Naoto Sato
On Wed, 13 Sep 2023 17:38:28 GMT, Justin Lu  wrote:

>> JDK .properties files still use ISO-8859-1 encoding with escape sequences. 
>> It would improve readability to see the native characters instead of escape 
>> sequences (especially for the L10n process). The majority of files changed 
>> are localized resource files.
>> 
>> This change converts the Unicode escape sequences in the JDK .properties 
>> files (both in src and test) to UTF-8 native characters. Additionally, the 
>> build logic is adjusted to read the .properties files in UTF-8 while 
>> generating the ListResourceBundle files.
>> 
>> The only escape sequence not converted was `\u0020` as this is used to 
>> denote intentional trailing white space. (E.g. `key=This is the 
>> value:\u0020`)
>> 
>> The conversion was done using native2ascii with options `-reverse -encoding 
>> UTF-8`.
>> 
>> If this PR is integrated, the IDE default encoding for .properties files 
>> need to be updated to UTF-8. (IntelliJ IDEA locks .properties files as 
>> ISO-8859-1 unless manually changed).
>
> Justin Lu has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Replace InputStreamReader with BufferedReader

Looks good to me, although I did not look at each l10n file, but sampled some. 
Thanks for tackling this conversion.

-

Marked as reviewed by naoto (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/15694#pullrequestreview-1625154951


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]

2023-09-13 Thread Justin Lu
> JDK .properties files still use ISO-8859-1 encoding with escape sequences. It 
> would improve readability to see the native characters instead of escape 
> sequences (especially for the L10n process). The majority of files changed 
> are localized resource files.
> 
> This change converts the Unicode escape sequences in the JDK .properties 
> files (both in src and test) to UTF-8 native characters. Additionally, the 
> build logic is adjusted to read the .properties files in UTF-8 while 
> generating the ListResourceBundle files.
> 
> The only escape sequence not converted was `\u0020` as this is used to denote 
> intentional trailing white space. (E.g. `key=This is the value:\u0020`)
> 
> The conversion was done using native2ascii with options `-reverse -encoding 
> UTF-8`.
> 
> If this PR is integrated, the IDE default encoding for .properties files need 
> to be updated to UTF-8. (IntelliJ IDEA locks .properties files as ISO-8859-1 
> unless manually changed).

Justin Lu has updated the pull request incrementally with one additional commit 
since the last revision:

  Replace InputStreamReader with BufferedReader

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/15694/files
  - new: https://git.openjdk.org/jdk/pull/15694/files/0f3698a5..ceb48bbe

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=15694&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15694&range=00-01

  Stats: 18 lines in 2 files changed: 6 ins; 8 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/15694.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/15694/head:pull/15694

PR: https://git.openjdk.org/jdk/pull/15694


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]

2023-03-16 Thread Justin Lu
On Wed, 15 Mar 2023 16:18:44 GMT, Archie L. Cobbs  wrote:

>> Justin Lu has updated the pull request incrementally with four additional 
>> commits since the last revision:
>> 
>>  - Bug6204853 should not be converted
>>  - Copyright year for CompileProperties
>>  - Redo translation for CS.properties
>>  - Spot convert CurrencySymbols.properties
>
> test/jdk/java/util/ResourceBundle/Bug6204853.properties line 1:
> 
>> 1: #
> 
> This file should probably be excluded because it's used in a test that 
> relates to UTF-8 encoding (or not) of property files.

Thank you, removed the changes for this file

-

PR: https://git.openjdk.org/jdk/pull/12726


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]

2023-03-16 Thread Justin Lu
On Wed, 15 Mar 2023 20:19:51 GMT, Naoto Sato  wrote:

>> Justin Lu has updated the pull request incrementally with four additional 
>> commits since the last revision:
>> 
>>  - Bug6204853 should not be converted
>>  - Copyright year for CompileProperties
>>  - Redo translation for CS.properties
>>  - Spot convert CurrencySymbols.properties
>
> test/jdk/java/text/Format/NumberFormat/CurrencySymbols.properties line 156:
> 
>> 154: zh=\u00A4
>> 155: zh_CN=\uFFE5
>> 156: zh_HK=HK$
> 
> Why are they not encoded into UTF-8 native?

Not sure, thank you for catching it. Working on it right now.

-

PR: https://git.openjdk.org/jdk/pull/12726


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]

2023-03-16 Thread Justin Lu
On Thu, 16 Mar 2023 18:19:29 GMT, Justin Lu  wrote:

>> This PR converts Unicode sequences to UTF-8 native in .properties file. 
>> (Excluding the Unicode space and tab sequence). The conversion was done 
>> using native2ascii.
>> 
>> In addition, the build logic is adjusted to support reading in the 
>> .properties files as UTF-8 during the conversion from .properties file to 
>> .java ListResourceBundle file.
>
> Justin Lu has updated the pull request incrementally with four additional 
> commits since the last revision:
> 
>  - Bug6204853 should not be converted
>  - Copyright year for CompileProperties
>  - Redo translation for CS.properties
>  - Spot convert CurrencySymbols.properties

test/jdk/java/text/Format/NumberFormat/CurrencySymbols.properties line 1:

> 1: #

Conversion did not work as expected, addressing right now.

-

PR: https://git.openjdk.org/jdk/pull/12726


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]

2023-03-16 Thread Justin Lu
> This PR converts Unicode sequences to UTF-8 native in .properties file. 
> (Excluding the Unicode space and tab sequence). The conversion was done using 
> native2ascii.
> 
> In addition, the build logic is adjusted to support reading in the 
> .properties files as UTF-8 during the conversion from .properties file to 
> .java ListResourceBundle file.

Justin Lu has updated the pull request incrementally with four additional 
commits since the last revision:

 - Bug6204853 should not be converted
 - Copyright year for CompileProperties
 - Redo translation for CS.properties
 - Spot convert CurrencySymbols.properties

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/12726/files
  - new: https://git.openjdk.org/jdk/pull/12726/files/1e798f24..6d6bffe8

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=12726&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12726&range=00-01

  Stats: 92 lines in 4 files changed: 0 ins; 0 del; 92 mod
  Patch: https://git.openjdk.org/jdk/pull/12726.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12726/head:pull/12726

PR: https://git.openjdk.org/jdk/pull/12726