Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]
On Tue, 20 Aug 2024 09:07:54 GMT, Pavel Rappo wrote: >> Justin Lu has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Replace InputStreamReader with BufferedReader > > src/jdk.jartool/share/classes/sun/tools/jar/resources/jar_pt_BR.properties > line 95: > >> 93: >> 94: main.usage.summary=Uso: jar [OPTION...] [ [--release VERSION] [-C dir] >> files] ... >> 95: main.usage.summary.try=Tente `jar --ajuda' para obter mais informações. > > I was looking for something unrelated in properties files, and found this. It > is surprising to see an option name being localised; it must be a bug. Good catch, Pavel. It is indeed a bug. This type of overtranslation l10n bug happens all the time, and hard to catch. - PR Review Comment: https://git.openjdk.org/jdk/pull/15694#discussion_r1723520963
Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]
On Wed, 13 Sep 2023 17:38:28 GMT, Justin Lu wrote: >> JDK .properties files still use ISO-8859-1 encoding with escape sequences. >> It would improve readability to see the native characters instead of escape >> sequences (especially for the L10n process). The majority of files changed >> are localized resource files. >> >> This change converts the Unicode escape sequences in the JDK .properties >> files (both in src and test) to UTF-8 native characters. Additionally, the >> build logic is adjusted to read the .properties files in UTF-8 while >> generating the ListResourceBundle files. >> >> The only escape sequence not converted was `\u0020` as this is used to >> denote intentional trailing white space. (E.g. `key=This is the >> value:\u0020`) >> >> The conversion was done using native2ascii with options `-reverse -encoding >> UTF-8`. >> >> If this PR is integrated, the IDE default encoding for .properties files >> need to be updated to UTF-8. (IntelliJ IDEA locks .properties files as >> ISO-8859-1 unless manually changed). > > Justin Lu has updated the pull request incrementally with one additional > commit since the last revision: > > Replace InputStreamReader with BufferedReader src/jdk.jartool/share/classes/sun/tools/jar/resources/jar_pt_BR.properties line 95: > 93: > 94: main.usage.summary=Uso: jar [OPTION...] [ [--release VERSION] [-C dir] > files] ... > 95: main.usage.summary.try=Tente `jar --ajuda' para obter mais informações. I was looking for something unrelated in properties files, and found this. It is surprising to see an option name being localised; it must be a bug. - PR Review Comment: https://git.openjdk.org/jdk/pull/15694#discussion_r1722966688
Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]
On Wed, 13 Sep 2023 18:12:15 GMT, Naoto Sato wrote: > Looks good to me, although I did not look at each l10n file, but sampled > some. Thanks for tackling this conversion. Thanks for the review; (In addition to testing), I ran a script to verify only white space escape sequences exist in JDK .properties files. (Excluding escape sequences in test files that should remain as is for the purpose of the test) - PR Comment: https://git.openjdk.org/jdk/pull/15694#issuecomment-1718139807
Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]
On Wed, 13 Sep 2023 17:38:28 GMT, Justin Lu wrote: >> JDK .properties files still use ISO-8859-1 encoding with escape sequences. >> It would improve readability to see the native characters instead of escape >> sequences (especially for the L10n process). The majority of files changed >> are localized resource files. >> >> This change converts the Unicode escape sequences in the JDK .properties >> files (both in src and test) to UTF-8 native characters. Additionally, the >> build logic is adjusted to read the .properties files in UTF-8 while >> generating the ListResourceBundle files. >> >> The only escape sequence not converted was `\u0020` as this is used to >> denote intentional trailing white space. (E.g. `key=This is the >> value:\u0020`) >> >> The conversion was done using native2ascii with options `-reverse -encoding >> UTF-8`. >> >> If this PR is integrated, the IDE default encoding for .properties files >> need to be updated to UTF-8. (IntelliJ IDEA locks .properties files as >> ISO-8859-1 unless manually changed). > > Justin Lu has updated the pull request incrementally with one additional > commit since the last revision: > > Replace InputStreamReader with BufferedReader Looks good to me, although I did not look at each l10n file, but sampled some. Thanks for tackling this conversion. - Marked as reviewed by naoto (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15694#pullrequestreview-1625154951
Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]
> JDK .properties files still use ISO-8859-1 encoding with escape sequences. It > would improve readability to see the native characters instead of escape > sequences (especially for the L10n process). The majority of files changed > are localized resource files. > > This change converts the Unicode escape sequences in the JDK .properties > files (both in src and test) to UTF-8 native characters. Additionally, the > build logic is adjusted to read the .properties files in UTF-8 while > generating the ListResourceBundle files. > > The only escape sequence not converted was `\u0020` as this is used to denote > intentional trailing white space. (E.g. `key=This is the value:\u0020`) > > The conversion was done using native2ascii with options `-reverse -encoding > UTF-8`. > > If this PR is integrated, the IDE default encoding for .properties files need > to be updated to UTF-8. (IntelliJ IDEA locks .properties files as ISO-8859-1 > unless manually changed). Justin Lu has updated the pull request incrementally with one additional commit since the last revision: Replace InputStreamReader with BufferedReader - Changes: - all: https://git.openjdk.org/jdk/pull/15694/files - new: https://git.openjdk.org/jdk/pull/15694/files/0f3698a5..ceb48bbe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15694&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15694&range=00-01 Stats: 18 lines in 2 files changed: 6 ins; 8 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/15694.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15694/head:pull/15694 PR: https://git.openjdk.org/jdk/pull/15694
Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]
On Wed, 15 Mar 2023 16:18:44 GMT, Archie L. Cobbs wrote: >> Justin Lu has updated the pull request incrementally with four additional >> commits since the last revision: >> >> - Bug6204853 should not be converted >> - Copyright year for CompileProperties >> - Redo translation for CS.properties >> - Spot convert CurrencySymbols.properties > > test/jdk/java/util/ResourceBundle/Bug6204853.properties line 1: > >> 1: # > > This file should probably be excluded because it's used in a test that > relates to UTF-8 encoding (or not) of property files. Thank you, removed the changes for this file - PR: https://git.openjdk.org/jdk/pull/12726
Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]
On Wed, 15 Mar 2023 20:19:51 GMT, Naoto Sato wrote: >> Justin Lu has updated the pull request incrementally with four additional >> commits since the last revision: >> >> - Bug6204853 should not be converted >> - Copyright year for CompileProperties >> - Redo translation for CS.properties >> - Spot convert CurrencySymbols.properties > > test/jdk/java/text/Format/NumberFormat/CurrencySymbols.properties line 156: > >> 154: zh=\u00A4 >> 155: zh_CN=\uFFE5 >> 156: zh_HK=HK$ > > Why are they not encoded into UTF-8 native? Not sure, thank you for catching it. Working on it right now. - PR: https://git.openjdk.org/jdk/pull/12726
Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]
On Thu, 16 Mar 2023 18:19:29 GMT, Justin Lu wrote: >> This PR converts Unicode sequences to UTF-8 native in .properties file. >> (Excluding the Unicode space and tab sequence). The conversion was done >> using native2ascii. >> >> In addition, the build logic is adjusted to support reading in the >> .properties files as UTF-8 during the conversion from .properties file to >> .java ListResourceBundle file. > > Justin Lu has updated the pull request incrementally with four additional > commits since the last revision: > > - Bug6204853 should not be converted > - Copyright year for CompileProperties > - Redo translation for CS.properties > - Spot convert CurrencySymbols.properties test/jdk/java/text/Format/NumberFormat/CurrencySymbols.properties line 1: > 1: # Conversion did not work as expected, addressing right now. - PR: https://git.openjdk.org/jdk/pull/12726
Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]
> This PR converts Unicode sequences to UTF-8 native in .properties file. > (Excluding the Unicode space and tab sequence). The conversion was done using > native2ascii. > > In addition, the build logic is adjusted to support reading in the > .properties files as UTF-8 during the conversion from .properties file to > .java ListResourceBundle file. Justin Lu has updated the pull request incrementally with four additional commits since the last revision: - Bug6204853 should not be converted - Copyright year for CompileProperties - Redo translation for CS.properties - Spot convert CurrencySymbols.properties - Changes: - all: https://git.openjdk.org/jdk/pull/12726/files - new: https://git.openjdk.org/jdk/pull/12726/files/1e798f24..6d6bffe8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12726&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12726&range=00-01 Stats: 92 lines in 4 files changed: 0 ins; 0 del; 92 mod Patch: https://git.openjdk.org/jdk/pull/12726.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12726/head:pull/12726 PR: https://git.openjdk.org/jdk/pull/12726