Please consider this fix to ensure that going from `MessageFormat` to pattern 
string via `toPattern()` and then back via `new MessageFormat()` results in a 
format that is equivalent to the original.

The quoting and escaping rules for `MessageFormat` pattern strings are really 
tricky. I admit not completely understanding them. At a high level, they work 
like this: The normal way one would "nest" strings containing special 
characters is with straightforward recursive escaping like with the `bash` 
command line. For example, if you want to echo `a "quoted string" example` then 
you enter `echo "a "quoted string" example"`. With this scheme it's always the 
"outer" layer's job to (un)escape special characters as needed. That is, the 
echo command never sees the backslash characters.

In contrast, with `MessageFormat` and friends, nested subformat pattern strings 
are always provided "pre-escaped". So to build an "outer" string (e.g., for 
`ChoiceFormat`) the "inner" subformat pattern strings are more or less just 
concatenated, and then only the `ChoiceFormat` option separator characters 
(e.g., `<`, `#`, `|`, etc.) are escaped.

The "pre-escape" escaping algorithm escapes `{` characters, because `{` 
indicates the beginning of a format argument. However, it doesn't escape `}` 
characters. This is OK because the format string parser treats any "extra" 
closing braces (where "extra" means not matching an opening brace) as plain 
characters.

So far, so good... at least, until a format string containing an extra closing 
brace is nested inside a larger format string, where the extra closing brace, 
which was previously "extra", can now suddenly match an opening brace in the 
outer pattern containing it, thus truncating it by "stealing" the match from 
some subsequent closing brace.

An example is the `MessageFormat` string `"{0,choice,0.0#option A: 
{1}|1.0#option B: {1}'}'}"`. Note the second option format string has a 
trailing closing brace in plain text. If you create a `MessageFormat` with this 
string, you see a trailing `}` only with the second option.

However, if you then invoke `toPattern()`, the result is `"{0,choice,0.0#option 
A: {1}|1.0#option B: {1}}}"`. Oops, now because the "extra" closing brace is no 
longer quoted, it matches the opening brace at the beginning of the string, and 
the following closing  brace, which was the previous match, is now just plain 
text in the outer `MessageFormat` string.

As a result, invoking `f.format(new Object{} { 0, 5 })` will return `"option A: 
5"` using the original `MessageFormat` but `"option A: 5}"` from a new one 
created with the string from `toPattern()`.

This patch fixes this problem by adding quotes around "extra" closing braces in 
the subformat pattern strings.

-------------

Commit messages:
 - Quote runs of extra unquoted closing braces in subformat patterns.

Changes: https://git.openjdk.org/jdk/pull/17416/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17416&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8323699
  Stats: 103 lines in 2 files changed: 100 ins; 0 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/17416.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/17416/head:pull/17416

PR: https://git.openjdk.org/jdk/pull/17416

Reply via email to