> Please consider this fix to ensure that going from `MessageFormat` to pattern
> string via `toPattern()` and then back via `new MessageFormat()` results in a
> format that is equivalent to the original.
>
> The quoting and escaping rules for `MessageFormat` pattern strings are really
> tricky. I admit not completely understanding them. At a high level, they work
> like this: The normal way one would "nest" strings containing special
> characters is with straightforward recursive escaping like with the `bash`
> command line. For example, if you want to echo `a "quoted string" example`
> then you enter `echo "a "quoted string" example"`. With this scheme it's
> always the "outer" layer's job to (un)escape special characters as needed.
> That is, the echo command never sees the backslash characters.
>
> In contrast, with `MessageFormat` and friends, nested subformat pattern
> strings are always provided "pre-escaped". So to build an "outer" string
> (e.g., for `ChoiceFormat`) the "inner" subformat pattern strings are more or
> less just concatenated, and then only the `ChoiceFormat` option separator
> characters (e.g., `<`, `#`, `|`, etc.) are escaped.
>
> The "pre-escape" escaping algorithm escapes `{` characters, because `{`
> indicates the beginning of a format argument. However, it doesn't escape `}`
> characters. This is OK because the format string parser treats any "extra"
> closing braces (where "extra" means not matching an opening brace) as plain
> characters.
>
> So far, so good... at least, until a format string containing an extra
> closing brace is nested inside a larger format string, where the extra
> closing brace, which was previously "extra", can now suddenly match an
> opening brace in the outer pattern containing it, thus truncating it by
> "stealing" the match from some subsequent closing brace.
>
> An example is the `MessageFormat` string `"{0,choice,0.0#option A:
> {1}|1.0#option B: {1}'}'}"`. Note the second option format string has a
> trailing closing brace in plain text. If you create a `MessageFormat` with
> this string, you see a trailing `}` only with the second option.
>
> However, if you then invoke `toPattern()`, the result is
> `"{0,choice,0.0#option A: {1}|1.0#option B: {1}}}"`. Oops, now because the
> "extra" closing brace is no longer quoted, it matches the opening brace at
> the beginning of the string, and the following closing brace, which was the
> previous match, is now just plain text in the outer `MessageFormat` string.
>
> As a result, invoking `f.format(new Object{} { 0, 5 })` will retur...
Archie Cobbs has updated the pull request incrementally with six additional
commits since the last revision:
- Add more test cases and more pattern string variety.
- Make it easier to debug & show what the test is doing.
- Add comment explaining what MAX_FORMAT_NESTING is for.
- Clean up code a bit by using instanceof patterns.
- Tweak @implNote to clarify only referring to MessageFormat class.
- Update copyright year in MessageFormat.java.
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/17416/files
- new: https://git.openjdk.org/jdk/pull/17416/files/36d70b8a..58e8cc68
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=17416&range=03
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=17416&range=02-03
Stats: 94 lines in 2 files changed: 50 ins; 12 del; 32 mod
Patch: https://git.openjdk.org/jdk/pull/17416.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/17416/head:pull/17416
PR: https://git.openjdk.org/jdk/pull/17416