On Thu, 25 Jan 2024 21:38:54 GMT, Archie Cobbs <aco...@openjdk.org> wrote:

>> Please consider this fix to ensure that going from `MessageFormat` to 
>> pattern string via `toPattern()` and then back via `new MessageFormat()` 
>> results in a format that is equivalent to the original.
>> 
>> The quoting and escaping rules for `MessageFormat` pattern strings are 
>> really tricky. I admit not completely understanding them. At a high level, 
>> they work like this: The normal way one would "nest" strings containing 
>> special characters is with straightforward recursive escaping like with the 
>> `bash` command line. For example, if you want to echo `a "quoted string" 
>> example` then you enter `echo "a "quoted string" example"`. With this scheme 
>> it's always the "outer" layer's job to (un)escape special characters as 
>> needed. That is, the echo command never sees the backslash characters.
>> 
>> In contrast, with `MessageFormat` and friends, nested subformat pattern 
>> strings are always provided "pre-escaped". So to build an "outer" string 
>> (e.g., for `ChoiceFormat`) the "inner" subformat pattern strings are more or 
>> less just concatenated, and then only the `ChoiceFormat` option separator 
>> characters (e.g., `<`, `#`, `|`, etc.) are escaped.
>> 
>> The "pre-escape" escaping algorithm escapes `{` characters, because `{` 
>> indicates the beginning of a format argument. However, it doesn't escape `}` 
>> characters. This is OK because the format string parser treats any "extra" 
>> closing braces (where "extra" means not matching an opening brace) as plain 
>> characters.
>> 
>> So far, so good... at least, until a format string containing an extra 
>> closing brace is nested inside a larger format string, where the extra 
>> closing brace, which was previously "extra", can now suddenly match an 
>> opening brace in the outer pattern containing it, thus truncating it by 
>> "stealing" the match from some subsequent closing brace.
>> 
>> An example is the `MessageFormat` string `"{0,choice,0.0#option A: 
>> {1}|1.0#option B: {1}'}'}"`. Note the second option format string has a 
>> trailing closing brace in plain text. If you create a `MessageFormat` with 
>> this string, you see a trailing `}` only with the second option.
>> 
>> However, if you then invoke `toPattern()`, the result is 
>> `"{0,choice,0.0#option A: {1}|1.0#option B: {1}}}"`. Oops, now because the 
>> "extra" closing brace is no longer quoted, it matches the opening brace at 
>> the beginning of the string, and the following closing  brace, which was the 
>> previous match, is now just plain text in the outer `MessageFormat` string.
>> 
>> As a result, invoking `f.format(new ...
>
> Archie Cobbs has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Change MessageFormat.toPattern() @implNote to @implSpec.

src/java.base/share/classes/java/text/MessageFormat.java line 558:

> 556:      * @implSpec The implementation in {@link MessageFormat} returns a 
> string
> 557:      * that can be used to create a new instance that is semantically 
> equivalent
> 558:      * to this instance.

The "can be used to create" seems conditional. Perhaps it can be worded as an 
assertion that can be tested.
Suggestion:

     * @implSpec A new MessageFormat created from the string returned from the 
implementation of 
     * `MessageFormat.toPattern()` is semantically equivalent to this instance.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17416#discussion_r1468118990

Reply via email to