On Thu, 18 Jan 2024 20:08:27 GMT, Justin Lu <[email protected]> wrote:
>> Hi @justin-curtis-lu,
>>
>> Thanks a lot for taking a look, I am glad for any other set of eyes on this
>> tricky stuff!
>>
>>> As it stands, it would be inconsistent to have the closing bracket quoted
>>> and the opening bracket not quoted.
>>
>> You're right - the problem exists with both `{` and `}`, as is shown here
>> (unmodified jshell):
>>
>>
>> $ jshell
>> | Welcome to JShell -- Version 17.0.9
>> | For an introduction type: /help intro
>>
>> jshell> import java.text.*;
>>
>> jshell> new MessageFormat("Test: {0,number,foo'{'#.00}");
>> $2 ==> java.text.MessageFormat@951c5b58
>>
>> jshell> $2.toPattern()
>> $3 ==> "Test: {0,number,foo{#.00}"
>>
>> jshell> new MessageFormat($3)
>> | Exception java.lang.IllegalArgumentException: Unmatched braces in the
>> pattern.
>> | at MessageFormat.applyPattern (MessageFormat.java:521)
>> | at MessageFormat.<init> (MessageFormat.java:371)
>> | at (#4:1)
>>
>>
>> I've been missing the forest for the trees a bit and I think the fix can be
>> simpler now.
>>
>> For the record, here is my latest understanding of what's going on...
>>
>> 1. When `MessageFormat.toPattern()` constructs the combined pattern string,
>> it concatenates the plain text bits, suitably escaped, and the pattern
>> strings from each subformat.
>> 1. The subformat strings are already escaped, in the sense that you can take
>> (for example) a `ChoiceFormat` format string and use it as-is to recreate
>> that same `ChoiceFormat`, but they are escaped _only for their own purposes_.
>> 1. The original example is an example of where this matters - `ChoiceFormat`
>> needs to escape `#`, `|`, etc., but doesn't need to escape `{` or `}` - but
>> a `MessageFormat` format string does need to escape those.
>> 1. Therefore, the correct fix is for `MessageFormat.toPattern()` to modify
>> the subformat strings by quoting any _unquoted_ `{` or `}` characters while
>> leaving any already-quoted characters alone.
>>
>> Updated in a9d78c76f5f. I've also updated the test so it does more thorough
>> round-trip checks.
>
> Hi @archiecobbs , I think the recent commit is good progress.
>
> First off I think this change will need a CSR as there are behavioral
> concerns, although minimal. Please let me know if you have access on JBS to
> file one, otherwise I can file one for you.
>
>> Therefore, the correct fix is for MessageFormat.toPattern() to modify the
>> subformat strings by quoting any unquoted {or } characters while leaving any
>> already-quoted characters alone.
>
> Yes, I think this behavior allows for the String returned by `toPattern()` to
> create a MessageFormat that can format equivalently to the original
> MessageFormat. Although to clarify, the original String pattern will not be
> guaranteed to be equivalent to the one returned by `toPattern()` as we are
> adding quotes to all brackets in the subformatPattern, regardless if they
> were quoted in the original String. I think the former is more important
> here, so the latter is okay.
>
> For DecimalFormat and SimpleDateFormat subformats, adding quotes to brackets
> found in the subformatPattern is always right, those subformats can not be
> used to create recursive format behavior. Thus you would **never** except a
> nested ArgumentIndex in the subformatPattern (ex: `"{0,number,{1}foo0.0"}`),
> and can confidently escape all brackets found for these subFormats (which you
> do).
>
> To me, ChoiceFormat is where there is some concern. Consider the following,
>
> `new MessageFormat("{0,choice,0# {1} {2} {3} }”)`
>
> With this change, invoking `toPattern()` on the created MessageFormat would
> return
>
> `"{0,choice,0.0# '{'1'}' '{'2'}' '{'3'}' }"`.
>
> This would technically be incorrect. One would think instead of allowing 3
> nested elements, we are now printing the literal `{1} {2} {3}` since the
> brackets are escaped. But, this is not actually the case. Escaping brackets
> with a choice subFormat does not function properly. This is due to the fact
> that a recursive messageFormat is created, but the pattern passed has already
> lost the quotes.
>
> This means that `new MessageFormat("{0,choice,0.0# '{'1'}' '{'2'}' '{'3'}'
> }")`.
>
> is still equivalent to `new MessageFormat("{0,choice,0# {1} {2} {3} }”).`
>
> So the behavior of quoting all brackets would still guarantee _the String
> returned by `toPattern()` to create a MessageFormat that can format
> equivalently to the original MessageFormat_ but only because the current
> behavior of formatting with a choice subFormat is technically wrong. I think
> this is okay, since this incorrect behavior is long-standing, but a CSR would
> be good to ad...
Hi @justin-curtis-lu,
Thanks for your comments.
> First off I think this change will need a CSR as there are behavioral
> concerns, although minimal. Please let me know if you have access on JBS to
> file one, otherwise I can file one for you.
No problem, I'll add one.
> To me, ChoiceFormat is where there is some concern. Consider the following,
OK I think I understand what you're saying.
The first goal here is to ensure the round trip operation `MessageFormat` →
Pattern string from `toPattern()` → `MessageFormat` takes you back to where you
started.
The patch achieves this so we're good so far. I think we agree on this.
> This would technically be incorrect.
I'm not quite understanding what you meant there (and therefore slightly
worried)...
In my mind, a key observation here is that the following two things are not
equal and therefore shouldn't be conflated:
* A `ChoiceFormat` pattern string (e.g., what you get from invoking
`ChoiceFormat.toPattern()`)
* The `XXX` that you see inside a `MessageFormat` pattern string as part of a
`{0,choice,XXX}` subformat.
Those are not the same, even modulo quoting. As you point out, `MessageFormat`
has this quirky logic
([here](https://github.com/openjdk/jdk/blob/81df265e41d393cdde87729e091dd465934071fd/src/java.base/share/classes/java/text/MessageFormat.java#L1327-L1333))
in which, after it evaluates a `ChoiceFormat` subformat, if the returned
string contains a `{` character, then that string is magically reinterpreted as
a `MessageFormat` pattern string and evaluated again, otherwise it's left alone.
So the `XXX` is not a "`ChoiceFormat` pattern string" but something else, so it
can't be "incorrect" for not looking like a normal `ChoiceFormat` pattern
string.
Maybe it's just a manner of speaking? In any case I want to be careful I'm not
missing something in what you're saying.
Thanks.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/17416#issuecomment-1899206551