вт, 30 авг. 2022 г. в 10:13, Mark Thomas <ma...@apache.org>:
>
> On 29/08/2022 14:06, Christopher Schultz wrote:
> > Mark,
> >
> > On 8/29/22 02:39, ma...@apache.org wrote:
>
> <snip/>
>
> >>       public static String get(final String key, final Object... args) {
> >>           String value = get(key);
> >> +        // Convert all Number arguments to String else MessageFormat
> >> may try to
> >> +        // format them in unexpected ways.
> >> +        if (args != null) {
> >> +            for (int i = 0; i < args.length; i++) {
> >> +                if (args[i] instanceof Number) {
> >> +                    args[i] = args[i].toString();
> >> +                }
> >> +            }
> >> +        }
> >> +
> >
> > This might represent a big change in behavior, especially with
> > floating-point numbers. I'm not sure what role MessageFormat plays in
> > the whole EL ecosystem... is it any part of the spec, or only for like
> > error messages and things like that?
>
> It is only for error messages and the like.
>
> oss-fuzz found an edge case where MessageFormat would output a number
> with hundreds of thousands of digits as an integer rather than using
> exponential form.
>
> Any such instances would be application bugs (the issue is in parsing
> the EL expression so there is no way for users to trigger this). It
> seems unlikely that this would occur in practice.

1. I think we are actually dealing with a JRE bug here.

As such, while only MessageFactory in EL was tested, in theory it
concerns other copies of this class.

2. Personally, as a programmer, I do like when numbers are printed in
a programmer-friendly way, i.e. via toString() like here, but
generally it means losing i18n.

3. For future reference
- Javadoc of MessageFormat. Java 17:

https://docs.oracle.com/javase/8/docs/api/java/text/MessageFormat.html

- messages used by EL code come from
static final ResourceBundle bundle =
ResourceBundle.getBundle("org.apache.el.LocalStrings");

and look like

error.convert=Cannot convert [{0}] of type [{1}] to [{2}]


4. I think that the JRE bug is as follows:

(1) The Javadoc of MessageFormat says (see a table there) that to
format a number you specify the format with FormatType = "number". The
FormatStyle may be omitted.

E.g. "{0, number}" can be used to clearly declare that the argument
has to be formatted as a Number.

(2) It says that for FormatType=none, it creates a Subformat of null.

Looking into the source code (OpenJDK 17.0.4)
parsing the format string is implemented in method #makeFormat(...).
See the branch with "case TYPE_NULL:" there.

It results in assigning the null value to an entry in "formats[]" array.

(3) I suspect that a null format was actually intended to serve as a
"text" format. My concerns:

- The MessageFormat lacks an explicit "text" FormatType.

In comparison, with a java.util.Formatter I can use an explicit %s
pattern to specify textual format.

- There is a comment in the source code of MessageFormat #toPattern():

[[[
            if (fmt == null) {
                // do nothing, string format
            } else if ...
]]]

- I do not see any documentation in MessageFormat javadoc on how the
null Subformat is supposed to be treated.

Nor I found it in the official tutorial.
https://docs.oracle.com/javase/tutorial/i18n/format/messageFormat.html

I guess the behaviour may be "specified" in a hidden way, via TCK.


(4) The actual implementation of formatting (implemented in
MessageFormat#subformat(...)) is different:

[[[
                } else if (formats[i] != null) {
                    subFormatter = formats[i];
                    if (subFormatter instanceof ChoiceFormat) {
                        arg = formats[i].format(obj);
                        if (arg.indexOf('{') >= 0) {
                            subFormatter = new MessageFormat(arg, locale);
                            obj = arguments;
                            arg = null;
                        }
                   }
                } else if (obj instanceof Number) {
                    // format number if can
                    subFormatter = NumberFormat.getInstance(locale);
                } else if (obj instanceof Date) {
                    // format a Date if can
                    subFormatter = DateFormat.getDateTimeInstance(
                             DateFormat.SHORT, DateFormat.SHORT, locale);//fix
                } else if (obj instanceof String) {
                    arg = (String) obj;
                } else {
                    arg = obj.toString();
                    if (arg == null) arg = "null";
                }
]]]

I.e. if format[i] is null it does not interpret it as "text", but
tries "to be smart" and does some processing for Numbers.

(5) According to oss-fuzz we now know that such processing is broken.
It assumes that any Number can be sanely formatted by a
"NumberFormat.getInstance(locale);".


5. What to do. I see the following ways for a better workaround of the JRE bug:

(a) It is possible to narrow Mark's patch:
narrowing the type test to look for BigInteger and BigDecimal instead
of a generic Number.

On the fact that
- There are known problems with some of such numbers.
- Those classes know how to print themselves better than a NumberFormat knows.

Pros: It fixes a rare edge case, and doesn't change anything else.

Cons: I think that it breaks formatting if a pattern is using an
explicit FormatType of "number" or of "choice".


(b) It is possible to obtain formats from a MessageFormat. See the
getFormatsByArgumentIndex() method.

I propose
if the format is null then format the value with String.valueOf().
I.e. interpret "{0}" as a text format.

If it causes a regression somewhere, it is possible to update a
specific format pattern to revert to the old behaviour:
just replace {0} with {0,number} for numbers, or with a pair of
{0,date} {0, time} for dates.
I assume that the type of an argument in such patterns is known.

Pros:
- Explicitly controlled behaviour.

Cons:
- If restoring the old behaviour is required in many places,
updating all them may be a notable amount of work.

Other notes:
- The default format for Dates when printed with toString() is"dow mon
dd hh:mm:ss zzz yyyy", as documented in Javadoc
https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/Date.html#toString()

It may look uglier than the old behaviour, but no data is lost: both
date and time are printed.

- I am not handling the case of nested format patterns, i.e.
the case of ChoiceFormat in the code fragment above.

I think that usage of ChoiceFormat is rare, and that it is always used
with numbers, not with some random values of unknown type.
So I think that we are safe to ignore this case.

(c) It is possible to combine (b) with a type check,
e.g. apply it for Numbers only, but skip Dates.

Is there a worth in such behaviour?


My preference is for (b).

Best regards,
Konstantin Kolinko

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to