вт, 30 авг. 2022 г. в 10:13, Mark Thomas <ma...@apache.org>: > > On 29/08/2022 14:06, Christopher Schultz wrote: > > Mark, > > > > On 8/29/22 02:39, ma...@apache.org wrote: > > <snip/> > > >> public static String get(final String key, final Object... args) { > >> String value = get(key); > >> + // Convert all Number arguments to String else MessageFormat > >> may try to > >> + // format them in unexpected ways. > >> + if (args != null) { > >> + for (int i = 0; i < args.length; i++) { > >> + if (args[i] instanceof Number) { > >> + args[i] = args[i].toString(); > >> + } > >> + } > >> + } > >> + > > > > This might represent a big change in behavior, especially with > > floating-point numbers. I'm not sure what role MessageFormat plays in > > the whole EL ecosystem... is it any part of the spec, or only for like > > error messages and things like that? > > It is only for error messages and the like. > > oss-fuzz found an edge case where MessageFormat would output a number > with hundreds of thousands of digits as an integer rather than using > exponential form. > > Any such instances would be application bugs (the issue is in parsing > the EL expression so there is no way for users to trigger this). It > seems unlikely that this would occur in practice.
1. I think we are actually dealing with a JRE bug here. As such, while only MessageFactory in EL was tested, in theory it concerns other copies of this class. 2. Personally, as a programmer, I do like when numbers are printed in a programmer-friendly way, i.e. via toString() like here, but generally it means losing i18n. 3. For future reference - Javadoc of MessageFormat. Java 17: https://docs.oracle.com/javase/8/docs/api/java/text/MessageFormat.html - messages used by EL code come from static final ResourceBundle bundle = ResourceBundle.getBundle("org.apache.el.LocalStrings"); and look like error.convert=Cannot convert [{0}] of type [{1}] to [{2}] 4. I think that the JRE bug is as follows: (1) The Javadoc of MessageFormat says (see a table there) that to format a number you specify the format with FormatType = "number". The FormatStyle may be omitted. E.g. "{0, number}" can be used to clearly declare that the argument has to be formatted as a Number. (2) It says that for FormatType=none, it creates a Subformat of null. Looking into the source code (OpenJDK 17.0.4) parsing the format string is implemented in method #makeFormat(...). See the branch with "case TYPE_NULL:" there. It results in assigning the null value to an entry in "formats[]" array. (3) I suspect that a null format was actually intended to serve as a "text" format. My concerns: - The MessageFormat lacks an explicit "text" FormatType. In comparison, with a java.util.Formatter I can use an explicit %s pattern to specify textual format. - There is a comment in the source code of MessageFormat #toPattern(): [[[ if (fmt == null) { // do nothing, string format } else if ... ]]] - I do not see any documentation in MessageFormat javadoc on how the null Subformat is supposed to be treated. Nor I found it in the official tutorial. https://docs.oracle.com/javase/tutorial/i18n/format/messageFormat.html I guess the behaviour may be "specified" in a hidden way, via TCK. (4) The actual implementation of formatting (implemented in MessageFormat#subformat(...)) is different: [[[ } else if (formats[i] != null) { subFormatter = formats[i]; if (subFormatter instanceof ChoiceFormat) { arg = formats[i].format(obj); if (arg.indexOf('{') >= 0) { subFormatter = new MessageFormat(arg, locale); obj = arguments; arg = null; } } } else if (obj instanceof Number) { // format number if can subFormatter = NumberFormat.getInstance(locale); } else if (obj instanceof Date) { // format a Date if can subFormatter = DateFormat.getDateTimeInstance( DateFormat.SHORT, DateFormat.SHORT, locale);//fix } else if (obj instanceof String) { arg = (String) obj; } else { arg = obj.toString(); if (arg == null) arg = "null"; } ]]] I.e. if format[i] is null it does not interpret it as "text", but tries "to be smart" and does some processing for Numbers. (5) According to oss-fuzz we now know that such processing is broken. It assumes that any Number can be sanely formatted by a "NumberFormat.getInstance(locale);". 5. What to do. I see the following ways for a better workaround of the JRE bug: (a) It is possible to narrow Mark's patch: narrowing the type test to look for BigInteger and BigDecimal instead of a generic Number. On the fact that - There are known problems with some of such numbers. - Those classes know how to print themselves better than a NumberFormat knows. Pros: It fixes a rare edge case, and doesn't change anything else. Cons: I think that it breaks formatting if a pattern is using an explicit FormatType of "number" or of "choice". (b) It is possible to obtain formats from a MessageFormat. See the getFormatsByArgumentIndex() method. I propose if the format is null then format the value with String.valueOf(). I.e. interpret "{0}" as a text format. If it causes a regression somewhere, it is possible to update a specific format pattern to revert to the old behaviour: just replace {0} with {0,number} for numbers, or with a pair of {0,date} {0, time} for dates. I assume that the type of an argument in such patterns is known. Pros: - Explicitly controlled behaviour. Cons: - If restoring the old behaviour is required in many places, updating all them may be a notable amount of work. Other notes: - The default format for Dates when printed with toString() is"dow mon dd hh:mm:ss zzz yyyy", as documented in Javadoc https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/Date.html#toString() It may look uglier than the old behaviour, but no data is lost: both date and time are printed. - I am not handling the case of nested format patterns, i.e. the case of ChoiceFormat in the code fragment above. I think that usage of ChoiceFormat is rare, and that it is always used with numbers, not with some random values of unknown type. So I think that we are safe to ignore this case. (c) It is possible to combine (b) with a type check, e.g. apply it for Numbers only, but skip Dates. Is there a worth in such behaviour? My preference is for (b). Best regards, Konstantin Kolinko --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org