UnicodeDecoder U+FFFE handling
Hi, I am wondering if UnicodeDecoder handling of U+FFFE is compliant with current Unicode specification. Supsicious code is: if (c == REVERSED_MARK) { // A reversed BOM cannot occur within middle of stream return CoderResult.malformedForLength(2); } Up to Unicode 6.3 Unicode specification said that U+FFFE is a non character and that non characters "should never been interchanged". Returning CR_MALFORMED on U+FFFE appears to be correct for Java 8 (Unicode 6.2). However, Unicode 7 changed that and now says: Applications are free to use any of these noncharacter code points internally. They have no standard interpretation when exchanged outside the context of internal use. However, they are not illegal in interchange, nor does their presence cause Unicode text to be ill-formed. [...] They are not prohibited from occurring in valid Unicode strings which happen to be in terchanged. [...]. If a noncharacter is received in open interchange, an application is not required to interpret it in any way. It is good practice, however, to recognize it as a noncharacter and to take appropriate action, such as replacing it with U+FFFD replacement character, to indicate the problem in the text. It is not recommended to simpl y delete noncharacter code points from such text, because of the potential security issues caused by deleting uninterpreted characters. See: - http://www.unicode.org/versions/corrigendum9.html - https://www.unicode.org/versions/Unicode11.0.0/ch23.pdf (23.7) Do you think that returning CR_MALFORMED is still OK? Regards, Clément MATHIEU
Re: DateTimeFormatter.format() uses exceptions for flow control
On Mon, 2016-10-10 at 06:47 +1000, David Holmes wrote: Hi David, > Please note that patches can only be accepted if they are sent > through, or hosted upon OpenJDK infrastructure. If the patch is small > enough can you send it inline in the email (attachments are often > stripped) Here it is: --- old/src/java.base/share/classes/java/time/format/DateTimePrintContext.java 2016-10-09 17:01:30.326739656 +0200 +++ new/src/java.base/share/classes/java/time/format/DateTimePrintContext.java 2016-10-09 17:01:30.228738595 +0200 @@ -302,13 +302,10 @@ * @throws DateTimeException if the field is not available and the section is not optional */ Long getValue(TemporalField field) { -try { +if (optional == 0) { return temporal.getLong(field); -} catch (DateTimeException ex) { -if (optional > 0) { -return null; -} -throw ex; +} else { +return temporal.isSupported(field) ? temporal.getLong(field) : null; } } Clément MATHIEU
DateTimeFormatter.format() uses exceptions for flow control
Hi ! I noticed that DateTimePrintContext.getValue() relies on exceptions to handle optionality. Using exceptions for flow control seems both unexpected and very costly, ie. I discovered the issue when LocaleDate.format(BASIC_ISO_DATE) showed up as a heavy hitter in my application. Formatting a LocateDate as a "MMdd" string should take ~0.1μs but currently takes from ~2.5μs, stack depth = 0, to ~10μs, stack depth = 128 when an optional parser is defined but the optional field is not supported. This seems unfortunate when exceptions can easily be avoided by testing if the field is supported before trying to get its value. Webrev: http://cdn.unportant.info/openjdk/dtf_exceptions/webrev.00/ JMH benchmarks: https://gist.github.com/cykl/020e1527546a1ba44b4bb3af6dc0484c What do you think ? Note: Many tests within java.time are currently broken because of TestNG upgrade, see https://bugs.openjdk.java.net/browse/JDK-8152944. Would it help if I spend some time adding the missing L suffixes ? Regards, Clément MATHIEU