UnicodeDecoder U+FFFE handling

2018-12-23 Thread Clément MATHIEU
Hi,

I am wondering if UnicodeDecoder handling of U+FFFE is compliant with
current Unicode specification. Supsicious code is:

   if (c == REVERSED_MARK) {
// A reversed BOM cannot occur within middle of stream
return CoderResult.malformedForLength(2);
   }

Up to Unicode 6.3 Unicode specification said that U+FFFE is a non
character and that non characters "should never been interchanged".
Returning CR_MALFORMED on U+FFFE appears to be correct for Java 8
(Unicode 6.2).

However, Unicode 7 changed that and now says: 

  Applications are free to use any of these noncharacter code
  points internally. They have no standard interpretation when
  exchanged outside the context of internal use. However, they are
  not illegal in interchange, nor does their presence cause Unicode
  text to be ill-formed. [...] They are not prohibited from
  occurring  in  valid  Unicode  strings  which  happen  to  be  in
  terchanged. [...]. If a noncharacter is received in open
  interchange, an application is not required to interpret it in
  any way. It is good practice, however, to recognize it as a
  noncharacter and to take appropriate action, such as replacing it
  with U+FFFD replacement character, to indicate
  the  problem  in  the  text.  It  is  not  recommended  to  simpl
  y  delete  noncharacter  code points from such text, because of
  the potential security issues caused by deleting uninterpreted
  characters.

See:
 - http://www.unicode.org/versions/corrigendum9.html
 - https://www.unicode.org/versions/Unicode11.0.0/ch23.pdf (23.7)

Do you think that returning CR_MALFORMED is still OK?

Regards,
Clément MATHIEU



Re: DateTimeFormatter.format() uses exceptions for flow control

2016-10-10 Thread Clément MATHIEU
On Mon, 2016-10-10 at 06:47 +1000, David Holmes wrote:

Hi David,

> Please note that patches can only be accepted if they are sent
> through, or hosted upon OpenJDK infrastructure. If the patch is small
> enough can you send it inline in the email (attachments are often
> stripped) 

Here it is: 

--- old/src/java.base/share/classes/java/time/format/DateTimePrintContext.java  
2016-10-09 17:01:30.326739656 +0200
+++ new/src/java.base/share/classes/java/time/format/DateTimePrintContext.java  
2016-10-09 17:01:30.228738595 +0200
@@ -302,13 +302,10 @@
  * @throws DateTimeException if the field is not available and the section 
is not optional
  */
 Long getValue(TemporalField field) {
-try {
+if (optional == 0) {
 return temporal.getLong(field);
-} catch (DateTimeException ex) {
-if (optional > 0) {
-return null;
-}
-throw ex;
+} else {
+return temporal.isSupported(field) ? temporal.getLong(field) : 
null;
 }
 }

Clément MATHIEU


DateTimeFormatter.format() uses exceptions for flow control

2016-10-09 Thread Clément MATHIEU
Hi ! 

I noticed that DateTimePrintContext.getValue() relies on exceptions to
handle optionality. Using exceptions for flow control seems both
unexpected and very costly, ie. I discovered the issue
when  LocaleDate.format(BASIC_ISO_DATE) showed up as a heavy hitter in
my application. 

Formatting a LocateDate as a "MMdd" string should take ~0.1μs but
currently takes from ~2.5μs, stack depth = 0, to ~10μs, stack depth =
128 when an optional parser is defined but the optional field is not
supported. This seems unfortunate when exceptions can easily be avoided
by testing if the field is supported before trying to get its value. 

Webrev:

  http://cdn.unportant.info/openjdk/dtf_exceptions/webrev.00/

JMH benchmarks:

  https://gist.github.com/cykl/020e1527546a1ba44b4bb3af6dc0484c


What do you think ?


Note: Many tests within java.time are currently broken because of
TestNG upgrade, see https://bugs.openjdk.java.net/browse/JDK-8152944.
Would it help if I spend some time adding the missing L suffixes ?  

Regards,

Clément MATHIEU