Re: RFR:JDK-8032051:"ZonedDateTime" class "parse" method fails with short time zone offset ("+01")

nadeesh tv Thu, 03 Mar 2016 10:39:06 -0800

Hi,

Stephen, Roger Thanks for the comments.

Please see the updated webrevhttp://cr.openjdk.java.net/~ntv/8032051/webrev.04/



Regards,
Nadeesh


On 3/1/2016 12:29 AM, Stephen Colebourne wrote:

I'm happy to go back to the spec I proposed before. That spec would
determine colons dynamically only for pattern HH. Otherwise, it would
use the presence/absence of a colon in the pattern as the signal. That
would deal with the ISO-8601 problem and resolve the original issue
(as ISO_OFFSET_DATE_TIME uses HH:MM:ss, which would leniently parse
using colons).

Writing the spec wording is not easy however. I had:

When parsing in lenient mode, only the hours are mandatory - minutes
and seconds are optional. The colons are required if the specified
pattern contains a colon. If the specified pattern is "+HH", the
presence of colons is determined by whether the character after the
hour digits is a colon or not. If the offset cannot be parsed then an
exception is thrown unless the section of the formatter is optional.

which isn't too bad but alternatives are possible.

Stephen




On 29 February 2016 at 15:52, Roger Riggs <[email protected]> wrote:

Hi Stephen,

As a fix for the original issue[1], not correctly parsing a ISO defined
offset, the use of lenient
was a convenient implementation technique (hack).  But with the expanded
definition of lenient,
it will allow many forms of the offset that are not allowed by the ISO
specification
and should not be accepted forDateTimeFormatter. ISO_OFFSET_DATE_TIME.
In particular, ISO requires the ":" to separate the minutes.
I'm not sure how to correctly fix the original issue with the new
specification of lenient offset
parsing without introducing some more specific implementation information.


WRT the lenient parsing mode for appendOffset:

I was considering that the subfields of the offset were to be treated
leniently but it seems
you were treating the entire offset field and text as the unit to be treated
leniently.
The spec for lenient parsing would be clearer if it were specified as
allowing any
of the patterns of appendOffset.  The current wording around the character
after the hour
may be confusing.

In the specification of appendOffset(pattern, noOffsetText) how about:

"When parsing in lenient mode, the longest valid pattern that matches the
input is used. Only the hours are mandatory, minutes  and seconds are
optional."

Roger


[1] https://bugs.openjdk.java.net/browse/JDK-8032051





On 2/26/2016 1:10 PM, Stephen Colebourne wrote:

It is important to also consider the case where the user wants to
format using HH:MM but parse seconds if they are provided.

As I said above, this is no different to SignStyle, where the user
requests something specific on format, but accepts anything on input.

The pattern is still used for formatting and strict parsing under
these changes. It is effectively ignored in lenient parsing (which is
the very definition of leniency).

Another way to look at it:

using a pattern of HH:MM and strict:
+02 - disallowed
+02:00 - allowed
+02:00:00 - disallowed

using a pattern of HH:mm and strict:
+02 - allowed
+02:00 - allowed
+02:00:00 - disallowed

using any pattern and lenient:
+02 - allowed
+02:00 - allowed
+02:00:00 - allowed

This covers pretty much anything a user needs when parsing.

Stephen


On 26 February 2016 at 17:38, Roger Riggs <[email protected]> wrote:

Hi Stephen,

Even in lenient mode the parser needs to stick to the fields provided in
the
pattern.
If the caller intends to parse seconds, the pattern should include
seconds.
Otherwise the caller has not been able to specify their intent.
That's consistent with lenient mode used in the other fields.
Otherwise, the pattern is irrelevant except for whether it contains a ":"
and makes
the spec nearly useless.

Roger



On 2/26/2016 12:09 PM, Stephen Colebourne wrote:

On 26 February 2016 at 15:00, Roger Riggs <[email protected]>
wrote:

Hi Stephen,

It does not seem natural to me with a pattern of HHMM for it to parse
more
than 4 digits.
I can see lenient modifying the behavior as it it were HHmm, but there
is
no
indication in the pattern
that seconds would be considered.  How it would it be implied from the
spec?

The spec is being expanded to define what happens. Previously it
didn't define it at all, and would throw an error.

Lenient parsing typically accepts much more than the strict parsing.

When parsing numbers, you may set the SignStyle to NEVER, but the sign
will still be parsed in lenient mode

When parsing text, you may select the short output format, but any
length of text will be parsed in lenient mode.

As such, it is very much in line with the behavour of the API to parse
a much broader format than the one requested in lenient mode. (None of
this affects strict mode).

Stephen

In the original issue, appendOffsetId is defined as using the +HH:MM:ss
pattern and
specific to ISO the MM should be allowed to be optional.   There is no
question of parsing
extra digits not included in the requested pattern.

Separately, this is specifying the new lenient behavior of
appendOffset(pattern, noffsetText).
In that case, I don't think it will be understood that patterns
'shorter'
than the input will
gobble up extra digits and ':'s.

Roger





On 2/26/2016 9:42 AM, Stephen Colebourne wrote:

Lenient can be however lenient we define it to be. Allowing minutes
and seconds to be parsed when not specified in the pattern is the key
part of the change. Whether the parser copes with both colons and
no-colons is the choice at hand here. It seems to me that since the
parser can easily handle figuring out whether the colon is present or
not, we should just allow the parser to be fully lenient.

Stephen


On 26 February 2016 at 14:15, Roger Riggs <[email protected]>
wrote:

HI Stephen,

How lenient is lenient supposed to be? Looking at the offset test
cases,
it
seems to allow  minutes
and seconds digits to be parsed even if the pattern did not include
them.

+ @DataProvider(name="lenientOffsetParseData")
+ Object[][] data_lenient_offset_parse() {
+ return new Object[][] {
+ {"+HH", "+01", 3600},
+ {"+HH", "+0101", 3660},
+ {"+HH", "+010101", 3661},
+ {"+HH", "+01", 3600},
+ {"+HH", "+01:01", 3660},
+ {"+HH", "+01:01:01", 3661},

Thanks, Roger



On 2/26/2016 6:16 AM, Stephen Colebourne wrote:

I don't think this is quite right.

if ((length > position + 3) && (text.charAt(position + 3) == ':')) {
       parseType = 10;
}

This code will *always* select type 10 (colons) if a colon is found at
position+3. Whereas the spec now says that it should only do this if
the pattern is "HH". For other patterns, the colon/no-colon choice is
defined to be based on the pattern.

That said, I'm thinking it is better to make the spec more lenient to
match the behaviour as implemented:


When parsing in lenient mode, only the hours are mandatory - minutes
and seconds are optional. If the character after the hour digits is a
colon
then the parser will parse using the pattern "HH:mm:ss", otherwise the
parser will parse using the pattern "HHmmss".


Additional TCKDateTimeFormatterBuilder tests will be needed to
demonstrate the above. There should also be a test for data following
the lenient parse. The following should all succeed:




DateTimeFormatterBuilder().parseLenient().appendOffset("HH:MM").appendZoneId();
"+01:00Europe/London"
"+0100Europe/London"




DateTimeFormatterBuilder().parseLenient().appendOffset("HH:MM").appendLiteral(":").appendZoneId();
"+01:Europe/London"

Note this special case, where the colon affects the parse type, but is
not ultimately part of the offset, thus it is left to match the
appendLiteral(":")

You may want to think of some additional nasty edge cases!

Stephen

On 25 February 2016 at 15:44, nadeesh tv <[email protected]> wrote:

Hi all,
Please see the updated webrev
http://cr.openjdk.java.net/~ntv/8032051/webrev.02/

Thanks and Regards,
Nadeesh

On 2/23/2016 5:17 PM, Stephen Colebourne wrote:

Thanks for the changes.

In `DateTimeFormatter`, the code should be

.parseLenient()
.appendOffsetId()
.parseStrict()

and the same in the other case. This ensures that existing callers who
then embed the formatter in another formatter (like the
ZONED_DATE_TIME constant) are unaffected.


The logic for lenient parsing does not look right as it only handles
types 5 and 6. This table shows the mappings needed:

"+HH",  -> "+HHmmss" or "+HH:mm:ss"
"+HHmm",  -> "+HHmmss",
"+HH:mm",  -> "+HH:mm:ss",
"+HHMM",  -> "+HHmmss",
"+HH:MM",  -> "+HH:mm:ss",
"+HHMMss",  -> "+HHmmss",
"+HH:MM:ss",  -> "+HH:mm:ss",
"+HHMMSS",  -> "+HHmmss",
"+HH:MM:SS",  -> "+HH:mm:ss",
"+HHmmss",
"+HH:mm:ss",

Note that the "+HH" pattern is a special case, as we don't know
whether to use the colon or non-colon pattern. Whether to require
colon or not is based on whether the next character after the HH is a
colon or not.

Proposed appendOffsetId() Javadoc:

* Appends the zone offset, such as '+01:00', to the formatter.
* <p>
* This appends an instruction to format/parse the offset ID to the
builder.
* This is equivalent to calling {@code appendOffset("+HH:MM:ss", "Z")}.
* See {@link #appendOffset(String, String)} for details on formatting
and parsing.

Proposed appendOffset(String, String) Javadoc:

* During parsing, the offset...

changed to:

* When parsing in strict mode, the input must contain the mandatory
and optional elements are defined by the specified pattern.
* If the offset cannot be parsed then an exception is thrown unless
the section of the formatter is optional.
* <p>
* When parsing in lenient mode, only the hours are mandatory - minutes
and seconds are optional.
* The colons are required if the specified pattern contains a colon.
* If the specified pattern is "+HH", the presence of colons is
determined by whether the character after the hour digits is a colon
or not.
* If the offset cannot be parsed then an exception is thrown unless
the section of the formatter is optional.

thanks and sorry for delay
Stephen



On 11 February 2016 at 20:22, nadeesh tv <[email protected]> wrote:

Hi all,

Please review a fix for

Bug Id  https://bugs.openjdk.java.net/browse/JDK-8032051

webrev http://cr.openjdk.java.net/~ntv/8032051/webrev.01/

--
Thanks and Regards,
Nadeesh TV

--
Thanks and Regards,
Nadeesh TV


--
Thanks and Regards,
Nadeesh TV

Re: RFR:JDK-8032051:"ZonedDateTime" class "parse" method fails with short time zone offset ("+01")

Reply via email to