Hi Jan,

Thanks for the discussion.

> On Mar 31, 2020, at 16:02, Jan Christoph Uhde <[email protected]> wrote:
> 
> The tests in the tao-file are taken from ISO8601 (2004).
I'm reading it and I do not see any prohibition on using the basic format time 
shift (UTC/GMT offset) with the extended format date and time of day.

Perhaps I am not finding it successfully.
I am reading ISO 8601 part 1 and part 2 2019 version (I don't have an older 
version)
It does not provide a mixed extended and basic example but it does not prohibit 
it
I am also looking at https://en.wikipedia.org/wiki/ISO_8601
Same.

> `((\+|\-)\d\d)((:)?\d\d)?$` is not correct because the -cake- `:?` is a lie.
I don't promise it to be exhaustive, I have not run it on enough tests. 

> It has either to be used everywhere or not at all.
I'm not sure that's accurate in reality.
At least in practice, it's rather common to see : separator for hours minutes 
and seconds but nothing in the GMT offset.
An example online
https://www.myintervals.com/blog/2009/05/20/iso-8601-date-validation-that-doesnt-suck/

I see it in sections 4.3.13, 5.3.4.1
It only describes basic and extended time shift formats for the GMT offset.
There are no rules described that I can find that are saying it must be : or 
not. Only the extended format includes it.


>From ISO 8601-1
A time shift, often used in the representation of local standard time against 
UTC, is represented as follows:

4.3.13 Time shift

a) Basic, hours and minutes: [±][hour][min] or [“Z”] EXAMPLE 1 ‘+0500’ or ‘Z’

b) Basic, hours only: [±][hour] or [“Z”] EXAMPLE 2 ‘+05’ or ‘Z’

c) Extended, hours and minutes: [±][hour][“:”][min] or [“Z”] EXAMPLE 3 ‘+05:00’ 
or ‘Z’

The UTC designator ["Z"] indicates that there is no time shift from UTC of day 
and is functionally equivalent to the expressions ‘+0000’ and ‘+00:00’. The 
time shift shall be expressed as positive (i.e. with the leading plus sign 
[“+”]) if it is ahead of or equal to UTC, and as negative (i.e. with the 
leading minus sign [“-”]) if it is behind UTC. 


> https://github.com/ObiWahn/PEGTL/commit/95a825326d734ff035c0979b2019695befda20b0#diff-aca8bef551eaee8610fd39305cc2dcaaR60-R61
>  
> <https://github.com/ObiWahn/PEGTL/commit/95a825326d734ff035c0979b2019695befda20b0#diff-aca8bef551eaee8610fd39305cc2dcaaR60-R61>
> 
> Those lines are as they must be! Only the extended form is allowed to have 
> the colon. This is the whole reason for having 2 different rules. Otherwise a 
> single rule with the RE you provided would suffice.
> 
> It can be done with REs - no question - but it is a lot more to write than 
> when using a something like a CFG.
> 
> If ICU would offer the functionality we would be happy to use it, but you 
> have to provide all possible format strings which is exactly what my PEG 
> grammar describes. And as I said trying out a list of format string is just 
> much slower than using some sort of grammar or regular expression. I could 
> try to parse "xyz" and would not get more than a parse error. Which means 
> that I still have to go through the list of all format strings. With the 
> grammar on the other hand there is nothing to get away from the start symbol 
> and you are immediately done. ICU is a nice and widely used lib, but it is 
> about unicode encodings and not ISO8601 parsing. It has some date support but 
> not what we need. The code does not even mention the ISO once.
> 
> On Tuesday, March 31, 2020 at 7:50:09 AM UTC+2, じょいすじょん wrote:
> Hi Jan,
> 
> If I understood the taocpp PEGTL correctly, it might be something like this:
> 
> struct extended_offset : sor< Z, seq< sign, zhh, opt< opt<colon>, zmm > > > 
> {};
> struct basic_offset : sor< Z, seq< sign, zhh, opt< opt<colon>, zmm > > > {};
> 
> With some additional tests to include something like this:
> "2020-03-24T04:59:41+0000"
> "2020-03-24T04:59:41+00:00"
> "2020-03-24T04:59:41+00"
> "2020-03-24T04:59:41"
> "2020-03-24T04:59:41Z+0000"
> "2020-03-24T04:59:41Z+00:00"
> "2020-03-24T04:59:41Z+00"
> "2020-03-24T04:59:41Z"
> "2020-03-24T04:59:41-0000"
> "2020-03-24T04:59:41-00:00"
> "2020-03-24T04:59:41-00"
> "2020-03-24T04:59:41"
> "2020-03-24T04:59:41Z-0000"
> "2020-03-24T04:59:41Z-00:00"
> "2020-03-24T04:59:41Z-00"
> "2020-03-24T04:59:41Z"
> 
> Sorry it's my first time looking at taocpp PEGTL, but if it helps, I could 
> express it pretty concisely in a Regular Expression.
> https://rubular.com/r/15kzhvvls0lD3E <https://rubular.com/r/15kzhvvls0lD3E>
> (only for the offset portion, and it may still yet be incomplete)
> 
> I'd recommend looking at ICU
> https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classicu_1_1DateFormat.html
>  
> <https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classicu_1_1DateFormat.html>
> https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classicu_1_1DateFormat.html#details
>  
> <https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classicu_1_1DateFormat.html#details>
> https://github.com/unicode-org/icu/blob/d3315d98ef82b09aabef12e3f7fb46d171d8bc32/icu4c/source/i18n/datefmt.cpp
>  
> <https://github.com/unicode-org/icu/blob/d3315d98ef82b09aabef12e3f7fb46d171d8bc32/icu4c/source/i18n/datefmt.cpp>
> icu/icu4c/source/i18n/datefmt.cpp
> 
> Theirs is the canonical and reference implementation of everything date time 
> and unicode.
> It's probably a parser only expecting a short string for a date, so you would 
> either want to call their function if you can determine the terminals for a 
> possible date string in your parsing, or just try to examine their logic and 
> their tests. Their tests should be informative.
> 
> Unfortunately, it's an absolutely massive code base that takes time to dig 
> through, but Github's search helps…
> 
> 
>> On Mar 31, 2020, at 13:18, dangerwillro...@ <>gmail.com <http://gmail.com/> 
>> wrote:
>> 
>> Hi Jan
>> 
>> Thanks for confirming and thanks for the code reference. Very helpful to 
>> know. 
>> Might be good to note in the docs How conformant it is or not. 
>> If I have time I will peek at the code and see if I can contribute. 
>> 
>> I wonder if some cases might be well covered already by ICU4C ?
>> 
>>> 
>>> On Mar 31, 2020, at 13:07, Jan Uhde <jan...@ <>arangodb.com 
>>> <http://arangodb.com/>> wrote:
>>> 
>>> While the ArangoDB code is not 100% ISO conform it should be correct 
>>> there. I think the ISO states that you should b use the extended format or 
>>> not. Therefore is should not be possible set the colon at dinner points but 
>>> not at others. Like it is done in your example.
>>> 
>>> To get an idea please look at this unfinished code:
>>> 
>>> https://github.com/ObiWahn/PEGTL/commit/95a825326d734ff035c0979b2019695befda20b0#diff-88660bc76de9dabacba18430927dfcd1
>>>  
>>> <https://github.com/ObiWahn/PEGTL/commit/95a825326d734ff035c0979b2019695befda20b0#diff-88660bc76de9dabacba18430927dfcd1>
>>> 
>>> The relevant ArangoDB code can be found here:
>>> https://github.com/arangodb/arangodb/blob/devel/lib/Basics/datetime.cpp 
>>> <https://github.com/arangodb/arangodb/blob/devel/lib/Basics/datetime.cpp>
>>> 
>>> Reading the code is your best shot to understand what will work and what 
>>> does not. Because this does not follow closely any standard.
>>> 
>>> At some point I wanted to make it ISO conform but it was decided that it is 
>>> not worth the trouble and we do not want to change what is currently 
>>> supported.
>>> 
>>> If you are interested you can help in my freetime effort to finish the PEG 
>>> grammar in above repository. It might be added to ArangoDB because we use 
>>> PEGTL in other places by now. This was not the case when I was looking at 
>>> the problem last time. Therefore an additional library needed to be added 
>>> in the past resulting in more resistance.
>>> 
>>> 
>>> -- 
>>> You received this message because you are subscribed to a topic in the 
>>> Google Groups "ArangoDB" group.
>>> To unsubscribe from this topic, visit 
>>> https://groups.google.com/d/topic/arangodb/TWrmSbJq0vY/unsubscribe 
>>> <https://groups.google.com/d/topic/arangodb/TWrmSbJq0vY/unsubscribe>.
>>> To unsubscribe from this group and all its topics, send an email to 
>>> aran...@ <>googlegroups.com <http://googlegroups.com/>.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/arangodb/8e857aa3-fd7a-4a30-a4ff-b1ef208015f0%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/arangodb/8e857aa3-fd7a-4a30-a4ff-b1ef208015f0%40googlegroups.com>.
> 
> 
> -- 
> You received this message because you are subscribed to a topic in the Google 
> Groups "ArangoDB" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/arangodb/TWrmSbJq0vY/unsubscribe 
> <https://groups.google.com/d/topic/arangodb/TWrmSbJq0vY/unsubscribe>.
> To unsubscribe from this group and all its topics, send an email to 
> [email protected] 
> <mailto:[email protected]>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/arangodb/f43df4a6-6412-4dbc-bee4-b32ed25611bd%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/arangodb/f43df4a6-6412-4dbc-bee4-b32ed25611bd%40googlegroups.com?utm_medium=email&utm_source=footer>.

-- 
You received this message because you are subscribed to the Google Groups 
"ArangoDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/arangodb/B2B59333-127E-4AB2-9411-C8B57FF2E08B%40gmail.com.

Reply via email to