The tests in the tao-file are taken from ISO8601 (2004). 
`((\+|\-)\d\d)((:)?\d\d)?$` is not correct because the -cake- `:?` is a 
lie. It has either to be used everywhere or not at all.

https://github.com/ObiWahn/PEGTL/commit/95a825326d734ff035c0979b2019695befda20b0#diff-aca8bef551eaee8610fd39305cc2dcaaR60-R61

Those lines are as they must be! Only the extended form is allowed to have 
the colon. This is the whole reason for having 2 different rules. Otherwise 
a single rule with the RE you provided would suffice.

It can be done with REs - no question - but it is a lot more to write than 
when using a something like a CFG.

If ICU would offer the functionality we would be happy to use it, but you 
have to provide all possible format strings which is exactly what my PEG 
grammar describes. And as I said trying out a list of format string is just 
much slower than using some sort of grammar or regular expression. I could 
try to parse "xyz" and would not get more than a parse error. Which means 
that I still have to go through the list of all format strings. With the 
grammar on the other hand there is nothing to get away from the start 
symbol and you are immediately done. ICU is a nice and widely used lib, but 
it is about unicode encodings and not ISO8601 parsing. It has some date 
support but not what we need. The code does not even mention the ISO once.

On Tuesday, March 31, 2020 at 7:50:09 AM UTC+2, じょいすじょん wrote:
>
> Hi Jan,
>
> If I understood the taocpp PEGTL correctly, it might be something like 
> this:
>
> struct extended_offset : sor< Z, seq< sign, zhh, opt< opt<colon>, zmm > > 
> > {};
> struct basic_offset : sor< Z, seq< sign, zhh, opt< opt<colon>, zmm > > > 
> {};
>
> With some additional tests to include something like this:
> "2020-03-24T04:59:41+0000"
> "2020-03-24T04:59:41+00:00"
> "2020-03-24T04:59:41+00"
> "2020-03-24T04:59:41"
> "2020-03-24T04:59:41Z+0000"
> "2020-03-24T04:59:41Z+00:00"
> "2020-03-24T04:59:41Z+00"
> "2020-03-24T04:59:41Z"
> "2020-03-24T04:59:41-0000"
> "2020-03-24T04:59:41-00:00"
> "2020-03-24T04:59:41-00"
> "2020-03-24T04:59:41"
> "2020-03-24T04:59:41Z-0000"
> "2020-03-24T04:59:41Z-00:00"
> "2020-03-24T04:59:41Z-00"
> "2020-03-24T04:59:41Z"
>
> Sorry it's my first time looking at taocpp PEGTL, but if it helps, I could 
> express it pretty concisely in a Regular Expression.
> https://rubular.com/r/15kzhvvls0lD3E
> (only for the offset portion, and it may still yet be incomplete)
>
> I'd recommend looking at ICU
>
> https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classicu_1_1DateFormat.html
>
> https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classicu_1_1DateFormat.html#details
>
> https://github.com/unicode-org/icu/blob/d3315d98ef82b09aabef12e3f7fb46d171d8bc32/icu4c/source/i18n/datefmt.cpp
> icu/icu4c/source/i18n/datefmt.cpp
>
> Theirs is the canonical and reference implementation of everything date 
> time and unicode.
> It's probably a parser only expecting a short string for a date, so you 
> would either want to call their function if you can determine the terminals 
> for a possible date string in your parsing, or just try to examine their 
> logic and their tests. Their tests should be informative.
>
> Unfortunately, it's an absolutely massive code base that takes time to dig 
> through, but Github's search helps…
>
>
> On Mar 31, 2020, at 13:18, [email protected] <javascript:> wrote:
>
> Hi Jan
>
> Thanks for confirming and thanks for the code reference. Very helpful to 
> know. 
> Might be good to note in the docs How conformant it is or not. 
> If I have time I will peek at the code and see if I can contribute. 
>
> I wonder if some cases might be well covered already by ICU4C ?
>
>
> On Mar 31, 2020, at 13:07, Jan Uhde <[email protected] <javascript:>> 
> wrote:
>
> While the ArangoDB code is not 100% ISO conform it should be correct 
> there. I think the ISO states that you should b use the extended format or 
> not. Therefore is should not be possible set the colon at dinner points but 
> not at others. Like it is done in your example.
>
> To get an idea please look at this unfinished code:
>
>
> https://github.com/ObiWahn/PEGTL/commit/95a825326d734ff035c0979b2019695befda20b0#diff-88660bc76de9dabacba18430927dfcd1
>
> The relevant ArangoDB code can be found here:
> https://github.com/arangodb/arangodb/blob/devel/lib/Basics/datetime.cpp
>
> Reading the code is your best shot to understand what will work and what 
> does not. Because this does not follow closely any standard.
>
> At some point I wanted to make it ISO conform but it was decided that it 
> is not worth the trouble and we do not want to change what is currently 
> supported.
>
> If you are interested you can help in my freetime effort to finish the PEG 
> grammar in above repository. It might be added to ArangoDB because we use 
> PEGTL in other places by now. This was not the case when I was looking at 
> the problem last time. Therefore an additional library needed to be added 
> in the past resulting in more resistance.
>
>
> -- 
> You received this message because you are subscribed to a topic in the 
> Google Groups "ArangoDB" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/arangodb/TWrmSbJq0vY/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> [email protected] <javascript:>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/arangodb/8e857aa3-fd7a-4a30-a4ff-b1ef208015f0%40googlegroups.com
> .
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"ArangoDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/arangodb/f43df4a6-6412-4dbc-bee4-b32ed25611bd%40googlegroups.com.

Reply via email to