[issue39280] Don't allow datetime parsing to accept non-Ascii digits

Paul Ganssle Thu, 09 Jan 2020 14:37:17 -0800


Paul Ganssle <[email protected]> added the comment:

I don't love the inconsistency, but can you elaborate on the actual *danger*
posed by this? What security vulnerabilities involve parsing a datetime using a
non-ascii digit?

The reason that `fromisoformat` doesn't accept non-ASCII digits is actually
because it's the inverse of `datetime.isoformat`, which never *emits* non-ASCII
digits. For `strptime`, we're really going more for a general specifier for
parsing datetime strings in a given format. I'll note that we do accept any
valid unicode character for the date/time separator.

>From my perspective, there are a few goals, some of which may be in conflict
>with the others:

1. Mitigating security vulnerabilities, if they exist.
2. Supporting international locales if possible.
3. Improving consistency in the API.

If no one ever actually specifies datetimes in non-ascii locales (and this
gravestone that includes the date in both Latin and Chinese/Japanese characters
seems to suggest otherwise:
https://jbnewall.com/wp-content/uploads/2017/02/LEE-MONUMENT1-370x270.jpg ),
then I don't see a problem dropping our patchy support, but I think we need to
carefully consider the backwards compatibility implications if we go through
with that.

One bit of evidence in favor of "no one uses this anyway" is that no one has
yet complained that apparently this doesn't work for "%d" even if it works for
"%y", so presumably it's not heavily used. If our support for this sort of
thing is so broken that no one could possibly be using it, I suppose we may as
well break it all the way, but it would be nice to try and identify some
resources that the documentation can point to for how to handle international
date parsing.

> Note the "unique and unambiguous". By accepting non-Ascii digits, we're
> breaking the uniqueness requirement of ISO 8601.

I think in this case "but the standard says X" is probably not a very strong
argument.
Even if we were coding to the ISO 8601 standard (I don't think we claim to be,
we're just using that convention), I don't really know how to interpret the
"unique" portion of that claim, considering that ISO 8601 specifies dozens of
ways to represent the same datetime. Here's an example from [my
`dateutil.parse.isoparse` test
suite](https://github.com/dateutil/dateutil/blob/110a09b4ad46fb87ae858a14bfb5a6b92557b01d/dateutil/test/test_isoparser.py#L150):

```
'2014-04-11T00',
'2014-04-10T24',
'2014-04-11T00:00',
'2014-04-10T24:00',
'2014-04-11T00:00:00',
'2014-04-10T24:00:00',
'2014-04-11T00:00:00.000',
'2014-04-10T24:00:00.000',
'2014-04-11T00:00:00.000000',
'2014-04-10T24:00:00.000000'
```

All of these represent the exact same moment in time, and this doesn't even get
into using the week-number/day-number configurations or anything with time
zones. They also allow for the use of `,` as the subsecond-component separator
(so add 4 more variants for that) and they allow you to leave out the dashes
between the date components and the colons between time components, so you can
multiply the possible variants by 4.

Just a random aside - I think there may be strong arguments for doing this even
if we don't care about coding to a specific standard.

----------
nosy: +p-ganssle

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue39280>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue39280] Don't allow datetime parsing to accept non-Ascii digits

Reply via email to