Paul Ganssle <p.gans...@gmail.com> added the comment:

I don't love the inconsistency, but can you elaborate on the actual *danger* 
posed by this? What security vulnerabilities involve parsing a datetime using a 
non-ascii digit?

The reason that `fromisoformat` doesn't accept non-ASCII digits is actually 
because it's the inverse of `datetime.isoformat`, which never *emits* non-ASCII 
digits. For `strptime`, we're really going more for a general specifier for 
parsing datetime strings in a given format. I'll note that we do accept any 
valid unicode character for the date/time separator.

>From my perspective, there are a few goals, some of which may be in conflict 
>with the others:

1. Mitigating security vulnerabilities, if they exist.
2. Supporting international locales if possible.
3. Improving consistency in the API.

If no one ever actually specifies datetimes in non-ascii locales (and this 
gravestone that includes the date in both Latin and Chinese/Japanese characters 
seems to suggest otherwise: 
https://jbnewall.com/wp-content/uploads/2017/02/LEE-MONUMENT1-370x270.jpg ), 
then I don't see a problem dropping our patchy support, but I think we need to 
carefully consider the backwards compatibility implications if we go through 
with that.

One bit of evidence in favor of "no one uses this anyway" is that no one has 
yet complained that apparently this doesn't work for "%d" even if it works for 
"%y", so presumably it's not heavily used. If our support for this sort of 
thing is so broken that no one could possibly be using it, I suppose we may as 
well break it all the way, but it would be nice to try and identify some 
resources that the documentation can point to for how to handle international 
date parsing.


> Note the "unique and unambiguous". By accepting non-Ascii digits, we're 
> breaking the uniqueness requirement of ISO 8601.

I think in this case "but the standard says X" is probably not a very strong 
argument. 
 Even if we were coding to the ISO 8601 standard (I don't think we claim to be, 
we're just using that convention), I don't really know how to interpret the 
"unique" portion of that claim, considering that ISO 8601 specifies dozens of 
ways to represent the same datetime. Here's an example from [my 
`dateutil.parse.isoparse` test 
suite](https://github.com/dateutil/dateutil/blob/110a09b4ad46fb87ae858a14bfb5a6b92557b01d/dateutil/test/test_isoparser.py#L150):

```
    '2014-04-11T00',
    '2014-04-10T24',
    '2014-04-11T00:00',
    '2014-04-10T24:00',
    '2014-04-11T00:00:00',
    '2014-04-10T24:00:00',
    '2014-04-11T00:00:00.000',
    '2014-04-10T24:00:00.000',
    '2014-04-11T00:00:00.000000',
    '2014-04-10T24:00:00.000000'
```

All of these represent the exact same moment in time, and this doesn't even get 
into using the week-number/day-number configurations or anything with time 
zones. They also allow for the use of `,` as the subsecond-component separator 
(so add 4 more variants for that) and they allow you to leave out the dashes 
between the date components and the colons between time components, so you can 
multiply the possible variants by 4.

Just a random aside - I think there may be strong arguments for doing this even 
if we don't care about coding to a specific standard.

----------
nosy: +p-ganssle

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue39280>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to