Paul Ganssle <p.gans...@gmail.com> added the comment:

> That it allows creating the datetime instance looks like a bug to me, i.e. a 
> time before 0001-01-01 00:00 UTC is invalid. What am I misunderstanding?

`datetime.datetime(1, 1, 1, tzinfo=timezone(timedelta(hours=1)))` is a valid 
datetime, it's just that it cannot be converted to all other timestamps, 
because in some time zones, the same absolute time is out of datetime's range.

`datetime.datetime` is a representation of an abstract datetime, and it can 
also be annotated with a time zone to basically tag the civil time with a 
function for converting it into other representations of the same *absolute* 
time. The range of valid `datetime.datetime` objects is based entirely on the 
naïve portion of the datetime, and has nothing to do with the absolute time. So 
this is indeed a natural consequence of the chosen design.

If we wanted to change things, it would cause a number of problems, and the 
cure would be much worse than the "disease". For one thing, accessing UTC 
offsets is done lazily, so `.utcoffset()` is not currently called during 
`datetime` creation. The datetime documentation lays out that this is 
consistent with the raison d'être of `datetime`: "While date and time 
arithmetic is supported, the focus of the implementation is on efficient 
attribute extraction for output formatting and manipulation." In order to 
determine whether a given `datetime` can always be converted to an equivalent 
datetime in any time zone, we'd need to actively determine its UTC offset, 
which would be a major performance regression in creating aware datetimes. We 
could avoid this performance regression by only doing the `.utcoffset()` check 
when the datetime is within 2 days of `MINYEAR` or `MAXYEAR`, but while this 
would be a more minor performance regression, it would also add new edge cases 
where `.utcoffset()` is sometimes called during the constructor and sometimes 
not, which is not ideal. Not to mention if we were to ever open up the allowed 
return values for `.utcoffset()` the logic might get hairier (depending on the 
nature of the allowed values).

Another issue with "fixing" this is that it would take currently-valid 
datetimes and turn them into invalid datetimes, which violates backwards 
compatibility. I imagine in most cases this is only done as part of test 
suites, since TZ-aware datetimes near 0 and 10,000 CE are anachronistic and not 
likely to be of much instrumental value, but the same can be said of these 
potentially "invalid" dates in the first place.

Additionally, even worse is that even naïve datetimes can be converted to UTC 
or other time zones, and if we want to add a new constraint that 
`some_datetime.astimezone(some_timezone)` must always work, then you wouldn't 
even be able to *construct* `datetime.MINYEAR` or `datetime.MAXYEAR`, since 
`datetime.MINYEAR.astimezone(timezone(timedelta(hours=-24)))` would fail 
everywhere, and worse, the minimum datetime value you could construct would 
depend on your system locale! Again, the alternative would be to make an 
exception for naïve datetimes, but given that this change is of dubious value 
to start with, I don't think it is worth it.

> So I'm pretty sure this is "not a bug" but it's a bit of a problem and I have 
> a user suggesting the "security vulnerability" bell on this one, and to be 
> honest I don't even know what any library would do to "prevent" this.

I don't really know why it would be a "security vulnerability", but presumably 
a library could either convert their datetimes to UTC as soon as they get them 
from the user if they want to use them as UTC in the future, or they could 
simply refuse to accept any datetimes outside the range 
`datetime.datetime.MINYEAR + timedelta(hours=48) < dt.replace(tzinfo=None) < 
datetime.datetime.MAXYEAR - timedelta(hours=48)`, or if the concern is only 
about UTC, then refuse datetimes outside the range 
`datetime.MINYEAR.replace(tzinfo=timezone.utc) < dt < 
datetime.MAXYEAR.replace(tzinfo=timezone.utc)`.

> Why's this a security problem?   ish?    because PostgreSQL has a data type 
> "TIMESTAMP WITH TIMEZONE" and if you take said date and INSERT it into your 
> database, then SELECT it back using any Python DBAPI that returns datetime() 
> objects like psycopg2, if your server is in a timezone with zero or negative 
> offset compared to the given date, you get an error.  So the mischievous user 
> can create that datetime for some reason and now they've broken your website 
> which can't SELECT that table anymore without crashing.

Can you clarify why this crashes? Is it because it always returns the datetime 
value in UTC?

> So, suppose you maintain the database library that helps people send data in 
> and out of psycopg2.    We have, the end user's application, we have the 
> database abstraction library, we have the psycopg2 driver, we have Python's 
> datetime() object with MIN_YEAR, and finally we have PostgreSQL with the 
> TIMEZONE WITH TIMESTAMP datatype that I've never liked.
>
>Of those five roles, whose bug is this?    I'd like to say it's the end user 
>for letting untrusted input that can put unusual timezones and timestamps in 
>their system.   But at the same time it's a little weird Python is letting me 
>create this date that lacks the ability to convert into UTC.     thanks for 
>reading!

It sounds like the fundamental issue here is that PostgreSQL supports a 
different range of datetimes than Python does, regardless of the question of 
whether any Python datetime can be converted into another timezone. It sounds 
to me like that's a mismatch between PostgreSQL's data type and Python's data 
type, and I'm not sure how they are squared up now. If the database abstraction 
layer is making the guarantee that anything you can INSERT into the database 
can be SELECTed back out (meaning that it will artificially restrict the range 
of valid values to the union of those supported by Postgres and Python), then 
if the abstraction layer allows you to choose what time zone applies when you 
read the values out, it should probably make sure that it doesn't write 
something in that can't be represented in another time zone.

If the abstraction layer's guarantee is that it can read out anything that can 
be stored in a postgres database, then I still think it might be the 
abstraction layer's bug, since apparently Postgres lets you store values that 
cannot be represented in all the supported time zones that you could read them 
out as — the abstraction layer may need to use a different type (possibly a 
subclass) for "out-of-range" values.

I'll note that I don't actually understand the difference between "abstraction 
layer" and "psycopg2 driver", so I may be conflating those two, but at the end 
of the day I think one of those two is the right place to fix this. If there 
are good reasons for not fixing it in either of those two layers, then 
obviously the buck stops in the user's application — though obviously it would 
be better to fix it lower in the stack, since it's difficult for users to keep 
track of minutia like this. I think it likely that neither CPython nor Postgres 
are the right levels of abstraction to fix it as, since neither one is in the 
business of making their datetime types compatible with one another (and each 
presumably has a thousand other integrations to also worry about).

Hopefully that's helpful information! Understandable if the answer is somewhat 
disappointing.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue43484>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to