Well, I was going to stay silent, but math is something I can do without
wasting anyone's time or embarrassing myself. I don't think this mail answers
Lennart's concerns, but I do want to get it out there to compete with the
comment in `datetime.py`. I apologize if the LaTeX density is too high; I don't
trust that my e-mail client would transmit the message faithfully were I to
render it myself.
I disagree with the view Tim had of time zones when he wrote that comment (and
that code). It sounds like he views US/Eastern and US/Central as time zones
(which they are), but thinks of the various America/Indiana zones as switching
back and forth between them, rather than being time zones in their own right. I
think the right perspective is that a time zone *is* the function that its
`fromutc()` method implements, although of course we need additional
information in order to actually compute (rather than merely mathematically
define) its inverse. Daylight Saving Time is a red herring, and assumptions 2
and 4 in that exposition are just wrong from this point of view. In the worst
case, Asia/Riyadh's two years of solar time completely shatter these
assumptions.
I'm convinced that the right viewpoint on this is to view local time and UTC
time each as isomorphic to $\RR$ (i.e., effectively as UNIX timestamps, minus
the oft-violated guarantee that timestamps are in UTC), and to consider the
time zone as
\[ fromutc : \RR \to \RR. \]
(Leap seconds are a headache for this perspective, but it can still support
them with well-placed epicycles.) Then our assumptions (inspired by zoneinfo)
about the nature of this map are as follows:
* $fromutc$ is piecewise defined, with each piece being continuous and strictly
monotonic increasing. Let us call the set of discontinuities $\{ utc_i \in \RR
| i \in \ZZ \}$, where the labels are in increasing order, and define
$fromutc_i$ to be the $i$-th piece. (The theoretical treatment doesn't suffer
if there are only finitely many discontinuities, since we can place additional
piece boundaries at will where no discontinuities exist; obviously, an
implementation would not take this view.)
* The piece $fromutc_i : [utc_i, utc_{i+1}) \to [local_{start, i}, local_{end,
i})$ and its inverse, which we will call $fromlocal_i$, are both readily
computable. In particular, this means that $local_{start, i} = fromutc(utc_i)$
and $local_{end, i}$ is the limit of $fromutc(t)$ as $t$ approaches $utc_{i+1}$
from the left, and that these values are known. Note that the (tzfile(5))[1]
format and (zic(8)[2]) both assume that $fromutc_i$ is of the form $t \mapsto t
+ off_i$, where $off_i$ is a constant. This assumption is true in practice,
but is stronger than we actually need.
* The sequences $\{ local_{start, i} | i \in \ZZ \}$ and $\{ local_{end, i} | i
\in \ZZ \}$ are strictly increasing, and $local_{end, i-1} < local_{start,
i+1}$ for all $i \in \ZZ$. This final condition is enough to guarantee that
the preimage of any local time under $fromutc$ contains at most two UTC times.
This assumption would be violated if, for example, some jurisdiction decided to
fall back two hours by falling back one hour and then immediately falling back
a second hour. I recommend the overthrow of any such jurisdiction and its
(annexation by the Netherlands)[3].
Without the third assumption, it's impossible to specify a UTC time by a (local
time, time zone, DST flag) triple since there may be more than two UTC times
corresponding to the same local time, and computing $fromlocal$ becomes more
complicated, but the problem can still be solved by replacing the DST flag by
an index into the preimage. (Lennart, I think this third assumption is the
important part of your "no changes within 48 hours of each other" assumption,
which is violated by Asia/Riyadh. Is it enough?)
Once we take this view, computing $fromutc(t)$ is trivial: find $i$ with $utc_i
\le t < utc_{i+1}$ by binary search (presumably optimized to an $O(1)$ average
case by using a good initial guess), and compute $fromutc_i(t)$.
Computing $fromlocal(t)$ is somewhat more difficult. The first thing to
address is that, as written, $fromlocal$ is not a function; in order to make it
one, we need to pass it more information. We could define $fromlocal(t, i) =
fromlocal_i(t)$, but that's too circular to be useful. Likewise with my
(silly) earlier proposal to store $(local, offset)$ pairs-- then $fromlocal(t,
off) = t - off$. What we really need is a (partial, which is better than
multi-valued!) function $fromlocal : \RR \times \{True, False\} \to \RR$ that
takes a local time and a DST flag and returns a UTC time. We define
$fromlocal(local, flag)$ to be the first $utc \in \RR$ such that $fromutc(utc)
= local$ when $flag$ is $True$ and the last such $utc$ when $flag$ is $False$.
(Our implementation will presumably also allow $flag$ to be $None$, in which
case we require $utc$ to be unique.)
To compute $fromlocal$, we'll begin with a lemma:
If $t < utc_i$, then $fromutc(t) < local_{end, i-1}$; and likewise, if
$utc_i \le t$, then $local_{start, i} \le fromutc(t)$.
If $t < utc_i$, then we must have $utc_j \le t < utc_{j+1}$ for some $j \le
i-1$, so $local_{start, j} \le fromutc(t) < local_{end, j} \le local_{end,
i-1}$ because $local_{end, i}$ is increasing in $i$. The proof of the second
part is similar, using the fact that $local_{start, i}$ is increasing.
Now we compute $fromlocal(t, True)$ by finding the minimal $i$ such that $t <
local_{end, i}$. Then for $s < utc_i$ we must have by the lemma that
$fromutc(s) < local_{end, i-1} < t$, so $s < fromlocal(t, True)$ if the latter
exists. For $s \ge utc_i$ the lemma gives us $fromutc(s) \ge local_{start, i}$,
so if $t < local_{start, i}$ we see that there is no $s$ with $fromutc(s) = t$,
so $fromlocal(t, True)$ is undefined (which we will implement as a
NonexistentTimeError). If on the other hand $t \ge local_{start, i}$, we have
$t \in [local_{start, i}, local_{end, i})$, so that $fromutc(fromlocal_i(t)) =
t$. Because $fromlocal_i$ is monotonic on this interval, $fromlocal(t, True) =
fromlocal_i(t)$ is minimal. An analogous argument establishes that if $i$ is
maximal with $local_{start, i} \le t$, then $fromlocal(t, False) =
fromlocal_i(t)$ if $t < local_{end, i}$ and $fromlocal(t, False)$ is undefined
otherwise. All of these computations can be accomplished by searches of
ordered lists and applications of $fromlocal_i$.
We can also include the option for $fromlocal(t, None)$ to require uniqueness
by computing $fromlocal(t, True)$ and verifying that $t < local_{start, i+1}$
where $i$ is as in the computation of $fromlocal(t, True)$, raising an
AmbiguousTimeError if this condition is not met.
Notice that the definition of time zone that I've given here does not mention
Daylight Saving Time, but this isn't a problem in most cases because ambiguity
happens almost exclusively at "fall back" transitions, in which the first
period is DST and the second period is STD. I argue that the rare ambiguities
for which this does not hold are best resolved by divorcing our DST flag from
Daylight Saving Time; that is, by defining the flag to mean "choose the first
ambiguous time"; otherwise, we fail to handle jurisdictional transitions not
involving DST.
With this perspective, arithmetic becomes "translate to UTC, operate, translate
back", which is as it should be. The only arithmetic that is performed outside
of this framework is in the implementation of $fromutc_i$ and $fromlocal_i$,
which operate on naive timestamps. But IIUC what Lennart is complaining about
is the fact that the DST flag isn't part of and can't be embedded into a local
time, so it's impossible to fold the second parameter to $fromlocal$ into $t$.
Without that, a local time isn't rich enough to designate a single point in
time and the whole edifice breaks.
ijs
[1]: http://linux.die.net/man/5/tzfile
[2]: http://linux.die.net/man/8/zic
[3]: https://what-if.xkcd.com/53/
Top-posted from Microsoft Outlook Web App; may its designers be consigned for
eternity to that circle of hell in which their dog food is consumed.
From: Python-Dev <[email protected]> on
behalf of Alexander Belopolsky <[email protected]>
Sent: Thursday, July 23, 2015 20:28
To: Lennart Regebro
Cc: Python-Dev
Subject: Re: [Python-Dev] Status on PEP-431 Timezones
On Thu, Jul 23, 2015 at 12:22 PM, Lennart Regebro <[email protected]> wrote:
It turns out it's very complex to solve this when internally storing
the time as the local time. Basically you have to normalize the time
(ie check if daylight savings have changed) when doing arithmetic, but
normalize is doing arithmetic, and you get infinite recursion.
This is not true. Tim's analysis immortalized [1] at the end of the
datetime.py file,
shows that UTC to local mapping can be unambiguously recovered from the
local to UTC rules using a simple finite algorithm. Tim assumes [2] that
standard (non-DST)
time offset is constant throughout the history, but this requirement can be
relaxed to offset
changing no more than once in any 48 hour period (if you generously allow
timezones
from -24 to 24 hours).
Actually, it looks like I am repeating what I wrote back in April, so I'll stop
with a
reference [3] to that post.
[1]: https://hg.python.org/cpython/file/v3.5.0b1/Lib/datetime.py#l1935
[2]: https://hg.python.org/cpython/file/v3.5.0b1/Lib/datetime.py#l1948
[3]: https://mail.python.org/pipermail/python-dev/2015-April/139171.html
_______________________________________________
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com