Re: [Python-Dev] Status on PEP-431 Timezones

ISAAC J SCHWABACHER Fri, 24 Jul 2015 17:19:14 -0700

Well, I was going to stay silent, but math is something I can do without 
wasting anyone's time or embarrassing myself. I don't think this mail answers 
Lennart's concerns, but I do want to get it out there to compete with the 
comment in `datetime.py`. I apologize if the LaTeX density is too high; I don't 
trust that my e-mail client would transmit the message faithfully were I to 
render it myself.


I disagree with the view Tim had of time zones when he wrote that comment (and 
that code). It sounds like he views US/Eastern and US/Central as time zones 
(which they are), but thinks of the various America/Indiana zones as switching 
back and forth between them, rather than being time zones in their own right. I 
think the right perspective is that a time zone *is* the function that its 
`fromutc()` method implements, although of course we need additional 
information in order to actually compute (rather than merely mathematically  
define) its inverse. Daylight Saving Time is a red herring, and assumptions 2 
and 4 in that exposition are just wrong from this point of view. In the worst 
case, Asia/Riyadh's two years of solar time completely shatter these 
assumptions.

I'm convinced that the right viewpoint on this is to view local time and UTC 
time each as isomorphic to $\RR$ (i.e., effectively as UNIX timestamps, minus 
the oft-violated guarantee that timestamps are in UTC), and to consider the 
time zone as
    \[ fromutc : \RR \to \RR. \]
(Leap seconds are a headache for this perspective, but it can still support 
them with well-placed epicycles.) Then our assumptions (inspired by zoneinfo) 
about the nature of this map are as follows:

* $fromutc$ is piecewise defined, with each piece being continuous and strictly 
monotonic increasing.  Let us call the set of discontinuities $\{ utc_i \in \RR 
| i \in \ZZ \}$, where the labels are in increasing order, and define 
$fromutc_i$ to be the $i$-th  piece. (The theoretical treatment doesn't suffer 
if there are only finitely many discontinuities, since we can place additional 
piece boundaries at will where no discontinuities exist; obviously, an 
implementation would not take this view.) 
* The piece $fromutc_i : [utc_i, utc_{i+1}) \to [local_{start, i}, local_{end, 
i})$ and its inverse, which we will call $fromlocal_i$, are both readily 
computable.  In particular, this means that $local_{start, i} = fromutc(utc_i)$ 
and $local_{end, i}$ is the limit of $fromutc(t)$ as $t$ approaches $utc_{i+1}$ 
from the left, and that these values are known.  Note that the (tzfile(5))[1] 
format and (zic(8)[2]) both assume that $fromutc_i$ is of the form $t \mapsto t 
+ off_i$, where $off_i$ is a constant.  This assumption is true in practice, 
but is stronger than we actually need.
* The sequences $\{ local_{start, i} | i \in \ZZ \}$ and $\{ local_{end, i} | i 
\in \ZZ \}$ are strictly increasing, and $local_{end, i-1} < local_{start, 
i+1}$ for all $i \in \ZZ$.  This final condition is enough to guarantee that 
the preimage of any local time under $fromutc$ contains at most two UTC times.  
This assumption would be violated if, for example, some jurisdiction decided to 
fall back two hours by falling back one hour and then immediately falling back 
a second hour.  I recommend the overthrow of any such jurisdiction and its 
(annexation by the Netherlands)[3].

Without the third assumption, it's impossible to specify a UTC time by a (local 
time, time zone, DST flag) triple since there may be more than two UTC times 
corresponding to the same local time, and computing $fromlocal$ becomes more 
complicated, but the problem can still be solved by replacing the DST flag by 
an index into the preimage.  (Lennart, I think this third assumption is the 
important part of your "no changes within 48 hours of each other" assumption, 
which is violated by Asia/Riyadh. Is it enough?)

Once we take this view, computing $fromutc(t)$ is trivial: find $i$ with $utc_i 
\le t < utc_{i+1}$ by binary search (presumably optimized to an $O(1)$ average 
case by using a good initial guess), and compute $fromutc_i(t)$.

Computing $fromlocal(t)$ is somewhat more difficult.  The first thing to 
address is that, as written, $fromlocal$ is not a function; in order to make it 
one, we need to pass it more information.  We could define $fromlocal(t, i) = 
fromlocal_i(t)$, but that's too circular to be useful.  Likewise with my 
(silly) earlier proposal to store $(local, offset)$ pairs-- then $fromlocal(t, 
off) = t - off$.  What we really need is a (partial, which is better than 
multi-valued!) function $fromlocal : \RR \times \{True, False\} \to \RR$ that 
takes a local time and a DST flag and returns a UTC time.  We define 
$fromlocal(local, flag)$ to be the first $utc \in \RR$ such that $fromutc(utc) 
= local$ when $flag$ is $True$ and the last such $utc$ when $flag$ is $False$.  
(Our implementation will presumably also allow $flag$ to be $None$, in which 
case we require $utc$ to be unique.)

To compute $fromlocal$, we'll begin with a lemma:
    If $t < utc_i$, then $fromutc(t) < local_{end, i-1}$; and likewise, if 
$utc_i \le t$, then $local_{start, i} \le fromutc(t)$.
If $t < utc_i$, then we must have $utc_j \le t < utc_{j+1}$ for some $j \le 
i-1$, so $local_{start, j} \le fromutc(t) < local_{end, j} \le local_{end, 
i-1}$ because $local_{end, i}$ is increasing in $i$.  The proof of the second 
part is similar, using the fact that $local_{start, i}$ is increasing.

Now we compute $fromlocal(t, True)$ by finding the minimal $i$ such that $t < 
local_{end, i}$. Then for $s < utc_i$ we must have by the lemma that 
$fromutc(s) < local_{end, i-1} < t$, so $s < fromlocal(t, True)$ if the latter 
exists. For $s \ge utc_i$ the lemma gives us $fromutc(s) \ge local_{start, i}$, 
so if $t < local_{start, i}$ we see that there is no $s$ with $fromutc(s) = t$, 
so $fromlocal(t, True)$ is undefined (which we will implement as a 
NonexistentTimeError). If on the other hand $t \ge local_{start, i}$, we have 
$t \in [local_{start, i}, local_{end, i})$, so that $fromutc(fromlocal_i(t)) = 
t$.  Because $fromlocal_i$ is monotonic on this interval, $fromlocal(t, True) = 
fromlocal_i(t)$ is minimal.  An analogous argument establishes that if $i$ is 
maximal with $local_{start, i} \le t$, then $fromlocal(t, False) = 
fromlocal_i(t)$ if $t < local_{end, i}$ and $fromlocal(t, False)$ is undefined 
otherwise.  All of these computations can be accomplished by searches of 
ordered lists and applications of $fromlocal_i$.

We can also include the option for $fromlocal(t, None)$ to require uniqueness 
by computing $fromlocal(t, True)$ and verifying that $t < local_{start, i+1}$ 
where $i$ is as in the computation of $fromlocal(t, True)$, raising an 
AmbiguousTimeError if this condition is not met.

Notice that the definition of time zone that I've given here does not mention 
Daylight Saving Time, but this isn't a problem in most cases because ambiguity 
happens almost exclusively at "fall back" transitions, in which the first 
period is DST and the second period is STD.  I argue that the rare ambiguities 
for which this does not hold are best resolved by divorcing our DST flag from 
Daylight Saving Time; that is, by defining the flag to mean "choose the first 
ambiguous time"; otherwise, we fail to handle jurisdictional transitions not 
involving DST.

With this perspective, arithmetic becomes "translate to UTC, operate, translate 
back", which is as it should be.  The only arithmetic that is performed outside 
of this framework is in the implementation of  $fromutc_i$ and $fromlocal_i$, 
which operate on naive timestamps. But IIUC what Lennart is complaining about 
is the fact that the DST flag isn't part of and can't be embedded into a local 
time, so it's impossible to fold the second parameter to $fromlocal$ into $t$.  
Without that, a local time isn't rich enough to designate a single point in 
time and the whole edifice breaks.

ijs

[1]:  http://linux.die.net/man/5/tzfile
[2]: http://linux.die.net/man/8/zic
[3]: https://what-if.xkcd.com/53/

Top-posted from Microsoft Outlook Web App; may its designers be consigned for 
eternity to that circle of hell in which their dog food is consumed.


From: Python-Dev <[email protected]> on 
behalf of Alexander Belopolsky <[email protected]>
Sent: Thursday, July 23, 2015 20:28
To: Lennart Regebro
Cc: Python-Dev
Subject: Re: [Python-Dev] Status on PEP-431 Timezones
  
On Thu, Jul 23, 2015 at 12:22 PM, Lennart Regebro  <[email protected]> wrote:

It turns out it's very complex to solve this when internally storing
the time as the local time. Basically you have to normalize the time
(ie check if daylight savings have changed) when doing arithmetic, but
normalize is doing arithmetic, and you get infinite recursion.   
This is not true.  Tim's analysis immortalized [1] at the end of the 
datetime.py file,
shows that UTC to local mapping can be unambiguously recovered from the
local to UTC rules using a simple finite algorithm.  Tim assumes [2] that 
standard (non-DST)
time offset is constant throughout the history, but this requirement can be 
relaxed to offset
changing no more than once in any 48 hour period (if you generously allow 
timezones
from -24 to 24 hours).


Actually, it looks like I am repeating what I wrote back in April, so I'll stop 
with a
reference [3] to that post.


[1]: https://hg.python.org/cpython/file/v3.5.0b1/Lib/datetime.py#l1935
[2]: https://hg.python.org/cpython/file/v3.5.0b1/Lib/datetime.py#l1948
[3]: https://mail.python.org/pipermail/python-dev/2015-April/139171.html        
   
_______________________________________________
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Status on PEP-431 Timezones

Reply via email to