[issue35723] Add "time zone index" cache to datetime objects

Paul Ganssle Mon, 14 Jan 2019 05:59:27 -0800


Paul Ganssle <p.gans...@gmail.com> added the comment:


> I dislike adding a public API for an optimization. Would it be possible to 
> make it private? Would it make sense? tzidx => _tzidx.

This also would have been my preference, but it is unfortunately not possible 
because of the way tzinfo works. tzinfo is an abstract base class, which means 
that any changes to how `tzinfo` subclasses work *must* be implemented by the 
third party time zone providers. The tzidx() method is intended to be called 
*only* from `tzinfo` an optional feature, as such it must be public.

Maybe a better option would be to make it a magic method, to indicate that it 
is public but not intended to be called directly by users? `__tzidx__` or 
`__tzcache__`?


> In test of your PR, tzinfo allocates memory for its cache: 

>            offsets = [timedelta(hours=0), timedelta(hours=1)]
>            names = ['+00:00', '+01:00']
>            dsts = [timedelta(hours=0), timedelta(hours=1)]

>This memory isn't free. I don't see how using an index completely prevents the 
>need to allocate memory for a cache.

That part is not free, but that's also not a cache, it's the underlying data 
for the tzinfo and is not modified over the object's lifetime. Most `tzinfo` 
subclasses are *already* implemented like this in one way or another, for 
example `dateutil.tz.tzfile` reads one of the binary tzdata files and produces 
a list of `ttinfo` objects, then looks up which one applies for each datetime.

The cache part of this is that *that lookup* is expensive, but also super easy 
to cache. In pytz this cache comes for free, because instead of using a list of 
offsets and figuring out which one to apply when you call `utcoffset()`, pytz 
creates a list of `tzinfo` objects with *fixed* offsets and eagerly attaches 
one to a datetime when you call `pytz.timezone.localize()/normalize()`. It's 
using the `tzinfo` field as a cache, but at the cost of breaking time zone 
semantics (which unfortunately use `is` to determine whether two datetimes are 
in the same zone or not), plus the cost of requiring *eager* resolution to this 
question rather than lazy.

> Somehow, we need a method to clear the cache and decide a caching policy. The 
> simplest policy is to have no limit. The re.compile() uses a cache of 512 
> entries. functools.lru_cache uses a default limit of 128 entries.

This is not necessary, actually, because of the way this cache works. This is a 
"write-once" cache *stored on the datetime itself*, and its cost is built in to 
the memory cost of the datetime itself. For CPython it will currently consume 
one of the alignment bytes, so there will be no additional memory use.

> Instead of adding a new API, would it be possible to reuse 
> functools.lru_cache somehow?

If it were possible to use an LRU cache in this situation, it would not be 
necessary to make a change in CPython at all. The problem with LRU cache or any 
other cache implemented as a mapping between datetime -> offset are as I 
mentioned, large memory use and the performance cost of creating an appropriate 
hash would greatly reduce the utility of this function. The proposed tzidx lets 
tzinfo implementers opt in to nearly all the advantages of pytz without any of 
the disadvantages.

>> 2. Because the implementation of datetime.__hash__ invokes utcoffset(), it 
>> is impossible to implement utcoffset in terms of a dictionary of tz-aware 
>> datetimes. This means that you need to construct a new, naive datetime, 
>> which is a fairly slow operation and really puts a damper in the utility of 
>> the cache.

> For special local timezones, would it be possible to explicitly exclude them, 
> and restrict the cache the simple timespace (fixed offset)?

I think there are many problems with this approach, but I'm going to hold off 
on answering this, because I think even if this were possible, an on-datetime 
cache makes much more sense than a mapping-based cache. If you still want this 
addressed given my other arguments let me know and I can give you a more 
thorough response.

Thank you for taking the time to review my proposal, Victor. I realize that it 
is a bit of a wall of text accompanying a pretty big PR, which is kinda the 
result of the fact that I had this idea 6+ months ago and I've been privately 
reworking and refining it (both in a code sandbox and in my head) for some time 
now, only to dump out the results of all my internal arguments all at once. I 
appreciate the feedback and I'm happy to do whatever possible to clarify my 
reasoning.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue35723>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35723] Add "time zone index" cache to datetime objects

Reply via email to