Nick Coghlan <ncogh...@gmail.com> added the comment:

Increasing the number of stat calls required for a successful import is a good 
reason to close the submitted PR, but I'm not sure it's a good reason to close 
the *issue*, as there may be other ways to solve it that don't result in an 
extra stat call for every successful cache hit.

Restating the problem: the pyc file format currently discards the fractional 
portion of the source file mtime. This means that even if the source filesystem 
offers a better than 1 second timestamp resolution, the bytecode cache doesn't.

So I think it's worth asking ourselves what would happen if, instead of storing 
the source mtime as an integer directly, we instead stored "int(mtime * N) & 
0xFFFF".

The source timestamp is stored in a 32-bit field, so the current pyc format is 
technically already subject to a variant of the 2038 epoch problem (i.e. it 
will wrap in 2106 and start re-using timestamps). We just don't care, since the 
only impact is that there's a tiny risk that we'll fail to recompile an updated 
source file if it hasn't changed size and we try importing it at exactly the 
wrong time. That window is currently 1 second every ~136 years.

That means we have a trade-off available between the size of each individual 
"erroneous cache hit" window, and how often we encounter that window. Some 
examples:

N=2: 500 ms window every ~68 years
N=10: 100 ms window every ~13.6 years
N=100: 10 ms window every ~1.36 years
N=1000: 1 ms window every ~7 weeks (~0.136 years)

The odds of a file being in exactly 7 weeks time after it was last compiled 
(down to the millisecond) *and* being different without changing size are going 
to be lower that those of a single (or N) character change being made *right 
now* (e.g. fixing a typo in a variable name that transposed characters, or got 
a letter wrong).

A case where problems with the status quo could be most plausibly encountered 
is when a text editor with autosave configured is combined with a testing web 
service with hot reloading configured.

Don't get me wrong, I think the odds of that actually happening are already 
very low, and the human fix is simple (make another edit, save the source file 
again, and grumble about computers not seeing changes that are right in front 
of them).


However, encountering problems with an N=100 or N=1000 multiplier seems even 
more implausible to me, and in cases where it was deemed a concern, PEP 552's 
hash-based caching seems a solution people should be looking at anyway.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue31772>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to