There's a number of Core devs that have taken strong positions against this
change, citing various reasons ranging from "the addition of a function that
returns a constant will cause bloat in the interpreter / needs to be tested /
etc" to "what you really mean to ask for is set iteration stability, and we
don't want that" to "identity based hashing is the default correct choice of a
hashing function to use in any situation, unless we are forced by the
requirements not to (even if it's disadvantageous compared to other choices)"
to just straight appeals to authority ("rhettinger closed the issue on github
so he must have done it for a good reason).
I'm not sure if they actually believe what they say in all of these cases. To
me, it sounds more like "please go away" than an honest argument on technical
merit, but it matters little.
I don't think anything can be changed with further technical discussion.
---
I do have another suggestion that I think merits a discussion. Maybe it will
fare better. This change has a bit broader scope.
What if we were to subtract some statically allocated “anchor” address from the
pointer in _Py_HashPointerRaw and the id function?
It’s arguably a security fix, since these operations currently leak the ASLR
offset, and after that they won’t. It also makes the hashes of statically
allocated PyObjects with defaulted tp_hash stable per build of Python, which I
think is a good thing for reasons we’ve already discussed at great length.
There is a downside to this suggestion that it adds one integer subtraction to
each of these functions.
If this tiny perf cost is a concern, we could even disable this countermeasure
if Python can determine it was guaranteed to load to a static memory location.
At least two core devs responded with "don't care" / “it works on my machine”
because they happen to have ASLR disabled. The current situation ties together
two completely separate concerns, and adds a non-portable aspect to the
behavior of the runtime - you can write a program that behaves
deterministically on system A and then see non-deterministic behavior on system
B. I don’t think I should have to explain why this is bad.
Regarding language requirements, nothing changes.
It is a per-interpreter specific change, since not all id and hash
implementations depend on the object’s memory location (also since some runtime
environments, like JVM, cannot be attacked with out of bounds memory accesses
from inside the program, so an ASLR offset leak might not be deemed a risk
there). At most, it is an advisory that those who do should act similarly, and
even that is tenuous at best.
WDYT?
P.S. the other way to implement the security fix is to add a randomly chosen
64-bit secret (and then you wouldn’t know what part of the “offset” is due to
ASLR and what’s due to the secret). And at least then, it becomes
non-deterministic on all systems, as opposed to just some of them.
_______________________________________________
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/JGG2LOTJEFXLLMNEMNHT7CHOUSNZ5KZX/
Code of Conduct: http://python.org/psf/codeofconduct/