On 16Mar2022 10:57, Chris Angelico <ros...@gmail.com> wrote: >> Is it sensible to compute the hash only from the immutable parts? >> Bearing in mind that usually you need an equality function as well and >> it may have the same stability issues. > >My understanding - and I'm sure Marco will correct me if I'm wrong >here - is that this behaves like a tuple: if it contains nothing but >hashable objects, it is itself hashable, but if it contains anything >unhashable, the entire tuple isn't hashable.
A significant difference is that tuples have no keys, unlike a dict. A hash does not have to hash all the internal state, ony the relevant state, and not even all of that. The objective of the hash is twofold to my mind: - that "equal" objects (in the `==` sense) have the same hash, so that they hash to the same backet in dicts and can therefore be found - that hash values are widely distributed, to maximise the evenness of the object distribution in buckets For dicts to work, the former has to hold. The latter has more flexibility. A dict has keys. If the dicts are quite varied (sets of tags for example), it may be effective to just hash the keys. But if the dict keys are similar (labelled CSV-rows-as-dicts, or likewise with database rows) this will go badly because the hashes will all (or maybe mostly) collide. >As such, any valid hash value will be stable, and "asking for a hash >will raise TypeError" is also stable. I would seek to avoid a TypeError for a frozendict, but as you can see above I have not thought of a way to do that which would also have desireable hash characteristics in almost all circumstances. (I think we can accept that almost anything will have pathological cases, but the bad cases in my hash-the-keys notion are not, to my mind, rare.) >> >The problem is the first time I get an error with details, for example: >> >TypeError: unhashable type: 'list' >> >The subsequent times I simply raise a generic error: >> >TypeError >> >> You could, you know, cache the original exception. That does keep links >> to the traceback and therefore all the associates stack frames, so that >> isn't cheap (it is peerfectly fast - just the reference, t just prevents >> those things from being reclaimed). > >I don't like that idea myself, for that exact reason - it'll keep >arbitrary numbers of objects alive. I don't like it either, for that exact reason. That reason is the same reason which has Python 3 exception variables get unset as you leave an `except` clause. I'm sure it irks everyone, but the memory penalty of not doing so is high. >But caching the stringified form >would be more reasonable here, and have similar effect. Mmm, yes. Cheers, Cameron Simpson <c...@cskk.id.au> -- https://mail.python.org/mailman/listinfo/python-list