[Python-ideas] Re: set arbitrary hash random seed to ensure reproducible results

2021-12-22 Thread Chris Angelico
On Thu, Dec 23, 2021 at 5:40 PM Stephen J. Turnbull
 wrote:
>
> Hao Hu writes:
>  > On 12/18/21 08:44, Stephen J. Turnbull wrote:
>  > > Hao Hu writes:
>
>  > >   > >> For instance, if we create a caching programming interface that
>  > >   > >> relies on a distributed kv store,
>  > >
>  > > I would be very suspicious of using Python's hash builtin for such a
>  > > purpose.  The Python hash functions are very carefully tuned for high
>  > > performance in one application only: equality testing in Python,
>  > > especially for dicts.  [...]
>  >
>  > It is pretty much the same use case as python's dictionary though, the
>  > goal is just to generalize it to use with a distributed kv store.
>
> Sure, you know that because it's your application.  But I don't know
> that, and it's only an example you give to justify a change to
> Python.  The burden on you is not to argue that it works in one
> application; it's to argue that it works broadly enough to be worth
> changing a lot stuff in Python, imposing a change burden on any
> project that implements __hash__ for any of its classes, and for
> anybody who supports both pre- and post-change version of Python, they
> need to support both __hash__(object) and __hash__(object, salt)
> (probably trivial, just def __hash__(self, salt=None):, but I haven't
> thought about it).
>

A bit more complicated for anything that builds its hash out of other
objects' hashes (eg a tuple), since it would have to avoid calling
hash(object, salt) if it was called as __hash__(self, None). Changing
the signature of a dunder is generally a pain.

Python's hashing function is designed with some extremely specific
use-cases in mind. For example, small integers hash to themselves,
because this gives good results for dictionaries whose keys are all
small integers. That won't be as beneficial if the keyvalue store is
distributed (since each node will only have part of the full
dictionary), and it also means that the application would be
vulnerable to hash collision attacks. As soon as something is
networked, the rules change, and I do not see this as a safe choice
for a distributed kv store.

Exposing the string hashing algorithm *only*, as a convenient and fast
way to hash strings, would have some value. Trying to expose, but also
control, the overall hash function? Not something I would recommend.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/2KLISIHSW2OLDWPP36CILY5PGZGWZDZ5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: set arbitrary hash random seed to ensure reproducible results

2021-12-22 Thread Stephen J. Turnbull
Hao Hu writes:
 > On 12/18/21 08:44, Stephen J. Turnbull wrote:
 > > Hao Hu writes:

 > >   > >> For instance, if we create a caching programming interface that
 > >   > >> relies on a distributed kv store,
 > >
 > > I would be very suspicious of using Python's hash builtin for such a
 > > purpose.  The Python hash functions are very carefully tuned for high
 > > performance in one application only: equality testing in Python,
 > > especially for dicts.  [...]
 > 
 > It is pretty much the same use case as python's dictionary though, the 
 > goal is just to generalize it to use with a distributed kv store. 

Sure, you know that because it's your application.  But I don't know
that, and it's only an example you give to justify a change to
Python.  The burden on you is not to argue that it works in one
application; it's to argue that it works broadly enough to be worth
changing a lot stuff in Python, imposing a change burden on any
project that implements __hash__ for any of its classes, and for
anybody who supports both pre- and post-change version of Python, they
need to support both __hash__(object) and __hash__(object, salt)
(probably trivial, just def __hash__(self, salt=None):, but I haven't
thought about it).

 > Another big advantage is that it is more user friendly to apply
 > *hash* directly on a type.

Sure, that was the whole point of proposing it and nobody denies it.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/Z5I3V4AS6XYC76DL6KM4ZSG3X4AMVD32/
Code of Conduct: http://python.org/psf/codeofconduct/