[Python-ideas] Re: set arbitrary hash random seed to ensure reproducible results

2021-12-23 Thread Chris Angelico
On Fri, Dec 24, 2021 at 12:14 AM Hao Hu  wrote:
>- generalize the signature of __hash__(object) to __hash__(object,
> *args, **kwargs). In the default implementation, we discard the keyword
> "salt" and use the default fallback salt if the keyword is not present,
> otherwise we use the salt specified by the keyword.
>

That means that EVERY __hash__ implementation that calls hash() MUST
pass along all spare arguments. This is a breaking change that will
affect a lot of third-party code.

Changing a dunder's signature is a very big deal and needs a lot more
justification than this.

How important *is* it to be able to lock in the seed/salt (they're not
the same thing) when hashing arbitrary objects?

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/XAZIFH4GHWIREGIGU5KHGZJJM3Y6NDJZ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: set arbitrary hash random seed to ensure reproducible results

2021-12-23 Thread Hao Hu

On 12/23/21 07:39, Stephen J. Turnbull wrote:


Hao Hu writes:
  > On 12/18/21 08:44, Stephen J. Turnbull wrote:
  > > Hao Hu writes:

  > >   > >> For instance, if we create a caching programming interface that
  > >   > >> relies on a distributed kv store,
  > >
  > > I would be very suspicious of using Python's hash builtin for such a
  > > purpose.  The Python hash functions are very carefully tuned for high
  > > performance in one application only: equality testing in Python,
  > > especially for dicts.  [...]
  >
  > It is pretty much the same use case as python's dictionary though, the
  > goal is just to generalize it to use with a distributed kv store.

Sure, you know that because it's your application.  But I don't know
that, and it's only an example you give to justify a change to
Python.  The burden on you is not to argue that it works in one
application; it's to argue that it works broadly enough to be worth
changing a lot stuff in Python, imposing a change burden on any
project that implements __hash__ for any of its classes, and for
anybody who supports both pre- and post-change version of Python, they
need to support both __hash__(object) and __hash__(object, salt)
(probably trivial, just def __hash__(self, salt=None):, but I haven't
thought about it).


Thanks for thinking about all this. Those are reasonable concerns.

- The motivation of having such a discussion is indeed to figure out 
whether this small requirement could be crafted well enough for python's 
best interest.


- For the example mentioned above, I am wondering whether we could work 
around it like this:


  - generalize the signature of __hash__(object) to __hash__(object, 
*args, **kwargs). In the default implementation, we discard the keyword 
"salt" and use the default fallback salt if the keyword is not present, 
otherwise we use the salt specified by the keyword.



  > Another big advantage is that it is more user friendly to apply
  > *hash* directly on a type.

Sure, that was the whole point of proposing it and nobody denies it.




smime.p7s
Description: S/MIME Cryptographic Signature
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/DJNRQBSPZ2VBZXVKMIDPESNKK4FE2NJW/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: set arbitrary hash random seed to ensure reproducible results

2021-12-22 Thread Chris Angelico
On Thu, Dec 23, 2021 at 5:40 PM Stephen J. Turnbull
 wrote:
>
> Hao Hu writes:
>  > On 12/18/21 08:44, Stephen J. Turnbull wrote:
>  > > Hao Hu writes:
>
>  > >   > >> For instance, if we create a caching programming interface that
>  > >   > >> relies on a distributed kv store,
>  > >
>  > > I would be very suspicious of using Python's hash builtin for such a
>  > > purpose.  The Python hash functions are very carefully tuned for high
>  > > performance in one application only: equality testing in Python,
>  > > especially for dicts.  [...]
>  >
>  > It is pretty much the same use case as python's dictionary though, the
>  > goal is just to generalize it to use with a distributed kv store.
>
> Sure, you know that because it's your application.  But I don't know
> that, and it's only an example you give to justify a change to
> Python.  The burden on you is not to argue that it works in one
> application; it's to argue that it works broadly enough to be worth
> changing a lot stuff in Python, imposing a change burden on any
> project that implements __hash__ for any of its classes, and for
> anybody who supports both pre- and post-change version of Python, they
> need to support both __hash__(object) and __hash__(object, salt)
> (probably trivial, just def __hash__(self, salt=None):, but I haven't
> thought about it).
>

A bit more complicated for anything that builds its hash out of other
objects' hashes (eg a tuple), since it would have to avoid calling
hash(object, salt) if it was called as __hash__(self, None). Changing
the signature of a dunder is generally a pain.

Python's hashing function is designed with some extremely specific
use-cases in mind. For example, small integers hash to themselves,
because this gives good results for dictionaries whose keys are all
small integers. That won't be as beneficial if the keyvalue store is
distributed (since each node will only have part of the full
dictionary), and it also means that the application would be
vulnerable to hash collision attacks. As soon as something is
networked, the rules change, and I do not see this as a safe choice
for a distributed kv store.

Exposing the string hashing algorithm *only*, as a convenient and fast
way to hash strings, would have some value. Trying to expose, but also
control, the overall hash function? Not something I would recommend.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/2KLISIHSW2OLDWPP36CILY5PGZGWZDZ5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: set arbitrary hash random seed to ensure reproducible results

2021-12-22 Thread Stephen J. Turnbull
Hao Hu writes:
 > On 12/18/21 08:44, Stephen J. Turnbull wrote:
 > > Hao Hu writes:

 > >   > >> For instance, if we create a caching programming interface that
 > >   > >> relies on a distributed kv store,
 > >
 > > I would be very suspicious of using Python's hash builtin for such a
 > > purpose.  The Python hash functions are very carefully tuned for high
 > > performance in one application only: equality testing in Python,
 > > especially for dicts.  [...]
 > 
 > It is pretty much the same use case as python's dictionary though, the 
 > goal is just to generalize it to use with a distributed kv store. 

Sure, you know that because it's your application.  But I don't know
that, and it's only an example you give to justify a change to
Python.  The burden on you is not to argue that it works in one
application; it's to argue that it works broadly enough to be worth
changing a lot stuff in Python, imposing a change burden on any
project that implements __hash__ for any of its classes, and for
anybody who supports both pre- and post-change version of Python, they
need to support both __hash__(object) and __hash__(object, salt)
(probably trivial, just def __hash__(self, salt=None):, but I haven't
thought about it).

 > Another big advantage is that it is more user friendly to apply
 > *hash* directly on a type.

Sure, that was the whole point of proposing it and nobody denies it.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/Z5I3V4AS6XYC76DL6KM4ZSG3X4AMVD32/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: set arbitrary hash random seed to ensure reproducible results

2021-12-18 Thread Hao Hu

On 12/18/21 08:44, Stephen J. Turnbull wrote:


Hao Hu writes:
  > > On 17 Dec 2021, at 15:28, Chris Angelico  wrote:

  > > The built-in hash() function is extremely generic, so it can't really
  > > work that way. Adding a parameter to it would require (a) adding the
  > > parameter to every __hash__ method of every object, including
  > > user-defined objects;
  >
  > I would not say the opposite, however maybe it appears to be more
  > complicated than it is really is. Probably it is worth a small
  > analysis?

It's the user-defined objects that are the killer here.  We don't want
to go wrecking dozens of projects' objects.

  > >> For instance, if we create a caching programming interface that
  > >> relies on a distributed kv store,

I would be very suspicious of using Python's hash builtin for such a
purpose.  The Python hash functions are very carefully tuned for high
performance in one application only: equality testing in Python,
especially for dicts.  Many __hash__ methods omit much of the object
being hashed; if the variation in your keys depends only on those
attributes, you'll get a lot of collisions.  Others are extremely
predictable.  E.g., most integers and other numbers equal to integers
hash to themselves mod 2**61 - 1, I believe -1 is only exception.
Being predictable as such may not be a problem for your kv store
cache, but predictable == pattern, and if your application happens to
match that pattern, you could again end up with a massive collision
problem.  I imagine this is much less likely to be a problem than the
case where keys depend on omitted attributes, since presumably the
__hash__ method is designed to cover the whole range.  And numbers are
the only case I know of offhand.


It is pretty much the same use case as python's dictionary though, the 
goal is just to generalize it to use with a distributed kv store. 
Another big advantage is that it is more user friendly to apply *hash* 
directly on a type.




  > > I'd recommend hashlib:

+1

  > Otherwise, would that be useful to add siphash24 or fnv in the
  > hashlib as well?

I think that is a good idea.  To me, it seems relatively likely to be
accepted quickly.  However, many cryptographic algorithms are delicate
(eg, to avoid timing attacks), so I could be wrong about that.  Folks
like Christian Heimes might be very concerned about the implementation
as well as the algorithm.

Note that Python/pyhash.c seems to have implementations of both of
these algorithms, although I don't know if these implementations
satisfy cryptographic needs.


According to the doc, there seems to be 2 categories of hash function. 
One is for cryptographic purpose, another one is for message 
authentication code.


The algorithms mentioned above could be mostly put into the second category.


Steve


smime.p7s
Description: S/MIME Cryptographic Signature
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/FBYX6MPTZGQUPQICYGYOPMLGAELUVF2H/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: set arbitrary hash random seed to ensure reproducible results

2021-12-17 Thread Stephen J. Turnbull
Hao Hu writes:
 > > On 17 Dec 2021, at 15:28, Chris Angelico  wrote:

 > > The built-in hash() function is extremely generic, so it can't really
 > > work that way. Adding a parameter to it would require (a) adding the
 > > parameter to every __hash__ method of every object, including
 > > user-defined objects;
 > 
 > I would not say the opposite, however maybe it appears to be more
 > complicated than it is really is. Probably it is worth a small
 > analysis?

It's the user-defined objects that are the killer here.  We don't want
to go wrecking dozens of projects' objects.

 > >> For instance, if we create a caching programming interface that
 > >> relies on a distributed kv store,

I would be very suspicious of using Python's hash builtin for such a
purpose.  The Python hash functions are very carefully tuned for high
performance in one application only: equality testing in Python,
especially for dicts.  Many __hash__ methods omit much of the object
being hashed; if the variation in your keys depends only on those
attributes, you'll get a lot of collisions.  Others are extremely
predictable.  E.g., most integers and other numbers equal to integers
hash to themselves mod 2**61 - 1, I believe -1 is only exception.
Being predictable as such may not be a problem for your kv store
cache, but predictable == pattern, and if your application happens to
match that pattern, you could again end up with a massive collision
problem.  I imagine this is much less likely to be a problem than the
case where keys depend on omitted attributes, since presumably the
__hash__ method is designed to cover the whole range.  And numbers are
the only case I know of offhand.

 > > I'd recommend hashlib:

+1

 > Otherwise, would that be useful to add siphash24 or fnv in the
 > hashlib as well?

I think that is a good idea.  To me, it seems relatively likely to be
accepted quickly.  However, many cryptographic algorithms are delicate
(eg, to avoid timing attacks), so I could be wrong about that.  Folks
like Christian Heimes might be very concerned about the implementation
as well as the algorithm.

Note that Python/pyhash.c seems to have implementations of both of
these algorithms, although I don't know if these implementations
satisfy cryptographic needs.

Steve
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/H3BBUYLMJAPGGD66MN3R7A7M7SEYAX66/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: set arbitrary hash random seed to ensure reproducible results

2021-12-17 Thread Hao Hu


> On 17 Dec 2021, at 15:49, Chris Angelico  wrote:
> 
> On Sat, Dec 18, 2021 at 1:44 AM Hao Hu  wrote:
>>> For that sort of thing, it may be more practical to use your own
>>> hashing function, possibly a cryptographically secure one. The precise
>>> hashing function used by Python isn't guaranteed, so if you need it to
>>> be stable across different runs, and especially if you need to seed it
>>> in a specific way, I'd recommend hashlib:
>>> 
>>> https://docs.python.org/3/library/hashlib.html
>> 
>> I’ve explored that option, however the siphash24 or fnv under the hood of 
>> *hash* seems to be more adapted for this type of use cases in terms of 
>> *performance*.
>> Otherwise, would that be useful to add siphash24 or fnv in the hashlib as 
>> well?
> 
> That's a more viable option, although maybe it wouldn't even matter.
> How does hashlib.sha1() performance stack up, and what about a
> handrolled simple string hashing function in Python? Is performance
> actually going to be a problem with one of those?
> 

Great question. I agree that there could be other factors which slow things 
down much more than the hash function.
I assume that this is a function that’ll be potentially called a lot of times, 
and the cumulated cost won’t be negligible.
Maybe for the similar reason, some high performance networking tools adopt the 
same algorithm .

> ChrisA
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-ideas@python.org/message/LKXP3P7J7RBHC57SVS4VASDTVACIEM5S/
> Code of Conduct: http://python.org/psf/codeofconduct/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/4P6EZRWWNFYS3PFRNZSW3NDLY2AGYSFK/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: set arbitrary hash random seed to ensure reproducible results

2021-12-17 Thread Chris Angelico
On Sat, Dec 18, 2021 at 2:21 AM Hao Hu  wrote:
> Great question. I agree that there could be other factors which slow things 
> down much more than the hash function.
> I assume that this is a function that’ll be potentially called a lot of 
> times, and the cumulated cost won’t be negligible.
> Maybe for the similar reason, some high performance networking tools adopt 
> the same algorithm.

Yup. It's always better to measure than to assume, though :)

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ISK4YY4QEPS6QMY64HX4EBOWEV7AB3CI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: set arbitrary hash random seed to ensure reproducible results

2021-12-17 Thread Hao Hu



> On 17 Dec 2021, at 15:42, Steven D'Aprano  wrote:
> 
> On Fri, Dec 17, 2021 at 02:07:38PM -, Hao Hu wrote:
>> Hi,
>> 
>> I am wondering if it would be good to add an additional keyword `seed` 
>> to the builtin function *hash* to allow us to set arbitrary seed to 
>> ensure reproducible results.
> 
> I assume you are talking about hashing strings, I believe they are the 
> only objects that are effected by hash randomization.

I am not sure about this, though I agree with you that strings are affected by 
this.

> 
> 
>> In parallel, we have identified a couple of real use cases that 
>> require that an arbitrary seed is used for a limited scope.
>> For instance, if we create a caching programming interface that relies 
>> on a distributed kv store, it would be very important to make sure 
>> that the hash key stays the same when the application is rebooted or 
>> replicated.
> 
> Does your distributed key/value store have to use the built-in hash 
> function?

If you limit the use case strictly to a string and a value, I agree with you 
that I can rely on the hashing capability of the kv store.
However, sometimes I expect something more generic, such as it might be a tuple 
that mixes numbers, strings and other hashable types at least.
Another option might be using serialisation to produce key, but there are 2 
things to tackle
- serialisation vs hash in terms of speed
- a key of dynamic size vs key of fixed size (I tend to believe that fixed size 
key is more friendly for random access on kv storage side)  

> 
> 
> -- 
> Steve
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-ideas@python.org/message/ZEKR2Y673GSHYBU5MBIHHLCSFF5YUALJ/
> Code of Conduct: http://python.org/psf/codeofconduct/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/OSGNNVTHVU4BSMC65CGBFZ75PQETKPTA/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: set arbitrary hash random seed to ensure reproducible results

2021-12-17 Thread Chris Angelico
On Sat, Dec 18, 2021 at 1:44 AM Hao Hu  wrote:
> > For that sort of thing, it may be more practical to use your own
> > hashing function, possibly a cryptographically secure one. The precise
> > hashing function used by Python isn't guaranteed, so if you need it to
> > be stable across different runs, and especially if you need to seed it
> > in a specific way, I'd recommend hashlib:
> >
> > https://docs.python.org/3/library/hashlib.html
>
> I’ve explored that option, however the siphash24 or fnv under the hood of 
> *hash* seems to be more adapted for this type of use cases in terms of 
> *performance*.
> Otherwise, would that be useful to add siphash24 or fnv in the hashlib as 
> well?

That's a more viable option, although maybe it wouldn't even matter.
How does hashlib.sha1() performance stack up, and what about a
handrolled simple string hashing function in Python? Is performance
actually going to be a problem with one of those?

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/LKXP3P7J7RBHC57SVS4VASDTVACIEM5S/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: set arbitrary hash random seed to ensure reproducible results

2021-12-17 Thread Steven D'Aprano
On Fri, Dec 17, 2021 at 02:07:38PM -, Hao Hu wrote:
> Hi,
> 
> I am wondering if it would be good to add an additional keyword `seed` 
> to the builtin function *hash* to allow us to set arbitrary seed to 
> ensure reproducible results.

I assume you are talking about hashing strings, I believe they are the 
only objects that are effected by hash randomization.


> In parallel, we have identified a couple of real use cases that 
> require that an arbitrary seed is used for a limited scope.
> For instance, if we create a caching programming interface that relies 
> on a distributed kv store, it would be very important to make sure 
> that the hash key stays the same when the application is rebooted or 
> replicated.

Does your distributed key/value store have to use the built-in hash 
function?


-- 
Steve
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ZEKR2Y673GSHYBU5MBIHHLCSFF5YUALJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: set arbitrary hash random seed to ensure reproducible results

2021-12-17 Thread Hao Hu

> On 17 Dec 2021, at 15:28, Chris Angelico  wrote:
> 
> On Sat, Dec 18, 2021 at 1:21 AM Hao Hu  wrote:
>> 
>> Hi,
>> 
>> I am wondering if it would be good to add an additional keyword `seed` to 
>> the builtin function *hash* to allow us to set arbitrary seed to ensure 
>> reproducible results.
>> 
> 
> The built-in hash() function is extremely generic, so it can't really
> work that way. Adding a parameter to it would require (a) adding the
> parameter to every __hash__ method of every object, including
> user-defined objects; and (b) defining what that would mean when
> multiple objects' hashes are combined (eg hashing a tuple).
> 

I would not say the opposite, however maybe it appears to be more complicated 
than it is really is. Probably it is worth a small analysis?

>> 
>> As far as I know, there exists already the environment variable 
>> PYTHONHASHSEED that allows us to set arbitrary seed or disable the seed 
>> globally for the python interpreter.
>> However, it looks like that it would be too bold to use that environment 
>> variable to change the default behavior because the random seed generation 
>> helps improve the security my reducing the risk of hash flooding.
>> 
>> In parallel, we have identified a couple of real use cases that require that 
>> an arbitrary seed is used for a limited scope.
>> For instance, if we create a caching programming interface that relies on a 
>> distributed kv store, it would be very important to make sure that the hash 
>> key stays the same when the application is rebooted or replicated. It is 
>> generally more cautious to use the above capability to limit the scope to 
>> the caching library itself instead of applying the same for all the hash 
>> functions of all the python interpreters.
>> 
> 
> For that sort of thing, it may be more practical to use your own
> hashing function, possibly a cryptographically secure one. The precise
> hashing function used by Python isn't guaranteed, so if you need it to
> be stable across different runs, and especially if you need to seed it
> in a specific way, I'd recommend hashlib:
> 
> https://docs.python.org/3/library/hashlib.html

I’ve explored that option, however the siphash24 or fnv under the hood of 
*hash* seems to be more adapted for this type of use cases in terms of 
*performance*.
Otherwise, would that be useful to add siphash24 or fnv in the hashlib as well?
There are obviously also other third party libraries such as *mmh*, however 
that’ll introduce unnecessary immature dependencies.

WDYT? 
Thank you.

> 
> ChrisA
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-ideas@python.org/message/CU4TI3AEMI7Y6USIIHDWWSQW7WGGPNJ7/
> Code of Conduct: http://python.org/psf/codeofconduct/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/HNMTY3SSKLAYNCLC5CN5SPUOMZP7IYGJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: set arbitrary hash random seed to ensure reproducible results

2021-12-17 Thread Chris Angelico
On Sat, Dec 18, 2021 at 1:21 AM Hao Hu  wrote:
>
> Hi,
>
> I am wondering if it would be good to add an additional keyword `seed` to the 
> builtin function *hash* to allow us to set arbitrary seed to ensure 
> reproducible results.
>

The built-in hash() function is extremely generic, so it can't really
work that way. Adding a parameter to it would require (a) adding the
parameter to every __hash__ method of every object, including
user-defined objects; and (b) defining what that would mean when
multiple objects' hashes are combined (eg hashing a tuple).

>
> As far as I know, there exists already the environment variable 
> PYTHONHASHSEED that allows us to set arbitrary seed or disable the seed 
> globally for the python interpreter.
> However, it looks like that it would be too bold to use that environment 
> variable to change the default behavior because the random seed generation 
> helps improve the security my reducing the risk of hash flooding.
>
> In parallel, we have identified a couple of real use cases that require that 
> an arbitrary seed is used for a limited scope.
> For instance, if we create a caching programming interface that relies on a 
> distributed kv store, it would be very important to make sure that the hash 
> key stays the same when the application is rebooted or replicated. It is 
> generally more cautious to use the above capability to limit the scope to the 
> caching library itself instead of applying the same for all the hash 
> functions of all the python interpreters.
>

For that sort of thing, it may be more practical to use your own
hashing function, possibly a cryptographically secure one. The precise
hashing function used by Python isn't guaranteed, so if you need it to
be stable across different runs, and especially if you need to seed it
in a specific way, I'd recommend hashlib:

https://docs.python.org/3/library/hashlib.html

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/CU4TI3AEMI7Y6USIIHDWWSQW7WGGPNJ7/
Code of Conduct: http://python.org/psf/codeofconduct/