Raymond Hettinger <raymond.hettin...@gmail.com> added the comment:

Messages (3)
msg309956 - (view)      Author: Johnny Dude (JohnnyD)   Date: 2018-01-15 01:08
When using a tuple that include a string the results are not consistent when 
invoking a new interpreter or process.

For example executing the following on a linux machine will yield different 
results:
python3.6 -c 'import random; random.seed(("a", 1)); print(random.random())"

Please note that the doc string of random.seed states: "Initialize internal 
state from hashable object."

Python documentation does not. 
(https://docs.python.org/3.6/library/random.html#random.seed)

This is very confusing, I hope you can fix the behavior, not the doc string.
msg309957 - (view)      Author: STINNER Victor (vstinner) * (Python committer)  
Date: 2018-01-15 01:13
random.seed(str) uses:

        if version == 2 and isinstance(a, (str, bytes, bytearray)):
            if isinstance(a, str):
                a = a.encode()
            a += _sha512(a).digest()
            a = int.from_bytes(a, 'big')

Whereas for other types, random.seed(obj) uses hash(obj), and hash is 
randomized by default in Python 3.

Yeah, the random.seed() documentation should describe the implementation and 
explain that hash(obj) is used and that the hash function is randomized by 
default:
https://docs.python.org/dev/library/random.html#random.seed
msg310006 - (view)      Author: Raymond Hettinger (rhettinger) * (Python 
committer)     Date: 2018-01-15 10:41
I'm getting a nice improvement in dispersion statistics by shuffling in higher 
bits right at the end:

     /* Disperse patterns arising in nested frozensets */
  +  hash ^= (hash >> 11) ^ (~hash >> 25);
     hash = hash * 69069U + 907133923UL;

Results for range() check:

                     range       range
                    baseline      new
  1st percentile     35.06%      40.63%
  1st decile         48.03%      51.34%
  mean               61.47%      63.24%      
  median             63.24%      65.58% 

Test code for the letter_range() test:

                     letter      letter
                    baseline      new
  1st percentile     39.59%      40.14%
  1st decile         50.90%      51.07%
  mean               63.02%      63.04%      
  median             65.21%      65.23% 


    def letter_range(n):
        return string.ascii_letters[:n]

    def powerset(s):
        for i in range(len(s)+1):
            yield from map(frozenset, itertools.combinations(s, i))

    # range() check
    for i in range(10000):
        for n in range(5, 19):
            t = 2 ** n
            mask = t - 1
            u = len({h & mask for h in map(hash, powerset(range(i, i+n)))})
            print(u/t*100)

    # letter_range() check needs to be restarted (reseeded on every run)
    for n in range(5, 19):
        t = 2 ** n
        mask = t - 1
        u = len({h & mask for h in map(hash, powerset(letter_range(n)))})
        print(u/t)

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue26163>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to