Note that hashing in python 2.7 and prior to 3.4 is simply broken and the randomization does not do nearly enough, see https://bugs.python.org/issue14621
On Wed, Feb 17, 2016 at 4:45 AM, Shell Xu <shell909...@gmail.com> wrote: > I thought you are right. Here is the source code in python 2.7.11: > > long > PyObject_Hash(PyObject *v) > { > PyTypeObject *tp = v->ob_type; > if (tp->tp_hash != NULL) > return (*tp->tp_hash)(v); > /* To keep to the general practice that inheriting > * solely from object in C code should work without > * an explicit call to PyType_Ready, we implicitly call > * PyType_Ready here and then check the tp_hash slot again > */ > if (tp->tp_dict == NULL) { > if (PyType_Ready(tp) < 0) > return -1; > if (tp->tp_hash != NULL) > return (*tp->tp_hash)(v); > } > if (tp->tp_compare == NULL && RICHCOMPARE(tp) == NULL) { > return _Py_HashPointer(v); /* Use address as hash value */ > } > /* If there's a cmp but no hash defined, the object can't be hashed */ > return PyObject_HashNotImplemented(v); > } > > If object has hash function, it will be used. If not, _Py_HashPointer will > be used. Which _Py_HashSecret are not used. > And I checked reference of _Py_HashSecret. Only bufferobject, unicodeobject > and stringobject use _Py_HashSecret. > > On Wed, Feb 17, 2016 at 9:54 AM, Steven D'Aprano <st...@pearwood.info> > wrote: >> >> On Tue, Feb 16, 2016 at 11:56:55AM -0800, Glenn Linderman wrote: >> > On 2/16/2016 1:48 AM, Christoph Groth wrote: >> > >Hello, >> > > >> > >Recent Python versions randomize the hashes of str, bytes and datetime >> > >objects. I suppose that the choice of these three types is the result >> > >of a compromise. Has this been discussed somewhere publicly? >> > >> > Search archives of this list... it was discussed at length. >> >> There's a lot of discussion on the mailing list. I think that this is >> the very start of it, in Dec 2011: >> >> https://mail.python.org/pipermail/python-dev/2011-December/115116.html >> >> and continuing into 2012, for example: >> >> https://mail.python.org/pipermail/python-dev/2012-January/115577.html >> https://mail.python.org/pipermail/python-dev/2012-January/115690.html >> >> and a LOT more, spread over many different threads and subject lines. >> >> You should also read the issue on the bug tracker: >> >> http://bugs.python.org/issue13703 >> >> >> My recollection is that it was decided that only strings and bytes need >> to have their hashes randomized, because only strings and bytes can be >> used directly from user-input without first having a conversion step >> with likely input range validation. In addition, changing the hash for >> ints would break too much code for too little benefit: unlike strings, >> where hash collision attacks on web apps are proven and easy, hash >> collision attacks based on ints are more difficult and rare. >> >> See also the comment here: >> >> http://bugs.python.org/issue13703#msg151847 >> >> >> >> > >I'm not a web programmer, but don't web applications also use >> > >dictionaries that are indexed by, say, tuples of integers? >> > >> > Sure, and that is the biggest part of the reason they were randomized. >> >> But they aren't, as far as I can see: >> >> [steve@ando 3.6]$ ./python -c "print(hash((23, 42, 99, 100)))" >> 1071302475 >> [steve@ando 3.6]$ ./python -c "print(hash((23, 42, 99, 100)))" >> 1071302475 >> >> Web apps can use dicts indexed by anything that they like, but unless >> there is an actual attack, what does it matter? Guido makes a good point >> about security here: >> >> https://mail.python.org/pipermail/python-dev/2013-October/129181.html >> >> >> >> > I think hashes of all types have been randomized, not _just_ the list >> > you mentioned. >> >> I'm pretty sure that's not actually the case. Using 3.6 from the repo >> (admittedly not fully up to date though), I can see hash randomization >> working for strings: >> >> [steve@ando 3.6]$ ./python -c "print(hash('abc'))" >> 11601873 >> [steve@ando 3.6]$ ./python -c "print(hash('abc'))" >> -2009889747 >> >> but not for ints: >> >> [steve@ando 3.6]$ ./python -c "print(hash(42))" >> 42 >> [steve@ando 3.6]$ ./python -c "print(hash(42))" >> 42 >> >> >> which agrees with my recollection that only strings and bytes would be >> randomized. >> >> >> >> -- >> Steve >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev@python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/shell909090%40gmail.com > > > > > -- > 彼節者有間,而刀刃者無厚;以無厚入有間,恢恢乎其於游刃必有餘地矣。 > blog: http://shell909090.org/blog/ > twitter: @shell909090 > about.me: http://about.me/shell909090 > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/fijall%40gmail.com > _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com