INADA Naoki added the comment: Current Py_HASH_CUTOFF implementation seems weak. ``` switch(len) { /* ((hash << 5) + hash) + *p == hash * 33 + *p */ case 7: hash = ((hash << 5) + hash) + *p++; /* fallthrough */ ... case 1: hash = ((hash << 5) + hash) + *p++; break; default: assert(0); } hash ^= len; hash ^= (Py_uhash_t) _Py_HashSecret.djbx33a.suffix; x = (Py_hash_t)hash; ```
HashSecret used at last. Conflicting pair without secret conflict with secrets too. ``` >>> def djbx33a(bs): ... h = 5381 ... for i in bs: ... h = h*33+i ... h ^= len(bs) ... h ^~ secret ... return h & 0xffff_ffff_ffff_ffff ... >>> for i in [0, 3, 7]: # range(8) ... for j in [0, 4, 7]: # range(8) ... for k in [0, 5, 7]: # range(8) ... print(i,j,k, djbx33a([7-i, 7-j+33*i, 7-k+33*j, 33*k])) ... 0 0 0 6381700318 0 0 5 6381700318 0 0 7 6381700318 0 4 0 6381700318 ... 7 4 7 6381700318 7 7 0 6381700318 7 7 5 6381700318 7 7 7 6381700318 >>> ``` So there are 8^(Py_HASH_CUTOFF-1) conflicting sequence can be built. Maybe, we should remove Py_HASH_CUTOFF completely? ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue29410> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com