Josh Rosenberg added the comment:

Correct me if I'm wrong, but wouldn't this only become a concern if:

1. You're storing both IPv4 and IPv6 addresses side-by-side
2. You're storing well over a billion IP addresses
3. Hash codes for the hex string of an IP address were predictably sequential 
(they're not)

On point #3 alone, you can check for yourself. In a quick test within a single 
process on Python 3.4, hash(hex(0x1)) == 7060637827927985012 while 
hash(hex(0x2)) == -4275917786525356978 (your numbers may vary thanks to per 
process string hash seeding, but they should be quite random). As such, you 
couldn't easily fill more than two sequential buckets reliably; you could 
guarantee collision chaining occurs at least once (since as you noted, you can 
create two IP addresses with the same hash reliably), but the chains will be 
evenly distributed; you can't build on that to get a second, third, ..., nth 
collision.

There wouldn't be a meaningful "imbalance" between low and high IP addresses 
either; below a billion or so IP addresses, random chance would dictate the 
occasional hash code would collide, and you could guarantee that collisions 
with the sub-32 bit values would collide one extra time before finding an empty 
bucket, but I seem to recall a typical dict insertion involves 1-5 collisions 
already; adding one extra to every single dict insertion/lookup costs 
something, but it's not that much, and the scenarios required to take advantage 
of it would be incredibly contrived.

----------
nosy: +josh.rosenberg

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue20446>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to