New submission from Michael Hudson <m...@users.sourceforge.net>: If you call email.utils.make_msgid a number of times within the same second, the uniqueness of the results depends on random.randint(100000) returning different values each time.
A little mathematics proves that you don't have to call make_msgid *that* often to get the same message id twice: if you call it 'n' times, the probability of a collision is approximately "1 - math.exp(-n*(n-1)/200000.0)", and for n == 100, that's about 5%. For n == 1000, it's over 99%. These numbers are born out by experiment: >>> def collisions(n): ... msgids = [make_msgid() for i in range(n)] ... return len(msgids) - len(set(msgids)) ... >>> sum((collisions(100)>0) for i in range(1000)) 49 >>> sum((collisions(1000)>0) for i in range(1000)) 991 I think probably having a counter in addition to the randomness would be a good fix for the problem, though I guess then you have to worry about thread safety. ---------- components: Library (Lib) messages: 91073 nosy: mwh severity: normal status: open title: calling email.utils.make_msgid frequently has a non-trivial probability of generating colliding ids type: behavior versions: 3rd party, Python 2.4, Python 2.5, Python 2.6, Python 2.7, Python 3.0, Python 3.1, Python 3.2 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue6598> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com