[issue6598] calling email.utils.make_msgid frequently has a non-trivial probability of generating colliding ids

2015-11-13 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Perhaps will open a separate issue for more compact non-decimal message ID.

--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6598] calling email.utils.make_msgid frequently has a non-trivial probability of generating colliding ids

2015-11-10 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 3c0a817ab616 by Serhiy Storchaka in branch '3.4':
Issue #6598: Avoid clock wrapping around in test_make_msgid_collisions.
https://hg.python.org/cpython/rev/3c0a817ab616

New changeset e259c0ab7a69 by Serhiy Storchaka in branch '3.5':
Issue #6598: Avoid clock wrapping around in test_make_msgid_collisions.
https://hg.python.org/cpython/rev/e259c0ab7a69

New changeset d1c11a78b43a by Serhiy Storchaka in branch 'default':
Issue #6598: Avoid clock wrapping around in test_make_msgid_collisions.
https://hg.python.org/cpython/rev/d1c11a78b43a

New changeset bdd257d60da2 by Serhiy Storchaka in branch '2.7':
Issue #6598: Avoid clock wrapping around in test_make_msgid_collisions.
https://hg.python.org/cpython/rev/bdd257d60da2

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6598] calling email.utils.make_msgid frequently has a non-trivial probability of generating colliding ids

2015-11-10 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

http://buildbot.python.org/all/builders/x86%20Tiger%202.7/builds/3246/steps/test/logs/stdio
==
FAIL: test_make_msgid_collisions (email.test.test_email.TestMiscellaneous)
--
Traceback (most recent call last):
  File 
"/Users/db3l/buildarea/2.7.bolen-tiger/build/Lib/email/test/test_email.py", 
line 2434, in test_make_msgid_collisions
pass
  File "/Users/db3l/buildarea/2.7.bolen-tiger/build/Lib/contextlib.py", line 
24, in __exit__
self.gen.next()
  File "/Users/db3l/buildarea/2.7.bolen-tiger/build/Lib/test/test_support.py", 
line 1570, in start_threads
raise AssertionError('Unable to join %d threads' % len(started))
AssertionError: Unable to join 5 threads

--

Threads that should be finished for 3 seconds can't be finished for 15 minutes. 
It looks as clock() wrapped around. From clock (3) manpage:

Note that the time can wrap around.  On a 32-bit system where 
CLOCKS_PER_SEC equals 100 this function will return the same value 
approximately every 72 minutes.

The solution is to use monotonic() (time() on older releases) instead of 
clock().

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6598] calling email.utils.make_msgid frequently has a non-trivial probability of generating colliding ids

2015-05-19 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Now there is a question. Is it worth to use base16 (hexadecimal) to compact 
message id to 34 characters or base32 to compact it to 27 characters? Using 
base16 is pretty easy: just replace %d with %x. Using base32 is more 
complicated.

With base64 and base85 the length can be decreased to 23 and 21, but I doubt 
that this is safe.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6598
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6598] calling email.utils.make_msgid frequently has a non-trivial probability of generating colliding ids

2015-05-19 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
assignee:  - serhiy.storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6598
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6598] calling email.utils.make_msgid frequently has a non-trivial probability of generating colliding ids

2015-05-19 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 6969bac411fa by Serhiy Storchaka in branch '2.7':
Issue #6598: Increased time precision and random number range in
https://hg.python.org/cpython/rev/6969bac411fa

New changeset ea878f847eee by Serhiy Storchaka in branch '3.4':
Issue #6598: Increased time precision and random number range in
https://hg.python.org/cpython/rev/ea878f847eee

New changeset 933addbc7041 by Serhiy Storchaka in branch 'default':
Issue #6598: Increased time precision and random number range in
https://hg.python.org/cpython/rev/933addbc7041

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6598
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6598] calling email.utils.make_msgid frequently has a non-trivial probability of generating colliding ids

2015-05-19 Thread Barry A. Warsaw

Barry A. Warsaw added the comment:

On May 19, 2015, at 07:23 AM, Serhiy Storchaka wrote:

Now there is a question. Is it worth to use base16 (hexadecimal) to compact
message id to 34 characters or base32 to compact it to 27 characters? Using
base16 is pretty easy: just replace %d with %x. Using base32 is more
complicated.

It might not be relevant, but we're using base 32 in Message-ID-Hash:

http://wiki.list.org/DEV/Stable%20URLs

We settled on base 32 after consulting with the curators of mail-archive.com
and others.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6598
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6598] calling email.utils.make_msgid frequently has a non-trivial probability of generating colliding ids

2015-05-17 Thread Barry A. Warsaw

Barry A. Warsaw added the comment:

An increase of 13 characters doesn't seem so bad.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6598
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6598] calling email.utils.make_msgid frequently has a non-trivial probability of generating colliding ids

2015-05-17 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Currently the maximal length of local part is

14+1+len(str(2**16))+1+len(str(10**5-1)) == 26

With the patch it will be

len(str(2**31*100))+1+len(str(2**16))+1+len(str(2**64)) = 39

If encode all three numbers with hexadecimal, it will be

len('%x'%(2**31*100))+1+len('%x'%(2**16))+1+len('%x'%(2**64)) = 34

If encode their with base32:


math.ceil((31+math.log(100)/math.log(2))/5)+1+math.ceil(16/5)+1+math.ceil(64/5) 
= 27

Using base64 or base85 can be not safe, because message ID can be used as file 
name in case-insensitive file system.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6598
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6598] calling email.utils.make_msgid frequently has a non-trivial probability of generating colliding ids

2015-05-17 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Here is a patch that changes the generating of message IDs:

1. The datetime is taken with higher precision, 2 decimal digits after dot.
2. The datetime is written as Unix time in 0.01 second units, not as 
mmddHHMMSS. This is more compact and faster.
3. The random part is taken as the random integer in the range 0 from to 2**64, 
not from 0 to 10**5.

This increases the length of generated part of the ID from 26 to 39 characters 
in average. Perhaps in new releases we can use non-decimal or even 
non-alphanumeric characters in ID for compactness.

--
nosy: +barry
stage:  - patch review
Added file: http://bugs.python.org/file39405/make_msgid_2.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6598
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6598] calling email.utils.make_msgid frequently has a non-trivial probability of generating colliding ids

2015-05-17 Thread R. David Murray

R. David Murray added the comment:

Looking at my mail store, it looks like one more more mail clients use a GUID 
for the messageid, which is 36 characters.  So 39 doesn't seem so bad.  There's 
even one that is 74 characters long.  There are also shorter ones, though, so 
compactifying the result wouldn't be a bad thing.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6598
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6598] calling email.utils.make_msgid frequently has a non-trivial probability of generating colliding ids

2015-02-24 Thread Mark Lawrence

Mark Lawrence added the comment:

Can we have a patch review please.  If nothing else xrange will have to change 
for Python 3.

--
nosy: +BreamoreBoy
versions: +Python 3.4, Python 3.5 -Python 3.1, Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6598
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6598] calling email.utils.make_msgid frequently has a non-trivial probability of generating colliding ids

2015-02-24 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

The probability of a collision when generated n numbers from 0 to m-1 is

1 - factorial(m)/factorial(m-n)/m**n

When n  sqrt(m), it is approximately equal to n**2/(2*m). When m = 100, 
we have problems for n about 1000. When increase m to 2**32, the probability of 
collisions is decreased to 0.01%. When increase m to 2**64 as recommended in 
[1], it is decreased to 2.7e-14. I.e. when generate 1000 IDs per second, 
collisions happen one per a million years.

We could also take datetime with larger precision. E.g with 2 digits after the 
dot, as recommended in [1].

When apply both changes, we could generate 10 IDs per second with one 
collision per 1 years or 1 IDs per second with one collision per 
100 years.

[1] https://tools.ietf.org/html/draft-ietf-usefor-message-id-01

--
nosy: +serhiy.storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6598
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6598] calling email.utils.make_msgid frequently has a non-trivial probability of generating colliding ids

2014-02-28 Thread Varun Sharma

Changes by Varun Sharma varunsharmal...@gmail.com:


--
nosy: +varun

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6598
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6598] calling email.utils.make_msgid frequently has a non-trivial probability of generating colliding ids

2010-11-29 Thread Éric Araujo

Changes by Éric Araujo mer...@netwok.org:


--
nosy: +r.david.murray
versions:  -Python 2.6

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6598
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6598] calling email.utils.make_msgid frequently has a non-trivial probability of generating colliding ids

2009-11-15 Thread Gabriel Genellina

Gabriel Genellina gagsl-...@yahoo.com.ar added the comment:

An updated version of the make_msgid patch.
Includes a random part plus a sequential part. Testing takes at most 3 
seconds now (we're interested in those msgids generated in a whole 
second)

--
Added file: http://bugs.python.org/file15339/make_msgid.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6598
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6598] calling email.utils.make_msgid frequently has a non-trivial probability of generating colliding ids

2009-11-15 Thread Gabriel Genellina

Changes by Gabriel Genellina gagsl-...@yahoo.com.ar:


Removed file: http://bugs.python.org/file14644/test_email.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6598
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6598] calling email.utils.make_msgid frequently has a non-trivial probability of generating colliding ids

2009-11-15 Thread Gabriel Genellina

Changes by Gabriel Genellina gagsl-...@yahoo.com.ar:


Removed file: http://bugs.python.org/file14676/utils.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6598
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6598] calling email.utils.make_msgid frequently has a non-trivial probability of generating colliding ids

2009-08-07 Thread Gabriel Genellina

Gabriel Genellina gagsl-...@yahoo.com.ar added the comment:

--- El jue 6-ago-09, Antoine Pitrou rep...@bugs.python.org escribió:

 Antoine Pitrou pit...@free.fr
 added the comment:
 
 Is it ok if the message id is predictable?

I don't know of any use of message ids apart from uniquely identifying the 
message, but we could still keep a random part followed by the sequence.

 Besides, _gen_next_number() can more efficiently be written
 as:
 
 _gen_next_number = itertools.cycle(xrange(N))

Written that way it will eventually hold a list with 10 integers forever.

  Yahoo! Cocina

Encontra las mejores recetas con Yahoo! Cocina.

http://ar.mujer.yahoo.com/cocina/

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6598
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6598] calling email.utils.make_msgid frequently has a non-trivial probability of generating colliding ids

2009-08-07 Thread Gabriel Genellina

Changes by Gabriel Genellina gagsl-...@yahoo.com.ar:


Removed file: http://bugs.python.org/file14643/utils.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6598
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6598] calling email.utils.make_msgid frequently has a non-trivial probability of generating colliding ids

2009-08-07 Thread Gabriel Genellina

Changes by Gabriel Genellina gagsl-...@yahoo.com.ar:


Added file: http://bugs.python.org/file14676/utils.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6598
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6598] calling email.utils.make_msgid frequently has a non-trivial probability of generating colliding ids

2009-08-06 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

Is it ok if the message id is predictable?
Besides, _gen_next_number() can more efficiently be written as:

_gen_next_number = itertools.cycle(xrange(N))

--
nosy: +pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6598
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6598] calling email.utils.make_msgid frequently has a non-trivial probability of generating colliding ids

2009-08-03 Thread Gabriel Genellina

Gabriel Genellina gagsl-...@yahoo.com.ar added the comment:

This patch replaces the random part with an increasing sequence (in a 
thread safe way).
Also, added a test case for make_msgid (there was none previously)

--
keywords: +patch
nosy: +gagenellina
Added file: http://bugs.python.org/file14643/utils.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6598
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6598] calling email.utils.make_msgid frequently has a non-trivial probability of generating colliding ids

2009-08-03 Thread Gabriel Genellina

Changes by Gabriel Genellina gagsl-...@yahoo.com.ar:


Added file: http://bugs.python.org/file14644/test_email.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6598
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6598] calling email.utils.make_msgid frequently has a non-trivial probability of generating colliding ids

2009-08-03 Thread Gabriel Genellina

Changes by Gabriel Genellina gagsl-...@yahoo.com.ar:


--
versions:  -3rd party, Python 2.4, Python 2.5, Python 3.0

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6598
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6598] calling email.utils.make_msgid frequently has a non-trivial probability of generating colliding ids

2009-07-30 Thread Gavin Panella

Changes by Gavin Panella ga...@gromper.net:


--
nosy: +allenap

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6598
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6598] calling email.utils.make_msgid frequently has a non-trivial probability of generating colliding ids

2009-07-29 Thread Michael Hudson

New submission from Michael Hudson m...@users.sourceforge.net:

If you call email.utils.make_msgid a number of times within the same
second, the uniqueness of the results depends on random.randint(10)
returning different values each time.

A little mathematics proves that you don't have to call make_msgid
*that* often to get the same message id twice: if you call it 'n' times,
the probability of a collision is approximately 1 -
math.exp(-n*(n-1)/20.0), and for n == 100, that's about 5%.  For n
== 1000, it's over 99%.

These numbers are born out by experiment:

 def collisions(n):
... msgids = [make_msgid() for i in range(n)]
... return len(msgids) - len(set(msgids))
... 
 sum((collisions(100)0) for i in range(1000))
49
 sum((collisions(1000)0) for i in range(1000))
991

I think probably having a counter in addition to the randomness would be
a good fix for the problem, though I guess then you have to worry about
thread safety.

--
components: Library (Lib)
messages: 91073
nosy: mwh
severity: normal
status: open
title: calling email.utils.make_msgid frequently has a non-trivial probability 
of generating colliding ids
type: behavior
versions: 3rd party, Python 2.4, Python 2.5, Python 2.6, Python 2.7, Python 
3.0, Python 3.1, Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6598
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6598] calling email.utils.make_msgid frequently has a non-trivial probability of generating colliding ids

2009-07-29 Thread Michael Hudson

Michael Hudson m...@users.sourceforge.net added the comment:

A higher resolution timer would also help, of course.

(Thanks to James Knight for the prod).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6598
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com