[issue2676] email/message.py [Message.get_content_type]: Trivial regex hangs on pathological input

2008-08-16 Thread Antoine Pitrou

Antoine Pitrou [EMAIL PROTECTED] added the comment:

Hi Jack,

 Antoine, I looked at your patch and I'm not sure why you applied it
 instead of applying mine (or saying +1 on me applying my patch).
 
 Yours uses str.partition which I pointed out is sub-optimal (same big-Oh
 but with a larger constant factor) and also adds a function that returns
 two things, one of which is thrown away after having a str.strip
 performed on it.

I added that function so that the header splitting facility is
explicitly exposed as an internal API, as was the case with the regular
expression. I tried to mimick the behaviour of the regex as closely as
possible, which meant returning two things as well :-)

I think the point of the issue is to remove the pathological
(exponential) behaviour when parsing some headers, not to try to squeeze
out the last microseconds out of content-type parsing (which shouldn't
be, IMO, the limiting factor in email handling performance as soon as
it's not super-linear).

That said, I've timed the function against the regular expression and
the former is always faster, even for tiny strings (e.g. a;b).

Your patch was keeping the regular expression as a module-level constant
while replacing all uses of it with a function, which I found a bit
strange (I don't think people are using paramre from the outside since
it's not documented, it's an internal not public API IMO). I also found
it strange to devote a docstring to the discussion of a performance
detail. But I don't have any strong feeling against it either, so you
can still apply it if you think it's important performance-wise.

Regards

Antoine.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2676
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2676] email/message.py [Message.get_content_type]: Trivial regex hangs on pathological input

2008-08-15 Thread Antoine Pitrou

Antoine Pitrou [EMAIL PROTECTED] added the comment:

This should really be fixed. Hanging on a rather normal email message
(not a theoretical example) is not right.

--
nosy: +pitrou
priority:  - high
versions: +Python 3.0

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2676
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2676] email/message.py [Message.get_content_type]: Trivial regex hangs on pathological input

2008-08-15 Thread Antoine Pitrou

Antoine Pitrou [EMAIL PROTECTED] added the comment:

Fixed in r65700. Thanks for the report!

--
resolution:  - fixed
status: open - closed

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2676
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2676] email/message.py [Message.get_content_type]: Trivial regex hangs on pathological input

2008-08-15 Thread Jack Diederich

Jack Diederich [EMAIL PROTECTED] added the comment:

Antoine, I looked at your patch and I'm not sure why you applied it
instead of applying mine (or saying +1 on me applying my patch).

Yours uses str.partition which I pointed out is sub-optimal (same big-Oh
but with a larger constant factor) and also adds a function that returns
two things, one of which is thrown away after having a str.strip
performed on it.

If my patch was deficient please let me know.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2676
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2676] email/message.py [Message.get_content_type]: Trivial regex hangs on pathological input

2008-07-31 Thread Jack Diederich

Jack Diederich [EMAIL PROTECTED] added the comment:

Augmented version of Daniel's patch.

This makes an internal function that does the same work.  It uses
txt.find() instead of split() or partition() because for pathologically
long strings find() is noticeably faster.  It also does the strip()
before the lower() which helps with evilly long strings.

I didn't remove the module global paramre because an external module
might be using it.  I did update its comment.

Do bugfixes get applied to 2.6 or 3.0?  I'm a bit out of practice.

--
nosy: +jackdied
Added file: http://bugs.python.org/file11022/email.message.diff

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2676
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2676] email/message.py [Message.get_content_type]: Trivial regex hangs on pathological input

2008-04-23 Thread Daniel Diniz

New submission from Daniel Diniz [EMAIL PROTECTED]:

[Reported by Alberto Casado Martín [1]]

Message.get_content_type() hangs when very large values are split by the
regex:
ctype = paramre.split(value)[0].lower().strip() #line 439

paramre comes from line 26:
paramre = re.compile(r'\s*;\s*')

Unless the full fledged parser cited in the comment before line 26 is in
the works, I suggest splitting the string by ; to get exactly the same
behavior in a more reliable way.


[1] http://mail.python.org/pipermail/python-dev/2008-April/078840.html

--
components: Library (Lib)
files: message.py.patch
keywords: patch
messages: 65702
nosy: ajaksu2, barry
severity: normal
status: open
title: email/message.py [Message.get_content_type]: Trivial regex hangs on 
pathological input
versions: Python 2.6
Added file: http://bugs.python.org/file10079/message.py.patch

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2676
__
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2676] email/message.py [Message.get_content_type]: Trivial regex hangs on pathological input

2008-04-23 Thread Benjamin Peterson

Changes by Benjamin Peterson [EMAIL PROTECTED]:


--
assignee:  - barry
type:  - resource usage

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2676
__
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com