[issue2676] email/message.py [Message.get_content_type]: Trivial regex hangs on pathological input
Antoine Pitrou [EMAIL PROTECTED] added the comment: Hi Jack, Antoine, I looked at your patch and I'm not sure why you applied it instead of applying mine (or saying +1 on me applying my patch). Yours uses str.partition which I pointed out is sub-optimal (same big-Oh but with a larger constant factor) and also adds a function that returns two things, one of which is thrown away after having a str.strip performed on it. I added that function so that the header splitting facility is explicitly exposed as an internal API, as was the case with the regular expression. I tried to mimick the behaviour of the regex as closely as possible, which meant returning two things as well :-) I think the point of the issue is to remove the pathological (exponential) behaviour when parsing some headers, not to try to squeeze out the last microseconds out of content-type parsing (which shouldn't be, IMO, the limiting factor in email handling performance as soon as it's not super-linear). That said, I've timed the function against the regular expression and the former is always faster, even for tiny strings (e.g. a;b). Your patch was keeping the regular expression as a module-level constant while replacing all uses of it with a function, which I found a bit strange (I don't think people are using paramre from the outside since it's not documented, it's an internal not public API IMO). I also found it strange to devote a docstring to the discussion of a performance detail. But I don't have any strong feeling against it either, so you can still apply it if you think it's important performance-wise. Regards Antoine. ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue2676 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2676] email/message.py [Message.get_content_type]: Trivial regex hangs on pathological input
Antoine Pitrou [EMAIL PROTECTED] added the comment: This should really be fixed. Hanging on a rather normal email message (not a theoretical example) is not right. -- nosy: +pitrou priority: - high versions: +Python 3.0 ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue2676 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2676] email/message.py [Message.get_content_type]: Trivial regex hangs on pathological input
Antoine Pitrou [EMAIL PROTECTED] added the comment: Fixed in r65700. Thanks for the report! -- resolution: - fixed status: open - closed ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue2676 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2676] email/message.py [Message.get_content_type]: Trivial regex hangs on pathological input
Jack Diederich [EMAIL PROTECTED] added the comment: Antoine, I looked at your patch and I'm not sure why you applied it instead of applying mine (or saying +1 on me applying my patch). Yours uses str.partition which I pointed out is sub-optimal (same big-Oh but with a larger constant factor) and also adds a function that returns two things, one of which is thrown away after having a str.strip performed on it. If my patch was deficient please let me know. ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue2676 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2676] email/message.py [Message.get_content_type]: Trivial regex hangs on pathological input
Jack Diederich [EMAIL PROTECTED] added the comment: Augmented version of Daniel's patch. This makes an internal function that does the same work. It uses txt.find() instead of split() or partition() because for pathologically long strings find() is noticeably faster. It also does the strip() before the lower() which helps with evilly long strings. I didn't remove the module global paramre because an external module might be using it. I did update its comment. Do bugfixes get applied to 2.6 or 3.0? I'm a bit out of practice. -- nosy: +jackdied Added file: http://bugs.python.org/file11022/email.message.diff ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue2676 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2676] email/message.py [Message.get_content_type]: Trivial regex hangs on pathological input
New submission from Daniel Diniz [EMAIL PROTECTED]: [Reported by Alberto Casado MartÃn [1]] Message.get_content_type() hangs when very large values are split by the regex: ctype = paramre.split(value)[0].lower().strip() #line 439 paramre comes from line 26: paramre = re.compile(r'\s*;\s*') Unless the full fledged parser cited in the comment before line 26 is in the works, I suggest splitting the string by ; to get exactly the same behavior in a more reliable way. [1] http://mail.python.org/pipermail/python-dev/2008-April/078840.html -- components: Library (Lib) files: message.py.patch keywords: patch messages: 65702 nosy: ajaksu2, barry severity: normal status: open title: email/message.py [Message.get_content_type]: Trivial regex hangs on pathological input versions: Python 2.6 Added file: http://bugs.python.org/file10079/message.py.patch __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue2676 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2676] email/message.py [Message.get_content_type]: Trivial regex hangs on pathological input
Changes by Benjamin Peterson [EMAIL PROTECTED]: -- assignee: - barry type: - resource usage __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue2676 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com