Bugs item #1633678, was opened at 2007-01-11 20:14 Message generated for change (Comment added) made by akuchling You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1633678&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: Python 2.5 >Status: Closed >Resolution: Fixed Priority: 5 Private: No Submitted By: Matthias Klose (doko) Assigned to: A.M. Kuchling (akuchling) Summary: mailbox.py _fromlinepattern regexp does not support positive Initial Comment: [forwarded from http://bugs.debian.org/254757] mailbox.py _fromlinepattern regexp does not support positive GMT offsets. the pattern didn't change in 2.5. bug submitter writes: archivemail incorrectly splits up messages in my mbox-format mail archvies. I use Squirrelmail, which seems to create mbox lines that look like this: >From [EMAIL PROTECTED] Mon Jan 26 12:29:24 2004 -0400 The "-0400" appears to be throwing it off. If the first message of an mbox file has such a line on it, archivemail flat out stops, saying the file is not mbox. If the later messages in an mbox file are in this style, they are not counted, and archivemail thinks that the preceding message is just kind of long, and the decision to archive or not is broken. I have stumbled on this bug when I wanted to archive my mails on a Sarge system. And since my TZ is positive, the regexp did not work. I think the correct regexp for /usr/lib/python2.3/mailbox.py should be: _fromlinepattern = r"From \s*[^\s]+\s+\w\w\w\s+\w\w\w\s+\d?\d\s+" \ r"\d?\d:\d\d(:\d\d)?(\s+[^\s]+)?\s+\d\d\d\d\s*((\+|-)\d\d\d\d)?\s*$" This should handle positive and negative timezones in From lines. I have tested it successfully with an email beginning with this line: >From [EMAIL PROTECTED] Mon May 31 13:24:50 2004 +0200 as well as one withouth TZ reference. ---------------------------------------------------------------------- >Comment By: A.M. Kuchling (akuchling) Date: 2007-01-22 15:55 Message: Logged In: YES user_id=11375 Originator: NO According to qmail's description of the mbox format (http://www.qmail.org/qmail-manual-html/man5/mbox.html), the 'from' lines shouldn't contain timezone info, but may contain additional information after the date. So I think a better change is just to add [^\s]*\s* to the end of the pattern. Note that the docs recommend the PortableUnixMailbox class as preferable for just this reason: there's too much variation in from lines to make the strict parsing useful. Change committed to trunk in rev. 53519, and to release25-maint in rev. 53521. Thanks for your report! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1633678&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com