On 3/1/19 9:15 AM, Mark Sapiro wrote: > On 2/28/19 5:26 AM, Lothar Schilling wrote: >> Hi everybody, >> >> a few weeks ago I upgraded from 2.1.16 (as far as I can remember...) to >> 2.1.29. Everything seemed to work fine at first. But then I found out >> that a lot of posts - actually far more than half of them - aren't >> archived any longer. What logging the errors tells me is this: >> >> Feb 28 12:29:02 2019 (3123) Uncaught runner exception: 'ascii' codec >> can't decode byte 0xb5 in position 26: ordinal not in range(128) >> Feb 28 12:29:02 2019 (3123) Traceback (most recent call last): >> File "/usr/lib/mailman/Mailman/Queue/Runner.py", line 119, in _oneloop >> self._onefile(msg, msgdata) >> File "/usr/lib/mailman/Mailman/Queue/Runner.py", line 190, in _onefile >> keepqueued = self._dispose(mlist, msg, msgdata) >> File "/usr/lib/mailman/Mailman/Queue/ArchRunner.py", line 77, in _dispose >> mlist.ArchiveMail(msg) >> File "/usr/lib/mailman/Mailman/Archiver/Archiver.py", line 216, in >> ArchiveMail >> h.processUnixMailbox(f) >> File "/usr/lib/mailman/Mailman/Archiver/pipermail.py", line 596, in >> processUnixMailbox >> self.add_article(a) >> File "/usr/lib/mailman/Mailman/Archiver/pipermail.py", line 640, in >> add_article >> author = fixAuthor(article.decoded['author']) >> File "/usr/lib/mailman/Mailman/Archiver/pipermail.py", line 63, in >> fixAuthor >> while i>0 and (L[i-1][0] in lowercase or [error message stops right >> here] >> >> As I read in a previous thread the reason for this may be non-ascii >> compliant characters in the post, especially the "from:"-line. But why >> would Python or Mailman now all of a sudden use ASCII instead of UTF-8 >> in the first place? And if so: How can I change that behaviour? > > > Yes, this is due to non-ascii in the display name portion of the From: > header. I'm investigating a fix, but I'm not sure if this is an RFC 2047 > encoded header or if the raw header contains non-ascii. If the latter, > the message is non-compliant - RFC 5321 and predecessors require all raw > headers to contain only ascii characters. > > > As far as "all of a sudden use ASCII" is concerned. Mailman's character > set for English has always been ascii, and for German, iso-8859-1. >
I am unable to duplicate this with 3 tests: a message with non-ascii utf-8 characters in the display name in a raw From:, a message with non-ascii iso-8859-1 characters in the display name in a raw From: and a message with non-ascii iso-8859-1 characters in an RFC 2047 encoded display name in From:. All 3 messages were archived on an English language list. The display name in the archive for the first case was garbled, i.e. the separate bytes of the utf-8 encoding were shown rather than the character they represented. Other than that, there were no issues with the archive. Further, I examined the diff of all the archiver modules between 2.1.16 and 2.1.29 and also between 2.1.15 and 2.1.29, and I see nothing that seems relevant to this exception. To try to diagnose this further, you could try: === modified file 'Mailman/Archiver/pipermail.py' --- Mailman/Archiver/pipermail.py 2018-05-03 21:23:47 +0000 +++ Mailman/Archiver/pipermail.py 2019-03-02 04:51:23 +0000 @@ -60,9 +60,12 @@ else: # Mixed case; assume that small parts of the last name will be # in lowercase, and check them against the list. - while i>0 and (L[i-1][0] in lowercase or - L[i-1].lower() in smallNameParts): - i = i - 1 + try: + while i>0 and (L[i-1][0] in lowercase or + L[i-1].lower() in smallNameParts): + i = i - 1 + except: + syslog('error', 'Exception in fixAuthor: %s', author) author = SPACE.join(L[-1:] + L[i:-1]) + ', ' + SPACE.join(L[:i]) return author and see what gets logged in Mailman's error log and what the archived message looks like -- Mark Sapiro <m...@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan ------------------------------------------------------ Mailman-Users mailing list Mailman-Users@python.org https://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: https://mail.python.org/mailman/options/mailman-users/archive%40jab.org