Folks, I'm OK with changing the recomposing part in Scrubber.py:
if not part or part.is_multipart(): continue to if part.is_multipart(): continue It looks like the email package is more robust than it was when the bug report was issued and the Scrubber code was patched. But as to the default charset is 'us-ascii' problem, if we put the part together the parts, some language text (like japanese) become irreversibly unreadable. It is safe to keep it in a separate file if you can't archive the whole message in multipart like in Pipermail. Additionally, the diff file which was said to be lost in the first post is in: http://lists.gnupg.org/pipermail/gnupg-users/attachments/20061207/6bd11edc/attachment.diff I believe the folks in gnupg.org can fix the reference in the pipermail archive by fixing the PUBLIC_ARCHIVE_URL in mm_cfg.py and re-generating the archive by bin/arch --wipe command. Mark Sapiro wrote: > Todd Zullinger wrote: >> Related to the second part of Werner's message being scrubbed with the >> message: >> >> An embedded and charset-unspecified text was scrubbed... >> >> Poking in the email package (on python 2.4.4) shows: >> >> def get_content_charset(self, failobj=None): >> """Return the charset parameter of the Content-Type header. >> >> The returned string is always coerced to lower case. If there is no >> Content-Type header, or if that header has no charset parameter, >> failobj is returned. >> """ >> >> This seems to violate section 5.2 of RFC 2045 which says parts lacking >> a Content-type header should be assumed to be text/plain with a >> charset of us-ascii. The get_content_type method in email.Message >> does mention RFC 2045 and uses text/plain if the content-type is >> invalid. > > > It does seem inconsistent, but I don't think we can call it a violation > of the RFC yet, it depends on what the caller does with it. > > >> Would it be appropriate to set failobj="us-ascii" when >> calling this method in Scrubber.py? > > > It might be, but I'd like to hear from Tokio first. > > Clearly this was considered at one point as a specific case and message > exist for it where it would have been simpler to just assume it is > us-ascii. Thus, I think there must be messages in the wild with parts > with unspecified character sets that aren't us-ascii. > -- Tokio Kikuchi, [EMAIL PROTECTED] http://weather.is.kochi-u.ac.jp/ ------------------------------------------------------ Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org Security Policy: http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq01.027.htp