Re: [Mailman-Users] Scrubbing charset-unspecified text
Sorry for breaking the thread, I seem to have accidentally deleted Mark's original reply, and the web archive doesn't include the Message-ID. On 02/05/2006 17:47, Mark Sapiro wrote: Here is a suggested change to the code you quoted. Replace if part.get('content-disposition') and \ not part.get_content_charset(): omask = os.umask(002) with if part.get('content-disposition') and \ msg.is_multipart() and \ not part.get_content_charset(): omask = os.umask(002) This is not really a proper fix, but I think it will avoid the problem in your case. Thank you. Having applied it I can confirm it does now allow that particular email through. Roger -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org Security Policy: http://www.python.org/cgi-bin/faqw-mm.py?req=showamp;file=faq01.027.htp
Re: [Mailman-Users] Scrubbing charset-unspecified text
Roger Lynn wrote: Sorry for breaking the thread, I seem to have accidentally deleted Mark's original reply, and the web archive doesn't include the Message-ID. That's OK, but if you want to maintain the thread in these cases, you can find the Message-ID: in the Downloadable version monthly text file. On 02/05/2006 17:47, Mark Sapiro wrote: snip This is not really a proper fix, but I think it will avoid the problem in your case. Thank you. Having applied it I can confirm it does now allow that particular email through. Thanks for the feedback. -- Mark Sapiro [EMAIL PROTECTED] The highway is for gamblers, San Francisco Bay Area, Californiabetter use your sense - B. Dylan -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org Security Policy: http://www.python.org/cgi-bin/faqw-mm.py?req=showamp;file=faq01.027.htp
[Mailman-Users] Scrubbing charset-unspecified text
Hi, I'm running Mailman 2.1.7, packaged for Debian (although I don't think that's relevant to this question). A list that I administer has non-digest scrubbing enabled. An email was recently sent to it with the following headers: Content-Type: text/plain Content-Disposition: inline MIME-Version: 1.0 X-Mailer: MIME-tools 5.411 (Entity 5.404) Date: Mon, 01 May 2006 18:47:30 +0100 Subject: [...] To: [...] From: [...] X-Mailer: SINA Webmail 6.00. Reply-To: [...] X-Sina-Mail-Agent: sinadeliver-6.00-1.97 Message-Id: [...] X-Virus-Scanned: by myinternet myAV on ngflrtr1 Content-Transfer-Encoding: quoted-printable This resulted in the contents of the email being replaced with: An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://[...]/attachments/20060501/aad799ed/attachment.ksh Why is it necessary to scrub plain text in this instance, when no character set is specified? Couldn't it just be assumed that it is us-ascii? If I were to comment out the following code from process() in Scrubber.py, would there be any consequences other than allowing messages like the above through to the list? # TK: if part is attached then check charset and scrub if none if part.get('content-disposition') and \ not part.get_content_charset(): omask = os.umask(002) try: url = save_attachment(mlist, part, dir) finally: os.umask(omask) filename = part.get_filename(_('not available')) filename = Utils.oneline(filename, lcset) replace_payload_by_text(part, _(\ An embedded and charset-unspecified text was scrubbed... Name: %(filename)s Url: %(url)s ), lcset) Incidentally, why does the attachment have the suffix .ksh? It seems rather unusual. I'm using the following settings: SCRUBBER_DONT_USE_ATTACHMENT_FILENAME = False SCRUBBER_USE_ATTACHMENT_FILENAME_EXTENSION = True Thanks for any help, Roger -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org Security Policy: http://www.python.org/cgi-bin/faqw-mm.py?req=showamp;file=faq01.027.htp
Re: [Mailman-Users] Scrubbing charset-unspecified text
Roger Lynn wrote: I'm running Mailman 2.1.7, packaged for Debian (although I don't think that's relevant to this question). A list that I administer has non-digest scrubbing enabled. An email was recently sent to it with the following headers: Content-Type: text/plain Content-Disposition: inline MIME-Version: 1.0 X-Mailer: MIME-tools 5.411 (Entity 5.404) Date: Mon, 01 May 2006 18:47:30 +0100 Subject: [...] To: [...] From: [...] X-Mailer: SINA Webmail 6.00. Reply-To: [...] X-Sina-Mail-Agent: sinadeliver-6.00-1.97 Message-Id: [...] X-Virus-Scanned: by myinternet myAV on ngflrtr1 Content-Transfer-Encoding: quoted-printable Which seems like a mal-formed message. The issue is the Content-Disposition: inline which should only appear in sub-part headers, not in the message headers. This resulted in the contents of the email being replaced with: An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://[...]/attachments/20060501/aad799ed/attachment.ksh Why is it necessary to scrub plain text in this instance, when no character set is specified? Couldn't it just be assumed that it is us-ascii? It is a bug or at least insufficiently robust code. We shouldn't be relying on the Content-Disposition: header to determine a sub-part. If I were to comment out the following code from process() in Scrubber.py, would there be any consequences other than allowing messages like the above through to the list? Yes. The consequence is that you could get a message which contained an actual charset-unspecified text attachment with an actual character set different from that of the first text/plain part and then these two parts with perhaps incompatible character sets would be 'flattened' together into one part. Here is a suggested change to the code you quoted. Replace if part.get('content-disposition') and \ not part.get_content_charset(): omask = os.umask(002) with if part.get('content-disposition') and \ msg.is_multipart() and \ not part.get_content_charset(): omask = os.umask(002) This is not really a proper fix, but I think it will avoid the problem in your case. Incidentally, why does the attachment have the suffix .ksh? It seems rather unusual. I'm using the following settings: SCRUBBER_DONT_USE_ATTACHMENT_FILENAME = False SCRUBBER_USE_ATTACHMENT_FILENAME_EXTENSION = True There is no 'filename' in what we mistakenly think is an attachment, so we guess the extension based on the Content-Type: which is text/plain. We use effectively the Python library call mimetypes.guess_all_extensions('text/plain', strict=False) which returns this list ['.ksh', '.asc', '.h', '.c', '.txt'] and we pick the first one. -- Mark Sapiro [EMAIL PROTECTED] The highway is for gamblers, San Francisco Bay Area, Californiabetter use your sense - B. Dylan -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org Security Policy: http://www.python.org/cgi-bin/faqw-mm.py?req=showamp;file=faq01.027.htp