Re: [Mailman-Users] Scrubbing charset-unspecified text

2006-05-11 Thread Roger Lynn
Sorry for breaking the thread, I seem to have accidentally deleted Mark's
original reply, and the web archive doesn't include the Message-ID.

On 02/05/2006 17:47, Mark Sapiro wrote:
 Here is a suggested change to the code you quoted.
 
 Replace
 
 if part.get('content-disposition') and \
not part.get_content_charset():
 omask = os.umask(002)
 
 with
 
 if part.get('content-disposition') and \
msg.is_multipart() and \
not part.get_content_charset():
 omask = os.umask(002)
 
 This is not really a proper fix, but I think it will avoid the problem
 in your case.

Thank you. Having applied it I can confirm it does now allow that particular
email through.

Roger

--
Mailman-Users mailing list
Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Security Policy: 
http://www.python.org/cgi-bin/faqw-mm.py?req=showamp;file=faq01.027.htp


Re: [Mailman-Users] Scrubbing charset-unspecified text

2006-05-11 Thread Mark Sapiro
Roger Lynn wrote:

Sorry for breaking the thread, I seem to have accidentally deleted Mark's
original reply, and the web archive doesn't include the Message-ID.


That's OK, but if you want to maintain the thread in these cases, you
can find the Message-ID: in the Downloadable version monthly text
file.


On 02/05/2006 17:47, Mark Sapiro wrote:
 
snip
 This is not really a proper fix, but I think it will avoid the problem
 in your case.

Thank you. Having applied it I can confirm it does now allow that particular
email through.


Thanks for the feedback.

-- 
Mark Sapiro [EMAIL PROTECTED]   The highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan

--
Mailman-Users mailing list
Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Security Policy: 
http://www.python.org/cgi-bin/faqw-mm.py?req=showamp;file=faq01.027.htp


[Mailman-Users] Scrubbing charset-unspecified text

2006-05-02 Thread Roger Lynn
Hi,

I'm running Mailman 2.1.7, packaged for Debian (although I don't think
that's relevant to this question). A list that I administer has non-digest
scrubbing enabled. An email was recently sent to it with the following headers:

Content-Type: text/plain
Content-Disposition: inline
MIME-Version: 1.0
X-Mailer: MIME-tools 5.411 (Entity 5.404)
Date: Mon, 01 May 2006 18:47:30 +0100
Subject: [...]
To: [...]
From: [...]
X-Mailer: SINA Webmail 6.00.
Reply-To: [...]
X-Sina-Mail-Agent: sinadeliver-6.00-1.97
Message-Id: [...]
X-Virus-Scanned: by myinternet myAV on ngflrtr1
Content-Transfer-Encoding: quoted-printable

This resulted in the contents of the email being replaced with:

An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: http://[...]/attachments/20060501/aad799ed/attachment.ksh

Why is it necessary to scrub plain text in this instance, when no character
set is specified? Couldn't it just be assumed that it is us-ascii?

If I were to comment out the following code from process() in Scrubber.py,
would there be any consequences other than allowing messages like the above
through to the list?

# TK: if part is attached then check charset and scrub if none
if part.get('content-disposition') and \
   not part.get_content_charset():
omask = os.umask(002)
try:
url = save_attachment(mlist, part, dir)
finally:
os.umask(omask)
filename = part.get_filename(_('not available'))
filename = Utils.oneline(filename, lcset)
replace_payload_by_text(part, _(\
An embedded and charset-unspecified text was scrubbed...
Name: %(filename)s
Url: %(url)s
), lcset)

Incidentally, why does the attachment have the suffix .ksh? It seems
rather unusual. I'm using the following settings:

SCRUBBER_DONT_USE_ATTACHMENT_FILENAME   = False
SCRUBBER_USE_ATTACHMENT_FILENAME_EXTENSION = True

Thanks for any help,

Roger

--
Mailman-Users mailing list
Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Security Policy: 
http://www.python.org/cgi-bin/faqw-mm.py?req=showamp;file=faq01.027.htp


Re: [Mailman-Users] Scrubbing charset-unspecified text

2006-05-02 Thread Mark Sapiro
Roger Lynn wrote:

I'm running Mailman 2.1.7, packaged for Debian (although I don't think
that's relevant to this question). A list that I administer has non-digest
scrubbing enabled. An email was recently sent to it with the following headers:

Content-Type: text/plain
Content-Disposition: inline
MIME-Version: 1.0
X-Mailer: MIME-tools 5.411 (Entity 5.404)
Date: Mon, 01 May 2006 18:47:30 +0100
Subject: [...]
To: [...]
From: [...]
X-Mailer: SINA Webmail 6.00.
Reply-To: [...]
X-Sina-Mail-Agent: sinadeliver-6.00-1.97
Message-Id: [...]
X-Virus-Scanned: by myinternet myAV on ngflrtr1
Content-Transfer-Encoding: quoted-printable


Which seems like a mal-formed message. The issue is the

Content-Disposition: inline

which should only appear in sub-part headers, not in the message
headers.


This resulted in the contents of the email being replaced with:

An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: http://[...]/attachments/20060501/aad799ed/attachment.ksh

Why is it necessary to scrub plain text in this instance, when no character
set is specified? Couldn't it just be assumed that it is us-ascii?


It is a bug or at least insufficiently robust code. We shouldn't be
relying on the Content-Disposition: header to determine a sub-part.


If I were to comment out the following code from process() in Scrubber.py,
would there be any consequences other than allowing messages like the above
through to the list?


Yes. The consequence is that you could get a message which contained an
actual charset-unspecified text attachment with an actual character
set different from that of the first text/plain part and then these
two parts with perhaps incompatible character sets would be
'flattened' together into one part.


Here is a suggested change to the code you quoted.

Replace

if part.get('content-disposition') and \
   not part.get_content_charset():
omask = os.umask(002)

with

if part.get('content-disposition') and \
   msg.is_multipart() and \
   not part.get_content_charset():
omask = os.umask(002)

This is not really a proper fix, but I think it will avoid the problem
in your case.


Incidentally, why does the attachment have the suffix .ksh? It seems
rather unusual. I'm using the following settings:

SCRUBBER_DONT_USE_ATTACHMENT_FILENAME  = False
SCRUBBER_USE_ATTACHMENT_FILENAME_EXTENSION = True


There is no 'filename' in what we mistakenly think is an attachment, so
we guess the extension based on the Content-Type: which is text/plain.

We use effectively the Python library call

mimetypes.guess_all_extensions('text/plain', strict=False)

which returns this list

['.ksh', '.asc', '.h', '.c', '.txt']

and we pick the first one.

-- 
Mark Sapiro [EMAIL PROTECTED]   The highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan

--
Mailman-Users mailing list
Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Security Policy: 
http://www.python.org/cgi-bin/faqw-mm.py?req=showamp;file=faq01.027.htp