[Mailman-Users] Fwd: HTML filter on the lists

2006-08-03 Thread David Andrews
I got the below message from a user, and am not quite sure what to do?  Any 
advice?

Dave


Date: Thu, 3 Aug 2006 00:30:29 -0600
From: T. Joseph Carter [EMAIL PROTECTED]
To: David Andrews [EMAIL PROTECTED]
Subject: HTML filter on the lists

The filter you are using on text/html messages to the list really is very,
very broken.  First, it leaves parts of the HTML behind.  Second, it lies
about its output, claiming that all messages are now us-ascii (which
breaks character set conversion tools which need to know the original
character set in order to map to the correct one.)

The situation as it exists now is that you have almost everyone on the
list using Microsoft Outhous--er, I mean Outlook, which renders plain text
us-ascii messages as HTML in Windows-Latin-1 encoding.

My native character set is not Windows-Latin-1, it's UTF-8.  This requires
conversion, and the conversion tools assume that because your filter says
the message is us-ascii, it actually is.  I am also one of the about three
people on the lists whose email does not support HTML natively.  I have
fixed that with a mail filter, but it only works if the message is
actually HTML.

Essentially, the three people for whom your mail filter still serves a
purpose are having to deal with HTML emails we can't read precisely
because your filter doesn't actually do what it says it does.

My thought on this is to switch to a filter that simply defangs HTML
without stripping it, or replacing the existing filter with some suitable
lynx command line.  My filter:

LANG=en.UTF-8 lynx -dump -localhost -stdin -dont-wrap-pre -minimal

You might want to use en.iso8859-1 instead for LANG, since just about
everyone on the list speaks a Latin-1 language natively and Outlook does
know how to convert that to a Windows character set rather easily.  Just
make sure that when the output is stuffed back into MIME format the
charset is set to match the output.


I tried to write something to correct this--if I take an affected message,
correct the MIME headers so mutt knows it's HTML and what charset it
really is, mutt does properly extract the message.  The problem is that
there is no automated way to determine which messages are mangled, and any
filter would be forced to make as many assumptions about what the filter
broke as as the filter made in breaking it.  An Eastern-European poster's
messages would be garbled beyond recovery.  The proper solution is to not
break the messages.  *smile*


__ NOD32 1.1689 (20060802) Information __

This message was checked by NOD32 antivirus system.
http://www.eset.com

--
Mailman-Users mailing list
Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Security Policy: 
http://www.python.org/cgi-bin/faqw-mm.py?req=showamp;file=faq01.027.htp


Re: [Mailman-Users] Fwd: HTML filter on the lists

2006-08-03 Thread Mark Sapiro
David Andrews wrote:

I got the below message from a user, and am not quite sure what to do?  Any 
advice?

Date: Thu, 3 Aug 2006 00:30:29 -0600
From: T. Joseph Carter [EMAIL PROTECTED]
To: David Andrews [EMAIL PROTECTED]
Subject: HTML filter on the lists

The filter you are using on text/html messages to the list really is very,
very broken.  First, it leaves parts of the HTML behind.  Second, it lies
about its output, claiming that all messages are now us-ascii (which
breaks character set conversion tools which need to know the original
character set in order to map to the correct one.)


Presumably the issue here is the conversion done by Content filtering
- convert_html_to_plaintext. The simplest solution is just to set
this to 'No' and allow the HTML to go to the list unchanged, but you
may not want to allow 'non-defanged' HTML or any HTML at all on your
list.

Another solution is to remove text/html from the MIME types allowed on
your list and thus force your members to post plain text or at least
multipart/alternative. See http://www.expita.com/nomime.html.

If you want to continue to convert HTML to plaintext, there are a few
issues. You don't say what Mailman version this is, but from your
user's complaint, it seems it is pre-2.1.7. In versions prior to
2.1.7, HTML that was quoted-printable or base64 encoded was not
decoded prior to passing to HTML_TO_PLAIN_TEXT_COMMAND which caused
many problems. If this is the issue, you need to upgrade. Beyond that,
the default for HTML_TO_PLAIN_TEXT_COMMAND is '/usr/bin/lynx -dump
%(filename)s'. This may not be appropriate. Redefining this in
mm_cfg.py to the command suggested by your user may or may not be a
solution because of the way Mailman/Handlers/MimeDel.py resets the
converted payload

-- 
Mark Sapiro [EMAIL PROTECTED]   The highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan

--
Mailman-Users mailing list
Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Security Policy: 
http://www.python.org/cgi-bin/faqw-mm.py?req=showamp;file=faq01.027.htp