On Tue, May 07, 2002 at 01:43:43AM +0200, Miklos Bagi Jr. wrote:
> Is there any way to filter out html mails and "convert" them to plain
> text?

If I were asked to do this, my first question would be one of load.
Light to moderate load, and I would strongly consider linking lynx's
--dump option together with, say, sendmail's milters functionality.

If the load were higher, I'd definately look at trying to embed the
functionality into the mailserver's process itself, or a daemon process
of some sort, since a fork()/exec() to strip html off of every incoming
email would be pretty drastic on highly loaded systems.

Note that you might run into problems with the MUA's idea of html
differing from the concept of html as embodied in whatever solution you
choose to use; something such as:
<html>.....</html> might be required for your stripping tool to consider
the contents as html, whereas something that is mostly plaintext but
with a <script> embedded somewhere within -- in UTF-8, ASCII,
locale-specific encodings, etc -- might be counted as html in your MUA
but not the stripper.

(I think that last paragraph is easier said as: Don't expect stripping
to be a magic bullet; the real answer is to use MUAs that are more
robust against untrusted input.)

-- 
New GPG key coming soon, please grab D9B0A099 before this one expires.

Attachment: msg00288/pgp00000.pgp
Description: PGP signature

Reply via email to