On Tue, May 07, 2002 at 01:43:43AM +0200, Miklos Bagi Jr. wrote: > Is there any way to filter out html mails and "convert" them to plain > text?
If I were asked to do this, my first question would be one of load. Light to moderate load, and I would strongly consider linking lynx's --dump option together with, say, sendmail's milters functionality. If the load were higher, I'd definately look at trying to embed the functionality into the mailserver's process itself, or a daemon process of some sort, since a fork()/exec() to strip html off of every incoming email would be pretty drastic on highly loaded systems. Note that you might run into problems with the MUA's idea of html differing from the concept of html as embodied in whatever solution you choose to use; something such as: <html>.....</html> might be required for your stripping tool to consider the contents as html, whereas something that is mostly plaintext but with a <script> embedded somewhere within -- in UTF-8, ASCII, locale-specific encodings, etc -- might be counted as html in your MUA but not the stripper. (I think that last paragraph is easier said as: Don't expect stripping to be a magic bullet; the real answer is to use MUAs that are more robust against untrusted input.) -- New GPG key coming soon, please grab D9B0A099 before this one expires.
msg00288/pgp00000.pgp
Description: PGP signature