Re: filtering html tags from email

2005-02-22 Thread Louis LeBlanc
On 02/22/05 11:16 PM, Mike Hauber sat at the `puter and typed:
> Without going through the hassle of setting up proxy servers, 
> isn't there a way that one can filter out html tags from a 
> message (say, pipe the email through the filter from kmail for 
> instance?)
> 
> Perhaps I'm looking too hard for it, but I didn't see anything in 
> the ports tree except for /mail/nohtml.  I tried to pipe a html 
> message through nohtml.py from kmail, but doesn't seem to work 
> (although I'm getting no errors from kmail's filter log).
> 
> Any ideas?  Thx.

Mutt saves to a temp file then calls the following command:
lynx -localhost -dump %s
where '%s' is the temporary file you saved it to.

You could also just pipe it to the following:
lynx -localhost -dump -stdin

the -localhost argument prevents lynx from simply following links
external to your machine - helpful to avoid generating hits for
unscrupulous spammers that get paid for hits on a URL.

Just make sure lynx is installed.

Lou
-- 
Louis LeBlanc  FreeBSD-at-keyslapper-DOT-net
Fully Funded Hobbyist,   KeySlapper Extrordinaire :)
Please send off-list email to: leblanc at keyslapper d.t net
Key fingerprint = C5E7 4762 F071 CE3B ED51  4FB8 AF85 A2FE 80C8 D9A2

Habit is habit, and not to be flung out of the window by any man, but
coaxed down-stairs a step at a time.
-- Mark Twain, "Pudd'nhead Wilson's Calendar


pgpwHmOTn9WRn.pgp
Description: PGP signature


Re: filtering HTML tags from email

2005-02-22 Thread Mike Hauber
On Wednesday 23 February 2005 12:50 am, Louis LeBlanc wrote:
> On 02/22/05 11:16 PM, Mike Hauber sat at the `puter and typed:
> > Without going through the hassle of setting up proxy servers,
> > isn't there a way that one can filter out html tags from a
> > message (say, pipe the email through the filter from kmail
> > for instance?)
> >
> > Perhaps I'm looking too hard for it, but I didn't see
> > anything in the ports tree except for /mail/nohtml.  I tried
> > to pipe a html message through nohtml.py from kmail, but
> > doesn't seem to work (although I'm getting no errors from
> > kmail's filter log).
> >
> > Any ideas?  Thx.
>
> Mutt saves to a temp file then calls the following command:
> lynx -localhost -dump %s
> where '%s' is the temporary file you saved it to.
>
> You could also just pipe it to the following:
> lynx -localhost -dump -stdin
>
> the -localhost argument prevents lynx from simply following
> links external to your machine - helpful to avoid generating
> hits for unscrupulous spammers that get paid for hits on a URL.
>
> Just make sure lynx is installed.
>
> Lou

Okay, so to be sure, there is no filter (as of yet) to simply open 
an email file, strip the HTML tags, and resave it?  I'm not 
complaining, as this may actually be something I'm capable of 
creating myself.  (I'll make this my first python project. :) )

I'm just making sure I'm not missing anything obvious before I 
start working on it.  It's irritating to spend time on something 
only to find out that it's already been done.

Thanks,

Mike

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: filtering HTML tags from email

2005-02-23 Thread Simon Barner
Mike Hauber wrote:
> > Mutt saves to a temp file then calls the following command:
> > lynx -localhost -dump %s
> > where '%s' is the temporary file you saved it to.
> >
> > You could also just pipe it to the following:
> > lynx -localhost -dump -stdin
> >
> > the -localhost argument prevents lynx from simply following
> > links external to your machine - helpful to avoid generating
> > hits for unscrupulous spammers that get paid for hits on a URL.
> >
> > Just make sure lynx is installed.
> >
> > Lou
> 
> Okay, so to be sure, there is no filter (as of yet) to simply open 
> an email file, strip the HTML tags, and resave it?  I'm not 
> complaining, as this may actually be something I'm capable of 
> creating myself.  (I'll make this my first python project. :) )
> 
> I'm just making sure I'm not missing anything obvious before I 
> start working on it.  It's irritating to spend time on something 
> only to find out that it's already been done.

You probably could do it also with procmail + lynx (or w3m) during the
delivery process.

Another possibility is to have the following entries in your ~/.mailcap
file, which converts html, doc and rtf to plain text.

text/html; w3m -dump -T text/html; copiousoutput;
application/msword; antiword %s; copiousoutput
application/rtf; rtfreader %s; copiousoutput

As for your python script: I don't think that just stripping everything
matching the following expressions is correct because they might appear
in non html emails, too: <.*> <\/.*> (perl syntax).

At least, you'd need a list of valid html tags, i.e. a regular grammar
for html:  |  |  |  | ... (BNF notation).

While this is not too hard to implement (and possibly a good project to
learn a new programming language), this would be too much work for
something that can be achieved easier with existing tools (that is, for
me, personally ;-)

Simon


pgpgUlVMmAaoT.pgp
Description: PGP signature


Re: filtering HTML tags from email

2005-02-23 Thread Mike Hauber
On Wednesday 23 February 2005 04:43 am, Simon Barner wrote:
> > > You could also just pipe it to the following:
> > > lynx -localhost -dump -stdin
> > >
> > > Lou
> >
> > Okay, so to be sure, there is no filter (as of yet) to simply
> > open an email file, strip the HTML tags, and resave it?  I'm
> > not complaining, as this may actually be something I'm
> > capable of creating myself.  (I'll make this my first python
> > project. :) )
> >
>
> You probably could do it also with procmail + lynx (or w3m)
> during the delivery process.
>
> Another possibility is to have the following entries in your
> ~/.mailcap file, which converts html, doc and rtf to plain
> text.
>
> text/html; w3m -dump -T text/html; copiousoutput;
> application/msword; antiword %s; copiousoutput
> application/rtf; rtfreader %s; copiousoutput
>
> Simon

Just after destroying the headers in who-knows-how-many emails 
(backed up...  whew!), I finally realized that piping the 
messages though html2text (or lynx or w3m) was probably not such 
a great idea after all.  :)

This is something that really should be implemented as part of 
kmail itself (it would help to remain compatable with both 
maildir/mbox).  I'll continue to be frustrated with html2text for 
a while (it's a pretty cool tool), and who knows...  Mayhaps I'll 
figure out a reasonable way to set it up so that everything is 
done automatically.

Thanks for the feeds.

Mike
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


RE: filtering HTML tags from email

2005-02-24 Thread Ted Mittelstaedt


> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] Behalf Of Mike Hauber
> Sent: Wednesday, February 23, 2005 4:19 AM
> To: freebsd-questions@freebsd.org
> Subject: Re: filtering HTML tags from email
> 
> 
> Just after destroying the headers in who-knows-how-many emails 
> (backed up...  whew!), I finally realized that piping the 
> messages though html2text (or lynx or w3m) was probably not such 
> a great idea after all.  :)
> 
> This is something that really should be implemented as part of 
> kmail itself (it would help to remain compatable with both 
> maildir/mbox).  I'll continue to be frustrated with html2text for 
> a while (it's a pretty cool tool), and who knows...  Mayhaps I'll 
> figure out a reasonable way to set it up so that everything is 
> done automatically.

Mike, why are you torturing yourself when http://www.mimedefang.org/
does this?  Afraid of Sendmail?

Ted
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"