Ok guys, I've completed a very crude html parser.

It reads a mail from stdin and writes it to stdout.  Sandesh, you will
have to take care of piping the data into and out of it.

It reads the mail header and if the content-type is text/html it will
parse it and try to convert to text (at least intelligent humans can read
it).  It also changes the content-type to text/plain so that mail readers
don't get confused.

I have only included support for <h?>, <p>, <div>, <br>, <ul><li>,
<ol><li> also a few escape chars (&nbsp;, &amp;, &lt;, &gt;,)

No pre support, and text wrapping at 72 chars is not very good. I hacked
this up in an hour and haven't bothered to test too much (I used two html
files and added a mail header to them).  They came out decently, and most
mail readers would either take care of any line wrapping that hasn't
happened or add scroll bars.

Anyone want to finish my job, feel free.  GPL holds.

Philip


=======================================================================

Nature always sides with the hidden flaw.

htmlparse.gz

Reply via email to