On 20 Jul 2006 15:12:27 GMT, Duncan Booth <[EMAIL PROTECTED]> wrote: > Ksenia Marasanova wrote: > > i want to send plain text alternative of html email, and would prefer > > to do it automatically from HTML source. > > Any hints? > > Use htmllib: > > >>> import htmllib, formatter, StringIO > >>> def cleanup(s): > out = StringIO.StringIO() > p = htmllib.HTMLParser( > formatter.AbstractFormatter(formatter.DumbWriter(out))) > p.feed(s) > p.close() > if p.anchorlist: > print >>out > for idx,anchor in enumerate(p.anchorlist): > print >>out, "\n[%d]: %s" % (idx+1,anchor) > return out.getvalue() > > >>> print cleanup('''<div><h1>Title</h1><p>This is a <br > />test</p></div>''') > > Title > > This is a > test > >>> print cleanup('''<div><h1>Title</h1><p>This is a <br />test with <a > href="http://python.org">a link</a> to the Python homepage</p></div>''') > > Title > > This is a > test with a link[1] to the Python homepage > > [1]: http://python.org >
cleanup() doesn't handle script and styles too well. html2text will do a much better job of these and give a more structured output (compatible with Markdown) http://www.aaronsw.com/2002/html2text/ >>> import html2text >>> print html2text.html2text('''<div><h1>Title</h1><p>This is a <br />test with <a href="http://python.org">a link</a> to the Python homepage</p></div>''') # Title This is a test with [a link][1] to the Python homepage [1]: http://python.org HTH :) -- http://mail.python.org/mailman/listinfo/python-list