Re: unescape HTML entities

Fredrik Lundh Sat, 28 Oct 2006 19:05:12 -0700

Rares Vernica wrote:

> How can I unescape HTML entities like "&nbsp;"?


run it through an HTML parser.

or use something like this:

     http://effbot.org/zone/re-sub.htm#strip-html

(if you want to keep elements, change the regular expression in the 
re.sub call to "(?s)&#?\w+;")

> I know about xml.sax.saxutils.unescape() but it only deals with "&amp;", 
> "&lt;", and "&gt;".
> 
> Also, I know about htmlentitydefs.entitydefs, but not only this 
> dictionary is the opposite of what I need, it does not have "&nbsp;".

 >>> htmlentitydefs.entitydefs.get("nbsp")
'\xa0'

</F>

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: unescape HTML entities

Reply via email to