Rares Vernica wrote: > How can I unescape HTML entities like " "? > > I know about xml.sax.saxutils.unescape() but it only deals with > "&", "<", and ">". > > Also, I know about htmlentitydefs.entitydefs, but not only this > dictionary is the opposite of what I need, it does not have > " ".
How about something like: #v+ #!/usr/bin/env/python '''dehtml.py''' import re import htmlentitydef myrx = re.compile('&(' + '|'.join(htmlentitydefs.name2codepoint.keys()) + ');') def dehtml(s): return re.sub( myrx, lambda m: unichr(htmlentitydefs.name2codepoint[m.group(1)]), s ) # end def dehtml if __name__ == '__main__': import sys print dehtml(sys.stdin.read()).encode('utf-8') # end if #v- E.g.: #v+ $ echo 'frække frølår' | ./dehtml.py frække frølår $ #v- -- Klaus Alexander Seistrup Copenhagen, Denmark, EU http://klaus.seistrup.dk/ -- http://mail.python.org/mailman/listinfo/python-list