"Adam Atlas" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > As far as I know, there isn't a standard idiom to do this, but it's > still a one-liner. Untested, but I think this should work: > > import re > from htmlentitydefs import name2codepoint > def htmlentitydecode(s): > return re.sub('&(%s);' % '|'.join(name2codepoint), lambda m: > name2codepoint[m.group(1)], s) >
'&(%s);' won't quite work: HTML (and, I assume, SGML, but not XHTML being XML) allows you to skip the semicolon after the entity if it's followed by a white space (IIRC). Should this be respected, it looks more like this: r'&(%s)([;\s]|$)' Also, this completely ignores non-name entities as also found in XML. (eg %x20; for ' ' or so) Maybe some part of the HTMLParser module is useful, I wouldn't know. IMHO, these particular batteries aren't too commonly needed. Regards, Thomas Jollans -- http://mail.python.org/mailman/listinfo/python-list