
How does your code deal with ' like entities?


Klaus Alexander Seistrup wrote:
> Rares Vernica wrote:
>> How can I unescape HTML entities like " "?
>> I know about xml.sax.saxutils.unescape() but it only deals with
>> "&", "<", and ">".
>> Also, I know about htmlentitydefs.entitydefs, but not only this 
>> dictionary is the opposite of what I need, it does not have 
>> " ".
> How about something like:
> #v+
> #!/usr/bin/env/python
> '''dehtml.py'''
> import re
> import htmlentitydef
> myrx = re.compile('&(' + '|'.join(htmlentitydefs.name2codepoint.keys()) + 
> ');')
> def dehtml(s):
>     return re.sub(
>         myrx,
>         lambda m: unichr(htmlentitydefs.name2codepoint[m.group(1)]),
>         s
>     )
> # end def dehtml
> if __name__ == '__main__':
>     import sys
>     print dehtml(sys.stdin.read()).encode('utf-8')
> # end if
> #v-
> E.g.:
> #v+
> $ echo 'frække frølår' | ./dehtml.py
> frække frølår
> $ 
> #v-


Reply via email to