Re: xhtml encoding question

Peter Otten Wed, 01 Feb 2012 01:37:25 -0800

Ulrich Eckhardt wrote:

> Am 31.01.2012 19:09, schrieb Tim Arnold:
>> high_chars = {
>>     0x2014:'&mdash;', # 'EM DASH',
>>     0x2013:'&ndash;', # 'EN DASH',
>>     0x0160:'&Scaron;',# 'LATIN CAPITAL LETTER S WITH CARON',
>>     0x201d:'&rdquo;', # 'RIGHT DOUBLE QUOTATION MARK',
>>     0x201c:'&ldquo;', # 'LEFT DOUBLE QUOTATION MARK',
>>     0x2019:"&rsquo;", # 'RIGHT SINGLE QUOTATION MARK',
>>     0x2018:"&lsquo;", # 'LEFT SINGLE QUOTATION MARK',
>>     0x2122:'&trade;', # 'TRADE MARK SIGN',
>>     0x00A9:'&copy;', # 'COPYRIGHT SYMBOL',
>> }
> 
> You could use Unicode string literals directly instead of using the
> codepoint, making it a bit more self-documenting and saving you the
> later call to ord():
> 
> high_chars = {
>      u'\u2014': '&mdash;',
>      u'\u2013': '&ndash;',
>      ...
> }
> 
>> for c in string:
>>     if ord(c) in high_chars:
>>         c = high_chars.get(ord(c))
>>     s += c
>> return s
> 
> Instead of checking if there is a replacement and then looking up the
> replacement again, just use the default:
> 
>    for c in string:
>        s += high_chars.get(c, c)
> 
> Alternatively, if you find that clearer, you could also check if the
> returnvalue of get() is None to find out if there is a replacement:
> 
>    for c in string:
>        r = high_chars.get(c)
>        if r is None:
>            s += c
>        else:
>            s += r


It doesn't matter for the OP (see Stefan Behnel's post), but If you want to 
replace characters in a unicode string the best way is probably the 
translate() method:

>>> print u"\xa9\u2122"
©™
>>> u"\xa9\u2122".translate({0xa9: u"&copy;", 0x2122: u"&trade;"})
u'&copy;&trade;'


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: xhtml encoding question

Reply via email to