Re: xhtml encoding question

Ulrich Eckhardt Wed, 01 Feb 2012 01:23:10 -0800

Am 31.01.2012 19:09, schrieb Tim Arnold:

high_chars = {
    0x2014:'&mdash;', # 'EM DASH',
    0x2013:'&ndash;', # 'EN DASH',
    0x0160:'&Scaron;',# 'LATIN CAPITAL LETTER S WITH CARON',
    0x201d:'&rdquo;', # 'RIGHT DOUBLE QUOTATION MARK',
    0x201c:'&ldquo;', # 'LEFT DOUBLE QUOTATION MARK',
    0x2019:"&rsquo;", # 'RIGHT SINGLE QUOTATION MARK',
    0x2018:"&lsquo;", # 'LEFT SINGLE QUOTATION MARK',
    0x2122:'&trade;', # 'TRADE MARK SIGN',
    0x00A9:'&copy;', # 'COPYRIGHT SYMBOL',
}

You could use Unicode string literals directly instead of using thecodepoint, making it a bit more self-documenting and saving you thelater call to ord():


high_chars = {
    u'\u2014': '&mdash;',
    u'\u2013': '&ndash;',
    ...
}

for c in string:
    if ord(c) in high_chars:
        c = high_chars.get(ord(c))
    s += c
return s

Instead of checking if there is a replacement and then looking up thereplacement again, just use the default:


  for c in string:
      s += high_chars.get(c, c)

Alternatively, if you find that clearer, you could also check if thereturnvalue of get() is None to find out if there is a replacement:


  for c in string:
      r = high_chars.get(c)
      if r is None:
          s += c
      else:
          s += r


Uli

--
http://mail.python.org/mailman/listinfo/python-list

Re: xhtml encoding question

Reply via email to