Ulrich Eckhardt wrote: > Am 31.01.2012 19:09, schrieb Tim Arnold: >> high_chars = { >> 0x2014:'—', # 'EM DASH', >> 0x2013:'–', # 'EN DASH', >> 0x0160:'Š',# 'LATIN CAPITAL LETTER S WITH CARON', >> 0x201d:'”', # 'RIGHT DOUBLE QUOTATION MARK', >> 0x201c:'“', # 'LEFT DOUBLE QUOTATION MARK', >> 0x2019:"’", # 'RIGHT SINGLE QUOTATION MARK', >> 0x2018:"‘", # 'LEFT SINGLE QUOTATION MARK', >> 0x2122:'™', # 'TRADE MARK SIGN', >> 0x00A9:'©', # 'COPYRIGHT SYMBOL', >> } > > You could use Unicode string literals directly instead of using the > codepoint, making it a bit more self-documenting and saving you the > later call to ord(): > > high_chars = { > u'\u2014': '—', > u'\u2013': '–', > ... > } > >> for c in string: >> if ord(c) in high_chars: >> c = high_chars.get(ord(c)) >> s += c >> return s > > Instead of checking if there is a replacement and then looking up the > replacement again, just use the default: > > for c in string: > s += high_chars.get(c, c) > > Alternatively, if you find that clearer, you could also check if the > returnvalue of get() is None to find out if there is a replacement: > > for c in string: > r = high_chars.get(c) > if r is None: > s += c > else: > s += r
It doesn't matter for the OP (see Stefan Behnel's post), but If you want to replace characters in a unicode string the best way is probably the translate() method: >>> print u"\xa9\u2122" ©™ >>> u"\xa9\u2122".translate({0xa9: u"©", 0x2122: u"™"}) u'©™' -- http://mail.python.org/mailman/listinfo/python-list