Am 31.01.2012 19:09, schrieb Tim Arnold:
high_chars = {
0x2014:'—', # 'EM DASH',
0x2013:'–', # 'EN DASH',
0x0160:'Š',# 'LATIN CAPITAL LETTER S WITH CARON',
0x201d:'”', # 'RIGHT DOUBLE QUOTATION MARK',
0x201c:'“', # 'LEFT DOUBLE QUOTATION MARK',
0x2019:"’", # 'RIGHT SINGLE QUOTATION MARK',
0x2018:"‘", # 'LEFT SINGLE QUOTATION MARK',
0x2122:'™', # 'TRADE MARK SIGN',
0x00A9:'©', # 'COPYRIGHT SYMBOL',
}
You could use Unicode string literals directly instead of using the
codepoint, making it a bit more self-documenting and saving you the
later call to ord():
high_chars = {
u'\u2014': '—',
u'\u2013': '–',
...
}
for c in string:
if ord(c) in high_chars:
c = high_chars.get(ord(c))
s += c
return s
Instead of checking if there is a replacement and then looking up the
replacement again, just use the default:
for c in string:
s += high_chars.get(c, c)
Alternatively, if you find that clearer, you could also check if the
returnvalue of get() is None to find out if there is a replacement:
for c in string:
r = high_chars.get(c)
if r is None:
s += c
else:
s += r
Uli
--
http://mail.python.org/mailman/listinfo/python-list