richard wrote:
> Leon wrote:
> > example:
> > s = ' ' --->
>
> That's technically not HTML encoding, that's replacing a perfectly valid
> space character with a *non-breaking* space character.
How can you tell?
s = 'Â' # non-breaking space
s = ' ' # normal space
s = 'á' # em-space
But you might want to do something like:
def escapechar(s):
import htmlentitydefs
n = ord(s)
if n < 128:
return s.encode('ascii')
elif n in htmlentitydefs.codepoint2name:
return '&%s;' % htmlentitydefs.codepoint2name[n]
else:
return '&#%d;' % ord(s)
This requires unicode strings, because unicode encodings have multi-byte
characters. Demonstration:
>>> f(u'Ã')
'ò'
>>> f(u'Å')
'ş'
>>> f(u's')
's'
yours,
Gerrit Holl.
--
Weather in Lulea / Kallax, Sweden 13/12 10:20:
-15.0ÂC wind 0.9 m/s NNW (34 m above NAP)
--
In the councils of government, we must guard against the acquisition of
unwarranted influence, whether sought or unsought, by the
military-industrial complex. The potential for the disastrous rise of
misplaced power exists and will persist.
-Dwight David Eisenhower, January 17, 1961
--
http://mail.python.org/mailman/listinfo/python-list