En Mon, 29 Jan 2007 00:05:24 -0300, Steven D'Aprano <[EMAIL PROTECTED]> escribió:
> I have a string containing Latin-1 characters: > > s = u"© and many more..." > > I want to convert it to HTML entities: > > result => > "© and many more..." > Module htmlentitydefs contains the tables you're looking for, but you need a few transforms: <code> # -*- coding: iso-8859-15 -*- from htmlentitydefs import codepoint2name unichr2entity = dict((unichr(code), u'&%s;' % name) for code,name in codepoint2name.iteritems() if code!=38) # exclude "&" def htmlescape(text, d=unichr2entity): if u"&" in text: text = text.replace(u"&", u"&") for key, value in d.iteritems(): if key in text: text = text.replace(key, value) return text print '%r' % htmlescape(u'hello') print '%r' % htmlescape(u'"©® áé&ö <²³>') </code> Output: u'hello' u'"©® áé&ö <²³>' The result is an unicode object, with all known entities replaced. It does not handle missing, unknown entities - as the docs for htmlentitydefs say, "the definition provided here contains all the entities defined by XHTML 1.0 that can be handled using simple textual substitution in the Latin-1 character set (ISO-8859-1)." -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list