Asterix wrote: > how could I test that those 2 strings are the same: > > 'séd' (repr is 's\\xc3\\xa9d') > > u'séd' (repr is u's\\xe9d')
You may also want to look at unicodedata.normalize(). For example, é can be represented multiple ways: >>> import unicodedata >>> unicodedata.normalize('NFC', u'é') u'\xe9' >>> unicodedata.normalize('NFD', u'é') u'e\u0301' >>> u'\xe9' == u'e\u0301' False The first form is "composed", just being U+00E9 (LATIN SMALL LETTER E WITH ACUTE). The second form is "decomposed", being made up of U+0065 (LATIN SMALL LETTER E) and U+0301 (COMBINING ACUTE ACCENT). Even though they represent the same thing to a human, they don't compare as equal. But if you normalize them to the same form, they will. For more information, look at the unicodedata module's documentation: <http://docs.python.org/lib/module-unicodedata.html> -- -- http://mail.python.org/mailman/listinfo/python-list