On Oct 16, 2:33 am, Peter Bengtsson <[EMAIL PROTECTED]> wrote: > In UTF8, \u0141 is a capital L with a little dash through it as can be > seen in this image:http://static.peterbe.com/lukasz.png > > I tried this:>>> import unicodedata > >>> unicodedata.normalize('NFKD', u'\u0141').encode('ascii','ignore') > > '' > > I was hoping it would convert it it 'L' because that's what it > visually looks like. And I've seen it becoming a normal ascii L before > in other programs such as Thunderbird. > > I also tried the other forms: 'NFC', 'NFKC', 'NFD', and 'NFKD' but > none of them helped. > > What am I doing wrong?
The character in question is NOT composed (in the way that Unicode means) of an 'L' and a little slash; hence the concepts of "normalization" and "decomposition" don't apply. To "asciify" such text, you need to build a look-up table that suits your purpose. unicodedata.decomposition() is (accidentally) useful in providing *some* of the entries for such a table. -- http://mail.python.org/mailman/listinfo/python-list