Re: Normalize a polish L

John Machin Mon, 15 Oct 2007 15:24:39 -0700

On Oct 16, 2:33 am, Peter Bengtsson <[EMAIL PROTECTED]> wrote:
> In UTF8, \u0141 is a capital L with a little dash through it as can be
> seen in this image:http://static.peterbe.com/lukasz.png
>
> I tried this:>>> import unicodedata
> >>> unicodedata.normalize('NFKD', u'\u0141').encode('ascii','ignore')
>
> ''
>
> I was hoping it would convert it it 'L' because that's what it
> visually looks like. And I've seen it becoming a normal ascii L before
> in other programs such as Thunderbird.
>
> I also tried the other forms: 'NFC', 'NFKC', 'NFD', and 'NFKD' but
> none of them helped.
>
> What am I doing wrong?


The character in question is NOT composed (in the way that Unicode
means) of an 'L' and a little slash; hence the concepts of
"normalization" and "decomposition" don't apply.

To "asciify" such text, you need to build a look-up table that suits
your purpose. unicodedata.decomposition() is (accidentally) useful in
providing *some* of the entries for such a table.


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Normalize a polish L

Reply via email to