Python has a very good support of unicode, utf8, encodings ... But I have some difficulties with the concepts and the vocabulary. The documentation is not bad, but for example in reading http://docs.python.org/lib/module-unicodedata.html I had a long time to figure out what unicodedata.digit(unichr) would mean, a simple example is badly lacking.
So I wrote the following script : #!/usr/bin/env python """Example of use of the unicodedata module http://docs.python.org/lib/module-unicodedata.html """ import unicodedata import sys # outcodec = 'latin_1' outcodec = 'iso8859_15' if len(sys.argv) > 1: outcodec = sys.argv[1] for c in range(256): uc = unichr(c) uname = unicodedata.name(uc, None) if uname: unfd = unicodedata.normalize('NFD', uc).encode(outcodec, 'replace') unfc = unicodedata.normalize('NFC', uc).encode(outcodec, 'replace') print str(c).ljust(3), uname.ljust(42), unfd.ljust(2), unfc.ljust(2), \ unicodedata.category(uc), unicodedata.numeric(uc, None) and here are some samples of output 44 COMMA , , Po None 45 HYPHEN-MINUS - - Pd None 46 FULL STOP . . Po None 47 SOLIDUS / / Po None 48 DIGIT ZERO 0 0 Nd 0.0 49 DIGIT ONE 1 1 Nd 1.0 50 DIGIT TWO 2 2 Nd 2.0 It seems that 'Nd' category means Numerical digit doh! 64 COMMERCIAL AT @ @ Po None 65 LATIN CAPITAL LETTER A A A Lu None 66 LATIN CAPITAL LETTER B B B Lu None 'Lu' should read 'Letter upper' ? 94 CIRCUMFLEX ACCENT ^ ^ Sk None 95 LOW LINE _ _ Pc None 96 GRAVE ACCENT ` ` Sk None 97 LATIN SMALL LETTER A a a Ll None 98 LATIN SMALL LETTER B b b Ll None 'Ll' == Letter lower 124 VERTICAL LINE | | Sm None 125 RIGHT CURLY BRACKET } } Pe None 126 TILDE ~ ~ Sm None 160 NO-BREAK SPACE Zs None 161 INVERTED EXCLAMATION MARK ¡ ¡ Po None What a gap ! 245 LATIN SMALL LETTER O WITH TILDE o? õ Ll None 246 LATIN SMALL LETTER O WITH DIAERESIS o? ö Ll None 247 DIVISION SIGN ÷ ÷ Sm None 248 LATIN SMALL LETTER O WITH STROKE ø ø Ll None 'Sm' should read 'sign mathematics' ? I think that such code snippets should be included in the documentation or in a Wiki. Regards Sorry for bad english, I'm not a native speaker. -- http://mail.python.org/mailman/listinfo/python-list