> 1) Why doesn't the category method raise an Exception, like the name method > does?
As Chris explains, the result category means "Other, Not Assigned". Python returns this category because it's the truth: for those characters, the value of the "category" property really *is* Cn; it means that they are not assigned. If you are wondering how unicodedata.c comes up with the result: the unassigned characters get a record index of 0, and that has a category value of 0, which is "Cn". > 2) Given that the category method doesn't currently raise an Exception, > please could someone explain how the category is calculated? I have tried to > figure it out based on the CPython code, but I have thus far failed, and I > would also prefer to have it explicitly defined, rather than mandating that > a Jython (.NET, etc) implementation uses the same (possibly non-optimal for > Java) data structures and algorithms. You definitely should *not* follow the Python implementation. Instead, the Unicode database is defined by the Unicode consortium, so the Unicode standard is the ultimate specification. To implement it in Java, I recommend to use java.lang.Character.getType. If that returns java.lang.Character.UNASSIGNED, return "Cn". Regards Martin -- http://mail.python.org/mailman/listinfo/python-list