Re: Incorrect title case?

John Machin Fri, 16 Jan 2009 18:01:22 -0800

On Jan 17, 9:07 am, MRAB <[email protected]> wrote:
> Python 2.6.1
>
> I've just found that the following 4 Unicode characters/codepoints don't
> behave as I'd expect: ǅ (U+01C5), ǈ (U+01C8), ǋ (U+01CB), ǲ (U+01F2).
>
> For example, u"\u01C5".istitle() returns True and
> unicodedata.category(u"\u01C5") returns "Lt", but u"\u01C5".title()
> returns u'\u01C4', which is the uppercase equivalent. Are these mistakes
> in the Unicode database?


Doesn't look like it. AFAICT it's a mistake in Objects/unicodetype.c,
function _PyUnicode_ToTitlecase.

See 
http://svn.python.org/view/python/trunk/Objects/unicodectype.c?rev=66362&view=markup

The code that says:
    if (ctype->title)
        delta = ctype->title;
    else
        delta = ctype->upper;
should IMHO merely be:
    delta = ctype->title;

A value of zero for ctype->title should be interpreted simply as the
offset to add to the ordinal, as it is in the sibling _PyUnicode_To
(Upper|Lower)case functions. See also Tools/unicode/makeunicodedata.py
which treats upper, lower and title identically when preparing the
tables used by those 3 functions.

AFAICT making that change will fix the problem for those four
characters and not ruin any others.

The error that you noticed occurs as far back as I've looked (2.1) and
also occurs in 3.0.

Cheers,
John
--
http://mail.python.org/mailman/listinfo/python-list

Re: Incorrect title case?

Reply via email to