Tom Christiansen <tchr...@perl.com> added the comment:

Guido van Rossum <rep...@bugs.python.org> wrote
   on Fri, 26 Aug 2011 21:16:57 -0000: 

> Yeah, this should be fixed in 3.3 and probably backported to 3.2
> and 2.7.  (There is already no guarantee that len(s) ==
> len(s.title()), right?)

Well, *I* don't know of any such guarantee, 
but I don't know Python very well.

In general, Unicode makes very few guarantees about casing.  Under full
casemapping, which is the only way to do the silly Turkish stuff amongst
quite a bit else, any of the three casemappings can change the length of
the string.

Other things you can't rely on are round tripping and "single paths".  By
roundtripping, just look at the two lowercase sigmas and think about how
you can't get back to one of them if you uppercase them both.  By single
paths, I mean that code that does some sort of conversion where it first
lowercases everything and then titlecases the first letter can produce
something different from titlecasing just the original first letter and
then lowercasing the rest of them.  That's because tc(x) and tc(lc(x)) can
be different.

--tom

----------
title: str.title()  is overzealous by upcasing combining marks inappropriately 
-> str.title() is overzealous by upcasing combining marks inappropriately

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue12737>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to