Tom Christiansen <tchr...@perl.com> added the comment: Guido van Rossum <rep...@bugs.python.org> wrote on Fri, 26 Aug 2011 21:16:57 -0000:
> Yeah, this should be fixed in 3.3 and probably backported to 3.2 > and 2.7. (There is already no guarantee that len(s) == > len(s.title()), right?) Well, *I* don't know of any such guarantee, but I don't know Python very well. In general, Unicode makes very few guarantees about casing. Under full casemapping, which is the only way to do the silly Turkish stuff amongst quite a bit else, any of the three casemappings can change the length of the string. Other things you can't rely on are round tripping and "single paths". By roundtripping, just look at the two lowercase sigmas and think about how you can't get back to one of them if you uppercase them both. By single paths, I mean that code that does some sort of conversion where it first lowercases everything and then titlecases the first letter can produce something different from titlecasing just the original first letter and then lowercasing the rest of them. That's because tc(x) and tc(lc(x)) can be different. --tom ---------- title: str.title() is overzealous by upcasing combining marks inappropriately -> str.title() is overzealous by upcasing combining marks inappropriately _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue12737> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com