[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2021-07-24 Thread Terry J. Reedy
Terry J. Reedy added the comment: Which comes out 'Tr̥Tīyā'. The underdot '̥' is '0x325' -- ___ Python tracker ___ ___

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2021-07-24 Thread Vishvas Vasuki
Vishvas Vasuki added the comment: This case still fails with 3.9 - 'Tr̥tīyā'.title() -- nosy: +vishvas.vasuki ___ Python tracker ___

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2020-10-26 Thread STINNER Victor
Change by STINNER Victor : -- nosy: -vstinner ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2020-10-26 Thread Irit Katriel
Irit Katriel added the comment: You're right, I see that too when I don't tamper with the test. -- components: +Library (Lib) ___ Python tracker ___

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2020-10-25 Thread Guido van Rossum
Guido van Rossum added the comment: Are you sure? Running Ezio's titletest.py, I get this output (note that the UCD major version is in the double digits so the test for that misfires :-). titletest.py: Please set your PYTHONIOENCODING envariable to utf8 WARNING: Your old UCD is out of date,

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2020-10-25 Thread Irit Katriel
Irit Katriel added the comment: Of the examples given two seem ok now, but the Istanbul one is still wrong: >>> "déme un café".title() 'Déme Un Café' >>> "ᾲ στο διάολο".title() 'Ὰͅ Στο Διάολο' >>> >>> "i̇stanbul".title() 'İStanbul' -- nosy: +iritkatriel versions: +Python 3.10,

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-10-18 Thread Florent Xicluna
Changes by Florent Xicluna florent.xicl...@gmail.com: -- nosy: +flox ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12737 ___ ___ Python-bugs-list

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-10-01 Thread Martin v . Löwis
Martin v. Löwis mar...@v.loewis.de added the comment: * Word characters are Alphabetic + Mn+Mc+Me + Nd + Pc. Where did you get that definition from? UTS#18 defines word_character, which is Alphabetic + U+200C + U+200D (i.e. not including marks, but including those I think you are looking

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-10-01 Thread Tom Christiansen
Tom Christiansen tchr...@perl.com added the comment: Martin v. Löwis rep...@bugs.python.org wrote on Sat, 01 Oct 2011 10:59:48 -: * Word characters are Alphabetic + Mn+Mc+Me + Nd + Pc. Where did you get that definition from? UTS#18 defines word_character, which is Alphabetic +

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-10-01 Thread Martin v . Löwis
Martin v. Löwis mar...@v.loewis.de added the comment: As for terminology: I think the documentation should continue to speak about words and letters, and then define what is meant in this context. It's not that the Unicode consortium invented the term letter, so we should use it more

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-09-30 Thread Martin v . Löwis
Martin v. Löwis mar...@v.loewis.de added the comment: Martin, do you think that str.title() should follow the Unicode standard? I don't think that follow the Unicode standard has any meaning in this context: the Unicode standard doesn't specify (AFAIK) what a .title() method in a programming

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-09-30 Thread Tom Christiansen
Tom Christiansen tchr...@perl.com added the comment: Martin v. Löwis mar...@v.loewis.de added the comment: Split S into words. Change the first letter in a word to upper-case, Except that I think you actually mean that the first letter is changed into titlecase not uppercase. One might

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-09-30 Thread Guido van Rossum
Guido van Rossum gu...@python.org added the comment: I like how we're actually converging on an implementable and maximally-useful algorithm. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12737

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-09-29 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: After PEP 393 the result is still the same (I attached a slightly improved version of the script): titlecase of 'deme un cafe' should be 'Deme Un Cafe' not 'DeMe Un Cafe' titlecase of 'istanbul' should be 'Istanbul' not

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-09-18 Thread Martin v . Löwis
Martin v. Löwis mar...@v.loewis.de added the comment: Tom: it's intentional that .title() doesn't use traditional word break algorithms. In 2.x, foo3bar.title() is Foo3Bar, i.e. the 3 counts as a word end. So neither UTS#18 \w nor UAX#29 apply. So in UTS#18 terminology, .title() matches more

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-09-17 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: I think string methods (and other parts of the stdlib) assume NFC and leave normalization to NFC up to the user. Before fixing str.title() we should take a more general decision about handling strings that use other normalization forms.

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-08-26 Thread Guido van Rossum
Guido van Rossum gu...@python.org added the comment: Yeah, this should be fixed in 3.3 and probably backported to 3.2 and 2.7. (There is already no guarantee that len(s) == len(s.title()), right?) -- nosy: +gvanrossum ___ Python tracker

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-08-26 Thread Tom Christiansen
Tom Christiansen tchr...@perl.com added the comment: Guido van Rossum rep...@bugs.python.org wrote on Fri, 26 Aug 2011 21:16:57 -: Yeah, this should be fixed in 3.3 and probably backported to 3.2 and 2.7. (There is already no guarantee that len(s) == len(s.title()), right?) Well,

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-08-15 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: See also #12746. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12737 ___ ___

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-08-15 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: So the issue here is that while using combing chars, str.title() fails to titlecase the string properly. The algorithm implemented by str.title() [0] is quite simple: it loops through the code units, and uppercases all the chars that

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-08-13 Thread Antoine Pitrou
Changes by Antoine Pitrou pit...@free.fr: -- nosy: +haypo, loewis stage: - needs patch versions: +Python 3.3 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12737 ___

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-08-12 Thread Terry J. Reedy
Terry J. Reedy tjre...@udel.edu added the comment: I changed the title because 'string' is a module that once contained the functions that are now attached to the str class as methods. So 'string.title' is an obsolete attribute reference. -- nosy: +terry.reedy title: string.title()