[issue7008] str.title() misbehaves with apostrophes
Raymond Hettinger rhettin...@users.sourceforge.net added the comment: I agree with the OP that str.title should be made smarter. As it stands, it is a likely bug factory that would pass unittests, then generate unpleasant results with real user inputs. Extending on Thomas's comment, I think string.capwords() needs to be deprecated and eliminated. It is an egregious hack that has unfortunate effects such as dropping runs for repeated spaces and incorrectly handling strings in quotes. As it stands, we have two methods that both don't quite do what we would really want in a title casing method (correct handling of apostrophe's and quotation marks, keeping the string length unchanged, and only changing desired letters from lower to uppercase with no other side-effects). -- nosy: +rhettinger versions: +Python 2.7, Python 3.2 -Python 2.6 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7008 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7008] str.title() misbehaves with apostrophes
Thomas W. Barr t...@rice.edu added the comment: If correct handling of apostrophe's and quotation marks, keeping the string length unchanged, and only changing desired letters from lower to uppercase with no other side-effects is the criterion we want, then what I suggested (toupper() the first character, and any character that follows a space or punctuation character) should work. (Unless I'm missing something.) Do we want to tolower() all other characters, like the interpreter does now? I can make a test and patch for this if this is what we decide. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7008 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7008] str.title() misbehaves with apostrophes
Raymond Hettinger rhettin...@users.sourceforge.net added the comment: I'm still researching what other languages do. MS-Excel matches what Python currently does. Django uses the python version and then fixes-up apostrophe errors: title=lambda value: re.sub(([a-z])'([A-Z]), lambda m: m.group(0).lower(), value.title()). It would also be nice to handle hyphenates like xray -- X-ray. Am thinking that it would be nice if the user could pass-in an optional argument to list all desired characters to prevent transitions (such as apostrophes and hyphens). A broader solution would be to replace string.capwords() with a more sophisticated set of rules that generally match what people are really trying to accomplish with title casing: http://aitech.ac.jp/~ckelly/midi/help/caps.html http://search.cpan.org/dist/Text-Capitalize/Capitalize.pm Headline Style in the Chicago Manual of Style or Associate Pressd Stylebook: http://grammar.about.com/b/2008/04/11/rules-for-capitalizing-the-words-in-a-title.htm Any such attempt at a broad solution needs to provide ways for users to modify the list of exception words and options for quoted text. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7008 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7008] str.title() misbehaves with apostrophes
Guido van Rossum gu...@python.org added the comment: Raymond, please refrain from emotional terms like bug factory. I have nothing to say about whether string.capwords() should be removed, but I want to note that it does a split on whitespace and then rejoins using a single space, so that string.capwords('A B\tC\r\nD') returns 'A B C D'. The title() method exists primarily because the Unicode standard has a definition of title case. I wouldn't want to change its default behavior because there is no reasonable behavior that isn't locale- dependent, and Unicode methods shouldn't depend on locale; and even then it won't be perfect, as the O'Brien example shows. Also note that .title() matches .istitle() in the sense that x.title().istitle() is supposed to be true (except in end cases like a string containing no letters). I worry that providing an API that adds a way to specify a set of characters to be treated as letters (for the purpose of deciding where words start) will just make the bugs in apps harder to find because the examples are rarer (like l'Aperitif or O'Brien -- or RSVP for that matter). With the current behavior at least app authors will easily notice the problem, decide whether it matters to them, and implement their own algorithm if they do. And they are free to be as elaborate or simplistic as they care. What's a realistic use case for .title() anyway? (Proposal: close as won't fix.) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7008 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com