El vie, 02-11-2007 a las 18:35 +0100, Erich Schubert escribió: > Package: exfalso > Version: 1.0-1 > Severity: normal > Tags: Patch > > I'm not very happy with the included Title-Case function of > exfalso/quodlibet. For example, (test) is not being title-cased > properly, since the t isn't preceded by a space.
I'm attaching a version of title that uses the unicodedata module to find word boundaries as an alternative approach. I included some tests, which show that both Unicode and 8-bit strings are handled correctly according to Unicode rules (which for Python might or might not consider the current locale, I'm not sure, to be honest). At least it works for your example. It would be great if Python exported Unicode properties such as "Word-Break," but it doesn't... so it's a bit of a hack, but it seems to work better than the current algorithm. Feel free to bring more test cases to my attention. By the way, Python's Unicode support is broken. Capitalization of ß yields ß, whereas Unicode defines it according to German spelling as SS (see the first entry in SpecialCasing.txt). Disclaimer: title casing in QL and Python is oriented towards English language (and others, by coincidence). Spanish, for instance, uses lower case for titles, except on the first word and personal names. Greetings, -- Javier Kohen <[EMAIL PROTECTED]> ICQ: blashyrkh #2361802 Jabber: [EMAIL PROTECTED]
#!/usr/bin/python # -*- coding: utf-8 -*- import unicodedata def iswbound(char): """Returns whether the given character is a word boundary.""" # Special case apostrophe, since it's punctuation, but more #commonly used to form the possessive in song titles. if u"'" == char: return False category = unicodedata.category(char) # If it's a space separator or punctuation return 'Zs' == category or 'P' == category[0] def utitle(string): """Title-case a string using a less destructive method than str.title.""" new_string = string[0].capitalize() cap = False for s in string[1:]: if iswbound(s): cap = True elif cap and s.isalpha(): cap = False s = s.capitalize() else: cap = False new_string += s print new_string return new_string from types import UnicodeType from locale import getpreferredencoding def title(string): """Title-case a string using a less destructive method than str.title.""" if not string: return "" if (not isinstance(string, UnicodeType)): string = unicode(string.decode(getpreferredencoding())) return utitle(string) assert u"Mama's Boy" == title(u"mama's boy") assert u"Mama’S Boy" == title(u"mama’s boy") assert u"The A-Sides" == title(u"the a-sides") assert u"Hello Goodbye" == title(u"hello goodbye") assert u"HELLO GOODBYE" == title(u"HELLO GOODBYE") assert u"Hello Goodbye (A Song)" == title(u"hello goodbye (a song)") assert u"Hello Goodbye \"A Song\"" == title(u"hello goodbye \"a song\"") assert u"Hello Goodbye „A Song”" == title(u"hello goodbye „a song”") assert u"Hello Goodbye “A Song”" == title(u"hello goodbye “a song”") assert u"Hello Goodbye »A Song«" == title(u"hello goodbye »a song«") assert u"Hello Goodbye «A Song»" == title(u"hello goodbye «a song»") assert u"Fooäbar" == title(u"fooäbar") assert u"Los Años Felices" == title(u"los años felices") assert u"Ñandú" == title(u"ñandú") # Not a real word, but still Python doesn't capitalize the es-zed properly. #assert u"SSbahn" == title(u"ßbahn") assert u"Fooäbar" == title("fooäbar") assert u"Los Años Felices" == title("los años felices") assert u"Ñandú" == title("ñandú")
signature.asc
Description: Esta parte del mensaje está firmada digitalmente