El sáb, 03-11-2007 a las 19:21 +0100, Erich Schubert escribió: > Hi, > > That's exactly why I mentioned it. Some people love to uppercase > > everything in any language and I usually find myself fixing it by hand. > > Ignoring proper names, the conversion is trivial ("s[0].upper() + > > s[1:].lower()" will do it), but I was a bit lazy to actually write the > > plug-in... > > That is slightly different from what I had in mind: it actively > lowercases anything else, ignoring potential proper names. > I'd have offered s[0].upper() + s[1:] as an option. > Your option does make sense though, maybe as "Remove Title Case" > (I.e. automatically uppercase the first character, leave the rest alone. > For example, a song might be titled "für Elise". In that situation, it > should be transformed to "Für Elise". Another example is abbreviations. > "JD's blues" for example should be left unchanged.
What I wrote is exactly what I had in mind. Unlike in German, uppercase words in languages such as Spanish, French or Norwegian, to name a few, are very scarce except at the beginning of the sentence, and title-case in those languages, like in German, follows the regular spelling rules. > Another thing my titlecase function did that yours doesn't is process > all-uppercase strings. > I'm aware that occasionally you'll have a track titled "TLA" that > shouldn't be converted to "Tla". But a very common bad title is "EXAMPLE > TRACK TITLE", where I'd like to have an easy way to make that into > "Example Track Title". My code only attempts to fix faults in QL's current algorithm, which by design never removes upper case. Your suggestion is trivial to implement (actually you only have to remove code), but I did not have an argument to justify the change. We could add some heuristics, such as "if everything is uppercase, then apply lowercase first, otherwise honor existing uppercase in the string." What do you think? I'm attaching a new version of my function that does that. > Speaking of which, did anyone manage to get the re.LOCALE flag to work? > I've yet to see a proper way of using that... I don't know if Python supports Unicode categories in its Regular Expression. Maybe you should try that (e.g. \P{L}) instead of \W, since \W is only defined over the ASCII range, as you found out. In that case, I'm guessing that you won't need that flag (although, to be honest, not having read the documentation I don't know what it's supposed to do). Regards, -- Javier Kohen <[EMAIL PROTECTED]> ICQ: blashyrkh #2361802 Jabber: [EMAIL PROTECTED]
#!/usr/bin/python # -*- coding: utf-8 -*- import unicodedata def iswbound(char): """Returns whether the given character is a word boundary.""" category = unicodedata.category(char) # If it's a space separator or punctuation return 'Zs' == category or 'P' == category[0] def utitle(string): """Title-case a string using a less destructive method than str.title.""" if string.upper() == string: string = string.lower() new_string = string[0].capitalize() cap = False for i in xrange(1, len(string)): s = string[i] # Special case apostrophe in the middle of a word. if u"'" == s and string[i-1].isalpha(): cap = False elif iswbound(s): cap = True elif cap and s.isalpha(): cap = False s = s.capitalize() else: cap = False new_string += s print new_string return new_string from types import UnicodeType from locale import getpreferredencoding def title(string): """Title-case a string using a less destructive method than str.title.""" if not string: return "" if (not isinstance(string, UnicodeType)): string = unicode(string.decode(getpreferredencoding())) return utitle(string) assert u"Mama's Boy" == title(u"mama's boy") # This character is not an apostrophe, it's a single quote! assert u"Mama’S Boy" == title(u"mama’s boy") assert u"The A-Sides" == title(u"the a-sides") assert u"Hello Goodbye" == title(u"hello goodbye") assert u"Hello Goodbye" == title(u"HELLO GOODBYE") assert u"Hello GOODBYE" == title(u"hello GOODBYE") assert u"Hello G.O.O.D.B.Y.E." == title(u"hello G.O.O.D.B.Y.E.") assert u"Hello G.O.O.D.B.Y.E." == title(u"HELLO G.O.O.D.B.Y.E.") assert u"Hello Goodbye (A Song)" == title(u"hello goodbye (a song)") assert u"Hello Goodbye 'A Song'" == title(u"hello goodbye 'a song'") assert u"Hello Goodbye \"A Song\"" == title(u"hello goodbye \"a song\"") assert u"Hello Goodbye „A Song”" == title(u"hello goodbye „a song”") assert u"Hello Goodbye ‘A Song’" == title(u"hello goodbye ‘a song’") assert u"Hello Goodbye “A Song”" == title(u"hello goodbye “a song”") assert u"Hello Goodbye »A Song«" == title(u"hello goodbye »a song«") assert u"Hello Goodbye «A Song»" == title(u"hello goodbye «a song»") assert u"Fooäbar" == title(u"fooäbar") assert u"Los Años Felices" == title(u"los años felices") assert u"Ñandú" == title(u"ñandú") # Not a real word, but still Python doesn't capitalize the es-zed properly. #assert u"SSbahn" == title(u"ßbahn") assert u"Fooäbar" == title("fooäbar") assert u"Los Años Felices" == title("los años felices") assert u"Ñandú" == title("ñandú")
signature.asc
Description: Esta parte del mensaje está firmada digitalmente