Hi all, in a python re pattern, how do I match all unicode uppercase characters (in a unicode string/in a utf-8 string)?
I know that there is string.uppercase/.lowercase which are 'locale-aware', but I don't think there is a "all locales" locale. I know that there is a re.U switch that makes \w match all unicode word characters, but there are no subclasses of that ([[:upper:]] or preferably \u). Or is there a module/extension to get that? There is the module unicodedata, but it has no unicodedata.uppercase that would correspond to string.uppercase. <wishful thinking> re.compile('|'.join([x.encode('utf8') for x in unicode.uppercase])) or:: re.compile('(?u)[[:upper:]]') or:: re.compile('(?u)\u') for the latter two, to work on utf-8 strings, would I have to set the defaultencoding to utf-8? </wishful thinking> -- http://mail.python.org/mailman/listinfo/python-list