I need a regex that will match strings containing only unicode letter characters (not including numeric or the _ character). I was surprised to find the 're' module does not include a special character class for this already (python 2.6). Or did I miss something?
It seems like this would be a very common need. Is the following the only option to generate the character class (based on an old post by Martin v. Löwis )? import unicodedata, sys def letters(): start = end = None result = [] for index in xrange(sys.maxunicode + 1): c = unichr(index) if unicodedata.category(c)[0] == 'L': if start is None: start = end = c else: end = c elif start: if start == end: result.append(start) else: result.append(start + "-" + end) start = None return u'[' + u''.join(result) + u']' Seems rather cumbersome. -Brad -- http://mail.python.org/mailman/listinfo/python-list