Hi, Having the following python code:
import locale import re locale.setlocale(locale.LC_ALL, 'zh_CN.utf8') re.findall('(?uL)\s+', u'\u2001\u3000\x20', re.U|re.L) re.findall('\s+', u'\u2001\u3000\x20', re.U|re.L) re.findall('(?uL)\s+', u'\u2001\u3000\x20') I was wondering why doesn't it find the unicode space chars \u2001 and \u3000? The python docs for re module says: When the LOCALE and UNICODE flags are not specified, matches any whitespace character; this is equivalent to the set [ \t\n\r\f\v]. With LOCALE, it will match this set plus whatever characters are defined as space for the current locale. If UNICODE is set, this will match the characters [ \t\n\r\f\v] plus whatever is classified as space in the Unicode character properties database. which doesn't seem to work. Any ideas? -- http://mail.python.org/mailman/listinfo/python-list