On 4/17/07, Christian Heimes <[EMAIL PROTECTED]> wrote: > Neal Norwitz schrieb: > > I don't have any plans, just considering options. Move them > > somewhere? Perhaps, trim the ones that are unused. In a unicode > > world, I'm not sure how much some of these make sense. letters stands > > out more than others. I don't know enough about unicode to know if > > digits or whitespace can be diff. > > What do you think about replacing the definitions by information from > the unicode character properties database. The information are available > somewhere in Python: > > http://docs.python.org/lib/re-syntax.html > > \w ... With LOCALE, it will match the set [0-9_] plus whatever > characters are defined as alphanumeric for the current locale. If > UNICODE is set, this will match the characters [0-9_] plus whatever is > classified as alphanumeric in the Unicode character properties database.
Yes, unicode.islower() and friends have this information. It would be silly to set e.g. letters to a string of all unicode letters -- that would be a string of 46618 characters! Similar, there are 304 unicode digits. (And this is in a narrow Unicode build, only supporting the basic Unicode plane, 0--2**16!) -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
