John Nagle wrote: > Regular expressions are compiled in ASCII mode unless > Unicode mode is specified to "rc.compile". The difference is that regular > expressions in ASCII mode don't recognize things like > Unicode whitespace, even when applied to Unicode strings. > For example, Unicode character 0x00A0 is a "NO-BREAK SPACE", which is > a form of whitespace. It's the Unicode equivalent of HTML's " ". > This can create some strange bugs. > > Is the current default good? Or is it time to compile all regular > expressions in Unicode mode by default? It shouldn't hurt processing of > ASCII strings to do that. The current setup is really a legacy of when > most things in Python didn't work in Unicode mode, and you didn't want to > introduce Unicode unnecessarily. It's another one of those obscure > Unicode "gotchas" that really should go away. > > John Nagle
Personally I'd leave it to go away with Python 3.0, when all strings will be Unicode. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://del.icio.us/steve.holden Recent Ramblings http://holdenweb.blogspot.com -- http://mail.python.org/mailman/listinfo/python-list