Xah Lee wrote: > Thanks. Is it true that any unicode chars can also be used inside regex > literally? > > e.g. > re.search(ur'â+',mystring,re.U) > > I tested this case and apparently i can.
Yes. In fact, when you write u"\u2003" or u"â" doesn't matter to re.search. Either way you get a Unicode object with U+2003 in it, which is processed by SRE. > But is it true that any > unicode char can be embedded in regex literally. (does this apply to > the esoteric ones such as other non-printing chars and combining > forms...) Yes. To SRE, only the Unicode ordinal values matter. To determine whether something matches, it needs to have the same ordinal value in the string that you have in the expression. No interpretation of the character is performed, except for the few characters that have markup meaning in regular expressions (e.g. $, \, [, etc) Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list