Am 30.05.2012 08:52 schrieb ru...@yahoo.com:

This breaks a lot of my code because in python 2
       re.split (ur'[\u3000]', u'A\u3000A') ==>  [u'A', u'A']
but in python 3 (the result of running 2to3),
       re.split (r'[\u3000]', 'A\u3000A' ) ==>  ['A\u3000A']

I can remove the "r" prefix from the regex string but then
if I have other regex backslash symbols in it, I have to
double all the other backslashes -- the very thing that
the r-prefix was invented to avoid.

Or I can leave the "r" prefix and replace something like
r'[ \u3000]' with r'[  ]'.  But that is confusing because
one can't distinguish between the space character and
the ideographic space character.  It also a problem if a
reader of the code doesn't have a font that can display
the character.

Was there a reason for dropping the lexical processing of
\u escapes in strings in python3 (other than to add another
annoyance in a long list of python3 annoyances?)

Probably it is more consequent. Alas, it makes the whole stuff incompatible to Py2.

But if you think about it: why allow for \u if \r, \n etc. are disallowed as well?


And is there no choice for me but to choose between the two
poor choices I mention above to deal with this problem?

There is a 3rd one: use r'[ ' + '\u3000' + ']'. Not very nice to read, but should do the trick...


Thomas
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to