New submission from rurpy the second <ru...@yahoo.com>:

PEP 414 proposes restoring the "u" string prefix (semantically as a "noop") to 
make porting from Python2 easier.  I would like to propose that "ru"-strings 
also interpret embedded "\uxxxx" unicode literals in the python2 fashion (as a 
single unicode character) rather than in the python 3.2 fashion (as 6 
characters).

Many Python2 programs use unicode literals in strings because they can be 
represented and displayed in source code with the ascii character set.  For 
example, I often write ur" \u3000\u3042\t" rather than ur"  あ    " because the 
former is much clearer in source code than the latter and does not require the 
viewer to have a Japanese font installed.

However such a string must be manually converted for Python3 because the former 
string has a very different meaning in Python3 than Python2.  The equivalent in 
Python3 is " \u3000\u3042\\t".  AFAIK, 2to3 does not fix this.  Because there 
are no longer unicode literals in Python3 raw strings, any string with a 
unicode literal *has* to be a non-raw string (AFAICT).  This means that strings 
used as regexes, that have a lot of backslashes and have unicode literals, must 
have the backslashes doubled.  Doubling the backslashes in the above example is 
trivial but it is not trival in more realistic regexes.  This was one of the 
main reasons for having raw strings in Python2 I thought.  It is unfortunate 
that one looses this ability (in the presence of unicode literals) in Python3.

When I raised this issue on the Python user's list [*1], Terry Reedy made the 
suggestion that since the "u" string prefix was being reintroduced for python 
3.3, that having the prefix also restore the python2 unicode literal handling 
would not introduce any incompatibilties and would greatly increase the ease of 
porting to Python3 for some programs.[*2]  He subsequently raised the issue on 
the dev list.[*3]

An argument might be made that this is an extra feature that would encourage 
the use of the "u"-prefix beyond that of easing porting from Python2.  Perhaps 
so but there is currently a hole in Python's capability that is difficult to 
work around, and I've seen no other proposals to fix it.  So it seems to me 
that the benefits of this proposal greatly outweigh that somewhat purist 
argument.
----
[*1] http://mail.python.org/pipermail/python-list/2012-May/1292870.html
[*2] http://mail.python.org/pipermail/python-list/2012-May/1292887.html
[*3] http://mail.python.org/pipermail/python-dev/2012-May/119760.html

----------
components: Unicode
messages: 162025
nosy: ezio.melotti, rurpy2
priority: normal
severity: normal
status: open
title: restore python2 unicode literals in "ru" strings
type: enhancement
versions: Python 3.3

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue14973>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to