On 8/8/2019 5:31 AM, Dima Tisnek wrote:
These two ought to be converted to raw strings, shouldn't they?

For the first example, yes or no. It depends ;-)  See below.

The problem is that string literals in python code are, by default, half-baked. The interpretation of '\' by the python parser, and the resulting string object, depends on the next char. I can see how this is sometimes a convenience, but I consider it a design bug. There is no way for a user to say "I intend for this string to be fully baked, so if it cannot be, I goofed." And the convenience gets used when it must not be.

On Thu, 8 Aug 2019 at 08:04, <raymond.hettin...@gmail.com> wrote:

For me, these warnings are continuing to arise almost daily.  See two recent 
examples below.  In both cases, the code previously had always worked without 
complaint.

----- Example from yesterday's class ----

''' How old-style formatting works with positional placeholders

print('The answer is %d today, but was %d yesterday' % (new, old))
                                          \--------------------o
                       \------------------------------------o
'''

SyntaxWarning: invalid escape sequence \-

For true ascii-only character art, where one will never want '\' baked, an 'r' prefix is appropriate. It is in fact mandatory when '\' may be followed by a legal escape code.


If one is making unicode art, with '\u' and '\U' escapes used, one must not use the 'r' prefix, but should instead use '\\' for unbaked backslashes. The unicode escapes have already thrown off column alignments.

----- Example from today's class ----

# Cut and pasted from:
# https://en.wikipedia.org/wiki/VCard#vCard_2.1
vcard = '''
BEGIN:VCARD
VERSION:2.1
N:Gump;Forrest;;Mr.
FN:Forrest Gump
ORG:Bubba Gump Shrimp Co.
TITLE:Shrimp Man
PHOTO;GIF:http://www.example.com/dir_photos/my_photo.gif
TEL;WORK;VOICE:(111) 555-1212
TEL;HOME;VOICE:(404) 555-1212
ADR;WORK;PREF:;;100 Waters Edge;Baytown;LA;30314;United States of America
LABEL;WORK;PREF;ENCODING=QUOTED-PRINTABLE;CHARSET=UTF-8:100 Waters Edge=0D=
  =0ABaytown\, LA 30314=0D=0AUnited States of America
ADR;HOME:;;42 Plantation St.;Baytown;LA;30314;United States of America
LABEL;HOME;ENCODING=QUOTED-PRINTABLE;CHARSET=UTF-8:42 Plantation St.=0D=0A=
  Baytown, LA 30314=0D=0AUnited States of America
EMAIL:forrestg...@example.com
REV:20080424T195243Z
END:VCARD
'''

SyntaxWarning: invalid escape sequence \,

Based on my reading of the Wikipedia vCard page linked above,
the vCard protocol mandates use of '\' chars that must be passed through unbaked to a vCard processor. (I don't know why '\,', but it does not matter.) So vCard strings using '\' should generally have 'r' prefixes, just as for regex and latex strings. For version 2.1, it appears that one can currently, in 3.7-, get away with omitting 'r'. In versions 3.0 and 4.0, embedded 'newline' is represented by '\n' instead of '=0D=0A'. It must not be baked by python, but passed on as is. So omitting 'r' becomes a bug for those versions.

To me, this one of the major problems with the half-baked default. People who want string literals left as is sometimes get away with omitting explicit mention of that fact, but sometimes don't.

Note: when we added '\u' and '\U' escapes, we broke working code that had Windows paths like "C:\Users\Terry". But we did it anyway.

--
Terry Jan Reedy
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/NZZ32WFHUMQAKG6O3KDYV5J5NQMWGKSO/

Reply via email to