On Mon, Feb 13, 2017 at 11:29 AM, Erik <pyt...@lucidity.plus.com> wrote:
> On 13/02/17 00:13, Chris Angelico wrote:
>>
>> On Mon, Feb 13, 2017 at 11:11 AM, Chris Angelico <ros...@gmail.com> wrote:
>>>
>>> The string "\t" gets shown in the repr as "\t". It is a string
>>> consisting of one character, U+0009, a tab. The string r"\t" is shown
>>> as "\\t" and consists of two characters, REVERSE SOLIDUS and LATIN
>>> SMALL LETTER T. That might be why you think there's confusing stuff
>>> happening :)
>>
>>
>> Oh, and the other trap you can fall into is the reverse of that:
>>
>>>>> "worl\d"
>>
>> 'worl\\d'
>>
>> This one actually triggers a warning in sufficiently-recent Pythons:
>
>
> Fair point, but you're going off at a tangent. I just stuck a backslash on a
> random letter to see which string tokens were/were not being treated as
> "raw" by the parser. Next time I'll use \v or something. You're focusing on
> something that is beside the point I'm trying to make ;)


Except that I'm not. Here, look:

OK, so please explain one of my examples:

>>>>> r"hello \the" "worl\d" "\t"
>> 'hello \\theworl\\d\t'
>>
>> The initial string is raw. The following string adopts that (same as the
>> second example), but the _next_ string does not!

Why is the first string token parsed as a "raw" string, the second
string token also parsed as a "raw" string (without the 'r' prefix),
but the third string token NOT?

>>> r"hello \the" "worl\d" "\t"
'hello \\theworl\\d\t'
>>> r"hello \the" "worl\t" "\d"
'hello \\theworl\t\\d'
>>> "hello \the" "worl\d" "\t"
'hello \theworl\\d\t'
>>> "hello \the" "worl\t" "\d"
'hello \theworl\t\\d'

The unit "\t" always means U+0009, even if it's following a raw string
literal; and the unit "\d" always means "\\d", regardless of the
rawness of any of the literals involved. The thing that's biting you
here is that unrecognized escapes get rendered as backslash followed
by letter, which is why that now produces a warning.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to