Re: [Python-3000] Raw strings containing \u or \U

Ron Adam Fri, 18 May 2007 09:18:08 -0700

Georg Brandl wrote:
> Ron Adam schrieb:
>> Guido van Rossum wrote:
>>> That would be great! This will automatically turn \u1234 into 6
>>> characters, right?
>> I'm not exactly clear when the '\uxxxx' characters get converted.  There 
>> isn't any conversion done in tokanize.c that I can see.  It's primarily 
>> only concerned with finding the beginning and ending of the string at that 
>> point.  It looks like everything between the beginning and end is just 
>> passed along "as is" and it's translated further later in the chain.
> 
> Look at Python/ast.c, which has functions parsestr() and decode_unicode().
> The latter calls PyUnicode_DecodeRawUnicodeEscape() which I think is the
> function you're looking for.
> 
> Georg


Thanks, I'll look there.

That should be where I need to look to fix a glitch where the last quote of 
a raw string is both the end of the string and part of a string.

 >>> r'\'
"\\'"

Interestingly it works just fine for raw byte strings.  (I wish the letter 
were reversed, saying bytes-raw-string is awkward.)

 >>> br'\'
b'\\'

Anyway, I've made the corresponding modifications to tokenize.py and 
tokenize_tests.txt.

The tests for tokenize.py need to be updated.  They do a round trip test, 
but I've found that doesn't mean it's the correct round trip!

Cheers,
    Ron





_______________________________________________
Python-3000 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Re: [Python-3000] Raw strings containing \u or \U

Reply via email to