Georg Brandl wrote:
> Ron Adam schrieb:
>> Guido van Rossum wrote:
>>> That would be great! This will automatically turn \u1234 into 6
>>> characters, right?
>> I'm not exactly clear when the '\uxxxx' characters get converted. There
>> isn't any conversion done in tokanize.c that I can see. It's primarily
>> only concerned with finding the beginning and ending of the string at that
>> point. It looks like everything between the beginning and end is just
>> passed along "as is" and it's translated further later in the chain.
>
> Look at Python/ast.c, which has functions parsestr() and decode_unicode().
> The latter calls PyUnicode_DecodeRawUnicodeEscape() which I think is the
> function you're looking for.
>
> Georg
Thanks, I'll look there.
That should be where I need to look to fix a glitch where the last quote of
a raw string is both the end of the string and part of a string.
>>> r'\'
"\\'"
Interestingly it works just fine for raw byte strings. (I wish the letter
were reversed, saying bytes-raw-string is awkward.)
>>> br'\'
b'\\'
Anyway, I've made the corresponding modifications to tokenize.py and
tokenize_tests.txt.
The tests for tokenize.py need to be updated. They do a round trip test,
but I've found that doesn't mean it's the correct round trip!
Cheers,
Ron
_______________________________________________
Python-3000 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe:
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com