Re: [Python-3000] Invalid \U escape in source code give hard-to-trace error

Georg Brandl Wed, 18 Jul 2007 14:45:47 -0700

Guido van Rossum schrieb:
> On 7/17/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
>> > When a source file contains a string literal with an out-of-range \U
>> > escape (e.g. "\U12345678"), instead of a syntax error pointing to the
>> > offending literal, I get this, without any indication of the file or
>> > line:
>> >
>> > UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in
>> > position 0-9: illegal Unicode character
>> >
>> > This is quite hard to track down.
>>
>> I think the fundamental flaw is that a codec is used to implement
>> the Python syntax (or, rather, lexical rules).
>>
>> Not quite sure what the rationale for this design was; doing it on
>> the lexical level is (was) tricky because \u escapes were allowed
>> only for Unicode literals, and the lexer had no knowledge of the
>> prefix preceding a literal. (In 3k, it's still similar, because
>> \U escapes have no effect in bytes and raw literals).
>>
>> Still, even if it is "only" handled at the parsing level, I
>> don't see why it needs to be a codec. Instead, implementing
>> escapes in the compiler would still allow for proper diagnostics
>> (notice that in the AST the original lexical form of the string
>> literal is gone).
> 
> I guess because it was deemed useful to have a codec for this purpose
> too, thereby exposing the algorithm to Python code that needs the same
> functionality (e.g. the compiler package, RIP).


And it still is useful. If you want to convert a string into a printable
representation, you can use repr(), but for the inverse you need this
codec. (or eval()...)

Georg

_______________________________________________
Python-3000 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Re: [Python-3000] Invalid \U escape in source code give hard-to-trace error

Reply via email to