Guido van Rossum schrieb: > On 7/17/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: >> > When a source file contains a string literal with an out-of-range \U >> > escape (e.g. "\U12345678"), instead of a syntax error pointing to the >> > offending literal, I get this, without any indication of the file or >> > line: >> > >> > UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in >> > position 0-9: illegal Unicode character >> > >> > This is quite hard to track down. >> >> I think the fundamental flaw is that a codec is used to implement >> the Python syntax (or, rather, lexical rules). >> >> Not quite sure what the rationale for this design was; doing it on >> the lexical level is (was) tricky because \u escapes were allowed >> only for Unicode literals, and the lexer had no knowledge of the >> prefix preceding a literal. (In 3k, it's still similar, because >> \U escapes have no effect in bytes and raw literals). >> >> Still, even if it is "only" handled at the parsing level, I >> don't see why it needs to be a codec. Instead, implementing >> escapes in the compiler would still allow for proper diagnostics >> (notice that in the AST the original lexical form of the string >> literal is gone). > > I guess because it was deemed useful to have a codec for this purpose > too, thereby exposing the algorithm to Python code that needs the same > functionality (e.g. the compiler package, RIP).
And it still is useful. If you want to convert a string into a printable representation, you can use repr(), but for the inverse you need this codec. (or eval()...) Georg _______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
