Random832 writes: > "Stephen J. Turnbull" <step...@xemacs.org> writes: > > I don't see any good reason for allowing non-ASCII-compatible > > encodings in the reference CPython interpreter. > > There might be a case for having the tokenizer not care about encodings > at all and just operate on a stream of unicode characters provided by a > different layer.
That's exactly what the PEP 263 implementation does in Python 2 (with the caveat that Python 2 doesn't know anything about Unicode, it's a UTF-8 stream and the non-ASCII characters are treated as bytes of unknown semantics, so they can't be used in syntax). I don't know about Python 3, I haven't looked at the decoding of source programs. But I would assume it implements PEP 263 still, except that since str is now either widechars or PEP 393 encoding (ie, flexible widechars) that encoding is now used instead of UTF-8. I'm sure that there are plenty of ASCII-isms in the tokenizer in the sense that it assumes the ASCII *character* (not byte) repertoire. But I'm not sure why Serhiy thinks that the tokenizer cares about the representation on-disk. But as I say, I haven't looked at the code so he might be right. Steve _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com