Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

Stephen J. Turnbull Sun, 15 Nov 2015 08:43:56 -0800

Random832 writes:
 > "Stephen J. Turnbull" <step...@xemacs.org> writes:
 > > I don't see any good reason for allowing non-ASCII-compatible
 > > encodings in the reference CPython interpreter.
 > 
 > There might be a case for having the tokenizer not care about encodings
 > at all and just operate on a stream of unicode characters provided by a
 > different layer.


That's exactly what the PEP 263 implementation does in Python 2 (with
the caveat that Python 2 doesn't know anything about Unicode, it's a
UTF-8 stream and the non-ASCII characters are treated as bytes of
unknown semantics, so they can't be used in syntax).  I don't know
about Python 3, I haven't looked at the decoding of source programs.
But I would assume it implements PEP 263 still, except that since str
is now either widechars or PEP 393 encoding (ie, flexible widechars)
that encoding is now used instead of UTF-8.

I'm sure that there are plenty of ASCII-isms in the tokenizer in the
sense that it assumes the ASCII *character* (not byte) repertoire.
But I'm not sure why Serhiy thinks that the tokenizer cares about the
representation on-disk.  But as I say, I haven't looked at the code so
he might be right.

Steve

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

Reply via email to