Steven D'Aprano <steve+pyt...@pearwood.info> added the comment:
> I expect sys.stdout to have utf-8 encoding inside the redirect because > the buffer accepts unicode code points (not bytes) And the buffer stores unicode code points, not bytes, so why would there be an encoding? Just to get this out of the way, in case you are thinking along these lines. Python strings are not arrays of UTF-8 bytes, like Go runes. Python strings are arrays of abstract code points. The specific details will vary from interpreter to interpreter, and from version to version, but current versions of CPython use a flexible in-memory representation where the width of the code points (1, 2 or 4 bytes) depend on the string. This is not UTF-8: the bytes are encoded as Latin-1, UCS-2, or UTF-32 depending on the string. > For some reason, the encoding of a StringIO object is None Because StringIO objects store strings, not bytes. There is no encoding involved. The inputs are strings, and the storage is strings. > which is inconsistent with its semantics: it should be 'uft-8'. It is completely consistent: the encoding should be None, because there is no encoding. > I expect the 'encoding' attribute of sys.stdout to have the same value > inside and outside this redirect. Why? If you redirect to an actual file using, let's say Mac-Roman encoding, or ASCII, or UTF-32, or any one of dozens of other encodings, you should expect the encoding inside the block to reflect the actual encoding used inside the block. The encoding outside the block is the encoding used by the original stdout; the encoding inside the block is the encoding used by the replacement stdout. Why would you expect them to always be the same? >>> print("outside:", sys.stdout.encoding) outside: utf-8 >>> with open("/tmp/junk.txt", "w", encoding="ascii") as f: ... with redirect_stdout(f): ... print("inside:", sys.stdout.encoding) ... >>> with open("/tmp/junk.txt", encoding="ascii") as f: ... print(f.read()) ... inside: ascii > It so happens that sys.stdout is an io.StringIO() object inside the > redirect. The getvalue() method on this object returns a string (not > a bytes), i.e. a sequence of unicode points. Exactly. And that is why there is no encoding involved. It is purely a sequence of Unicode code points, not bytes, and at no point was a Unicode string encoded to bytes to go to the filesystem. > StringIO inherits from TextIOBase, which has an 'encoding' attribute. And StringIO has an encoding attribute because of inheritance, and it is set to None because there is no actual encoding codec used. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue44774> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com