On Wed, Apr 13, 2016 at 2:15 AM, Ethan Furman <et...@stoneleaf.us> wrote: > On 04/11/2016 04:43 PM, Victor Stinner wrote: >> >> Le 11 avr. 2016 11:11 PM, "Ethan Furman" a écrit : > > >>> So my concern in such a case is what happens if we pass this SE >>> string somewhere else: a UTF-8 file, or over a socket, or into a >>> database? Does this have issues that we wouldn't face if we just used >>> bytes? >> >> >> "SE string" are returned by os.listdir(str), os.walk(str), >> os.getenv(str), sys.argv[int], ... since Python 3.3. Nothing new under >> the sun. > > > So when we pass a bytes object in, Python (on posix) converts that to a > string using surrogateescape, gets back strings from the os, and encodes > them back to bytes, again using surrogateescape? > > >> Trying to encode a surrogate to ascii, latin1 or utf8 raise an encoding >> error. > > > latin1? I thought latin1 had a code point for 0-255, so how could using it > raise an encoding error?
Latin-1 / ISO-8859-1 defines a character for every byte, so any byte string will *decode*. It only defines 256 characters as having equivalent bytes, though, so *encoding* can fail. ChrisA _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com