On Wed, Apr 13, 2016 at 2:15 AM, Ethan Furman <et...@stoneleaf.us> wrote:
> On 04/11/2016 04:43 PM, Victor Stinner wrote:
>>
>> Le 11 avr. 2016 11:11 PM, "Ethan Furman" a écrit :
>
>
>>> So my concern in such a case is what happens if we pass this SE
>>> string somewhere else: a UTF-8 file, or over a socket, or into a
>>> database? Does this have issues that we wouldn't face if we just used
>>> bytes?
>>
>>
>> "SE string" are returned by os.listdir(str), os.walk(str),
>> os.getenv(str), sys.argv[int], ... since Python 3.3. Nothing new under
>> the sun.
>
>
> So when we pass a bytes object in, Python (on posix) converts that to a
> string using surrogateescape, gets back strings from the os, and encodes
> them back to bytes, again using surrogateescape?
>
>
>> Trying to encode a surrogate to ascii, latin1 or utf8 raise an encoding
>> error.
>
>
> latin1?  I thought latin1 had a code point for 0-255, so how could using it
> raise an encoding error?

Latin-1 / ISO-8859-1 defines a character for every byte, so any byte
string will *decode*. It only defines 256 characters as having
equivalent bytes, though, so *encoding* can fail.

ChrisA
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to