On Fri, 6 May 2022 21:19:48 +0100, MRAB <pyt...@mrabarnett.plus.com>
declaimed the following:

>Is the file UTF-8? That's a variable-width encoding, so are any of the 
>characters > U+007F?
>
>Which OS? On Windows, it's common/normal for UTF-8 files to start with a 
>BOM/signature, which is 3 bytes/1 codepoint.

        Windows also uses <cr><lf> for the EOL marker, but Python's I/O system
condenses that to just <lf> internally (for TEXT mode) -- so using the
length of a string so read to compute a file position may be off-by-one for
each EOL in the string.

https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files
"""
In text mode, the default when reading is to convert platform-specific line
endings (\n on Unix, \r\n on Windows) to just \n. When writing in text
mode, the default is to convert occurrences of \n back to platform-specific
line endings. This behind-the-scenes modification to file data is fine for
text files, but will corrupt binary data like that in JPEG or EXE files. Be
very careful to use binary mode when reading and writing such files.
"""



-- 
        Wulfraed                 Dennis Lee Bieber         AF6VN
        wlfr...@ix.netcom.com    http://wlfraed.microdiversity.freeddns.org/
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to