On 2014-01-11 05:36, Steven D'Aprano wrote:
[snip]
Latin-1 has the nice property that every byte decodes into the character
with the same code point, and visa versa. So:
for i in range(256):
assert bytes([i]).decode('latin-1') == chr(i)
assert chr(i).encode('latin-1') == bytes([i])
passes. It seems to me that your problem goes away if you use Unicode
text with embedded binary data, rather than binary data with embedded
ASCII text. Then when writing the file to disk, of course you encode it
to Latin-1, either explicitly:
pdf = ... # Unicode string containing the PDF contents
with open("outfile.pdf", "wb") as f:
f.write(pdf.encode("latin-1")
or implicitly:
with open("outfile.pdf", "w", encoding="latin-1") as f:
f.write(pdf)
[snip]
The second example won't work because you're forgetting about the
handling of line endings in text mode.
Suppose you have some binary data bytes([10]).
You convert it into a Unicode string using Latin-1, giving '\n'.
You write it out to a file opened in text mode.
On Windows, that string '\n' will be written to the file as b'\r\n'.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com