Paul Moore <p.f.mo...@gmail.com> added the comment:

A better example in terms of "intended to be text" might be ChangeLog files. 
These are clearly text files, but of sufficiently standard format that they can 
be manipulated programmatically.

Consider a program to get a list of all authors who changed a particular file. 
Scan the file for date lines, then scan the block of text below for the 
filename you care about. Extract the author from the date line, put into a set, 
sort and print.

All of this can be done assuming the file is ASCII-compatible, but requires 
non-trivial text processing that would be a pain to do on bytes. But author 
names are quite likely to be non-ASCII, especially if it's an international 
project. And the changelog file is manually edited by people on different 
machines, so the possibility of inconsistent encodings is definitely there. (I 
have seen this happen - it's not theoretical!)

For my code, all I care about is that the names round-trip, so that I'm not 
damaging people's names any more than has already happened.

encoding="ascii",errors="surrogateescape" sounds like precisely the right 
answer here.

(If it's hard to find a good answer in Python 3, it's very easy to decide to 
use Python 2 which "just works", or even other tools like awk which also take 
Python 2's naive approach - and dismiss Python 3's Unicode model as "too hard").

My mental model here is text editors, which let you open any file, do their 
best to display as much as they can and allow you to manipulate it without 
damaging the bits you don't change. I don't see any reason why people shouldn't 
be able to write Python 3 code that way if they need to.

----------
nosy: +pmoore

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue13997>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to