Jason R. Coombs <jar...@jaraco.com> added the comment:

The problem you've encountered is that previously the file was assumed to be 
one encoding and would fail if it was not that encoding... so it was possible 
to lazy-load the file and process each line.

In the new model, where you need to evaluate the viability of the file in one 
of two candidate encodings, you'll necessarily need to read the entire file 
once before processing its contents.

Therefore, I recommend one of these options:

1. Always read the file in binary mode, ascertain the "best" encoding, then 
rewind the file and wrap it in a TextIOWrapper for that encoding. Presumably 
this logic is common--perhaps there's already a routine that does just that.
2. In a try/except block, read the entire content, decoded, into another 
iterable ... and then have the logic below rely on that content. i.e. `f = 
list(f)`.
3. Always assume UTF-8 instead of the system encoding. This change would be 
backward incompatible, so probably isn't acceptable without at least an interim 
release with a deprecation warning.

I recommend a combination of (1) and then (3) in the future. That is:

def determine_best_encoding(f, encodings=('utf-8', sys.getdefaultencoding())):
    """
    Attempt to read and decode all of stream f using the encodings
    and return the first one that succeeds. Rewinds the file.
    """


f = open(..., 'rb)
encoding = determine_best_encoding(f)
if encoding != 'utf-8':
    warnings.warn("Detected pth file with unsupported encoding", 
DeprecationWarning)
f = io.TextIOWrapper(f, encoding)


Then, in a future version, dropping support for local encodings, all of that 
code can be replaced with `f = open(..., encoding='utf-8')`.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue35131>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to