On Sat, Jan 23, 2021 at 03:24:12PM +0000, Barry Scott wrote:

> I think that you are going to create a bug magnet if you attempt to auto
> detect the encoding.
> 
> First problem I see is that the file may be a pipe and then you will block
> until you have enough data to do the auto detect.

Can you use `open('filename')` to read a pipe?

Is blocking a problem in practice? If you try to open a network file, 
that could block too, if there are network issues. And since you're 
likely to follow the open with a read, the read is likely to block. So 
over all I don't think that blocking is an issue.


> Second problem is that the first N bytes are all in ASCII and only later
> do you see Windows code page signature (odd lack of utf-8 signature).

UTF-8 is a strict superset of ASCII, so if the file is actually 
ASCII, there is no harm in using UTF-8.

The bigger issue is if you have N bytes of pure ASCII followed by some 
non-UTF superset, such as one of the ISO-8859-* encodings. So you end up 
detecting what you think is ASCII/UTF-8 but is actually some legacy 
encoding. But if N is large, say 512 bytes, that's unlikely in practice.


> > That auto-detection behaviour could be enough to differentiate it from 
> > the regular open(), thus solving the "but in ten years time it will be 
> > redundant and will need to be deprecated" objection.
> > 
> > Having said that, I can't say I'm very keen on the name "open_text", but 
> > I can't think of any other bikeshed colour I prefer.
> 
> Given the the functions purpose is to open unicode text use a name that
> reflects that it is the encoding that is set not the mode (binary vs. text).
> 
> open_unicode maybe?

I guess that depends on whether the auto-detection is intended to 
support non-Unicode legacy encodings or not.

> If you are teaching open_text then do you also need to have open_binary?

No. There are no frustrating, difficult, platform-specific encoding 
issues when reading binary files. Bytes are bytes.


-- 
Steve
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/MVX5PNZM7W4I42XDSACOQTW3YRJPRQHI/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to