On Sat, Jan 23, 2021 at 03:24:12PM +0000, Barry Scott wrote: > I think that you are going to create a bug magnet if you attempt to auto > detect the encoding. > > First problem I see is that the file may be a pipe and then you will block > until you have enough data to do the auto detect.
Can you use `open('filename')` to read a pipe? Is blocking a problem in practice? If you try to open a network file, that could block too, if there are network issues. And since you're likely to follow the open with a read, the read is likely to block. So over all I don't think that blocking is an issue. > Second problem is that the first N bytes are all in ASCII and only later > do you see Windows code page signature (odd lack of utf-8 signature). UTF-8 is a strict superset of ASCII, so if the file is actually ASCII, there is no harm in using UTF-8. The bigger issue is if you have N bytes of pure ASCII followed by some non-UTF superset, such as one of the ISO-8859-* encodings. So you end up detecting what you think is ASCII/UTF-8 but is actually some legacy encoding. But if N is large, say 512 bytes, that's unlikely in practice. > > That auto-detection behaviour could be enough to differentiate it from > > the regular open(), thus solving the "but in ten years time it will be > > redundant and will need to be deprecated" objection. > > > > Having said that, I can't say I'm very keen on the name "open_text", but > > I can't think of any other bikeshed colour I prefer. > > Given the the functions purpose is to open unicode text use a name that > reflects that it is the encoding that is set not the mode (binary vs. text). > > open_unicode maybe? I guess that depends on whether the auto-detection is intended to support non-Unicode legacy encodings or not. > If you are teaching open_text then do you also need to have open_binary? No. There are no frustrating, difficult, platform-specific encoding issues when reading binary files. Bytes are bytes. -- Steve _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/MVX5PNZM7W4I42XDSACOQTW3YRJPRQHI/ Code of Conduct: http://python.org/psf/codeofconduct/