> On 23 Jan 2021, at 11:00, Steven D'Aprano <st...@pearwood.info> wrote:
> 
> On Sat, Jan 23, 2021 at 12:40:55AM -0500, Random832 wrote:
>> On Fri, Jan 22, 2021, at 20:34, Inada Naoki wrote:
>>> * Default encoding is "utf-8".
>> 
>> it might be worthwhile to be a little more sophisticated than this.
>> 
>> Notepad itself uses character set detection [it might not be 
>> reasonable to do this on the whole file as notepad does, but maybe the 
>> first 512 bytes, or the result of read1(512)?] when opening a file of 
>> unknown encoding, and msvcrt's "ccs=UTF-8" option to fopen will at 
>> least detect at the presence of UTF-8 and UTF-16 BOMs [and treat the 
>> file as UTF-16 in the latter case].
> 
> 
> I like Random's idea. If we add a new "open text file" builtin function, 
> we should seriously consider having it attempt to auto-detect the 
> encoding. It need not be as sophisticated as `chardet`.

I think that you are going to create a bug magnet if you attempt to auto
detect the encoding.

First problem I see is that the file may be a pipe and then you will block
until you have enough data to do the auto detect.

Second problem is that the first N bytes are all in ASCII and only later
do you see Windows code page signature (odd lack of utf-8 signature).

> That auto-detection behaviour could be enough to differentiate it from 
> the regular open(), thus solving the "but in ten years time it will be 
> redundant and will need to be deprecated" objection.
> 
> Having said that, I can't say I'm very keen on the name "open_text", but 
> I can't think of any other bikeshed colour I prefer.

Given the the functions purpose is to open unicode text use a name that
reflects that it is the encoding that is set not the mode (binary vs. text).

open_unicode maybe?

If you are teaching open_text then do you also need to have open_binary?

Barry

> 
> 
> -- 
> Steve
> _______________________________________________
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-ideas@python.org/message/VAWFPIAA4WIVLIF4LFJ4OATJK6JDJS2N/
> Code of Conduct: http://python.org/psf/codeofconduct/
> 
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/4LHLZ5QIBOCLIZUVYQ2UXAU6MEX6VMJH/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to