> On 23 Jan 2021, at 11:00, Steven D'Aprano <st...@pearwood.info> wrote:
>
> On Sat, Jan 23, 2021 at 12:40:55AM -0500, Random832 wrote:
>> On Fri, Jan 22, 2021, at 20:34, Inada Naoki wrote:
>>> * Default encoding is "utf-8".
>>
>> it might be worthwhile to be a little more sophisticated than this.
>>
>> Notepad itself uses character set detection [it might not be
>> reasonable to do this on the whole file as notepad does, but maybe the
>> first 512 bytes, or the result of read1(512)?] when opening a file of
>> unknown encoding, and msvcrt's "ccs=UTF-8" option to fopen will at
>> least detect at the presence of UTF-8 and UTF-16 BOMs [and treat the
>> file as UTF-16 in the latter case].
>
>
> I like Random's idea. If we add a new "open text file" builtin function,
> we should seriously consider having it attempt to auto-detect the
> encoding. It need not be as sophisticated as `chardet`.
I think that you are going to create a bug magnet if you attempt to auto
detect the encoding.
First problem I see is that the file may be a pipe and then you will block
until you have enough data to do the auto detect.
Second problem is that the first N bytes are all in ASCII and only later
do you see Windows code page signature (odd lack of utf-8 signature).
> That auto-detection behaviour could be enough to differentiate it from
> the regular open(), thus solving the "but in ten years time it will be
> redundant and will need to be deprecated" objection.
>
> Having said that, I can't say I'm very keen on the name "open_text", but
> I can't think of any other bikeshed colour I prefer.
Given the the functions purpose is to open unicode text use a name that
reflects that it is the encoding that is set not the mode (binary vs. text).
open_unicode maybe?
If you are teaching open_text then do you also need to have open_binary?
Barry
>
>
> --
> Steve
> _______________________________________________
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/VAWFPIAA4WIVLIF4LFJ4OATJK6JDJS2N/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/python-ideas@python.org/message/4LHLZ5QIBOCLIZUVYQ2UXAU6MEX6VMJH/
Code of Conduct: http://python.org/psf/codeofconduct/