[Python-ideas] Re: Adding `open_text()` builtin function. (relating to PEP 597)

2021-01-22 Thread Random832
On Fri, Jan 22, 2021, at 20:34, Inada Naoki wrote:
> * Default encoding is "utf-8".

it might be worthwhile to be a little more sophisticated than this.

Notepad itself uses character set detection [it might not be reasonable to do 
this on the whole file as notepad does, but maybe the first 512 bytes, or the 
result of read1(512)?] when opening a file of unknown encoding, and msvcrt's 
"ccs=UTF-8" option to fopen will at least detect at the presence of UTF-8 and 
UTF-16 BOMs [and treat the file as UTF-16 in the latter case].
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/7TUNPIXTWSWKTFD2LE4UBV5SOOEUBGMY/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Adding `open_text()` builtin function. (relating to PEP 597)

2021-01-22 Thread Chris Angelico
On Sat, Jan 23, 2021 at 12:37 PM Inada Naoki  wrote:
> ## 1. Add `io.open_text()`, builtin `open_text()`, and
> `pathlib.Path.open_text()`.
>
> All functions are same to `io.open()` or `Path.open()`, except:
>
> * Default encoding is "utf-8".
> * "b" is not allowed in the mode option.

I *really* don't like this, because it implies that open() will open
in binary mode.

> How do you think about this idea? Is this worth enough to add a new
> built-in function?

Highly dubious. I'd rather focus on just moving to UTF-8 as the
default, rather than bringing in a new function - especially with such
a confusing name.

What exactly are the blockers on making open(fn) use UTF-8 by default?
Can the proposals be written with that as the ultimate goal (even if
it's going to take X versions and multiple deprecation phases), rather
than aiming for a messy goal where people aren't sure which function
to use?

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/46RCX23FGYZY7YN4EOUL5QXYTQO6OO2H/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Adding `open_text()` builtin function. (relating to PEP 597)

2021-01-22 Thread Inada Naoki
Hi, all.

I am rewriting PEP 597 to introduce a new EncodingWarning, which
subclass of DeprecationWarning and used to warn about future default
encoding change.
But I don't think we can change the default encoding of
`io.TextIOWrapper` and built-in `open()` anytime soon. It is
disruptive change. It may take 10 or more years.

To ease the pain caused by "default encoding is not UTF-8 (almost)
only on Windows" (*), I came up with another idea. This idea is not
mutually exclusive with PEP 597, but I want to include it in the PEP
because both ideas use EncodingWarning.

(*) Imagine that a new Python user writes a text file with notepad.exe
(default encoding is UTF-8 without BOM already) or VS Code, and try to
read it in Jupyter Notebook. They will see UnicodeDecodeError. They
might not know about what encoding yet.


## 1. Add `io.open_text()`, builtin `open_text()`, and
`pathlib.Path.open_text()`.

All functions are same to `io.open()` or `Path.open()`, except:

* Default encoding is "utf-8".
* "b" is not allowed in the mode option.

These functions have two benefits:

* `open_text(filename)` is shorter than `open(filename,
encoding="utf-8")`. Its easy to type especially with autocompletion.
* Type annotation for returned value is simple than `open`. It is
always TextIOWrapper.


## 2. Change the default encoding of `pathlib.Path.read_text()`.

For convenience and consistency with `Path.open_text()`, change the
default encoding of `Path.read_text()` to "utf-8" with regular
deprecation period.

* Python 3.10: `Path.read_text()` emits EncodingWarning when the
encoding option is omitted.
* Python 3.13: `Path.read_text()` change the default encoding to "utf-8".

If PEP 597 is accepted, users can pass `encoding="locale"` instead of
`encoding=locale.getpreferredencoding(False)` when they need to use
locale encoding.

We might change more places where the default encoding is used. But it
should be done slowly and carefully.

---

How do you think about this idea? Is this worth enough to add a new
built-in function?

Regards,

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/PZUYJ5XDY3WDUSBFW7BAFHP3QRYES2GZ/
Code of Conduct: http://python.org/psf/codeofconduct/