[Python-ideas] Re: Provide UTF-8 version of Python for Windows.

2021-01-25 Thread Eryk Sun
On 1/26/21, Eryk Sun  wrote:
>
> The process active code page for GetACP() and GetOEMCP() is changed to
> UTF-8 (65001). The C runtime also overrides the user locale to UTF-8
> if GetACP() returns UTF-8, i.e. setlocale(LC_CTYPE, "") will return
> "utf8" as the encoding.

One concern is what to do for the special "ansi" and "oem" encodings.
If scripts rely on them for IPC, such as with subprocess.Popen(), then
it could be frustrating if they're just synonyms for UTF-8 (code page
65001). I've tested that it's possible for Python to peg "ansi" and
"oem" to the system ANSI and OEM code pages via GetLocaleInfoEx() with
LOCALE_NAME_SYSTEM_DEFAULT and the LCType constants
LOCALE_IDEFAULTANSICODEPAGE and LOCALE_IDEFAULTCODEPAGE (OEM). But
then they're no longer accurate within the current process, for which
ANSI and OEM are UTF-8.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/5RCA3LVRBWVAHGDRGMR5RVAGP647NGDJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Provide UTF-8 version of Python for Windows.

2021-01-25 Thread Inada Naoki
On Tue, Jan 26, 2021 at 4:01 PM Eryk Sun  wrote:
>
> > * Windows team needs to maintain more versions.
>
> I suppose the installer could install both sets of binaries, and copy
> to "python[w][_d].exe" based on an installer option. But then the
> UTF-8 selection statistics wouldn't be tracked, unless the installer
> phones home.

Can pip send `locale.getpreferredencoding(False)` to PyPI?

If so, we can set `PYTHONUTF8` environment variable from the installer too.
Or we can provide small tool to set/unset `PYTHONUTF8` environment variable.

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/35FW2SAYH5JR7FLNZGMSPDMUE2NNVHQN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Adding `open_text()` builtin function. (relating to PEP 597)

2021-01-25 Thread Inada Naoki
On Tue, Jan 26, 2021 at 3:07 PM Guido van Rossum  wrote:
>
>>
>> I agree that. But until we switch to the default encoding of open(),
>> we must recommend to avoid `open(filename)` anyway.
>> The default encoding of VS Code, Atom, Notepad is already UTF-8.
>>
>> Maybe, we need to update the tutorial (*) to use `encoding="utf-8"`.
>
>
> Telling people to always add `encoding='utf8'` makes much more sense to me 
> than introducing a new function and telling them to do that.
>

Ok, I will not add open_utf8() to PEP 597, and update the tutorial to
recommend `encoding="utf-8"`.

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/HXJKDIZUF6TMMHHPDZWQ3PYPFLXX6C66/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Provide UTF-8 version of Python for Windows.

2021-01-25 Thread Eryk Sun
On 1/25/21, Inada Naoki  wrote:
>
> Microsoft provides UTF-8 code page for process. It can be enabled by
> manifest file.
>
> How about providing Python binaris both of "UTF-8 version" and "ANSI
> version"?

I experimented with this manifest setting several months ago. To try
it out, simply export the manifest from "python.exe", edit it to add
the "activeCodePage" setting, and then replace it in "python.exe".

The process active code page for GetACP() and GetOEMCP() is changed to
UTF-8 (65001). The C runtime also overrides the user locale to UTF-8
if GetACP() returns UTF-8, i.e. setlocale(LC_CTYPE, "") will return
"utf8" as the encoding.

The console is hosted in a separate conhost.exe or openconsole.exe
process, so it still defaults to the system OEM code page for its
input and output code pages. This pertains only to low-level os.read()
and os.write(). High-level console I/O uses io._WindowsConsoleIO for
console files, which is internally UTF-16 and outwardly UTF-8.

> * Windows team needs to maintain more versions.

I suppose the installer could install both sets of binaries, and copy
to "python[w][_d].exe" based on an installer option. But then the
UTF-8 selection statistics wouldn't be tracked, unless the installer
phones home.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/76TJ4CMMR2FXQGMKWOQCSBGVBG5DSN3K/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Provide UTF-8 version of Python for Windows.

2021-01-25 Thread Random832
On Mon, Jan 25, 2021, at 22:49, William Pickard wrote:
> Looks like that's only available for Microsoft Store apps only, so it 
> might not be viable for Python.

I think the "Fusion manifest for an unpackaged Win32 app" part applies to 
non-store apps.

[English version of the page: 
https://docs.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-code-page
 ]
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/FIIXK5DISHHDMUMA4UEZYOV2UBMZ2MEF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Provide UTF-8 version of Python for Windows.

2021-01-25 Thread Guido van Rossum
Aren't there too many different Windows installers already? I worry that
it's too hard to choose which one to use (I know I had to ask another
expert :-).

On Mon, Jan 25, 2021 at 7:05 PM Inada Naoki  wrote:

> Sorry for posting multiple threads so quickly.
>
> Microsoft provides UTF-8 code page for process. It can be enabled by
> manifest file.
>
> https://docs.microsoft.com/ja-jp/windows/uwp/design/globalizing/use-utf8-code-page
>
> How about providing Python binaris both of "UTF-8 version" and "ANSI
> version"?
> This idea can provide a more smooth transition of the default encoding.
>
> 1. Provide UTF-8 version since Python 3.10
> 2. (Some years later) Recommend UTF-8 version
> 3. (Some years later) Provide only UTF-8 version
> 4. (Some years later, maybe) Change the default encoding
>
> The upsides of this idea are:
>
> * We don't need to emit a warning for `open(filename)`.
> * We can see the download stats.
>
> Especially, the last point is a huge advantage compared to current
> UTF-8 mode (e.g. PYTHONUTF8=1).
> We can know how many users need legacy behavior in new Python
> versions. That is a very important information for us.
>
> Of course, there are some downsides:
>
> * Windows team needs to maintain more versions.
> * More divisions for "Python on Windows" environment.
>
> Regards,
> --
> Inada Naoki  
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/KMYPF7RKDUHHXLPELA2RZC7TSPUWSHNU/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/MNVQWQEYUYKLIQOXVOFLFOPWY63QRLXN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Adding `open_text()` builtin function. (relating to PEP 597)

2021-01-25 Thread Guido van Rossum
On Mon, Jan 25, 2021 at 5:49 PM Inada Naoki  wrote:

> On Tue, Jan 26, 2021 at 10:22 AM Guido van Rossum 
> wrote:
> >
> >
> > Older Pythons may be easy to drop, but I'm not so sure about older
> unofficial docs. The open() function is very popular and there must be
> millions of blog posts with examples using it, most of them reading text
> files (written by bloggers naive in Python but good at SEO).
> >
> > I would be very sad if the official recommendation had to become "[for
> the most common case] avoid open(filename), use open_text(filename)".
> >
>
> I agree that. But until we switch to the default encoding of open(),
> we must recommend to avoid `open(filename)` anyway.
> The default encoding of VS Code, Atom, Notepad is already UTF-8.
>
> Maybe, we need to update the tutorial (*) to use `encoding="utf-8"`.
>

Telling people to always add `encoding='utf8'` makes much more sense to me
than introducing a new function and telling them to do that.

-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ZT66Q2UMDYJBOKM7GAMTLTPIXFVXZMBG/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Provide UTF-8 version of Python for Windows.

2021-01-25 Thread Inada Naoki
As my understanding, "Fusion manifest for an unpackaged Win32 app" (*)
works for non Store Apps too.
(*) 
https://docs.microsoft.com/ja-jp/windows/uwp/design/globalizing/use-utf8-code-page#examples
-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/MMZBS2QUXP73S6H6YDFUCW5HY2S7RADQ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Provide UTF-8 version of Python for Windows.

2021-01-25 Thread William Pickard
Looks like that's only available for Microsoft Store apps only, so it might not 
be viable for Python.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/H6J6NAG6L6SOT656DIKRMSSFTPNQSZ6V/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Adding `open_text()` builtin function. (relating to PEP 597)

2021-01-25 Thread Matt Wozniski
On Mon, Jan 25, 2021 at 8:51 PM Inada Naoki  wrote:

> On Tue, Jan 26, 2021 at 10:22 AM Guido van Rossum 
> wrote:
> > Older Pythons may be easy to drop, but I'm not so sure about older
> unofficial docs. The open() function is very popular and there must be
> millions of blog posts with examples using it, most of them reading text
> files (written by bloggers naive in Python but good at SEO).
> >
> > I would be very sad if the official recommendation had to become "[for
> the most common case] avoid open(filename), use open_text(filename)".
>
> I agree that. But until we switch to the default encoding of open(),
> we must recommend to avoid `open(filename)` anyway.
> The default encoding of VS Code, Atom, Notepad is already UTF-8.


Maybe we're overthinking this - do we really need to recommend avoiding
`open(filename)` in all cases? Isn't it just fine to use if
`locale.getpreferredencoding(False)` is UTF-8, since in that case there
won't be any change in behavior when `open` switches from the old,
locale-specific default to the new, always UTF-8 default?

If that's the case, then it would be less of a backwards incompatibility
issue, since most production environments will already be using UTF-8 as
the locale (by virtue of it being the norm on Unix systems and servers).

And if that's the case, all we need is a warning that is raised
conditionally when open() is called for text mode without an explicit
encoding when the system locale is not UTF-8, and that warning can say
something like:

Your system is currently configured to use shift_jis for text files.
Beginning in Python 3.13, open() will always use utf-8 for text files
instead.
For compatibility with future Python versions, pass open() the extra
argument:
encoding="shift_jis"

~Matt
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/6C2Y3RELB7PQYNNV5GS2D3H65SOXVD3N/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Provide UTF-8 version of Python for Windows.

2021-01-25 Thread Inada Naoki
Sorry for posting multiple threads so quickly.

Microsoft provides UTF-8 code page for process. It can be enabled by
manifest file.
https://docs.microsoft.com/ja-jp/windows/uwp/design/globalizing/use-utf8-code-page

How about providing Python binaris both of "UTF-8 version" and "ANSI version"?
This idea can provide a more smooth transition of the default encoding.

1. Provide UTF-8 version since Python 3.10
2. (Some years later) Recommend UTF-8 version
3. (Some years later) Provide only UTF-8 version
4. (Some years later, maybe) Change the default encoding

The upsides of this idea are:

* We don't need to emit a warning for `open(filename)`.
* We can see the download stats.

Especially, the last point is a huge advantage compared to current
UTF-8 mode (e.g. PYTHONUTF8=1).
We can know how many users need legacy behavior in new Python
versions. That is a very important information for us.

Of course, there are some downsides:

* Windows team needs to maintain more versions.
* More divisions for "Python on Windows" environment.

Regards,
-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/KMYPF7RKDUHHXLPELA2RZC7TSPUWSHNU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Adding `open_text()` builtin function. (relating to PEP 597)

2021-01-25 Thread Inada Naoki
On Tue, Jan 26, 2021 at 10:22 AM Guido van Rossum  wrote:
>
>
> Older Pythons may be easy to drop, but I'm not so sure about older unofficial 
> docs. The open() function is very popular and there must be millions of blog 
> posts with examples using it, most of them reading text files (written by 
> bloggers naive in Python but good at SEO).
>
> I would be very sad if the official recommendation had to become "[for the 
> most common case] avoid open(filename), use open_text(filename)".
>

I agree that. But until we switch to the default encoding of open(),
we must recommend to avoid `open(filename)` anyway.
The default encoding of VS Code, Atom, Notepad is already UTF-8.

Maybe, we need to update the tutorial (*) to use `encoding="utf-8"`.

(*)  
https://docs.python.org/3.10/tutorial/inputoutput.html#reading-and-writing-files


> BTW remind me what open_text() would do? How would it differ from open() with 
> the same arguments? That's too many messages back.
>

Current proposal is "open_utf8()". The differences from open() are:

* There is no encoding parameter. It uses "utf-8" always. (*)
* "b" is not allowed for mode.

(*) Another option is to use "utf-8-sig" for reading and "utf-8" for
writing. But it has some drawbacks. utf-8-sig has overhead because it
is a wrapper implemented in Python. And TextIOWrapper has fast-paths
for utf-8, but not for utf-8-sig. "utf-8-sig" may be not tested well
compared to "utf-8".

Regards,
-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/BCMUOSHJOA36AKOWKQINNJZYAC2WIBUF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Adding `open_text()` builtin function. (relating to PEP 597)

2021-01-25 Thread Guido van Rossum
On Mon, Jan 25, 2021 at 4:42 PM Steven D'Aprano  wrote:

> On Sat, Jan 23, 2021 at 09:11:27PM +1100, Chris Angelico wrote:
>
> > > On the other hand, if we add `open_text()`:
> > >
> > > * Replacing open with open_text is easier than adding `,
> encoding="utf-8"`.
> > > * Teachers can teach to use `open_text` to open text files. Students
> > > can use "utf-8" by default without knowing about what encoding is.
> > >
> > > So `open_text()` can provide better developer experience, without
> > > waiting 10 years.
> >
> > But this has a far worse end goal - two open functions with subtly
> > incompatible defaults, and a big question of "why should I choose this
> > over that".
>
> It has an easy answer:
>
> - Are you opening a text file and you don't know about or want to deal
>   with encodings? Use `open_text`.
>
> - Otherwise, use `open`.
>
> I think that if we moved to an open_text() builtin, it should have the
> simplest possible signature:
>
> open_text(filename, mode='r')
>
> If you care about anything beyond that, use `open`.
>
>
> > And if you start using open_text, suddenly your code won't
> > work on older Pythons.
>
> "Using older Pythons" is mostly a concern for library maintainers, not
> beginners. A few years from now, Python 3.10 will be the oldest version
> the great majority of beginners will care about, and 3.9 will be as
> irrelevant to them as 3.4 is to us today.
>
> Library maintainers always have to deal with the issue of not being able
> to use the newest functionality, it doesn't prevent us from adding new
> functionality.
>

Older Pythons may be easy to drop, but I'm not so sure about older
unofficial docs. The open() function is very popular and there must be
millions of blog posts with examples using it, most of them reading text
files (written by bloggers naive in Python but good at SEO).

I would be very sad if the official recommendation had to become "[for the
most common case] avoid open(filename), use open_text(filename)".

BTW remind me what open_text() would do? How would it differ from open()
with the same arguments? That's too many messages back.

-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/QPKA3SOCHMFMGZXW7YBCTSDMVQ6B6BHW/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Adding `open_text()` builtin function. (relating to PEP 597)

2021-01-25 Thread Steven D'Aprano
On Sat, Jan 23, 2021 at 09:11:27PM +1100, Chris Angelico wrote:

> > On the other hand, if we add `open_text()`:
> >
> > * Replacing open with open_text is easier than adding `, encoding="utf-8"`.
> > * Teachers can teach to use `open_text` to open text files. Students
> > can use "utf-8" by default without knowing about what encoding is.
> >
> > So `open_text()` can provide better developer experience, without
> > waiting 10 years.
> 
> But this has a far worse end goal - two open functions with subtly
> incompatible defaults, and a big question of "why should I choose this
> over that".

It has an easy answer:

- Are you opening a text file and you don't know about or want to deal 
  with encodings? Use `open_text`.

- Otherwise, use `open`.

I think that if we moved to an open_text() builtin, it should have the 
simplest possible signature:

open_text(filename, mode='r')

If you care about anything beyond that, use `open`.


> And if you start using open_text, suddenly your code won't
> work on older Pythons.

"Using older Pythons" is mostly a concern for library maintainers, not 
beginners. A few years from now, Python 3.10 will be the oldest version 
the great majority of beginners will care about, and 3.9 will be as 
irrelevant to them as 3.4 is to us today.

Library maintainers always have to deal with the issue of not being able 
to use the newest functionality, it doesn't prevent us from adding new 
functionality.



-- 
Steve
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/4K7U5KEXEIURFB36ML2GSMJD4HEQ7ZZL/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Adding `open_text()` builtin function. (relating to PEP 597)

2021-01-25 Thread Steven D'Aprano
Thanks Matt for the detailed explanation for why we cannot change `open` 
to do encoding detection by default. I think that should answer Guido's 
question.

It still leaves open the possibility of:

- a new mode to open() that opts-in to encoding detection;

- a new built-in function that is only used for opening text files (not 
  pipes) with encoding detection by default;

- or a new function that attempts the detection:

enc = io.guess_encoding(FILENAME) or 'UTF-8'
with open(FILENAME, encoding=enc) as f:
...


These may be useful, but I don't think that they are very helpful for 
solving the problem of naive programmers who don't know anything about 
encodings trying to open files which are encoded differently from the 
system encoding. Such users aren't knowledgable enough to know that they 
should opt-in to encoding detection. If they were, they would probably 
just set the encoding to "utf-8" in the first place.

-- 
Steve
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/IUNLC2JQYSAQ3IC6DWPGMWKQS5FWQDEK/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: dataclass: __init__ kwargs and Optional[type]

2021-01-25 Thread Paul Bryan via Python-ideas
Added myself to the nosy list.

Seems like the suggested kw_only: str variant—perhaps it should also
support Iterable—would allow continued support for positional
arguments. Is your idea around positional-only arguments different? 

Another thought  I had was around Optional and default-None semantics:
Optional[type] should safely default to None in __init__; I'm thinking
the opposite could also be supported (default of None can imply
Optional type in annotation). 

Given various objectives, would it make sense to codify in a PEP to
build consensus?


On Mon, 2021-01-25 at 11:07 -0500, Eric V. Smith wrote:
> See https://bugs.python.org/issue33129. I've not done much with this
> issue, because I'm not crazy about the API, and I'd like to do
> something with positionally-only arguments at the same time. But I
> think in general it's a good idea.
> Eric
> On 1/24/2021 3:07 PM, Paul Bryan via Python-ideas wrote:
> 
> The main benefits of this proposal:
> 
> - the order of fields (those with defaults, those without) is
> irrelevant
> - don't need to pedantically add default = None for Optional values
> 
> On Sun, 2021-01-24 at 19:46 +, Paul Bryan via Python-ideas wrote:
> 
> > I've created a helper class in my own library that enhances the
> > existing dataclass:
> > 
> > a) __init__ accepts keyword-only arguments,
> > b) Optional[...] attribute without a specified default value would
> > default to None in __init__.
> > 
> > I think this could be useful in stdlib. I'm thinking a dataclass
> > decorator parameter like "init_kwonly" (default=False to provide
> > backward compatibility) that if True would implement this behavior.
> > 
> > Thoughts?
> > 
> > ___
> > Python-ideas mailing list -- python-ideas@python.org
> > To unsubscribe send an email to python-ideas-le...@python.org
> > https://mail.python.org/mailman3/lists/python-ideas.python.org/
> > Message archived at
> > https://mail.python.org/archives/list/python-ideas@python.org/message/2NLPDOV2XJBQU5LX3SA3XEQ6CTOQEZA7/
> > Code of Conduct: http://python.org/psf/codeofconduct/
> 
> 
> 
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-ideas@python.org/message/T2BBB4LA3LLPRZRF4TRR6YBROTX7CGBD/
> Code of Conduct: http://python.org/psf/codeofconduct/
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/LP6LCIZ6UOPMWSVALPWF67TC7CA5BZME/
> Code of Conduct: http://python.org/psf/codeofconduct/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/IUEZPYGYVBQAGV6RJTLAT3WAK62ASIYR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Changing the default text encoding of pathlib

2021-01-25 Thread Paul Moore
On Mon, 25 Jan 2021 at 20:02, Christopher Barker  wrote:
> using a system setting as a default is a really bad idea in this day of 
> interconnected computers.

I'd mildly dispute this. There are (significant) downsides with the
default behaviour being system-dependent, yes, but there are *also*
disadvantages in having Python not behave consistently with other
tools/programs on the same system.

However, on POSIX, things are generally consistent, and *already*
default to UTF-8. So the proposal is mostly going to affect Windows.
And on Windows, there's not much consistency even on a single machine
at the moment. Between OEM and ANSI codepages, and other tools that
default to UTF-8 "because that's the future", there's not much
platform consistency for Python to conform to anyway...

> But back to PEP 597, and how to get there:
>
> 1) We need to start with a consensus about where we want Python to be in N 
> versions. That is not specifically laid out in the PEP but it does imply that 
> in the sometime-long-in-the-future:
>
> - TextIOWrapper will have utf-8 as the default, rather than 
> `locale.getpreferredencoding(False)`
> this behaviour will then be inherited by:
> - `open()` without a binary flag in the mode
>
> - `Path.read_text`
> - there will be a string that can be passed to encoding that will indicate 
> that the system default should be used.
>
> (and any other utility functions that use TextIOWrapper)
>
> Forgive me if there is already a consensus on this -- but this discussion has 
> brought up some thoughts.

There's a fundamental assumption here that I think needs to be made
explicit. Which is that we're assuming that whatever N happens to be,
we anticipate that `locale.getpreferredencoding(False)` will still be
something other than UTF-8. That's *already* false on most POSIX
systems, and TBH I get the impression that Microsoft is pushing quite
hard to move Windows 10 to a UTF-8 by default position (although
"fast" in Microsoft terms may still be slow to the rest of us ;-))

So I think that the real question here is "do we want to move Python
to "UTF8-by-default" faster than the OS vendors are going? And I think
that the answer to that is much less obvious. It probably also depends
heavily on your locale - I doubt it's an accident that Inada-san¹ is
proposing this, and he's from Japan :-) Personally, as an English
speaker based in the UK, I'll be happy when UTF-8 is the default
everywhere, but I can live with the status quo until that happens. But
I'm not the main target for this change.

> 1) As TextIOWrapper is an "implementation detail" for most Python developers, 
> maybe it shouldn't have a default encoding at all, and leave the default 
> implementation(s) up to the helper functions, like open() and 
> Path.read_text() -- that would mean changes in more places, but would allow 
> different utility functions to make different choices.

*shrug*. That sounds plausible, but it's a backward compatibility
break that doesn't offer any significant benefits, so I suspect it's
not worth doing in practice.

> 2) Inada proposed an open_text() function be introduced as a stepping stone, 
> with the new behaviour. This led to one person asking if that would imply a 
> open_binary() function as well. An answer to that was no -- as no one is 
> suggesting any changes to open()'s behavior for binary files.
> However, I kind of like the idea. We now have two (at least) different file 
> objects potentially returned by open(): TextIOWrapper, and 
> BufferedReader/Writer. And the TextIOWrapper has some pretty different 
> behavior. I *think* that in virtually all cases, when the code is written, 
> the author knows whether they want a binary or text file, so it may make 
> sense to have two different open() functions, rather than having the Type 
> returned be a function of what mode flags are passed.
>
> This would make it easier for people (and tools) to reason about the code 
> with static analysis:
>
> e.g.:
>
> open_text().read() would return a string
> open_binary().read() would return bytes

These are good arguments for having explicit open_text and open_binary
functions. I don't *like* the idea, because they feel unnecessarily
verbose to me, but I can accept that this might just be because I'm
used to open().

I do think that having open_text, but *not* having open_binary, would
be a bit confusing. Particularly as pathlib has read_text and
read_binary, so it would be inconsistent as well.

> This would also make the path to a future with different defaults smoother -- 
> plain "open" gets deprecated -- any new code uses one of the open_* 
> functions, and that new code will never need to be changed again.
>
> Back in the day, a single open() function made more sense. After all, the 
> only difference in the result for binary mode was that linefeed translation 
> was turned off (and the C legacy of course). In fact, this did lead to 
> errors, when folks accidentally left off the 'b', and tested only

[Python-ideas] Re: Changing the default text encoding of pathlib

2021-01-25 Thread Christopher Barker
On Sun, Jan 24, 2021 at 6:33 PM Inada Naoki  wrote:

> My previous thread is hijacked about "auto guessing" idea,


yes -- I'm a bit confused by that -- are folks advocating for making some
sort of encoding detection the default? or available as an option in the
stdlib? -- in any case, Ithink that could be an independent proposal.

First: I really want to see this get pushed forward and get done, one way
or another -- using a system setting as a default is a really bad idea in
this day of interconnected computers.

But back to PEP 597, and how to get there:

1) We need to start with a consensus about where we want Python to be in N
versions. That is not specifically laid out in the PEP but it does imply
that in the sometime-long-in-the-future:

- TextIOWrapper will have utf-8 as the default, rather than
`locale.getpreferredencoding(False)`
this behaviour will then be inherited by:
- `open()` without a binary flag in the mode

- `Path.read_text`
- there will be a string that can be passed to encoding that will indicate
that the system default should be used.

(and any other utility functions that use TextIOWrapper)

Forgive me if there is already a consensus on this -- but this discussion
has brought up some thoughts.

1) As TextIOWrapper is an "implementation detail" for most Python
developers, maybe it shouldn't have a default encoding at all, and leave
the default implementation(s) up to the helper functions, like open() and
Path.read_text() -- that would mean changes in more places, but would allow
different utility functions to make different choices.

2) Inada proposed an open_text() function be introduced as a stepping
stone, with the new behaviour. This led to one person asking if that would
imply a open_binary() function as well. An answer to that was no -- as no
one is suggesting any changes to open()'s behavior for binary files.
However, I kind of like the idea. We now have two (at least) different file
objects potentially returned by open(): TextIOWrapper, and
BufferedReader/Writer. And the TextIOWrapper has some pretty different
behavior. I *think* that in virtually all cases, when the code is written,
the author knows whether they want a binary or text file, so it may make
sense to have two different open() functions, rather than having the Type
returned be a function of what mode flags are passed.

This would make it easier for people (and tools) to reason about the code
with static analysis:

e.g.:

open_text().read() would return a string
open_binary().read() would return bytes

This would also make the path to a future with different defaults smoother
-- plain "open" gets deprecated -- any new code uses one of the open_*
functions, and that new code will never need to be changed again.

Back in the day, a single open() function made more sense. After all, the
only difference in the result for binary mode was that linefeed translation
was turned off (and the C legacy of course). In fact, this did lead to
errors, when folks accidentally left off the 'b', and tested only on *nix
systems. That, at least, is less of an issue now; as the text and binary
objects are more different, you are far more likely to get errors right
away -- but still at run time -- static analysis is still tricky.


On to:

> Path.open() was added in Python 3.4. Path.read_text() and

> Path.write_text() was added in Python 3.5.
> Their history is shorter than built-in open(). Changing its default
> encoding should be easier than built-in open and TextIOWrapper.
> New default encodings are:
>
> * read_text() default encoding is "utf-8-sig"
> * write_text() default encoding is "utf-8"
> * open() default encoding is "utf-8-sig" when mode is "r" or None,
> "utf-8" otherwise.
>

How do you think this idea?
>

+1 there is a lot less legacy with Path -- we can move faster. And I
honestly still wonder if making utf-8 the default with cause or fix more
bugs :-)

A thought on that -- there is currently both kinds of code "in the wild":
 (A) code that uses the default, when they really want utf-8 -- currently a
bug, won't be a bug in the future.
 (B) code that uses the default when it really does want the system
encoding. -- currently correct, will become a bug in the future

It's anyone's guess which of these is more common, but one thing to
consider is that (A) is a hidden bug that might reveal itself in the hands
of end users who knows when in the future. Whereas (B) will be a bug that
is likely to reveal itself fairly quickly (though perhaps also in the
(confused) hands of end users as well)

-Chris B

-- 
Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
h

[Python-ideas] Re: dataclass: __init__ kwargs and Optional[type]

2021-01-25 Thread Eric V. Smith
See https://bugs.python.org/issue33129. I've not done much with this 
issue, because I'm not crazy about the API, and I'd like to do something 
with positionally-only arguments at the same time. But I think in 
general it's a good idea.


Eric

On 1/24/2021 3:07 PM, Paul Bryan via Python-ideas wrote:

The main benefits of this proposal:

- the order of fields (those with defaults, those without) is irrelevant
- don't need to pedantically add default = None for Optional values

On Sun, 2021-01-24 at 19:46 +, Paul Bryan via Python-ideas wrote:
I've created a helper class in my own library that enhances the 
existing dataclass:


a) __init__ accepts keyword-only arguments,
b) Optional[...] attribute without a specified default value would 
default to None in __init__.


I think this could be useful in stdlib. I'm thinking a dataclass 
decorator parameter like "init_kwonly" (default=False to provide 
backward compatibility) that if True would implement this behavior.


Thoughts?

___
Python-ideas mailing list -- python-ideas@python.org 

To unsubscribe send an email to python-ideas-le...@python.org 

https://mail.python.org/mailman3/lists/python-ideas.python.org/ 

Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/2NLPDOV2XJBQU5LX3SA3XEQ6CTOQEZA7/ 

Code of Conduct: http://python.org/psf/codeofconduct/ 




___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/T2BBB4LA3LLPRZRF4TRR6YBROTX7CGBD/
Code of Conduct: http://python.org/psf/codeofconduct/
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/LP6LCIZ6UOPMWSVALPWF67TC7CA5BZME/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Adding `open_text()` builtin function. (relating to PEP 597)

2021-01-25 Thread Matt Wozniski
On Mon, Jan 25, 2021, 4:25 AM Steven D'Aprano  wrote:

> On Sun, Jan 24, 2021 at 10:43:54PM -0500, Matt Wozniski wrote:
> > And
> > `f.read(1)` needs to pick one of those and return it immediately. It
> can't
> > wait for more information. The contract of `read` is "Read from
> underlying
> > buffer until we have n characters or we hit EOF."
>
> In text mode, reads are always buffered:
>
> https://docs.python.org/3/library/functions.html#open
>
> so `f.read(1)` will read as much as needed, so long as it only returns a
> single character.
>

Text mode files are always backed by a buffer, yes, but that's not
relevant. My point is that `f.read(1)` must immediately return a character
if one exists in the buffer. It can't wait for more data to get buffered if
there is already a buffered character, as that would be a backwards
incompatible change that would badly break line based protocols like FTP,
SMTP, and POP.

Up until now, `f.read(1)` has always read bytes from the underlying file
descriptor into the buffer until it has one full character, and immediately
returned it. And this is user facing behavior. Imagine an echo server that
reads 1 character at a time and echoes it back, forever. The client will
only ever send 1 character at a time, so if an eight bit locale encoding is
in use the client will only send one byte before waiting for a response. As
things stand today this works. If encoding detection were added and the
server's call to `f.read(1)` could decide it doesn't know how to decode the
first byte it gets and to block until more data comes in, that would be a
deadlock, since the client isn't sending more.

A typical buffer size is 4096 bytes, or more.


Sure, but that doesn't mean that much data is always available. If
something has written less than that, it's not reasonable to block until
more data can be buffered in places where up until now no blocking would
have occurred. Not least because no more data will necessarily ever come.

And if it were to instead make its decisions based on what has been
buffered already, without ever blocking, then the behavior becomes
nondeterministic: it could return a different character based on how much
data the OS returned in the first read syscall.

In any case, I believe the intention of this proposal is for *open*, not
> read, to perform the detection.


If that's the case, named pipes are a perfect example of why that's
impossible. It's perfectly normal to open a named pipe that contains no
data, and that won't until you trigger some action (say, spawning a child
process that will write to it). You can't auto detect the encoding of an
empty pipe, and you can't make open block until data arrives because it's
entirely possible data will never arrive if open blocks.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/GUL5VOYGDEE3MSC2KDWZ7RNDP2ZMJGAS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Adding `open_text()` builtin function. (relating to PEP 597)

2021-01-25 Thread Steven D'Aprano
On Sun, Jan 24, 2021 at 10:43:54PM -0500, Matt Wozniski wrote:
> On Sun, Jan 24, 2021 at 9:53 AM <2qdxy4rzwzuui...@potatochowder.com> wrote:
> 
> > On 2021-01-25 at 00:29:41 +1100,
> > Steven D'Aprano  wrote:
> >
> > > On Sat, Jan 23, 2021 at 03:24:12PM +, Barry Scott wrote:
> > > > First problem I see is that the file may be a pipe and then you will
> > block
> > > > until you have enough data to do the auto detect.
> > >
> > > Can you use `open('filename')` to read a pipe?
> >
> > Yes.  Named pipes are files, at least on POSIX.
> >
> > And no.  Unnamed pipes are identified by OS-level file descriptors, so
> > you can't open them with open('filename'),
> >
> 
> The `open` function takes either a file path as a string, or a file
> descriptor as an integer. So you can use `open` to read an unnamed pipe or
> a socket.

Okay, but I was asking about using open with a filename string. In any 
case, the existence of named pipes answers my question.


[...]
> It's possible to do a `f.read(1)` on a file opened in text mode. If the
> first two bytes of the file are 0xC2 0x99, that's either ™ if the file is
> UTF-8, or 슙 if the file is UTF-16BE, or 駂 if the file is UTF-16LE.

Or  followed by the SGC control code in Latin-1. Or ™ in Windows-1252, 
or ¬ô in MacRoman. Etc.


> And
> `f.read(1)` needs to pick one of those and return it immediately. It can't
> wait for more information. The contract of `read` is "Read from underlying
> buffer until we have n characters or we hit EOF."

In text mode, reads are always buffered:

https://docs.python.org/3/library/functions.html#open

so `f.read(1)` will read as much as needed, so long as it only returns a 
single character.

A typical buffer size is 4096 bytes, or more.

In any case, I believe the intention of this proposal is for *open*, not 
read, to perform the detection.



-- 
Steve
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/OCMXGX7RY3EMKBNM6HMF72INK7K7FNVJ/
Code of Conduct: http://python.org/psf/codeofconduct/