[Python-ideas] Re: Enum: determining if a value is valid

2021-03-16 Thread Matt Wozniski
That's a problem with any attempt to find an enum member by value, since
values aren't guaranteed to be unique. With either proposal, we'd just need
to pick one - probably the one that appears first in the class dict.

On Tue, Mar 16, 2021, 2:39 PM Marco Sulla 
wrote:

> On Tue, 16 Mar 2021 at 05:38, Matt Wozniski  wrote:
> > Color.from_value(1)  # returns Color.RED
>
> What if I have an alias?
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/NQ535PUFCWRBBN5QVTGB7QOBNJNJJEPO/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/B7PEYLOVWD7KOI4SUUXLU224N6XAJ2JR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Enum: determining if a value is valid

2021-03-15 Thread Matt Wozniski
I find the idea of having the constructor potentially return something
other than an instance of the class to be very... off-putting. Maybe it's
the best option, but my first impression of it isn't favorable, and I can't
think of any similar case that exists in the stdlib today off the top of my
head. It seems like we should be able to do better.

If I might propose an alternative before this gets set in stone: what if
`Enum` provided classmethods `from_value` and `from_name`, each with a
`default=`, so that you could do:

Color.from_value(1)  # returns Color.RED
Color.from_value(-1)  # raises ValueError
Color.from_value(-1, None)  # returns None

Color.from_name("RED")  # returns Color.RED
Color.from_name("BLURPLE")  # raises ValueError
Color.from_name("BLURPLE", None)  # returns None

That still allows each concept to be expressed in a single line, and
remains explicit about whether the lookup is happening by name or by value.
It allows spelling `default=None` as just `None`, as we desire. And instead
of being a `__contains__` with unusual semantics coupled with a constructor
with unusual semantics, it's a pair of class methods that each have fairly
unsurprising semantics.

~Matt

On Mon, Mar 15, 2021 at 3:55 PM Guido van Rossum  wrote:

> +1
>
> On Mon, Mar 15, 2021 at 12:48 PM Ethan Furman  wrote:
>
>> On 3/15/21 11:27 AM, Guido van Rossum wrote:
>> > On Mon, Mar 15, 2021 at 10:53 AM Ethan Furman wrote:
>>
>> >> Part of the reason is that there are really two ways to identify an
>> >> enum -- by name, and by value -- which should `__contains__` work with?
>> >
>> > The two sets don't overlap, so we could allow both. (Funny
>> > interpretations of `__contains__` are not unusual, e.g.
>> > substring checks are spelled 'abc' in 'fooabcbar'.)
>>
>> They could overlap if the Enum is a `str`-subclass -- although having the
>> name of one member match the value of a different member seems odd.
>>
>> >> I think I like your constructor change idea, with a small twist:
>> >>
>> >>   Color(value=, name=, default=)
>> >>
>> >> This would make it possible to search for an enum by value or by name,
>> >> and also specify a default return value (raising an exception if the
>> >> default is not set and a member cannot be found).
>> >
>> >
>> > So specifically this would allow (hope my shorthand is clear):
>> > ```
>> > Color['RED'] --> Color.RED or raises
>> > Color(1) -> Color.RED or raises
>> > Color(1, default=None) -> Color.RED or None
>> > Color(name='RED', default=None) -> Color.RED or None
>> > ```
>> > This seems superficially reasonable. I'm not sure what
>> > Color(value=1, name='RED') would do -- insist that both value and
>> > name match? Would that have a use case?
>>
>> I would enforce that both match, or raise.  Also not sure what the
>> use-case would be.
>>
>> > My remaining concern is that it's fairly verbose -- assuming we don't
>> > really need the name argument, it would be attractive if we could
>> > write Color(1, None) instead of Color(1, default=None).
>> >
>> > Note that instead of Color(name='RED') we can already write this:
>> > ```
>> > getattr(Color, 'RED') -> Color.RED or raises
>> > getattr(Color, 'RED', None) -> Color.RED or None
>>
>> Very good points.
>>
>> Everything considered, I think I like allowing `__contains__` to verify
>> both names and values, adding `default=` to the constructor for
>> the value-based "gimme an Enum or None" case, and recommending  `getattr`
>> for the name-based "gimme an Enum or None" case.
>>
>> --
>> ~Ethan~
>> ___
>> Python-ideas mailing list -- python-ideas@python.org
>> To unsubscribe send an email to python-ideas-le...@python.org
>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-ideas@python.org/message/UQBSDZQJWBKMOVSUES7HEDJTYR76Y5N2/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
> --
> --Guido van Rossum (python.org/~guido)
> *Pronouns: he/him **(why is my pronoun here?)*
> 
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/ZK7KKABFNSFC4UY763262O2VIPZ5YDPQ/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/LJ3UQT3JDKFO4F2YBEPL6DFLPADQ4ESR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: dataclasses: position-only and keyword-only fields

2021-03-13 Thread Matt Wozniski
Oops, sent a reply too soon.

On Sat, Mar 13, 2021 at 3:14 PM Eric V. Smith  wrote:

> The thing is, even without being able to switch back and forth within a
> single dataclass, you could achieve the same thing with inheritance:
>
...
>
> In both cases, you'd get re-ordered fields in __init__, and nowhere else:
>
> def __init__(c, d, *, a, b, e, f):
>
> repr, comparisons, etc. would still treat them in today's order: a, b, c,
> d, e, f.
>
...
>
> And the same logic would apply to positional argument fields
>
This seems like another disadvantage of allowing positional-only arguments.
If positional-only fields show up just like keyword fields in an arbitrary
position in the repr, the repr will cease to be a representation of a call
to the dataclass's constructor suitable for passing to `eval`, as it is
today when init-only parameters are not in use.

~Matt
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/NZ7E5NKZLBK6EW243D34F3YGAMFPHUTO/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: dataclasses: position-only and keyword-only fields

2021-03-13 Thread Matt Wozniski
On Sat, Mar 13, 2021 at 3:14 PM Eric V. Smith  wrote:

> On 3/13/2021 2:51 PM, Abdulla Al Kathiri wrote:
>
> I don’t like the idea of going back and fourth for positional and keyword
> arguments. The positional arguments have to stay at the top of some
> variable (be it anything, e.g, __pos: Any). The mixed stays in between the
> two markers. And the keyword arguments come after yet another predefined
> variable with leading dunder (e.g., __kw:Any). Please if you are going to
> do this for dataclasses, it has to match the function signature. This will
> be much easier for us to use, read, and teach. Going back and forth will
> cause a lot of confusion, as the order of the arguments in the init method
> will not follow the same order as the arguments defined in the dataclass.
> Thanks to whoever mentioned this in this email chain.
>
> Thething is, even without being able to switch back and forth within a
> single dataclass, you could achieve the same thing with inheritance:... In
> both cases, you'd get re-ordered fields in __init__, and nowhere else:
>
> def __init__(c, d, *, a, b, e, f):
>
> repr, comparisons, etc. would still treat them in today's order: a, b, c,
> d, e, f.
>
> Other than putting in logic to call that an error, which I wouldn't want
> to do, it would be allowable. Why not allow the shortcut version if that's
> what people want to do? Again, I'm not saying this needs to be a day one
> feature using "__kw_only__: ArgumentMarker" (or however it's ultimately
> spelled). I just don't want to rule it out in case we come up with some
> reason it's important.
>
> I just checked and attrs allows that last case:
>
> @attr.s
> class A:
> a = attr.ib(kw_only=True)
> b = attr.ib(kw_only=True)
> c = attr.ib()
> d = attr.ib()
> e = attr.ib(kw_only=True)
> f = attr.ib(kw_only=True)
>
> Which generates help like:
>
> class A(builtins.object)
>  |  A(c, d, *, a, b, e, f) -> None
>
> The main reason to allow the switching back and forth is to support
> subclassing dataclasses that already have normal and keyword-only fields.
> If you didn't allow this, you'd have to say that the MostDerived class
> above would be an error because the __init__ looks like the parameters have
> been rearranged.
>
> And the same logic would apply to positional argument fields.
>
> I just don't see the need to prohibit it in general. Any tutorial would
> probably show the fields in the order you describe above: positional,
> normal, keyword-only.
>
> Eric
>
>
> Abdulla
>
> Sent from my iPhone
>
> On 13 Mar 2021, at 9:30 PM, Paul Bryan  
> wrote:
>
> 
> +1 to Matt's points here. I get the desire for symmetry with / and * in
> params, but I'm not convinced it's useful enough to warrant the complexity
> of the approaches being proposes. I think a @dataclass(..., kwonly=True)
> would solve > 90% of the issues with dataclass usability today.
>
> On Sat, 2021-03-13 at 06:41 -0500, Matt Wozniski wrote:
>
> On Fri, Mar 12, 2021, 11:55 PM Eric V. Smith  wrote:
>
> There have been many requests to add keyword-only fields to dataclasses.
> These fields would result in __init__ parameters that are keyword-only.
> As long as I'm doing this, I'd like to add positional-only fields as well.
>
>
> Have there also been requests for positional-only fields?
>
> The more I digest this idea, the more supporting positional-only fields
> sounds like a bad idea to me. The motivation for adding positional-only
> arguments to the language was a) that some built-in functions take only
> positional arguments, and there was no consistent way to document that and
> no way to match their interface with pure Python functions, b) that some
> parameters have no semantic meaning and making their names part of the
> public API forces library authors to maintain backwards compatibility on
> totally arbitrary names, and c) that functions like `dict.update` that take
> arbitrary keyword arguments must have positional-only parameters in order
> to not artificially reduce the set of keyword arguments that may be passed
> (e.g., `some_dict.update(self=5)`).
>
> None of these cases seem to apply to dataclasses. There are no existing
> dataclasses that take positional-only arguments that we need consistency
> with. Dataclasses' constructors don't take arbitrary keyword arguments in
> excess of their declared fields. And most crucially, the field names become
> part of the public API of the class. Dataclass fields can never be renamed
> without a risk of breaking existing users. Taking your example from the
> other thread:
>
> 

[Python-ideas] Re: dataclasses: position-only and keyword-only fields

2021-03-13 Thread Matt Wozniski
On Fri, Mar 12, 2021, 11:55 PM Eric V. Smith  wrote:

> There have been many requests to add keyword-only fields to dataclasses.
> These fields would result in __init__ parameters that are keyword-only.
> As long as I'm doing this, I'd like to add positional-only fields as well.
>

Have there also been requests for positional-only fields?

>
The more I digest this idea, the more supporting positional-only fields
sounds like a bad idea to me. The motivation for adding positional-only
arguments to the language was a) that some built-in functions take only
positional arguments, and there was no consistent way to document that and
no way to match their interface with pure Python functions, b) that some
parameters have no semantic meaning and making their names part of the
public API forces library authors to maintain backwards compatibility on
totally arbitrary names, and c) that functions like `dict.update` that take
arbitrary keyword arguments must have positional-only parameters in order
to not artificially reduce the set of keyword arguments that may be passed
(e.g., `some_dict.update(self=5)`).

None of these cases seem to apply to dataclasses. There are no existing
dataclasses that take positional-only arguments that we need consistency
with. Dataclasses' constructors don't take arbitrary keyword arguments in
excess of their declared fields. And most crucially, the field names become
part of the public API of the class. Dataclass fields can never be renamed
without a risk of breaking existing users. Taking your example from the
other thread:

```
@dataclasses.dataclass
class Comparator:
a: Any
b: Any
_: dataclasses.KEYWORD_ONLY
key: Optional[Callable[whatever]] = None
```

The names `a` and `b` seem arbitrary, but they're not used only in the
constructor, they're also available as attributes of the instances of
Comparator, and dictionary keys in the `asdict()` return. Even if they were
positional-only arguments to the constructor, that would forbid calling

comp = Comparator(a=1, b=2, key=operator.lt)

but it would still be possible to call

comp = Comparator(1, 2, key=operator.lt)
print(comp.a, comp.b)

Preventing them from being passed by name to the constructor seems to be
adding an inconsistency, not removing one.

Perhaps it makes sense to be able to make init-only variables be
positional-only, since they don't become part of the class's public API,
but in that case it seems it could just be a flag passed to `InitVar`.
Outside of init-only variables, positional-only arguments seem like a
misfeature to me.

~Matt

>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/LPUHXVDHW7QQR4KNOQQTNWT6UJ2CHDBI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: dataclasses: position-only and keyword-only fields

2021-03-12 Thread Matt Wozniski
On Fri, Mar 12, 2021, 11:55 PM Eric V. Smith  wrote:

> I should mention another idea that showed up on python-ideas, at
>
> https://mail.python.org/archives/list/python-ideas@python.org/message/WBL4X46QG2HY5ZQWYVX4MXG5LK7QXBWB/
> . It would allow you to specify the flag via code like:
>
> @dataclasses.dataclass
> class Parent:
>  with dataclasses.positional():
>  a: int
>  c: bool = False
>  with dataclasses.keyword():
>  e: list
>
> I'm not crazy about it, and it looks like it would require stack
> inspection to get it to work, but I mention it here for completeness.


I think stack inspection could be avoided if we did something like:

```
@dataclasses.dataclass
class Parent:
 class pos(dataclasses.PositionalOnly):
 a: int
 c: bool = False
 class kw(dataclasses.KeywordOnly):
 e: list
```

Like your proposal, the names for the two inner classes can be anything,
but they must be unique. The metaclass would check if a field in the new
class's namespace was a subclass of PositionalOnly or KeywordOnly, and if
so recurse into its annotations to collect more fields.

This still seems hacky, but it seems to read reasonably nicely, and behaves
obviously in the presence of subclassing.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/IFE35VNDZH5YUNXY23I53QBDCUFB7GRQ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Adding `open_text()` builtin function. (relating to PEP 597)

2021-01-25 Thread Matt Wozniski
On Mon, Jan 25, 2021 at 8:51 PM Inada Naoki  wrote:

> On Tue, Jan 26, 2021 at 10:22 AM Guido van Rossum 
> wrote:
> > Older Pythons may be easy to drop, but I'm not so sure about older
> unofficial docs. The open() function is very popular and there must be
> millions of blog posts with examples using it, most of them reading text
> files (written by bloggers naive in Python but good at SEO).
> >
> > I would be very sad if the official recommendation had to become "[for
> the most common case] avoid open(filename), use open_text(filename)".
>
> I agree that. But until we switch to the default encoding of open(),
> we must recommend to avoid `open(filename)` anyway.
> The default encoding of VS Code, Atom, Notepad is already UTF-8.


Maybe we're overthinking this - do we really need to recommend avoiding
`open(filename)` in all cases? Isn't it just fine to use if
`locale.getpreferredencoding(False)` is UTF-8, since in that case there
won't be any change in behavior when `open` switches from the old,
locale-specific default to the new, always UTF-8 default?

If that's the case, then it would be less of a backwards incompatibility
issue, since most production environments will already be using UTF-8 as
the locale (by virtue of it being the norm on Unix systems and servers).

And if that's the case, all we need is a warning that is raised
conditionally when open() is called for text mode without an explicit
encoding when the system locale is not UTF-8, and that warning can say
something like:

Your system is currently configured to use shift_jis for text files.
Beginning in Python 3.13, open() will always use utf-8 for text files
instead.
For compatibility with future Python versions, pass open() the extra
argument:
encoding="shift_jis"

~Matt
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/6C2Y3RELB7PQYNNV5GS2D3H65SOXVD3N/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Adding `open_text()` builtin function. (relating to PEP 597)

2021-01-25 Thread Matt Wozniski
On Mon, Jan 25, 2021, 4:25 AM Steven D'Aprano  wrote:

> On Sun, Jan 24, 2021 at 10:43:54PM -0500, Matt Wozniski wrote:
> > And
> > `f.read(1)` needs to pick one of those and return it immediately. It
> can't
> > wait for more information. The contract of `read` is "Read from
> underlying
> > buffer until we have n characters or we hit EOF."
>
> In text mode, reads are always buffered:
>
> https://docs.python.org/3/library/functions.html#open
>
> so `f.read(1)` will read as much as needed, so long as it only returns a
> single character.
>

Text mode files are always backed by a buffer, yes, but that's not
relevant. My point is that `f.read(1)` must immediately return a character
if one exists in the buffer. It can't wait for more data to get buffered if
there is already a buffered character, as that would be a backwards
incompatible change that would badly break line based protocols like FTP,
SMTP, and POP.

Up until now, `f.read(1)` has always read bytes from the underlying file
descriptor into the buffer until it has one full character, and immediately
returned it. And this is user facing behavior. Imagine an echo server that
reads 1 character at a time and echoes it back, forever. The client will
only ever send 1 character at a time, so if an eight bit locale encoding is
in use the client will only send one byte before waiting for a response. As
things stand today this works. If encoding detection were added and the
server's call to `f.read(1)` could decide it doesn't know how to decode the
first byte it gets and to block until more data comes in, that would be a
deadlock, since the client isn't sending more.

A typical buffer size is 4096 bytes, or more.


Sure, but that doesn't mean that much data is always available. If
something has written less than that, it's not reasonable to block until
more data can be buffered in places where up until now no blocking would
have occurred. Not least because no more data will necessarily ever come.

And if it were to instead make its decisions based on what has been
buffered already, without ever blocking, then the behavior becomes
nondeterministic: it could return a different character based on how much
data the OS returned in the first read syscall.

In any case, I believe the intention of this proposal is for *open*, not
> read, to perform the detection.


If that's the case, named pipes are a perfect example of why that's
impossible. It's perfectly normal to open a named pipe that contains no
data, and that won't until you trigger some action (say, spawning a child
process that will write to it). You can't auto detect the encoding of an
empty pipe, and you can't make open block until data arrives because it's
entirely possible data will never arrive if open blocks.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/GUL5VOYGDEE3MSC2KDWZ7RNDP2ZMJGAS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Adding `open_text()` builtin function. (relating to PEP 597)

2021-01-24 Thread Matt Wozniski
On Sun, Jan 24, 2021 at 9:53 AM <2qdxy4rzwzuui...@potatochowder.com> wrote:

> On 2021-01-25 at 00:29:41 +1100,
> Steven D'Aprano  wrote:
>
> > On Sat, Jan 23, 2021 at 03:24:12PM +, Barry Scott wrote:
> > > First problem I see is that the file may be a pipe and then you will
> block
> > > until you have enough data to do the auto detect.
> >
> > Can you use `open('filename')` to read a pipe?
>
> Yes.  Named pipes are files, at least on POSIX.
>
> And no.  Unnamed pipes are identified by OS-level file descriptors, so
> you can't open them with open('filename'),
>

The `open` function takes either a file path as a string, or a file
descriptor as an integer. So you can use `open` to read an unnamed pipe or
a socket.

> Is blocking a problem in practice? If you try to open a network file,
> > that could block too, if there are network issues. And since you're
> > likely to follow the open with a read, the read is likely to block. So
> > over all I don't think that blocking is an issue.
>
> If open blocks too many bytes, then my application never gets to respond
> unless enough data comes through the pipe.


It's possible to do a `f.read(1)` on a file opened in text mode. If the
first two bytes of the file are 0xC2 0x99, that's either ™ if the file is
UTF-8, or 슙 if the file is UTF-16BE, or 駂 if the file is UTF-16LE. And
`f.read(1)` needs to pick one of those and return it immediately. It can't
wait for more information. The contract of `read` is "Read from underlying
buffer until we have n characters or we hit EOF." A call to `read(1)`
cannot keep blocking after the first character was received to decide what
encoding to decode it as; that would be backwards incompatible, and it
might block forever if the sender only sends one character before waiting
for a response.

~Matt
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/BAUQXIMQP4F6DRFQCLJCDV3NUPCDCWSQ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Adding `open_text()` builtin function. (relating to PEP 597)

2021-01-23 Thread Matt Wozniski
On Sat, Jan 23, 2021 at 10:51 PM Chris Angelico  wrote:

> On Sun, Jan 24, 2021 at 2:46 PM Matt Wozniski  wrote:
> > 2. At the same time as the deprecation is announced, introduce a new
> __future__ import named "utf8_open" or something like that, to opt into the
> future behavior of `open` defaulting to utf-8-sig or utf-8 when opening a
> file in text mode and no explicit encoding is specified.
> >
> > I think a __future__ import solves the problem better than introducing a
> new function would.
>
> Note that, since this doesn't involve any language or syntax changes,
> a regular module import would work here - something like "from
> utf8mode import open", which would then shadow the builtin. Otherwise
> no change to your proposal - everything else works exactly the same
> way.
>

True - that's an even better idea. That even allows it to be wrapped in a
try/except ImportError, allowing someone to write code that's backwards
compatible to versions before the new function is introduced. Though it
does mean that the new function will need to stick around, even though it
will eventually be identical to the builtin open() function.

That would also allow the option of introducing a locale_open as well,
which would behave as though encoding=locale.getpreferredencoding(False) is
the default encoding for files opened in text mode. I can imagine putting
both functions in io, and allowing the user to silence the deprecation
warning by either opting into the new behavior:

from io import utf8_open as open

or explicitly declaring their desire for the legacy behavior:

from io import locale_open as open

~Matt
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ETJ6BADTVM5IICDLICGFIWQDMRDD34XS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Adding `open_text()` builtin function. (relating to PEP 597)

2021-01-23 Thread Matt Wozniski
On Sat, Jan 23, 2021 at 9:22 PM Inada Naoki  wrote:

> On Sun, Jan 24, 2021 at 10:17 AM Guido van Rossum 
> wrote:
> >
> > I have definitely seen BOMs written by Notepad on Windows 10.
> >
> > Why can’t the future be that open() in text mode guesses the encoding?
>
> I don't like guessing. As a Japanese, I have seen many mojibake caused
> by the wrong guess.
> I don't think guessing encoding is not a good part of reliable software.
>

I agree that guessing encodings in general is a bad idea and is an avenue
for subtle localization issues - bad things will happen when it guesses
wrong, and it will lead to code that works properly on the developer's
machine and fails for end users. It makes sense for a text editor to try to
guess, because showing the user something is better than nothing (and if it
guesses wrong the user can easily see that, and perhaps take some manual
action to correct it). It does not make sense for a programming language to
guess, because the user cannot easily detect or correct an incorrect guess,
and mistakes will tend to be propagated rather than caught.

On the other hand, if we add `open_utf8()`, it's easy to ignore BOM:
>

Rather than introducing a new `open_utf8` function, I'd suggest the
following:

1. Deprecate calling `open` for text mode (the default) unless an
`encoding=` is specified, and 3 years after deprecation change the default
encoding for `open` to "utf-8-sig" for reading and "utf-8" for writing (to
ignore a BOM if one exists when reading, but to not create a BOM when
writing).
2. At the same time as the deprecation is announced, introduce a new
__future__ import named "utf8_open" or something like that, to opt into the
future behavior of `open` defaulting to utf-8-sig or utf-8 when opening a
file in text mode and no explicit encoding is specified.

I think a __future__ import solves the problem better than introducing a
new function would. Users who already have a UTF-8 locale (the majority of
users on the majority of platforms) could simply turn on the new __future__
import in any files where they're calling open() with no change in
behavior, suppressing the deprecation warning. Users who have a non-UTF-8
locale and want to keep opening text files in that non-UTF-8 locale by
default can add encoding=locale.getpreferredencoding(False) to retain the
old behavior, suppressing the deprecation warning. And perhaps we could
make a shortcut for that, like encoding="locale".

~Matt
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/UACU527OLD6DLI5URTMALWVOSPEKKADA/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: pathlib enhancements

2020-11-22 Thread Matt Wozniski
> I suggest adding an "exist_ok" argument to all of these, with
> the default being "True" for backwards-compatibility.  This argument name
> is already in use elsewhere in pathlib.  If this is False and the file is
> not present, a "FileNotFoundError" is raised.

For Path.mkdir, exist_ok=True inhibits an error if a directory already exists.
You're proposing that for Path.is_dir, exist_ok=True should inhibit an error if
the directory does not exist.

A parameter to enable that behavior sounds reasonable to me, but it definitely
shouldn't have the name "exist_ok"; it does the opposite of what exist_ok does.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/YRSKCKBSTHKW2WPVUG2VQSCFV4M7X3O3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Global flag for whether a module is __main__

2020-11-12 Thread Matt Wozniski
> Also see PEP 299 for a slightly different approach.

Indeed. This proposal is more limited in scope than PEP 299, and appears
better from a backwards compatibility perspective.

> there's going to be two ways to spell the exact same thing. Both are
> still going to have to be memorized

Sure - though ideally over time the more verbose spelling will become
less idiomatic, and new users will encounter it less, and learners won't
need to memorize it because they won't need it and won't see it. Granted
there would be several years between the new feature being introduced
and when it is supported ubiquitously enough that the more verbose
spelling becomes rare.

> In a huge number of cases, it's actually better to separate out the
> library-like and script-like portions into separate files, or some
> other reorganization.

Other than the note about multiprocessing's spawn mode, I personally
agree with this. But still: this is extremely heavily used.  There are
almost 32 million hits for this idiom on GitHub:
https://github.com/search?q=%22__name__+%3D%3D+__main__&type=code

It seems that many people find value in this, and so lots of new users
continue to be exposed to it.

> Generally, I think scripts in installed packages are better handled
> via setuptools entrypoints nowadays.

In my experience, people reach the point of splitting their own scripts
out into private modules long before they reach the point of packaging
things. I propose this because I think it makes a gentler path for
learners. Pushing them in the direction of setuptools instead gives them
a brand new hurdle to overcome, one that's even more complex than the
rote memorization I'd like to see overcome.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/PDJC3G36SOD6M5TEOKWCCHUASFWTTA5H/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Global flag for whether a module is __main__

2020-11-12 Thread Matt Wozniski
Currently, the simplest and most idiomatic way to check whether a module was
run as a script rather than imported is:

if __name__ == "__main__":

People generally learn this by rote memorization, because users often want the
ability to add testing code or command line interfaces to their modules before
they understand enough about Python's data model to have any idea why this
works. Understanding what's actually happening requires you to know that:

  1. the script you ask Python to run is technically a module,
  2. every module has a unique name assigned to it,
  3. a module's `__name__` global stores this unique import name,
  4. and "__main__" is a magic name for the initial script's module.

A new (writable) global attribute called `__main__` would simplify this case,
allowing users to simply test

if __main__:

It would behave as though

__main__ = (__name__ == "__main__")

is executed in each module's namespace before executing it.

Because this would be writable, I don't see any backwards compatibility issues.
It wouldn't negatively impact any modules that might already be defining
`__main__` (for example, by doing `import __main__`). They'd simply redefine it
and go on using the `__main__` module as they always have. And a package with
a `__main__.py` does not have a `__main__` attribute.

It would be easier to teach, easier to learn, and easier to memorize, and
a nice simplification for users at the cost of only very slightly more
complexity in the data model.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/CUNE3Y2YSQQSTXFITSXKFRVPO6EM2DV7/
Code of Conduct: http://python.org/psf/codeofconduct/