[Python-ideas] Re: Auto assignment of attributes

2022-04-21 Thread Josh Rosenberg
On Wed, Apr 20, 2022 at 3:31 PM Pablo Alcain  wrote:

>
> About dataclasses, the point that Chris mentions, I think that they are in
> a different scope from this, since they do much more stuff. But, beyond
> this, a solution on the dataclass style would face a similar scenario:
> since the `__init__` is autogenerated, you would also be in a tight spot in
> the situation of "how would I bind only one of the items?". Again, now I'm
> talking about my experience, but I think that it's very hard to think that
> we could replace "classes" with "dataclasses" altogether. Here's an example
> of one of the (unexpected for me) things that happen when you try to do
> inheritance on dataclasses: https://peps.python.org/pep-0557/#inheritance.
>
> dataclasses, by default, do four things with the annotated fields defined
in the class:

1. Generate a __init__
2. Generate a reasonable __repr__
3. Generate a reasonable __eq__
4. Automatically support destructuring with match statements

And you can independently disable any/all of them with arguments to the
decorator. They *can* do much more, but I find it pretty unusual to *ever*
write a class that I wouldn't want most of those for. The __init__ it
generates is essentially automatically writing the boilerplate you're
trying to avoid, so it seems entirely reasonable to consider this the same
scope.

As for "how would I bind only one/some of the items?", dataclasses already
support this with dataclasses.InitVar and a custom __post_init__ method; so:

class MyClass:

def __init__(self, @a, @b, c):

... do something with c that doesn't just assign it as self.c...

where you directly move values from the a and b arguments to self.a and
self.b, but use c for some other purpose, is spelled (using typing.Any as a
placeholder annotation when there's no better annotation to use):

@dataclass
class MyClass:
a: Any
b: Any
c: InitVar[Any]
def __post_init__(self, c):
... do something with c that doesn't just assign it as self.c;
self.a and self.b already exist ...

The only name repeated is c (because you're not doing trivial assignment
with it), and it's perfectly readable. I'm really not seeing how this is
such an unwieldy solution that it's worth adding dedicated syntax to avoid
a pretty trivial level of boilerplate that is already avoidable with
dataclasses anyway.

-Josh
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/SSG2VXIMBQI5FC5S3G72GQ3XQTL6RSWS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Deprecate/change the behaviour of ~bool

2021-02-22 Thread Josh Rosenberg
> You could write it as a ^ (not b), as long as you don't mind it giving
back an integer rather than a bool.

Actually, that'll give back a bool if a is a bool (and (not b) produces a
bool); ^ is overridden for bool/bool operations and itself returns a bool.

On Tue, Feb 23, 2021 at 1:48 AM Chris Angelico  wrote:

> On Tue, Feb 23, 2021 at 12:14 PM Soni L.  wrote:
> >
> > Currently ~False is -1 and ~True is -2. Would be nicer if ~bool was the
> > same as not bool. Hopefully nobody actually relies on this but
> > nevertheless, it would be a backwards-incompatible change so the whole
> > deprecation warnings and whatnot would be required.
>
> There are quite a few ways in which bitwise operators are not the same
> as boolean operators. What would be the advantage of having them be
> the same in just this one case?
>
> > In particular, this is nice for xnor operator: a ^~ b. This currently
> > works on ints, but not on bools, while most other operators, including
> > xor, do successfully work on bools.
>
> You could write it as a ^ (not b), as long as you don't mind it giving
> back an integer rather than a bool. Fundamentally, you're doing
> bitwise operations on integers, and expecting them to behave as if
> they have only a single bit each, so another way to resolve this might
> be to mask it off at the end with "& 1".
>
> If this makes your code horrendously ugly, perhaps it would be better
> to create your own "one-bit integer" class, which responds to all
> bitwise operators by automatically masking down to one bit?
>
> ChrisA
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/6HLSLQMMKYNLSEQCIIDVKXIGV2TZFREE/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/QLYII65C56TRVBWPE3NRLC7IBEBECNPO/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Method to efficiently advance iterators for sequences that support random access

2020-10-06 Thread Josh Rosenberg
This:

def advance(it, n):
try:
return it[n:]
except TypeError:
return itertools.islice(it, n, None)

has the disadvantages of:

1. Requiring a temporary copy of the data sliced (if len(it) is 1_000_000,
and n is 500_000, you're stuck between 500_000 pointless __next__ calls, a
500_000 long temporary list or inefficiently looking up 500_000 elements by
index)
2. Not working with sequence iterators, only sequences themselves (so if
you want to read 1000, skip 1000, read 1000, skip 1000, over and over, you
can't just use a single stateful iterator for the whole process without
using the consume recipe and calling __next__ 1000 times for each skip)
3. Not working with all sorts of things where a dedicated advance
implementation would allow efficient skipping, e.g. the new insertion
ordered dicts when no deletions have been performed (a state we already
track for other reasons) can advance the iterator (for the raw dict, keys,
values or items) with the same efficiency as sequences do (and could save a
lot of work building and tossing tuples iterating .items() even if it can't
assume no dummy entries); the various itertools functions like
combinations, permutations and combinations_with_replacement (and in some
cases, product) could also be advanced efficiently without the expense of
generating the intermediate states.

Point is, the OP's case (a single sequence, advanced exactly once) is the
only one that implementation addresses, and it still has scaling issues
even then.

On Tue, Oct 6, 2020 at 6:21 PM Chris Angelico  wrote:

> On Wed, Oct 7, 2020 at 4:53 AM Christopher Barker 
> wrote:
> >
> > On Tue, Oct 6, 2020 at 10:28 AM Alex Hall  wrote:
> >>
> >>
> >> if you want to iterate through items N to the end, then how do you do
> that without either iterating through the first N and throwing them away,
> or making a slice, which copies the rest of the sequence?
> >>
> >> ```python
> >> for i in range(start, stop):
> >> x = lst[i]
> >> process(x)
> >> ```
> >
> >
> > well yes, of course. but it's kind of a principle in Python that you
> don't use indices to iterate through a sequence :-)
> >
> > And I still like the sequence view idea, because then the creating of
> the "subview" could be done outside the iteration code, which would not
> have to know that it was working with a sequence, rather than any arbitrary
> iterable.
> >
>
> Slices returning views would be elegant, but backward incompatible, so
> it might have to be something like lst.view[start:stop] to create a
> view. I'm not a fan of an "advance this iterator" feature (it'd need a
> dunder on the iterator to make it possible, and I don't see a lot of
> use for it), but perhaps some iterators could themselves support
> slicing. That would be completely backward compatible, since currently
> none of the iterators concerned have any support for indexing. If you
> had that, you could build your own advance() or consume() function,
> something like:
>
> def advance(it, n):
> try:
> return it[n:]
> except TypeError:
> return itertools.islice(it, n, None)
>
> Still unconvinced of its value, but a good use-case would sway me, and
> the C implementation for list_iterator wouldn't be too hard (it'd
> construct a new list_iterator with the same underlying list and an
> incremented index).
>
> ChrisA
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/T52Z25ZMCDVFEWY3W5WMIVO5XSCX46MN/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/RQD2XY5KP5TDU2MJXB5VX2D4IO4EOH5T/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Curious : Why staticmethod if classmethods can do everything a static method can?

2020-09-13 Thread Josh Rosenberg
The immediate use case I can think of this for is to make it possible to
just do:

__len__ = instancemethod(operator.attrgetter('_length'))
__hash__ = instancemethod(operator.attrgetter('_cached_hash'))

and stuff like that. Mostly just a minor performance optimization to avoid
the overhead of Python level function calls (and in the case of stuff like
__hash__, reduce the probability of the GIL releasing during a dict/set
operation, though you'd still need some way to move __eq__ to the C layer
to allow stuff like setdefault to be truly atomic).

This was possible in Python 2 with types.MethodType (since it could make
unbound methods of a class for you; unbound methods no longer exist). Now
the solution is... considerably uglier (requires ctypes):
https://stackoverflow.com/q/40120596/364696

On Sun, Sep 13, 2020 at 5:24 AM Steven D'Aprano  wrote:

> On Sun, Sep 13, 2020 at 12:32:54AM -0400, Random832 wrote:
>
> > This isn't what I was suggesting - I meant something like this:
> >
> > class instancemethod:
> > def __init__(self, wrapped):
> > self.wrapped = wrapped
> > def __get__(self, obj, objtype):
> > if obj is None: return self.wrapped
> > else: return MethodType(self.wrapped, obj)
> >
> > this wouldn't be useful for functions, but would give other callables
> > the same functionality as functions, automatically creating the bound
> > method object, e.g.:
> >
> > class D:
> > def __init__(self, obj, *args): ...
> >
> > class C:
> > foo = instancemethod(D)
>
> You want a method which does absolutely nothing at all but delegate to a
> class constructor or callable object (but not a function), with no
> docstring and no pre-processing of arguments or post-processing of the
> result.
>
> Seems like an awfully small niche for this to be in the stdlib.
>
> But having said that, I might have a use for that too. Except... I would
> need a docstring. And pre- and post-processing. Hmmm.
>
> *shrug*
>
> Seems to me that this might be useful in theory, but in practice we
> might never use it, preferring this instead:
>
> class C:
> def foo(self, *args):
> """Doc string."""
> return D(self, *args)
>
> with appropriate pre- and post-processing as needed.
>
> Interesting suggestion though, I may have to play around with it.
>
>
>
> --
> Steve
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/QKAA54IV2WDEEWN354PK3JPSPIAMRGD2/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/2YL4WQVRVOQ7RNINBALI7BJMBCJEPY3C/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: str(obj) not calling obj.__str__?

2020-02-22 Thread Josh Rosenberg
This is explained in "Special Method Lookup":
https://docs.python.org/3/reference/datamodel.html#special-method-lookup

Short version: For both correctness and performance, special methods (those
that begin and end with double underscores) are typically looked up on the
class, not the instance. If you want to override on a per-instance level,
have a non-special method that __str__ invokes, that can be overridden on a
per-instance basis.

On Sun, Feb 23, 2020 at 2:30 AM Jérôme Carretero 
wrote:

> Hello,
>
>
> I just noticed that calling `str(x)` is actually doing (in CPython
> `PyObject_Str`) `type(x).__str__(x)` rather than `x.__str__()`.
>
> Context: I wanted to override __str__ for certain objects in order to
> “see them better”.
>
> I'm wondering why not do `x.__str__()`
>
>
> Best regards,
>
> --
> Jérôme
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/2VSTPAVKCN6SNOPJG6MOSIP7SDK4W66W/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/KX7KLQXXN4K2VN2GPMBNHPAUBVAVCEV7/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Compound statement colon (Re: Re: Improve SyntaxError for obvious issue:)

2020-01-17 Thread Josh Rosenberg
The colon remains syntactically necessary in some cases, particularly to
disambiguate cases involving one-lining (no block involved). Stupid
example: If the colon is optional, what does:

if d +d

mean? Is it a test of the value of d, followed by invoking the unary plus
operator as a one-liner (that is, was it "if d: +d")? Or is it testing d +
d, and opening a block on the next line ("if d + d:")?

On Thu, Jan 16, 2020 at 6:15 PM Random832  wrote:

> On Tue, Jan 14, 2020, at 18:15, David Mertz wrote:
> > For what it's worth, after 20+ years of using Python, forgetting the
> > colon for blocks remains the most common error I make by a fairly wide
> > margin. Of course, once I see the error message—even being not all that
> > descriptive of the real issue—I immediately know what to fix too.
>
> What if the colon were made optional, with an eye to perhaps eventually no
> longer using it as the preferred style for new code?
>
> We had a post a while ago about the possibility of using the lack of a
> colon as an implicit line continuation (like with parentheses, e.g. "if
> a\nand b:", and this was (reasonably) rejected. But if a line beginning as
> a compound statement and ending without a colon is *never* going to have a
> valid meaning as something else... what's the point of the colon,
> otherwise? Seems like just grit on the screen.
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/J47TEY2KFGATFMQ7RSZJO7B4RV7KEYWJ/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/PW3QC5FRP4LWEXFTORG54XNTEJV2KKIS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: PEP 584: Add + and += operators to the built-in dict class.

2019-10-17 Thread Josh Rosenberg
Notes on new PEP:

The section on

{**d1, **d2}

claims "It is only guaranteed to work if the keys are all strings. If the
keys are not strings, it currently works in CPython, but it may not work
with other implementations, or future versions of CPython[2]."

That's 100% wrong. You're mixing up the unpacking generalizations for dict
literals with the limitations on keyword arguments to functions. {**d1,
**d2} is guaranteed to accept dicts with any keys, on any implementation of
Python. I suspect you may actually be talking about the behavior of
dict(d1, **d2), which behaved the way you described back in the Python 2
days, but that behavior has been long since disabled (in Python 3, if d2
keys are non-string, it immediately dies with a TypeError).

Less critical, but still wrong, is the contention that "collections.Counter
is a dict subclass that supports the + operator. There are no known
examples of people having performance issues due to adding large numbers of
Counters."

A couple examples of Counter merge issues:
https://bugs.python.org/issue36380   This is more about adding small
Counters to large Counters (not a problem that would directly affect dict
addition), regardless of the number of times you do it, but it gets *much*
worse when combining many small Counters.
https://stackoverflow.com/q/34407128/364696  Someone having the exact
problem of "performance issues due to adding large numbers of Counters."

On "lossiness", it says "Integer addition and concatenation are also lossy,
in the sense of not being reversable: you cannot get back the two addends
given only the sum. Two numbers add to give 356; what are the two numbers?"

The argument in the original thread was that, for c = a + b, on all
existing types in Python (modulo floating point imprecision issues),
knowing any two of a, b, or c was enough to determine the value of the
remaining variable; there were almost no cases (again, floating point
terribleness excepted) in which there existed some value d != a for which d
+ b == c, where dict addition breaks that pattern, however arbitrary some
people believe it to be. Only example I'm aware of where this is violated
is collections.Counter, as addition strips zero values from the result, so
Counter(a=0) and Counter() are equivalent in the end result of an add
(which is not necessarily a good thing, see
https://bugs.python.org/issue36380 , but we're stuck with it).

Lastly, it seems a tad odd to deny that the Zen encourages "one way to do
it" as if it were a calumny against Python invented by Perl folks (Perl
folks take pride in TIMTOWTDI, but it always felt like a bit of pride in
the perversity of it). Finely parsing the Zen to say it's only preferable,
not a rule, is kinda missing the point of the Zen: None of it is
prescriptive, it's a philosophy. Minimizing unnecessary "multiple ways to
do it" to avoid kitchen sink syndrome is a reasonable goal. It's not an
argument by itself if the new way to do it is strictly better, but
pretending Python doesn't set a higher bar for features which already exist
or are easily doable with existing tools is a little strange. Point is, if
you're going to mention this in the PEP at all, justify this as something
worth yet one more way to do it, don't argue that preferring one way to do
it isn't a goal of Python.


On Thu, Oct 17, 2019 at 5:35 AM Brandt Bucher 
wrote:

> At long last, Steven D'Aprano and I have pushed a second draft of PEP 584
> (dictionary addition):
>
> https://www.python.org/dev/peps/pep-0584/
>
> The accompanying reference implementation is on GitHub:
>
> https://github.com/brandtbucher/cpython/tree/addiction
>
> This new draft incorporates much of the feedback that we received during
> the first round of debate here on python-ideas. Most notably, the
> difference operators (-/-=) have been dropped from the proposal, and the
> implementations have been updated to use "new = self.copy();
> new.update(other)" semantics, rather than "new = type(self)();
> new.update(self); new.update(other)" as proposed before. It also includes
> more background information and summaries of major objections (with
> rebuttals).
>
> Please let us know what you think – we'd love to hear any *new* feedback
> that hasn't yet been addressed in the PEP or the related discussions it
> links to! We plan on updating the PEP at least once more before review.
>
> Thanks!
>
> Brandt
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/W2FCSC3JDA7NUBXAVSTVCUDEGAKWWPTH/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org

[Python-ideas] Re: For-expression/throwaway comprehension

2019-07-26 Thread Josh Rosenberg
On Fri, Jul 26, 2019 at 10:06 PM Kyle Stanley  wrote:

> From my understanding, consume() effectively provides the functionality the
> author was looking for. Also, between the options of `for _ in iter:` vs
> `colllections.deque(it, maxlen=0)`, how significant is the performance
> difference?
>
> I had assumed that the performance of `for _ in iter` would be
> significantly
> better, since due to the overhead cost of creating and filling a double
> ended queue,
> which provides optimization for insertion at the beginning and end.
> Wouldn't a one
> directional iterator provide better performance and have a lower memory
> cost if
> there is no modification required?
>

 collections.deque with an explicit maxlen of 0 doesn't actually populate
the queue at all; it has a special case for maxlen 0 that just pulls items
and immediately throws away the reference to what it pulled without storing
it in the deque at all. They split off that special case into its own
function at the C layer, consume_iterator:
https://github.com/python/cpython/blob/master/Modules/_collectionsmodule.c#L368

It's basically impossible to beat that in CPython in the general case. By
contrast, for _ in iterable would need to execute at least three bytecodes
per item (advance iterator, store, jump), which is *way* more expensive per
item. collections.deque(maxlen=0) can lose for small inputs (because it
does have to call a constructor, create a deque, then throw it away;
precreating a singleton for consume with `consumer =
collections.deque(maxlen=0).extend` can save on some of that though), but
for any meaningful length input, the reduced cost per item makes up for it.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/WN7GAF7O7UANIYR5JTVWV6SP34D6WEKU/
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] More alternate constructors for builtin type

2019-05-06 Thread Josh Rosenberg
bytes.ord is a bad name, given the behavior would be the opposite of ord
(ord converts length one str to int, not int to length one str).

PEP467 (currently deferred to 3.9 or later) does have proposals for this
case, either bytes.byte (old proposal:
https://legacy.python.org/dev/peps/pep-0467/#addition-of-explicit-single-byte-constructors
) or bytes.fromord/a top level built-in named bchr in the new version of
the PEP (
https://www.python.org/dev/peps/pep-0467/#addition-of-bchr-function-and-explicit-single-byte-constructors
). So if that's the way we want to go, we could just push forward on
PEP467. It's only a subset of Serhiy's broader proposal, though admittedly
one of the cases where the existing design is unusually weak and
improvements would better fill niches currently occupied by non-obvious
solutions.

On Tue, May 7, 2019 at 12:23 AM Steven D'Aprano  wrote:

> On Tue, May 07, 2019 at 09:54:03AM +1000, Cameron Simpson wrote:
> > On 06May2019 18:47, Antoine Pitrou  wrote:
> [...]
> > >The main constructors for built-in types are used so pervasively that
> > >there is no hope of actually removing such deprecated behavior.
> >
> > I don't find that compelling. I for one would welcome a small suite of
> > unambiguous factories that can't be misused. bytes() can easily be
> > misused by accident, introducing bugs and requiring debug work. I'd be
> > very happy for my own future code to be able to take advantage of hard
> > to misuse constructors.
>
> There is a difference between *adding* new constructor methods, and what
> Antoine is saying: that we cannot realistically remove existing uses of
> the current constructors.
>
> I think that Antoine is right: short of another major 2to3 backwards-
> incompatible version, the benefit of actually removing any of the
> built-in constructor behaviours is too small and the cost is too great.
> So I think removal of existing behaviour should be off the table.
>
> Or at least, taken on a case-by-case basis. Propose a specific API you
> want to remove, and we'll discuss that specific API.
>
>
> As for adding *new* constructors:
>
> > Of course we could all write tiny factories for these modes but (a) we'd
> > all have to write and debug them and (b) they'de all have different
> > spellings and signatures
>
> Probably because everyone will want them to do something different.
>
> We've already seen two different semantics for the same desired
> constructor call:
>
> bytes(10) -> b'10'# like str(), but returning bytes
> bytes(10) -> b'\x0A'  # like ord(), but returning a byte
>
> That suggests a possible pair of constructors:
>
> bytes.from_int(n)  -> equivalent to b'%d' % n
> bytes.ord(n)   -> equivalent to bytes((n,))
>
>
> The proposal in this thread seems to me to be a blanket call to add new
> constructors everywhere, and I don't think that's appropriate. I think
> that each proposed new constructor should live or die on its own merits.
> The two above for bytes seem like simple, obvious APIs that do something
> useful which is otherwise a small pain point. Both are syntactic sugar
> for something otherwise ugly or hard to discover.
>
> I think that, if somebody is willing to do the work (it can't be me,
> sorry) adding two new class methods to bytes for the above two cases
> would be a nett win, and they should be minor enough that it doesn't
> need a PEP.
>
> Thoughts?
>
>
>
> --
> Steven
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] More alternate constructors for builtin type

2019-05-06 Thread Josh Rosenberg
 The other bytes object constructor I often find myself in need of without
being able to remember how to do it is creating a a length 1 bytes object
from a known ordinal. The "obvious":

someordinal = ...
bytes(someordinal)

creates a zeroed bytes of that length, which is clearly wrong. I eventually
remember that wrapping it in a tuple (or list) before passing to the bytes
constructor works, but it's far from intuitive:

bytes((someordinal,))

Unfortunately, the most obvious name for the alternate constructor to fill
this niche is *also* bytes.fromint, which conflicts with Guido's use case.

On Mon, May 6, 2019 at 2:40 PM Guido van Rossum  wrote:

> 20-25 years ago this might have been a good idea. Unfortunately there's so
> much code (including well-publicized example code) that I'm not sure it's a
> good use of anyone's time to try and fix this.
>
> Exception: I am often in need of a constructor for a bytes object from an
> integer using the decimal representation, e.g. bytes.fromint(42) == b"42".
> (Especially when migrating code from Python 2, where I've found a lot of
> str(n) that cannot be translated to bytes(n) but must instead be written as
> b"%d" % n, which is ugly and unintuitive when coming from Python 2.)
>
> On Mon, May 6, 2019 at 2:50 AM Serhiy Storchaka 
> wrote:
>
>> Constructors for builtin types is too overloaded.
>>
>> For example, int constructor:
>>
>> * Converts a number (with truncation) to an integer.
>> * Parses human readable representation of integer from string or
>> bytes-like object. Optional base can be specified. Note that there is an
>> alternate constructor for converting bytes to int using other way:
>> int.frombytes().
>> * Without arguments returns 0.
>>
>> str constructor:
>>
>> * Converts an object to human-readable representation.
>> * Decodes a bytes-like object using the specified encoding.
>> * Without arguments returns an empty string.
>>
>> bytes constructor:
>>
>> * Converts a bytes-like object to a bytes object.
>> * Creates a bytes object from an iterable if integers.
>> * Encodes a string using the specified encoding. The same as str.encode().
>> * Creates a bytes object of the specified length consisting of zeros.
>> Equals to b'\0' * n.
>>
>> dict constructor:
>>
>> * Creates a dict from a mapping.
>> * Creates a dict from an iterable of key-value pairs.
>> * Without arguments returns an empty dict.
>>
>> The problem of supporting many different types of input is that we can
>> get wrong result instead of error, or that we can get error later, far
>> from the place where we handle input.
>>
>> For example, if our function should accept arbitrary bytes-like object,
>> and we call bytes() on the argument because we need the length and
>> indexing, and we pass an integer instead, we will get an unexpected
>> result. If our function expects a number, and we call int() on the
>> argument, we may prefer to get an error if pass a string.
>>
>> I suggest to add limited versions of constructors as named constructors:
>>
>> * int.parse() -- parses string or bytes to integer. I do not know
>> whether separate int.parsestr() and int.parsebytes() are needed. I think
>> round(), math.trunc(), math.floor() and math.ceil() are enough for lossy
>> converting numbers to integers. operator.index() should be used for
>> lossless conversion.
>> * bytes.frombuffer() -- accepts only bytes-like objects.
>> * bytes.fromvalues() -- accepts only an iterable if integers.
>> * dict.frommapping() -- accepts only mapping, but not key-value pairs.
>> Uses __iter__() instead of keys() for iterating keys, and can take an
>> optional iterable of keys. Equals to {k: m[k] for k in m} or {k: m[k]
>> for k in keys}.
>> * dict.fromitems() -- accepts only key-value pairs. Equals to {k: v for
>> k, v in iterable}.
>>
>> ___
>> Python-ideas mailing list
>> Python-ideas@python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
> --
> --Guido van Rossum (python.org/~guido)
> *Pronouns: he/him/his **(why is my pronoun here?)*
> 
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP: Dict addition and subtraction

2019-03-06 Thread Josh Rosenberg
On Wed, Mar 6, 2019 at 10:31 PM Greg Ewing 
wrote:

>
> You might as well say that using the + operator on vectors is
> nonsense, because len(v1 + v2) is not in general equal to
> len(v1) + len(v2).
>
> Yet mathematicians are quite happy to talk about "addition"
> of vectors.
>
>
Vectors addition is *actual* addition, not concatenation. You're so busy
loosening the definition of + as relates to , to make it make sense for
dicts that you've forgotten that + is, first and foremost, about addition
in the mathematical sense, where vector addition is just one type of
addition. Concatenation is already a minor abuse of +, but one commonly
accepted by programmers, thanks to it having some similarities to addition
and a single, unambiguous set of semantics to avoid confusion.

You're defending + on dicts because vector addition isn't concatenation
already, which only shows how muddled things get when you try to use + to
mean multiple concepts that are at best loosely related.

The closest I can come to a thorough definition of what + does in Python
(and most languages) right now is that:

1. Returns a new thing of the same type (or a shared coerced type for
number weirdness)
2. That combines the information of the input operands
3. Is associative ((a + b) + c produces the same thing as a + (b + c))
(modulo floating point weirdness)
4. Is "reversible": Knowing the end result and *one* of the inputs is
sufficient to determine the value of the other input; that is, for c = a +
b, knowing any two of a, b and c allows you to determine a single
unambiguous value for the remaining value (numeric coercion and floating
point weirdness make this not 100%, but you can at least know a value equal
to other value; e.g. for c = a + b, knowing c is 5.0 and a is 1.0 is
sufficient to say that b is equal to 4, even if it's not necessarily an int
or float). For numbers, reversal is done with -; for sequences, it's done
by slicing c using the length of a or b to "subtract" the elements that
came from a/b.
5. (Actual addition only) Is commutative (modulo floating point weirdness);
a + b == b + a
6. (Concatenation only) Is order preserving (really a natural consequence
of #4, but a property that people expect)

Note that these rules are consistent across most major languages that allow
+ to mean combine collections (the few that disagree, like Pascal, don't
support | as a union operator).

Concatenation is missing element #5, but otherwise aligns with actual
addition. dict merges (and set unions for that matter) violate #4 and #6;
for c = a + b, knowing c and either a or b still leaves a literally
infinite set of possible inputs for the other input (it's not infinite for
sets, where the options would be a subset of the result, but for dicts,
there would be no such limitation; keys from b could exist with any
possible value in a). dicts order preserving aspect *almost* satisfies #6,
but not quite (if 'x' comes after 'y' in b, there is no guarantee that it
will do so in c, because a gets first say on ordering, and b gets the final
word on value).

Allowing dicts to get involved in + means:

1. Fewer consistent rules apply to +;
2. The particular idiosyncrasies of Python dict ordering and "which value
wins" rules are now tied to +. for concatenation, there is only one set of
possible rules AFAICT so every language naturally agrees on behavior, but
dict merging obviously has many possible rules that would be unlikely to
match the exact rules of any other language except by coincidence). a
winning on order and b winning on value is a historical artifact of how
Python's dict developed; I doubt any other language would intentionally
choose to split responsibility like that if they weren't handcuffed by
history.

Again, there's nothing wrong with making dict merges easier. But it
shouldn't be done by (further) abusing +.

-Josh Rosenberg
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP: Dict addition and subtraction

2019-03-06 Thread Josh Rosenberg
On Wed, Mar 6, 2019 at 11:52 AM Rhodri James  wrote:

> On 06/03/2019 10:29, Ka-Ping Yee wrote:
> > len(dict1 + dict2) does not equal len(dict1) + len(dict2), so using the +
> > operator is nonsense.
>
> I'm sorry, but you're going to have to justify why this identity is
> important.  Making assumptions about length where any dictionary
> manipulations are concerned seems unwise to me, which makes a nonsense
> of your claim that this is nonsense :-)
>

It's not "nonsense" per se. If we were inventing programming languages in a
vacuum, you could say + can mean "arbitrary combination operator" and it
would be fine. But we're not in a vacuum; every major language that uses +
with general purpose containers uses it to mean element-wise addition or
concatenation, not just "merge". Concatenation is what imposes that
identity (and all the others people are defending, like no loss of input
values); you're taking a sequence of things, and shoving another sequence
of things on the end of it, preserving order and all values.

The argument here isn't that you *can't* make + do arbitrary merges that
don't adhere to these semantics. It's that adding yet a third meaning to +
(and it is a third meaning; it has no precedent in any existing type in
Python, nor in any other major language; even in the minor languages that
allow it, they use + for sets as well, so Python using + is making Python
itself internally inconsistent with the operators used for set), for
limited benefit.

- Josh Rosenberg
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP: Dict addition and subtraction

2019-03-05 Thread Josh Rosenberg
On Wed, Mar 6, 2019 at 12:08 AM Guido van Rossum  wrote:

> On Tue, Mar 5, 2019 at 3:50 PM Josh Rosenberg <
> shadowranger+pythonid...@gmail.com> wrote:
>
>>
>> On Tue, Mar 5, 2019 at 11:16 PM Steven D'Aprano 
>> wrote:
>>
>>> On Sun, Mar 03, 2019 at 09:28:30PM -0500, James Lu wrote:
>>>
>>> > I propose that the + sign merge two python dictionaries such that if
>>> > there are conflicting keys, a KeyError is thrown.
>>>
>>> This proposal is for a simple, operator-based equivalent to
>>> dict.update() which returns a new dict. dict.update has existed since
>>> Python 1.5 (something like a quarter of a century!) and never grown a
>>> "unique keys" version.
>>>
>>> I don't recall even seeing a request for such a feature. If such a
>>> unique keys version is useful, I don't expect it will be useful often.
>>>
>>
>> I have one argument in favor of such a feature: It preserves
>> concatenation semantics. + means one of two things in all code I've ever
>> seen (Python or otherwise):
>>
>> 1. Numeric addition (including element-wise numeric addition as in
>> Counter and numpy arrays)
>> 2. Concatenation (where the result preserves all elements, in order,
>> including, among other guarantees, that len(seq1) + len(seq2) == len(seq1 +
>> seq2))
>>
>> dict addition that didn't reject non-unique keys wouldn't fit *either*
>> pattern; the main proposal (making it equivalent to left.copy(), followed
>> by .update(right)) would have the left hand side would win on ordering, the
>> right hand side on values, and wouldn't preserve the length invariant of
>> concatenation. At least when repeated keys are rejected, most concatenation
>> invariants are preserved; order is all of the left elements followed by all
>> of the right, and no elements are lost.
>>
>
> I must by now have seen dozens of post complaining about this aspect of
> the proposal. I think this is just making up rules (e.g. "+ never loses
> information") to deal with an aspect of the design where a *choice* must be
> made. This may reflect the Zen of Python's "In the face of ambiguity,
> refuse the temptation to guess." But really, that's a pretty silly rule
> (truly, they aren't all winners). Good interface design constantly makes
> choices in ambiguous situations, because the alternative is constantly
> asking, and that's just annoying.
>
> We have a plethora of examples (in fact, almost all alternatives
> considered) of situations related to dict merging where a choice is made
> between conflicting values for a key, and it's always the value further to
> the right that wins: from d[k] = v (which overrides the value when k is
> already in the dict) to d1.update(d2) (which lets the values in d2 win),
> including the much lauded {**d1, **d2} and even plain {'a': 1, 'a': 2} has
> a well-defined meaning where the latter value wins.
>
> Yeah. And I'm fine with the behavior for update because the name itself is
descriptive; we're spelling out, in English, that we're update-ing the
thing it's called on, so it makes sense to have the thing we're sourcing
for updates take precedence.

Similarly, for dict literals (and by extension, unpacking), it's following
an existing Python convention which doesn't contradict anything else.

Overloading + lacks the clear descriptive aspect of update that describes
the goal of the operation, and contradicts conventions (in Python and
elsewhere) about how + works (addition or concatenation, and a lot of
people don't even like it doing the latter, though I'm not that pedantic).

A couple "rules" from C++ on overloading are "*Whenever the meaning of an
operator is not obviously clear and undisputed, it should not be
overloaded.* *Instead, provide a function with a well-chosen name.*"
and "*Always
stick to the operator’s well-known semantics".* (Source:
https://stackoverflow.com/a/4421708/364696 , though the principle is
restated in many other places). Obviously the C++ community isn't perfect
on this (see iostream and <> operators), but they're otherwise pretty
consistent. + means addition, and in many languages including C++ strings,
concatenation, but I don't know of any languages outside the "esoteric"
category that use it for things that are neither addition nor
concatenation. You've said you don't want the whole plethora of set-like
behaviors on dicts, but dicts are syntactically and semantically much more
like sets than sequences, and if you add + (with semantics differing from
both sets and sequences), the language becomes less consistent.

I'm not against making it easier to merge dictionaries. But people seem to
be

Re: [Python-ideas] PEP: Dict addition and subtraction

2019-03-05 Thread Josh Rosenberg
On Tue, Mar 5, 2019 at 11:16 PM Steven D'Aprano  wrote:

> On Sun, Mar 03, 2019 at 09:28:30PM -0500, James Lu wrote:
>
> > I propose that the + sign merge two python dictionaries such that if
> > there are conflicting keys, a KeyError is thrown.
>
> This proposal is for a simple, operator-based equivalent to
> dict.update() which returns a new dict. dict.update has existed since
> Python 1.5 (something like a quarter of a century!) and never grown a
> "unique keys" version.
>
> I don't recall even seeing a request for such a feature. If such a
> unique keys version is useful, I don't expect it will be useful often.
>
>
I have one argument in favor of such a feature: It preserves concatenation
semantics. + means one of two things in all code I've ever seen (Python or
otherwise):

1. Numeric addition (including element-wise numeric addition as in Counter
and numpy arrays)
2. Concatenation (where the result preserves all elements, in order,
including, among other guarantees, that len(seq1) + len(seq2) == len(seq1 +
seq2))

dict addition that didn't reject non-unique keys wouldn't fit *either*
pattern; the main proposal (making it equivalent to left.copy(), followed
by .update(right)) would have the left hand side would win on ordering, the
right hand side on values, and wouldn't preserve the length invariant of
concatenation. At least when repeated keys are rejected, most concatenation
invariants are preserved; order is all of the left elements followed by all
of the right, and no elements are lost.


>
> > This way, d1 + d2 isn’t just another obvious way to do {**d1, **d2}.
>
> One of the reasons for preferring + is that it is an obvious way to do
> something very common, while {**d1, **d2} is as far from obvious as you
> can get without becoming APL or Perl :-)
>
>
>From the moment PEP 448 published, I've been using unpacking as a more
composable/efficient form of concatenation, merging, etc. I'm sorry you
don't find it obvious, but a couple e-mails back you said:

"The Zen's prohibition against guessing in the face of ambiguity does not
mean that we must not add a feature to the language that requires the
user to learn what it does first."

Learning to use the unpacking syntax in the case of function calls is
necessary for tons of stuff (writing general function decorators, handling
initialization in class hierarchies, etc.), and as PEP 448 is titled, this
is just a generalization combining the features of unpacking arguments with
collection literals.

> The second syntax makes it clear that a new dictionary is being
> > constructed and that d2 overrides keys from d1.
>
> Only because you have learned the rule that {**d, **e) means to
> construct a new dict by merging, with the rule that in the event of
> duplicate keys, the last key seen wins. If you hadn't learned that rule,
> there is nothing in the syntax which would tell you the behaviour. We
> could have chosen any rule we liked:
>
>
No, because we learned the general rule for dict literals that {'a': 1,
'a': 2} produces {'a': 2}; the unpacking generalizations were very good
about adhering to the existing rules, so it was basically zero learning
curve if you already knew dict literal rules and less general unpacking
rules. The only part to "learn" is that when there is a conflict between
dict literal rules and function call rules, dict literal rules win.

To be clear: I'm not supporting + as raising error on non-unique keys. Even
if it makes dict + dict adhere to the rules of concatenation, I don't think
it's a common or useful functionality. My order of preferences is roughly:

1. Do nothing (even if you don't like {**d1, **d2}, .copy() followed by
.update() is obvious, and we don't need more than one way to do it)
2. Add a new method to dict, e.g. dict.merge (whether it's a class method
or an instance method is irrelevant to me)
3. Use | (because dicts are *far* more like sets than they are like
sequences, and the semi-lossy rules of unioning make more sense there); it
would also make - make sense, since + is only matched by - in numeric
contexts; on collections, | and - are paired. And I consider the -
functionality the most useful part of this whole proposal (because I *have*
wanted to drop a collection of known blacklisted keys from a dict and while
it's obvious you can do it by looping, I always wanted to be able to do
something like d1.keys() -= badkeys, and remain disappointed nothing like
it is available)

-Josh Rosenberg
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/