On Thu, Aug 27, 2020 at 03:28:07AM +1200, Greg Ewing wrote:
> On 27/08/20 12:53 am, Steven D'Aprano wrote:
> 
> >Presumably the below method is provided by `object`. If not, what
> >provides it?
> >
> >>     def __getindex__(self, *args, **kwds):
> >>         if kwds:
> >>             raise TypeError("Object does not support keyword indexes")
> >>         if not args:
> >>             raise TypeError("Object does not accept empty indexes")
> 
> It's not literally a method, I just wrote it like that to
> illustrate the semantics. It would be done by the interpreter
> as part of the process of translating indexing operations into
> dunder calls.

Okay, so similar to my suggestion that this would be better implemented 
in the byte-code rather than as a method of object.


> >What is your reasoning behind prohibiting keywords, when we're right in
> >the middle of a discussion over PEP 474 which aims to allow keywords?
> 
> We're falling back to __getitem__ here, which doesn't currently allow
> keywords, and would stay that way. The point of this proposal is to
> not change __getitem__. If you want to get keywords, you provide
> __getindex__.

Point of order: **the getitem dunder** already allows keywords, and 
always has, and always will. It's just a method.

It's the **subscript (pseudo-)operator** which doesn't support keywords. 
This is a syntax limitation, not a limitation of the dunder method. If 
the interpreter supports the syntax, it's neither here nor there to the 
interpreter whether it calls `__getitem__` or `__getindex__` or 
`__my_hovercraft_is_full_of_eels__` for that matter.

So if you want to accept keywords, you just add keywords to your 
existing dunder method. If you don't want them, don't add them. We don't 
need a new dunder just for the sake of keywords.


> >This is going to slow down the most common cases of subscripting: the
> >interpreter has to follow the entire MRO to find `__getindex__` in
> >object, which then dispatches to the `__getitem__` method.
> 
> No, it would be done by checking type slots, no MRO search involved.

Okay, I didn't think of type slots.

But type slots are expensive in other ways. Every new type slot 
increases the size of objects, and I've seen proposals for new dunders 
knocked back for that reason, so presumably the people care about the 
C-level care about the increase in memory and complexity from adding new 
type slots.

Looking here:

https://docs.python.org/3/c-api/typeobj.html

I see that `__setitem__` and `__delitem__` are handled by the same type 
slot, so presumably the triplet get-, set- and del- index type slots 
would be the same. Still, that means adding two new type slots to 
both sequence and mapping objects.

(I assume that it's not *all* objects that carry these slots. If it is 
all objects, that makes the cost of this proposal correspondingly 
higher.)

So if I understand it correctly, we have some choices when it comes to 
sequence/mapping types:

1. the existing `__*item__` methods keep their slots, and new `__*index__` 
   slots are created, which makes both the item and index dunders fast 
   but increases the size of every object which uses any of those 
   methods;

2. the existing item slots stay as they are, there are no new index 
   slots, which keeps objects the same size but the new index protocol 
   will be slow;

3. the existing item slots are repurposed for index, which keeps objects 
   the same size, and the new protocol fast, but makes calling item 
   dunders slow;

4. and just for completion because of course this is not going to 
   happen, we could remove the existing item slots and not add index 
   slots so that both protocols are equally slow;

5. alternatively, we could leave the existing C-level sequence and 
   mapping objects alone, and create *four* brand new C-level objects:

   - a sequence object that supports only the new index protocol;
   - a sequence object that supports both index and item protocols;
   - and likewise two new mapping objects.

Do I understand this correctly? Have I missed any options?

Assuming I do, 4 is never going to happen and each of the others have 
some fairly large disadvantages and costs in speed, memory, and 
complexity. Without a correspondingly large advantage to this new 
`__*index__` protocol, I don't see this going anywhere.



> >In your earlier statement, you said that it would be possible for
> >subscripting to mean something different depending on whether the
> >comma-separated subscripts had parentheses around them or not:
> >
> >     obj[(2, 3)]
> >     obj[2, 3]
> >
> >How does that happen?
> 
> If the object has a __getindex__ method, it gets whatever is between
> the [] the same way as a normal function call, so comma-separated
> expressions become separate positional arguments.

The compiler doesn't know whether the object has the `__getindex__` 
method at compile time, so any process that relies on that knowledge 
isn't going to work. There can only be one set of parsing rules that 
applies regardless of whether the object defines the item dunders or the 
index dunders or neither.

Right now, if you call `obj[1,]` the dunder receives the tuple (1,) as 
index. If it were treated as function call syntax, that would receive a 
single argument 1 instead. It were treated as a tuple, as required by 
backwards compatibility, that's an inconsistency between subscripts and 
function calls, and the whole point of your proposal is to remove that 
inconsistency.

Rock (you are here) Hard Place.

Do you break existing code, or fail in your effort to remove the 
inconsistencies?

I don't care two hoots about the inconsistencies, I just want to use 
keywords in my subscripts, so for me the answer is obvious: keep 
backwards compatibility, and there is no need to add new dunders to only 
partially fix something which isn't a problem.


Another inconsistency: function call syntax looks like this:

    call ::=  primary "(" [argument_list [","] | comprehension] ")"

which means we can write generator comprehensions inside function 
calls without additional parentheses:

    func(expr for x in items)  # unambiguously a generator comprehension

This is nice because the round brackets of the function call match the 
round brackets used in generator comprehensions, so it is perfctly 
consistent and unambiguous.

But if you do that in a subscript, we currently get a syntax error. If 
we allowed it, it would be pretty weird for the square brackets of the 
subscript to create a *generator* comprehension rather than a list 
comprehension. But we surely don't want a list comprehension by default:

    obj[(expr for x in items)]  # unambiguously a generator comprehension
    obj[[expr for x in items]]  # unambiguously a list comprehension
    obj[expr for x in items]    # and this is... what?

It looks like it should be a list comprehension (it has square brackets, 
right?) but we probably don't want it to be a list comp, we'd prefer it 
to be a generator comp because they are more flexible. Only that would 
look weird and would lead to all sorts of questions about why list 
comprehension syntax sometimes gives a list and sometimes a generator.

But if we leave it out, we have an inconsistency between subscripting 
and function calls, and for those who are motivated by removing that 
inconsistency, that's a Bad Thing.

For me, again, the answer is obvious: we don't have to support this for 
the sake of consistency, because consistency isn't the motivation. I 
just want keywords.



-- 
Steve
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/A4CVFW6AUBXFA3Y62RON63ZUSRQ2VZCX/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to