[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

Andrew Barnert via Python-ideas Sun, 10 May 2020 12:50:17 -0700

On May 10, 2020, at 11:09, Christopher Barker <[email protected]> wrote:

Is there any way you can fix the reply quoting on your mail client, or manually 
work around it? I keep reading paragraphs and saying “why is he saying the same 
thing I said” only to realize that you’re not, that’s just a quote from me that 
isn’t marked, up until the last line where it isn’t…

> On Sat, May 9, 2020 at 9:11 PM Andrew Barnert <[email protected]> wrote:
> 
> > That’s no more of a problem for a list slice view than for any of the 
> > existing views. The simplest way to implement a view is to keep a reference 
> > to the underlying object and delegate to it, which is effectively what the 
> > dict views do.
> 
> Fair enough. Though you still could get potentially surprising behavior if 
> the original sequence's length is changed.

I don’t think it’s surprising. When you go out of your way to ask for a dynamic 
view instead of the default snapshot copy, and then you change the list, you’d 
expect the view to change.

If you don’t keep views around, because you’re only using them for more 
efficient one-shot iteration, you might never think about that, but then you’ll 
never notice it to be surprised by it. The dynamic behavior of dict views 
presumably hasn’t ever surprised you in the 12 years it’s worked that way.

> And you probably don't want to lock the "host" anyway -- that could be very 
> confusing if the view is kept all be somewhere far from the code trying to 
> change the sequence. 

Yes. I think memoryview’s locking behavior is a special case, not something 
we’d want to emulate here. I’m guessing many people just never use memoryview 
at all, but when you do, you’re generally thinking about raw buffers rather 
than abstract behavior. (It’s right there in the name…) And when you need 
something more featureful than an invisible hard lock on the host, it’s time 
for numpy. :)

> I'm still a bit confused about what a dict.* view actually is

The docs explain it reasonably well. See 
https://docs.python.org/3/glossary.html#term-dictionary-view for the basic 
idea,  https://docs.python.org/3/library/stdtypes.html#dict-views for the 
details on the concrete types, and I think the relevant ABCs and data model 
entries are linked from there.

> -- for instance, a dict_keys object pretty much acts like a set, but it isn't 
> a subclass of set, and it has an isdisjoint() method, but not .union or any 
> of the other set methods. But it does have what at a glance looks like pretty 
> complete set of dunders....

The point of collections.abc.Set, and ABCs jn general, and the whole concept of 
protocols, is that the set protocol can be implemented by different concrete 
types—set, frozenset, dict_keys, third-party types like 
sortedcontainers.SortedSet or pyobjc.Foundation.NSSet, etc.—that are generally 
completely unrelated to each other, and implemented in different ways—a 
dict_keys is a link to the keys table in a dict somewhere, a set or frozenset 
has its own hash table, a SortedSet has a wide-B-tree-like structure, an NSSet 
is a proxy to an ObjC object, etc. if they all had to be subclasses of set, 
they’d be carrying around a set’s hash table but never using it; they’d have to 
be careful to override every method to make sure it never accidentally got used 
(and what would frozenset or dict_keys override add with?), etc.

And if you look at the ABC, union isn’t part of the protocol, but __or__ is, 
and so on.

> Anyway, a Sequence view is simpler, because it could probably simply be an 
> immutable sequence -- not much need for contemplating every bit of the API.

It’s really the same thing, it’s just the Sequence protocol rather than the Set 
protocol.

If anything, it’s _less_ simple, because for sequences you have to decide 
whether indexing should work with negative indices, extended slices, etc., 
which the protocol is silent about. But the answer there is pretty easy—unless 
there’s a good reason not to support those things, you want to support them. 
(The only open question is when you’re designing a sequence that you expect to 
be subclassed, but I don’t think we’re designing for subclassing here.)

> I do see a possible objection here though. Making a small view of a large 
> sequence would keep that sequence alive, which could be a memory issue. Which 
> is one reason why sliced don't do that by default.

Yes. When you just want to iterate something once, non-lazily, you don’t care 
whether it’s a view of a snapshot, but when you want to keep it around, you do 
care, and you have to decide which one you want. So we certainly can’t change 
the default; that would be a huge but subtle change that would break all kinds 
of code.

But I don’t think it’s a problem for offering an alternative that people have 
to explicitly ask for.

Also, notice that this is true for all of the existing views, and none of them 
try to be un-featureful to avoid it.

> And it could simply be a buyer beware issue. But the more featureful you make 
> a view, the more likely it is that they will get used and passed around and 
> kept alive without the programmer realizing the implications of that.

I think it is worth mentioning in the docs.

> Now I need to think about how to write this all up -- which is why I wasn't 
> sure I was ready to bring this up bu now I have, so more to do!

Feel free to borrow whatever you want (and discard whatever you don’t want) 
from the slices repo I posted. (It’s MIT-licensed, but I can relicense it to 
remove the copyright notice if you want.)

I think the biggest question is actually the API. Making this a function (or a 
class that most people think of as a function, like most of itertools) is easy, 
but as soon as you say it should be a method or property of sequences, that’s 
trickier. You can add it to all the builtin sequence types, but should other 
sequences in the stdlib have it? Should Sequence provide it as a mixin? Should 
it be part of the sequence protocol, and therefore checked by Sequence as an 
ABC (even though that could be a breaking change)?

> PR's accepted on my draft!
> 
> https://github.com/PythonCHB/islice-pep/blob/master/islice.py
> 
>     >>> d[7] = 8
>     >>> next(i1)
>     RuntimeError: dictionary changed size during iteration
>     >>> i3 = iter(k)
>     >>> next(i3)
> 
> That's probably a feature we'd want to emulate.
> 
> > Basically, views are not like iterators at all, except in that they save 
> > time and space by being lazy.
> 
> Well, this is a vocabulary issue -- an "iterable" and "iterator" is anything 
> that follows the protocol, so yes, they very much ARE iterables (and 
> iterators) even though they also have some additional behavior.

> Which is why it's not wrong to say that a range object is an iterator, but is 
> IS wrong to say that it's Just and iterator ...

No, they’re not iterators. You’ve got it backward—every iterator is an 
iterable, but most iterables are not iterators.

An iterator is an iterable that has a __next__ method and returns self from 
__iter__. List, tuples, dicts, etc. are not iterators, and neither are ranges, 
or the dict views.

You can test this easily:

    >>> isinstance(range(10), collections.abc.Iterator)
    False

A lot of people get this confused. I think the problem is that we don’t have a 
word for “iterable that’s not an iterator”, or for the refinement “iterable 
that’s not an iterator and is reusable”, much less the further refinement 
“iterable that’s reusable, providing a distinct iterator that starts from the 
head each time, and allows multiple such iterators in parallel”. But that last 
thing is exactly the behavior you expect from “things like list, dict, etc.”, 
and it’s hard to explain, and therefore hard to document. The closest word for 
that is “collection”, but Collection is also a protocol that adds being a 
Container and being Sized on top of being Iterable, so it’s misleading unless 
you’re really careful. So the docs don’t clearly tell people that range, 
dict_keys, etc. are exactly that “like list, dict, etc.” thing, so people are 
confused about what they are. People know they’re lazy, they know iterators are 
lazy, so they think they’re a kind of iterator, and the docs don’t ever make it 
clear why that’s wrong.

> > Such a resettable-iterator thing (which would have some precedent in file 
> > objects, I suppose) would actually be harder to Implement, on top of being 
> > less powerful and potentially confusing. And the same is true for slices.
> 
> but the dict_keys iterator does seem to do that ...
> 
> In [48]: dk                                                                   
>  
> Out[48]: dict_keys(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'])
> 
> In [49]: list(dk)                                                             
>  
> Out[49]: ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
> 
> In [50]: list(dk)                                                             
>  
> Out[50]: ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

You just picked an example where “resettable iterator” and “collection” would 
do the same thing. Try the same test with list and it also passes, because list 
is a collection. You can only distinguish the two cases by partially using an 
iterator and then asking for another one. And if you do that, you will see 
that, just like list, dict_keys gives you a brand new, completely independent 
iterator, initialized from the start, every time you call iter() on it. 
Because, like list, dict_keys is a collection, not an iterator. There are no 
types in Python’s stdlib that have the behavior you suggested of being an 
iterator but resetting each time you iterate. (The closest thing is file 
objects, but you have to manually reset them with seek(0).)

_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/4Y4I3N5Z6VDHE2ZTVTN3OHF5ED74GWWN/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

Reply via email to