[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

Andrew Barnert via Python-ideas Sat, 09 May 2020 21:14:08 -0700

On May 9, 2020, at 19:43, Christopher Barker <[email protected]> wrote:
> 
> On Sat, May 9, 2020 at 1:03 PM Andrew Barnert <[email protected]> wrote:
> > https://github.com/PythonCHB/islice-pep/blob/master/pep-xxx-islice.rst
> 
> I haven’t read the whole thing yet, but one thing immediately jumped out at 
> me:
> 
> > and methods on containers, such as dict.keys return iterators in Python 3, 
> 
> No they don’t. They return views—objects that are collections in their own 
> right (in particular, they’re not one-shot; they can be iterated over and 
> over) but just delegate to another object rather than storing the data.
> 
> Thanks -- that's that kind of thing that led me to say that this is probably 
> not ready for a PEP.
> 
> but I don't think that invalidates the idea at all -- there is debate about 
> what an "islice" should return, but an iterable view would be a good option.


I don’t think it invalidates the basic idea at all, just that it suggests the 
design should be different.

Originally, dict returned lists for keys, values, and items. In 2.2, iterator 
variants were added. In 3.0, the list and iterator variants were both replaced 
with view versions, which were enough of an improvement that they were 
backported to 2.x. Because a view does cover almost all of the uses of both a 
sequence copy and an iterator. And I think the same is true here.

> I'm inclined to think that it would be a bad idea to have it return a full 
> sequence view object, and not sure it should do anything other than be 
> iterable.

Why? What’s the downside to being able to do more with them for the same 
performance cost and only a little more up-front design work?

> > And this is important here, because a view is what you ideally _want_. The 
> > reason range, key view, etc. are views rather than iterators isn’t that 
> > it’s easier to implement or explain or anything, it’s that it’s a little 
> > harder to implement and explain but so much more useful that it’s worth it. 
> > It’s something people take advantage of all the time in real code.
> 
> Maybe -- but "all the time?" I'd vernture to say that absolutiely the most 
> comon thing done with, e.g. dict.keys() is to iterate over it.

Really? When I just want to iterate over a dict’s keys, I iterate the dict 
itself. 

> > For prior art specifically on slicing as a view, rather than just views in 
> > general, see memoryview (which only works on buffers, not all sequences) 
> > and NumPy (which is weird in many ways, but people rely on slicing giving 
> > you a storage-sharing view)
> 
> I am a long-time numpy user, and yes, I very much take advantage of the 
> memory sharing view.
> 
> But I do not think that that would be a good idea for the standard libary. 
> numpy slices return a full-fledged numpy array, which shares a data view with 
> the it's "host" -- this is really helpful for performance reasons -- moving 
> large blocks of data around is expensive, but it's also pretty confusing. And 
> it would be a lot more problematic with, e.g. lists, as the underlying buffer 
> can be reallocated -- numpy arrays are mutable, but not re-sizable, once 
> you've made one its data buffer does not change.

That’s no more of a problem for a list slice view than for any of the existing 
views. The simplest way to implement a view is to keep a reference to the 
underlying object and delegate to it, which is effectively what the dict views 
do.

(Well, did from 2.x to 3.5. The dict improvements in 3.6 opened up an 
optimization opportunity, because in the split layout a dict is effectively a 
wrapper around a keys view and a separate table, so the keys view can refer 
directly to that thing that already exists. But that isn’t relevant here.)

(You _could_ instead refuse to allow expanding a sequence when there’s a live 
view, as bytearray does with memoryview, but I don’t think that’s necessary 
here. It’s only needed there a consequence of the fact that the buffer protocol 
is provided in C rather than in Python. For a slice view, it would just make 
things more complicated and less functional for no good reason.)

> > But just replacing islice is a much simpler task (mainly because the input 
> > has to be a sequence and the output is always a sequence, so the only 
> > complexity that arises is whether you want to allow mutable views into 
> > mutable sequences), and it may well be useful on its own.
> 
> Agreed. And while yes, dict_keys and friends are not JUST iterartors, they 
> also aren't very functional views, either. They are not sequences, 

That’s not true. They are very functional—as functional as reasonably makes 
sense. The only reason they’re not Sequences is that they’re views on dicts, so 
indexing makes little sense, but set operations do—and they are in fact Sets. 
(Except for values.)

> certainly not mutabe sequences.

Well, yes, but mutating a dict through its views wouldn’t make sense in the 
first place:

    >>> d = {1: 2}
    >>> k = dict.keys()
    >>> k |= 3

You’ve told it to add an item with key 3 without telling it what the value is, 
and there’s no reasonable thing that could mean. A slice view would have no 
such problem, so mutation is sensible.

That being said, mutation could easily be added later without breaking 
anything, and it does raise some nontrivial design issues (most obviously, 
notice that my implementation only allows non-size-changing mutations, because 
otherwise you have to decide whether it remains a view over seq[3:5] or becomes 
a view over seq[3:6]; all three options seem reasonable there, so I just went 
with the simplest, and have no good argument for why it’s the best…). So I 
think it might be better to leave mutation out of the original version anyway 
unless someone has a need to it (at which point we can use the examples to 
think through the best answers to the design issues).

> And:
> 
> > (in particular, they’re not one-shot; they can be iterated over and over) 
> 
> yes, but they are only a single iterator -- if you call iter() on one you 
> always get the same one back, and it's state is preserved.

No, that’s not true. Each call to iter() returns a completely independent 
iterator each time, with its own independent state that starts at the head of 
the view. It works exactly the same way as a set, a tuple, or any other normal 
collection:

    >>> d = {1: 2, 3: 4, 5: 6
    >>> k = d.keys()
    >>> i1 = iter(k)
    >>> next(i1)
    1
    >>> i2 = iter(k)
    >>> next(i2)
    1
    >>> list(i1)
    [3, 5]
    >>> next(i2)
    3

(This was a bit harder to see, and to explain, before 3.6, because that order 
was intentionally arbitrary, but it was guaranteed to be consistent until you 
mutated the dict.)

Also notice, while the views’ iterators are just like dict iterators, and list 
iterators for that matter, in that they can’t handle the dict being resized 
during iteration, the views themselves have no such trouble:

    >>> d[7] = 8
    >>> next(i1)
    RuntimeError: dictionary changed size during iteration
    >>> i3 = iter(k)
    >>> next(i3)
    1

Basically, views are not like iterators at all, except in that they save time 
and space by being lazy.

> So yes, you can iterate over more than once, but iter() only resets after 
> it's been exhausted before.

Such a resettable-iterator thing (which would have some precedent in file 
objects, I suppose) would actually be harder to Implement, on top of being less 
powerful and potentially confusing. And the same is true for slices.

> In short -- not having thought about it deeply at all, but I'm thinking that 
> making an SliceIterator very similar to dict_keys and friends would make a 
> lot of sense.

Yes, as long that means being a full-featured normal collection (in this case a 
Sequence rather than a Set), not a resettable iterator.

_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/POCUH7IXXY4HB6GJ2KZVBXV3AF4AUMA6/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

Reply via email to