On May 10, 2020, at 11:09, Christopher Barker <python...@gmail.com> wrote:
Is there any way you can fix the reply quoting on your mail client, or manually
work around it? I keep reading paragraphs and saying “why is he saying the same
thing I said” only to realize that you’re not, that’s just a quote from me that
isn’t marked, up until the last line where it isn’t…
> On Sat, May 9, 2020 at 9:11 PM Andrew Barnert <abarn...@yahoo.com> wrote:
>
> > That’s no more of a problem for a list slice view than for any of the
> > existing views. The simplest way to implement a view is to keep a reference
> > to the underlying object and delegate to it, which is effectively what the
> > dict views do.
>
> Fair enough. Though you still could get potentially surprising behavior if
> the original sequence's length is changed.
I don’t think it’s surprising. When you go out of your way to ask for a dynamic
view instead of the default snapshot copy, and then you change the list, you’d
expect the view to change.
If you don’t keep views around, because you’re only using them for more
efficient one-shot iteration, you might never think about that, but then you’ll
never notice it to be surprised by it. The dynamic behavior of dict views
presumably hasn’t ever surprised you in the 12 years it’s worked that way.
> And you probably don't want to lock the "host" anyway -- that could be very
> confusing if the view is kept all be somewhere far from the code trying to
> change the sequence.
Yes. I think memoryview’s locking behavior is a special case, not something
we’d want to emulate here. I’m guessing many people just never use memoryview
at all, but when you do, you’re generally thinking about raw buffers rather
than abstract behavior. (It’s right there in the name…) And when you need
something more featureful than an invisible hard lock on the host, it’s time
for numpy. :)
> I'm still a bit confused about what a dict.* view actually is
The docs explain it reasonably well. See
https://docs.python.org/3/glossary.html#term-dictionary-view for the basic
idea, https://docs.python.org/3/library/stdtypes.html#dict-views for the
details on the concrete types, and I think the relevant ABCs and data model
entries are linked from there.
> -- for instance, a dict_keys object pretty much acts like a set, but it isn't
> a subclass of set, and it has an isdisjoint() method, but not .union or any
> of the other set methods. But it does have what at a glance looks like pretty
> complete set of dunders....
The point of collections.abc.Set, and ABCs jn general, and the whole concept of
protocols, is that the set protocol can be implemented by different concrete
types—set, frozenset, dict_keys, third-party types like
sortedcontainers.SortedSet or pyobjc.Foundation.NSSet, etc.—that are generally
completely unrelated to each other, and implemented in different ways—a
dict_keys is a link to the keys table in a dict somewhere, a set or frozenset
has its own hash table, a SortedSet has a wide-B-tree-like structure, an NSSet
is a proxy to an ObjC object, etc. if they all had to be subclasses of set,
they’d be carrying around a set’s hash table but never using it; they’d have to
be careful to override every method to make sure it never accidentally got used
(and what would frozenset or dict_keys override add with?), etc.
And if you look at the ABC, union isn’t part of the protocol, but __or__ is,
and so on.
> Anyway, a Sequence view is simpler, because it could probably simply be an
> immutable sequence -- not much need for contemplating every bit of the API.
It’s really the same thing, it’s just the Sequence protocol rather than the Set
protocol.
If anything, it’s _less_ simple, because for sequences you have to decide
whether indexing should work with negative indices, extended slices, etc.,
which the protocol is silent about. But the answer there is pretty easy—unless
there’s a good reason not to support those things, you want to support them.
(The only open question is when you’re designing a sequence that you expect to
be subclassed, but I don’t think we’re designing for subclassing here.)
> I do see a possible objection here though. Making a small view of a large
> sequence would keep that sequence alive, which could be a memory issue. Which
> is one reason why sliced don't do that by default.
Yes. When you just want to iterate something once, non-lazily, you don’t care
whether it’s a view of a snapshot, but when you want to keep it around, you do
care, and you have to decide which one you want. So we certainly can’t change
the default; that would be a huge but subtle change that would break all kinds
of code.
But I don’t think it’s a problem for offering an alternative that people have
to explicitly ask for.
Also, notice that this is true for all of the existing views, and none of them
try to be un-featureful to avoid it.
> And it could simply be a buyer beware issue. But the more featureful you make
> a view, the more likely it is that they will get used and passed around and
> kept alive without the programmer realizing the implications of that.
I think it is worth mentioning in the docs.
> Now I need to think about how to write this all up -- which is why I wasn't
> sure I was ready to bring this up bu now I have, so more to do!
Feel free to borrow whatever you want (and discard whatever you don’t want)
from the slices repo I posted. (It’s MIT-licensed, but I can relicense it to
remove the copyright notice if you want.)
I think the biggest question is actually the API. Making this a function (or a
class that most people think of as a function, like most of itertools) is easy,
but as soon as you say it should be a method or property of sequences, that’s
trickier. You can add it to all the builtin sequence types, but should other
sequences in the stdlib have it? Should Sequence provide it as a mixin? Should
it be part of the sequence protocol, and therefore checked by Sequence as an
ABC (even though that could be a breaking change)?
> PR's accepted on my draft!
>
> https://github.com/PythonCHB/islice-pep/blob/master/islice.py
>
> >>> d[7] = 8
> >>> next(i1)
> RuntimeError: dictionary changed size during iteration
> >>> i3 = iter(k)
> >>> next(i3)
>
> That's probably a feature we'd want to emulate.
>
> > Basically, views are not like iterators at all, except in that they save
> > time and space by being lazy.
>
> Well, this is a vocabulary issue -- an "iterable" and "iterator" is anything
> that follows the protocol, so yes, they very much ARE iterables (and
> iterators) even though they also have some additional behavior.
> Which is why it's not wrong to say that a range object is an iterator, but is
> IS wrong to say that it's Just and iterator ...
No, they’re not iterators. You’ve got it backward—every iterator is an
iterable, but most iterables are not iterators.
An iterator is an iterable that has a __next__ method and returns self from
__iter__. List, tuples, dicts, etc. are not iterators, and neither are ranges,
or the dict views.
You can test this easily:
>>> isinstance(range(10), collections.abc.Iterator)
False
A lot of people get this confused. I think the problem is that we don’t have a
word for “iterable that’s not an iterator”, or for the refinement “iterable
that’s not an iterator and is reusable”, much less the further refinement
“iterable that’s reusable, providing a distinct iterator that starts from the
head each time, and allows multiple such iterators in parallel”. But that last
thing is exactly the behavior you expect from “things like list, dict, etc.”,
and it’s hard to explain, and therefore hard to document. The closest word for
that is “collection”, but Collection is also a protocol that adds being a
Container and being Sized on top of being Iterable, so it’s misleading unless
you’re really careful. So the docs don’t clearly tell people that range,
dict_keys, etc. are exactly that “like list, dict, etc.” thing, so people are
confused about what they are. People know they’re lazy, they know iterators are
lazy, so they think they’re a kind of iterator, and the docs don’t ever make it
clear why that’s wrong.
> > Such a resettable-iterator thing (which would have some precedent in file
> > objects, I suppose) would actually be harder to Implement, on top of being
> > less powerful and potentially confusing. And the same is true for slices.
>
> but the dict_keys iterator does seem to do that ...
>
> In [48]: dk
>
> Out[48]: dict_keys(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'])
>
> In [49]: list(dk)
>
> Out[49]: ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
>
> In [50]: list(dk)
>
> Out[50]: ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
You just picked an example where “resettable iterator” and “collection” would
do the same thing. Try the same test with list and it also passes, because list
is a collection. You can only distinguish the two cases by partially using an
iterator and then asking for another one. And if you do that, you will see
that, just like list, dict_keys gives you a brand new, completely independent
iterator, initialized from the start, every time you call iter() on it.
Because, like list, dict_keys is a collection, not an iterator. There are no
types in Python’s stdlib that have the behavior you suggested of being an
iterator but resetting each time you iterate. (The closest thing is file
objects, but you have to manually reset them with seek(0).)
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/python-ideas@python.org/message/4Y4I3N5Z6VDHE2ZTVTN3OHF5ED74GWWN/
Code of Conduct: http://python.org/psf/codeofconduct/