[Python-ideas] Re: Argumenting in favor of first()

Andrew Barnert via Python-ideas Mon, 09 Dec 2019 15:40:25 -0800

On Dec 9, 2019, at 12:08, Wes Turner <wes.tur...@gmail.com> wrote:
> 
> 
>> On Sat, Dec 7, 2019, 11:30 PM Andrew Barnert <abarn...@yahoo.com> wrote:
>>> On Dec 7, 2019, at 18:09, Wes Turner <wes.tur...@gmail.com> wrote:
>>> 
>>>> On Sat, Dec 7, 2019, 8:20 PM Andrew Barnert <abarn...@yahoo.com> wrote:
> 
> I would argue that there could be subclasses of ValueError for .one() that 
> would also be appropriate for .first() (and/or .take(iterable, count=1, 
> default=_default)


...

> The names are less important than being able to distinguish the difference 
> between the cases.

But again, the need to be able to distinguish is, while not nonexistent, pretty 
rare. And cases where you need to distinguish them but don’t care what the 
types are otherwise are even less common. So, is that common enough to be worth 
adding two more exception types to Python (or just to itertools) that aren’t 
used anywhere else? Just saying that they might be useful somewhere doesn’t 
answer that question.
> 
>> That’s a common issue in Python. When you can’t use None as a sentinel 
>> because it could be a valid user input or return value, you just create a 
>> private module or class attribute that can’t equal anything the user could 
>> pass in, like this:
>> 
>>     _sentinel = object()
>> 
>> And then:
>> 
>>     def spam(stuff, default=_sentinel):
>>         if default is _sentinel:
>>             do single-argument stuff here
>>         else:
>>             do default-value stuff here
> 
> `None` is not a good default value for .first() (or .one()) because None may 
> be the first item in the iterable.

Yes. And, as I said, this is a common case in Python, with a standard idiom 
(which more-itertools uses) to deal with it.

>> This seems like the kind of thing that should be explained somewhere in 
>> every tutorial (including the official one), but most people end up finding 
>> it only by accident, reading some code that uses it and trying to figure out 
>> what it does and why. The same way people figure out how useful two-argument 
>> iter is, and a couple other things.
> 
> I'll second a recommendation to note the existence of two-argument iter() and 
> two-argument next() in the docstring for itertools.first()

I don’t think 2-arg iter belongs anywhere near first, just that it belongs 
somewhere in itertools tutorials, and maybe the module docs.

As for 2-arg next, notice that the existing docs for more_itertools.first cover 
that by saying “If is marginally shorter than next(iter(iterable), default)”. I 
think maybe a stdlib version of first should be a bit less dismissive of its 
own value, but noting thIs relationship is really all you need to teach people 
2-arg next.

>>>> Though, .first() (or .one()) on an unordered iterable is effectively 
>>>> first(shuffle(iterable)), which *could* raise an annotation exception at 
>>>> compile time. 
>> 
>> I’m not sure what you mean by an “annotation exception”. You mean an error 
>> from static type checking in mypy or something? I’m not sure why it would be 
>> an error, unless you got the annotation wrong. It should be Iterable, and 
>> that will work for iterators and sequences and sets and so on just fine.
>> 
>> Also, it’s not really like shuffle, because even most “unordered iterables” 
>> in Python, like sets, actually have an order. It’s not guaranteed to be a 
>> meaningful one, but it’s not guaranteed to be meaningless either. If you 
>> need that (e.g., you’re creating a guessing game where you don’t want the 
>> answer to be the same every time anyone runs the game, or for security 
>> reasons), you really do need to explicitly randomize. For example, if s = 
>> set(range(10,0,-1)), it’s not guaranteed anywhere that next(iter(s)) will be 
>> 0, but it is still always 0 in any version of CPython. Worse, whatever 
>> next(iter(s)) is, if you call next(iter(s)) again (without mutating s in 
>> between), you’ll get the same value from the new Iterator in any version of 
>> any Python implementation.
> 
> 
> Taking first(unordered_sequence) *is* like shuffle. Sets merely seem to be 
> ordered when the items are integers that hash to said integer:

It’s not a matter of when they seem to have some specific order. It’s that they 
always do have an order, even if it often isn’t a meaningful one. If you need 
to actually guarantee not having a meaningful order, you need to ask for that 
explicitly (whether with shuffle or something else).

> Does .first() need to solve for this with type annotations and/or just a 
> friendly docstring reminder?

Solve for what? People should know that sets have no guarantee about the 
meaningfulness of their order, but the right place to teach that is on sets, 
not on every function that works with iterables.

>> Right, but a sequence isn’t just an ordered iterable, it’s also 
>> random-access indexable (plus a few other things). An itertools.count(), a 
>> typical sorteddict type, a typical linked list, etc. are all ordered but not 
>> sequences.
> 
> 
> In terms of math, itertools.count() is an infinite ordered sequence (for 
> which there is not implementation of lookup by subscript)

Right, and? Sorted dicts and linked lists are ordered but not sequences despite 
being generally finite, and even Sized. That isn’t the key distinction that 
makes a Sequence; it’s being random-access indexable (which, e.g., a linked 
list usually isn’t, because it can’t do it in constant time).

> In terms of Python, the generator returned by itertools.count() is an 
> Iterable (hasattr('__iter__')) that does not implement __getitem__ ( doesn't 
> implement the Mapping abstract type interface ).

Not quite. The Mapping interface is not for everything that’s indexable, it’s 
for everything that’s subscriptable by keys rather than indexes. Things that 
are subscriptable by indexes are Sequences, not Mappings. Almost nothing is 
both. (The fact that Python’s type system can’t distinguish those—e.g., you can 
use ints as keys and as indexes—is why neither of these can be an implicit 
structural ABC like Iterable, and instead they need to register types manually.)

> https://github.com/python/typeshed/blob/master/stdlib/3/itertools.pyi : 
> 
> _N = TypeVar('_N', int, float)
> def count(start: _N = ...,
>           step: _N = ...) -> Iterator[_N]: ...  # more general types?
> 
> A collections.abc.Ordered type might make sense if Reversible does not imply 
> Ordered.
> A hasattr('__iter_ordered__') might've made sense.

But what would ordered mean here? Just that there is some ordering? That the 
ordering is consistent between iterations if nothing is mutated? That it’s 
consistent even after mutations except for the mutated bits? Something even 
more strict?

If you don’t have any code that needs to switch on any of those distinctions, 
there’s no need for an ABC.

> hasattr('__getitem__') !=> Sequence
> Sequence => hasattr('__getitem__')

Yes. Mappings also have __getitem__ and they’re not Sequences. And 
not-quite-Mapping types. And “old-style sequence protocol” types (which can be 
consistently indexed from 0 up to the smallest int that raises IndexError, but 
don’t necessarily have __len__, or even __iter__). And so on.

>> The more-itertools functions that require sequences (and name them seq) 
>> usually require indexing or slicing.
> 
> 
> That may be a good convention.
> But in terms of type annotations
> - https://docs.python.org/3/library/collections.abc.html
>   - [x] Iterable (__iter__)
>   - [x] Collection (__getitem__, __iter__, __len__)
>   - [x] Mapping / MutableMapping (Collection)
>   - [x] Sequence  / MutableSequence (Sequence, Reversible, Collection 
> (Iterable))
>   - [x] Reversible 
>   - [ ] Ordered
> 
> Does 'Reversible' => (imply) Ordered; which would then be redundant?

Which more-itertools functions require testing for Reversible, or Ordered, but 
not Sequence? There might be some of the former, but I doubt there are any of 
the latter. Most take Iterable, the rest take Sequence or Iterator, and I don’t 
think anything is left out, or had to be crammed into either of those as a 
hacky workaround or anything. So what are you trying to fix here?

> Math definition (setting aside a Number sequence-element type restriction):
> 
>   Sequence = AllOf(Iterable, Ordered)

So your Ordered implies Sized and Container?

> More_itertools convention, AFAIU?: 
> 
>   seq => AllOf(Iterable, Mapping, Ordered)
>   seq => all(hasattr(x) for x in (' __iter__', '__getitem__'))

I think it’s a lot simpler. seq => Sequence. Theremay be a bit of looseness in 
that some functions can take an old-style half-sequence or various other 
things, but no more than any other code annotated with Sequence in Python.

> How does this apply to .first()?
> 
> If I call .first() on an unordered Iterable like a set, I may not get the 
> first item; this can/may/will sometimes fail:
> 
>   assert first({'a', 'b'}) == 'a'

But 'a' isn’t the first element in the set just because it came first in the 
display. Consider this:

    assert first(sortedlist('zyx')) == 'z'

Clearly that should fail, because the first item in a sorted list of those 
letters is x, not z. The fact that you constructed it with z first isn’t 
relevant; they’re kept in sorted order, and x sorts before z. But surely you 
wouldn’t say that a sorted list isn’t ordered?

Meanwhile, notice that in either case, first(it) always returns the same thing 
that list(it)[0] would (except for a different exception if it is empty). 
That’s guaranteed by the way iteration works. In that sense, all iterables are 
ordered. There are other senses in which that’s not true, but without having a 
specific sense in mind that you’re trying to distinguish, the word doesn’t help 
anything.

> If there was an Ordered ABC (maybe unnecessarily in addition to Reversible), 
> we could specify:
> 
>   # collections.abc
>   class OrderedIterable(Iterable, Ordered):
>       pass

So your Ordered doesn’t imply Iterable? What kinds of things are ordered but 
not Iterable?

Again, what are you actually trying to solve this this distinction?

> Implicit in a next() call is a hasattr(obj, '__iter__') check

No there isn’t. It’s almost always true, because the only things that normally 
have __next__ are iterators, and they always have __iter__ as well. But there’s 
no need to check for that. If you create a type that has __next__ but not 
__iter__ for some reason, you expect that it can’t be used in a for loop, but 
why shouldn’t it be usable in a next call? Why would we want to go out of our 
way to block that when nobody ever does it, and it would be a clear “consenting 
adults” case if anyone ever did?

> ; but a user calling .first() may or may not be aware that there is no check 
> that the passed Iterable is ordered. Type annotations could catch that 
> mistake.

Only with some meaningful (and universally meaningful) definition of “ordered”. 
And I don’t know what definition you have in mind, or even could have in mind, 
that would alleviate potential confusion.

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/KT4IAL7JZIH4XL6A5V24L2TUPZD3SFKA/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Argumenting in favor of first()

Reply via email to