[Python-ideas] Re: Argumenting in favor of first()

Oscar Benjamin Sun, 08 Dec 2019 05:47:55 -0800

On Sat, 7 Dec 2019 at 00:43, Steven D'Aprano <st...@pearwood.info> wrote:
>
> On Fri, Dec 06, 2019 at 09:11:44AM -0400, Juancarlo Añez wrote:
>
> [...]
> > > Sure, but in this case, it isn't a fragment of a larger function, and
> > > that's not what it looks like. If it looked like what you wrote, I would
> > > understand it. But it doesn't, so I didn't really understand what it was
> > > supposed to do, until I read the equivalent version using first/next.
> > >
> >
> > Exactly my point.
>
> Indeed, and I agree with that. But I still don't see what advantage
> there is to having a `first` builtin which does so little. It's a really
> thin wrapper around `next` that:
>
>     calls iter() on its iterable argument
>     supplies a default
>     and then calls next() with two arguments
>
> I guess my question is asking you to justify adding a builtin rather
> than just educating people how to use next effectively.


The real problem with next is the fact that it raises StopIteration
with no default. That can be useful when you are *implementing*
iterators but it is very much not what you want when you are just
*using* iterators. That makes next something of a footgun because it's
tempting to write something like

first = next(iter(iterable))

but if there is no applicable default value that should really be

try:
    first = next(iter(iterable))
except StopIteration:
    raise ValueError

There is a PEP that attempted to solve this problem:
PEP 479 -- Change StopIteration handling inside generators
https://www.python.org/dev/peps/pep-0479/

However PEP 479 (wrongly IMO) attributed the problem to generators
rather than iterators. Consequently the fix does nothing for users of
itertools type functions like map etc. The root of the problem it
attempted to fix is the fact that bare next raises StopIteration and
so is not directly suitable in situations where you just want to get
the next/first element of an iterable.

So you can have something like this:

csvfiles = [
    ['header', '1', '2', '3'],
    [], # <-----    file has no header
    ['header', '4', '5', '6'],  # This file is skipped
]

def total_csvfile(lines):
    lines = iter(lines)
    header = next(lines) # Skip header
    return sum(int(row) for row in lines)

for total in map(total_csvfile, csvfiles):
    print(total)

This prints out the total of the first csv file. Then StopIteration
that is emitted from attempting to skip the missing header of the
second csvfile. That StopIteration leaks out from map.__iter__ and is
"caught" by the enclosing for loop.

If you change the end of the script to

totals = map(total_csvfile, csvfiles)
for total in totals:
    print(total)
for total in totals:
    print(total)

then you will see totals for the files after the empty file showing
that it is the for loop that caught the StopIteration.

The reason this is particularly pernicious is that it leads to silent
action-at-a-distance failure and can be hard to debug. This was
considered enough of a problem for PEP 479 to attempt to solve in the
case of generators (but not iterators in general).

> This is how I would implement the function in Python:
>
>     def first(iterable, default=None):
>         return next(iter(iterable), default)

I agree that that doesn't need to be a builtin. However I would
advocate for a function like this:

def first(iterable, *default):
    iterator = iter(iterable)
    if default:
        (default,) = default
        return next(iterator, default)
    else:
        try:
            return next(iterator)
        except StopIteration:
            raise ValueError('Empty iterable')

This has the following behaviour:

>>> first({1, 2, 3})
1
>>> first(x for x in [1, 2])
1
>>> first([])
Traceback (most recent call last):
   ...
ValueError: Empty iterable
>>> first([], 2)
2

You can use it safely with map e.g. to get the first element of a
bunch of iterables:

# raises ValueError if any of csvfiles is empty
for header in map(first, csvfiles):
    print(header)

With next that would be

# silently aborts if any of csvfiles is empty
for header in map(lambda e: next(iter(e)), csvfiles):
    print(header)

> But there's a major difference in behaviour depending on your input, and
> one which is surely going to lead to bugs from people who didn't realise
> that iterator arguments and iterable arguments will behave differently:
>
>     # non-iterator iterable
>     py> obj = [1, 2, 3, 4]
>     py> [first(obj) for __ in range(5)]
>     [1, 1, 1, 1, 1]
>
>     # iterator
>     py> obj = iter([1, 2, 3, 4])
>     py> [first(obj) for __ in range(5)]
>     [1, 2, 3, 4, None]
>
> We could document the difference in behaviour, but it will still bite
> people and surprise them.

This kind of confusion can come with iterators and iterables all the
time. I can see that the name "first" is potentially confusing.
Another possible name is "take" which might make more sense in the
context of partially consumed iterators. Essentially the idea should
just be that this is next for users rather than implementers of
iterables.

--
Oscar
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/TI2IVQTXMTO2O3LK2GUO3YBC2IDLPDV6/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Argumenting in favor of first()

Reply via email to