On Mon, 4 May 2020 at 12:41, Steven D'Aprano <st...@pearwood.info> wrote:
>
> On Sun, May 03, 2020 at 11:13:58PM -0400, David Mertz wrote:
>
> > It seems to me that a Python implementation of zip_equals() shouldn't do
> > the check in a loop like a version shows (I guess from more-itertools).
> > More obvious is the following, and this has only a small constant speed
> > penalty.
> >
> > def zip_equal(*its):
> >     yield from zip(*its)
> >     if any(_sentinel == next(o, _sentinel) for o in its):
> >         raise ZipLengthError
>
> Alas, that doesn't work, even with your correction of `any` to
> `not all`.
>
>     py> list(zip_equal("abc", "xy"))
>     [('a', 'x'), ('b', 'y')]
>
>
> The problem here is that zip consumes the "c" from the first iterator,
> exhausting it, so your check at the end finds that all the iterators are
> exhausted.

This got me thinking, what if we were to wrap (or as it turned out,
`chain` on to the end of) each of the individual iterables instead,
thereby performing the relevant check before `zip` fully exhausted
them, something like the following:

```python
def zip_equal(*iterables):
    return zip(*_checked_simultaneous_exhaustion(*iterables))

def _checked_simultaneous_exhaustion(*iterables):
    if len(iterables) <= 1:
        return iterables

    def check_others():
        # first iterable exhausted, check the others are too
        sentinel=object()
        if any(next(i, sentinel) is not sentinel for i in iterators):
            raise ValueError('unequal length iterables')
        if False: yield

    def throw():
        # one of iterables[1:] exhausted first, therefore it must be shorter
        raise ValueError('unequal length iterables')
        if False: yield

    iterators = tuple(map(iter, iterables[1:]))
    return (
        itertools.chain(iterables[0], check_others()),
        *(itertools.chain(it, throw()) for it in iterators),
    )
```

This has the advantage that, if desired, the
`_checked_simultaneous_exhaustion` function could also be reused to
implement a previously mentioned length checking version of `map`.

Going further, if `checked_simultaneous_exhaustion` were to become a
public function (with a better name), it could be used to impose
same-length checking to the iterable arguments of any function,
providing those iterables are consumed in a compatible way.

Additionally, it would allow one to be specific about which iterables
were checked, rather than being forced into the option of checking
either all or none by `zip_equal` / `zip` respectively, thus allowing
us to have our cake and eat it in terms of mixing infinite and
checked-length finite iterables, e.g.

```python
zip(i_am_infinite, *checked_simultaneous_exhaustion(*but_we_are_finite))
# or, if they aren't contiguous
checked1, checked2 = checked_simultaneous_exhaustion(it1, it2)
zip(checked1, infinite, checked2)
```

However, as I previously alluded to, this relies upon the assumption
that each of the given iterators is advanced in turn, in the order
they were provided to `checked_simultaneous_exhaustion`. So -- while
this function would be suitable for use with `zip`, `map`, and any
others which do the same -- if we wanted a more general
`checked_equal_length` function that extended to cases in which the
iterable-consuming function may consume the iterables in some
haphazard order, we'd need something more involved, such as keeping a
running tally of the current length of each iterable and, even then,
we could still only guarantee raising on unequal lengths if the said
function advanced all the given iterators by at least the length of
the shortest.
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/4D3FIYTOJSROIS3S3SYU752RTOJV27IZ/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to