On Sat, May 02, 2020 at 07:43:44PM +0200, Alex Hall wrote:
> On Sat, May 2, 2020 at 6:09 PM Steven D'Aprano <st...@pearwood.info> wrote:
> 
> > On Sat, May 02, 2020 at 04:58:43PM +0200, Alex Hall wrote:
> >
> > > I didn't think carefully about this implementation and thought that there
> > > was only a performance cost in the error case. That's obviously not true
> > -
> > > there's an `if` statement executed in Python for every item in every
> > > iterable.
> >
> > Sorry, this does not demonstrate that the performance cost is
> > significant.
> >
> > This adds one "if" per loop, terminating on (one more than) the shortest
> > input. So O(N) on the length of the input. That's usually considered
> > reasonable, provided the per item cost is low.
> >
> > The test in the "if" is technically O(N) on the number of input
> > iterators, but since that's usually two, and rarely more than a handful,
> > it's close enough to a fixed cost.
> >
> > On my old and slow PC `sentinel in combo` is quite fast:
> >
> 
> `sentinel in combo` is problematic if some values have overridden `__eq__`.
> I referred to this problem in a previous email to you, saying that people
> had copied this buggy implementation from SO and that it still hadn't been
> fixed after being pointed out. The fact that you missed this helps to prove
> my point. Getting this right is hard.

I didn't miss it, I ignored it as YAGNI.

Seriously, if some object defines a weird `__eq__` then half the 
standard library, including builtins, stops working "correctly". See for 
example the behaviour of float NANs in lists.

My care factor for this is negligible, until such time that it is proven 
to be an issue for real objects in real code. Until then, YAGNI.


> Fortunately, more_itertools avoids this bug by not using `in`, which you
> seem to not have noticed even though I copied its implementation in the
> email you're responding to.

Which by my testing on my machine is nearly ten times slower than the 
more obvious use of `in`.

> Without actual measurements, this is a classic example of premature
> > micro-optimization.
> >
> > Let's see some real benchmarks proving that a Python version is
> > too slow in real-life code first.
> >
> 
> Here is a comparison of the current zip with more-itertools' zip_equal:
[...]

> my_timeit("consume(zip_equal(x1, x2))")
> ``` <http://python.org/psf/codeofconduct/>

Huh, there's that weird link to the CoC again.



> So the Python version is about 13 times slower, and 10 million iterations
> (quite plausible) adds about 2 seconds.

Adds two seconds to *what* though? That's why I care more about 
benchmarks than micro benchmarks. In real-world code, you are going to 
be processing the data somehow. Adds two seconds to an hour's processing 
time? I couldn't care less. Adds two seconds to a second? Now I'm 
interested.

To be clear here, I'm not arguing *against* a C accelerated version. I'm 
arguing against the *necessity* of a C version, based only on micro 
benchmarks. If the PEP is accepted, and this goes into itertools, then 
whether it is implemented in C or Python should be a matter for the 
implementer.

We shouldn't argue that this *must* be a builtin because otherwise it 
will be too slow. That's a bogus argument.


> That's not disastrous, but I think
> it's significant enough that someone working with large amounts of data and
> concerned about performance might choose to risk accidental malformed input.

That's their choice to make, not ours. If they are worried about unequal 
input lengths, they can always truncate the data to make them equal 
*wink*


[Oh no, I have a sudden image in my head of people using zip to truncate 
their data to equal lengths, before passing it on to zip_strict "to be 
sure".]


-- 
Steven
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/E324GYS3RZDVXW7PHYX7LE4Q6W4TMHQF/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to