On Mon, Sep 23, 2019 at 12:21 AM Richard Higginbotham <higgi...@gmail.com>
wrote:

> I really appreciate the time and thought you have put into it (and others
> here as well), and its been educational / fun for me too. One of the
> concerns I have about using timeit is that it puts a lot of focus on the
> exact statements that are being executed and all the internals underneath.
> Looking at "list(set(a).difference(b))"; I suspect what I am seeing is 3 C
> calls from python. Anything after that is a comparison between C (here) and
> Python (in list_relcomp). I use Python for prototyping a lot so I tend to
> look for those situations more.
>

I believe that your "suspicion" is correct and the set variant is executed
mostly in C code (though I do not have any intimate knowledge of the Python
implementation, so I my be wrong), while your code is executed in bytecode
and I also believe no one here claimed otherwise. This however is not a
problem of `timeit`, which main feature is that it is designed for
benchmarking and available in the standard library. So anyone can use it,
and the results are (somewhat) comparable. If you design your own
benchmarking tool (I do not think datetime is a good candidate though),
anything you gain you lose on the reproducibility and comparability.
Besides, saying this benchmark does not give the results I expect, would
probably need also explaining what do you expect and why.


> When I first wrote the algorithms(2000 or so) I compared them to [x for x
> in a if x in b] and I found that the later was faster. I didn't make sense
> at first but it was because the 'x in b' is done in C (I had just graduated
> and started using Python). If the data set was large enough I could see the
> big O growth was greater for the C version versus the python version.
> Similar to what we see here in the difference between the set operations
> and listex. I implemented listex in C and it was much faster at all data
> sizes over maybe 10 elements. I don't expect the same speedup compared to
> sets because they are more efficient. I also can't guarantee that the trend
> of set operations having a higher time growth will hold true for all data
> sizes. I only point it out and say for some use cases set relation
> operations aren't optimal, but they may be good enough.
>

So far the benchmarks showed that that list implementation was slower than
the sets at every non-trivial case, sometimes by quite a margin.

I am not saying that the C implementation cannot be faster, in fact, I will
probably be eager to test it, if you come up with one. But after what I
read (and posted) in this thread I am also reluctant to accept that C
implementation _must_ be faster (or must be significantly faster). Maybe,
your code, even when run in Python, is well optimized already, and won't
get a significant speedup or will be 5x faster, but, honestly, I do not
dare even to guess. The only way to prove it is to actually implement it in
C.

Considering your use case however, I wonder, if you would not be better
going with the iterator approach (as Andrew has been hinting already for
some time in his posts).

Richard M.
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/7FDHFQB65EOX47AT65ZGJO6RA5SMOE35/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to