On Mon, Sep 23, 2019 at 12:21 AM Richard Higginbotham <higgi...@gmail.com> wrote:
> I really appreciate the time and thought you have put into it (and others > here as well), and its been educational / fun for me too. One of the > concerns I have about using timeit is that it puts a lot of focus on the > exact statements that are being executed and all the internals underneath. > Looking at "list(set(a).difference(b))"; I suspect what I am seeing is 3 C > calls from python. Anything after that is a comparison between C (here) and > Python (in list_relcomp). I use Python for prototyping a lot so I tend to > look for those situations more. > I believe that your "suspicion" is correct and the set variant is executed mostly in C code (though I do not have any intimate knowledge of the Python implementation, so I my be wrong), while your code is executed in bytecode and I also believe no one here claimed otherwise. This however is not a problem of `timeit`, which main feature is that it is designed for benchmarking and available in the standard library. So anyone can use it, and the results are (somewhat) comparable. If you design your own benchmarking tool (I do not think datetime is a good candidate though), anything you gain you lose on the reproducibility and comparability. Besides, saying this benchmark does not give the results I expect, would probably need also explaining what do you expect and why. > When I first wrote the algorithms(2000 or so) I compared them to [x for x > in a if x in b] and I found that the later was faster. I didn't make sense > at first but it was because the 'x in b' is done in C (I had just graduated > and started using Python). If the data set was large enough I could see the > big O growth was greater for the C version versus the python version. > Similar to what we see here in the difference between the set operations > and listex. I implemented listex in C and it was much faster at all data > sizes over maybe 10 elements. I don't expect the same speedup compared to > sets because they are more efficient. I also can't guarantee that the trend > of set operations having a higher time growth will hold true for all data > sizes. I only point it out and say for some use cases set relation > operations aren't optimal, but they may be good enough. > So far the benchmarks showed that that list implementation was slower than the sets at every non-trivial case, sometimes by quite a margin. I am not saying that the C implementation cannot be faster, in fact, I will probably be eager to test it, if you come up with one. But after what I read (and posted) in this thread I am also reluctant to accept that C implementation _must_ be faster (or must be significantly faster). Maybe, your code, even when run in Python, is well optimized already, and won't get a significant speedup or will be 5x faster, but, honestly, I do not dare even to guess. The only way to prove it is to actually implement it in C. Considering your use case however, I wonder, if you would not be better going with the iterator approach (as Andrew has been hinting already for some time in his posts). Richard M.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/7FDHFQB65EOX47AT65ZGJO6RA5SMOE35/ Code of Conduct: http://python.org/psf/codeofconduct/