On Thu, Sep 19, 2019 at 10:18 PM Richard Higginbotham <higgi...@gmail.com> wrote:
> I'm sorry I haven't used the mail list before and didn't send my comments > to the list. Please see my response to Chris for some example times. I'm > reluctant to use anything less than about a ms for timing purposes. There's > too many interactions at the us level that can skew results. I'd prefer to > expand the sample size until it's more clear. Here is an example > > For A not in B: > trying A 10000, B 50000 > naive test 0:00:10.319032 > set test 0:00:00.009001 > listex test 0:00:00.005000 > > For A subset of B: > trying A 100000, B 500000 > set test 0:00:00.180018 > listex test 0:00:00.054006 > Richard, I took your listex.py and modified it to use timeit module for benchmarking and put it here ( https://gist.github.com/risa2000/c44c1a06e89f348d82efcce5ec722774). The results I got from the modified code were: For A not in B: trying A 1000, B 5000 best naive_relcomp test time (msec): 38.125 best relcomp_set test time (msec): 0.034 best relcomp_set with set init test time (msec): 0.218 best list_relcomp test time (msec): 0.211 For A not in B: trying A 10000, B 50000 best relcomp_set test time (msec): 0.545 best relcomp_set with set init test time (msec): 3.093 best list_relcomp test time (msec): 2.101 For A subset of B: trying A 100000, B 500000 best relcomp_set test time (msec): 8.508 best relcomp_set with set init test time (msec): 55.001 best list_relcomp test time (msec): 29.242 Now, what is interesting about this is the benchmark I added: calculating the set difference while including the set setup (basically the set(list) call). This is the line with "best relcomp_set with set init". The results for those not familiar with Richard's listex.py are: 1) 'relcomp_set' is calculating set difference by using subtraction operator (a - b) on two sets built from the two unsorted lists. 2) 'relcomp_set with set init' is the same function but with the set setup included, so basically measuring set(list_a) - set(list_b) 3) 'list_relcomp' is the function using Richard's implementation of the difference on the lists (again unsorted). I did not verify the results (I hope they are correct), but the timing suggest that the set operation is always faster. When however we add the set setup to the mix, the build up time, which is O(n + m) becomes more significant than actually sorting and comparing the lists which is technically of O(nlog(n) + mlog(m)). It would also suggest that for some large lists the list implementation _can_ actually be faster if we calculate in an eventual conversion of the lists to the sets and possible even more faster if the lists were already ordered. What I cannot explain, why the setup time becomes so significant only at A 10000, B 50000 and does not make much of a difference in case A 1000, B 5000. It would suggest that there is some other factor which has a different impact depending on the set size. Richard M.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/5DZJGQHKODXPKEXSFC6I3GFSKYCCD6HQ/ Code of Conduct: http://python.org/psf/codeofconduct/