On Fri, Sep 20, 2019 at 1:52 AM Andrew Barnert <abarn...@yahoo.com> wrote:
> On Sep 19, 2019, at 15:18, Richard Musil <risa20...@gmail.com> wrote: > > > > Ok, I misread the original code, the lists were not sorted in the > previous results (and their should be). So with the correction to have them > sorted, > > I think to be fair, you want to show it _both_ ways, just as you’re > showing sets both with and without creating the sets. Because sometimes > you’re already going to have sorted lists, just as sometimes you’re already > going to have sets. > Right. It just was giving wrong results and I was not sure if the algorithm on "particularly unsorted" list would be representative (because it would produce the wrong results), so I removed it. I added a new test `list_relcomp[presorted]` with pre-sorted (just to have the results correct) to see if "making it correct" has any impact. So there are four things to compare: > > * set operation on existing sets > * set operation including time to build the sets > * step-compare operation on pre-sorted lists > * step-compare including time to sort the lists > > (Also, for the set operation, there’s no need to build _two_ sets. Just > build a set of one and intersect it with the other list.) > Right (again). When I was considering the algorithm on the lists, I thought that technically, I could build just one set on the fly, but the other one has to be at least hashed to test the inclusion, and I also assumed that the hash operation will be dominant in this and basically told myself that the cost of building the other set will not be significant. I just added another implementation to the test (which I believe you had on mind): ``` def relcomp_set_list(a, b): bset = set(b) return [ia for ia in a if ia not in bset] ``` and confirmed that avoiding the 2nd set build actually brings a noticeable improvement :). I also changed the naming as the original code used `relcomp_set` to mark set operation 'a - b', but returned the results as the list, which seemed to me a bit confusing, so I renamed it to `relcomp_set_to_list` (do not confuse it with `relcomp_set_list` mentioned above) and put there `relcomp_set`, which does pure set 'a - b' and returns a set. So one can see the overhead of the resulting list construction in the original `relcomp_set`. The results are: For A not in B: trying A 1000, B 5000 naive_relcomp (msec): 38.105 relcomp_set (msec): 0.025 relcomp_set_to_list (msec): 0.034 relcomp_set with set init (msec): 0.215 relcomp_set_list (msec): 0.192 list_relcomp (msec): 0.329 list_relcomp[presorted] (msec): 0.214 For A not in B: trying A 10000, B 50000 relcomp_set (msec): 0.416 relcomp_set_to_list (msec): 0.545 relcomp_set with set init (msec): 3.167 relcomp_set_list (msec): 2.546 list_relcomp (msec): 3.372 list_relcomp[presorted] (msec): 2.005 For A subset of B: trying A 100000, B 500000 relcomp_set (msec): 8.557 relcomp_set_to_list (msec): 8.519 relcomp_set with set init (msec): 55.783 relcomp_set_list (msec): 48.903 list_relcomp (msec): 143.577 list_relcomp[presorted] (msec): 120.911 There is still one peculiarity in the last test, where relcomp_set and relcomp_set_to_list gave basically the same result. I interpret is as that the result is of a small size, basically insignificant to the other operations in play. Richard M.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/CU7DVSIAAJQVYKQ5PX6ZMPYWVEUL2EID/ Code of Conduct: http://python.org/psf/codeofconduct/