[Python-ideas] Re: Set operations with Lists

Richard Musil Fri, 20 Sep 2019 02:45:07 -0700

On Fri, Sep 20, 2019 at 1:52 AM Andrew Barnert <abarn...@yahoo.com> wrote:

> On Sep 19, 2019, at 15:18, Richard Musil <risa20...@gmail.com> wrote:
> >
> > Ok, I misread the original code, the lists were not sorted in the
> previous results (and their should be). So with the correction to have them
> sorted,
>
> I think to be fair, you want to show it _both_ ways, just as you’re
> showing sets both with and without creating the sets. Because sometimes
> you’re already going to have sorted lists, just as sometimes you’re already
> going to have sets.
>

Right. It just was giving wrong results and I was not sure if the algorithm
on "particularly unsorted" list would be representative (because it would
produce the wrong results), so I removed it. I added a new test
`list_relcomp[presorted]` with pre-sorted (just to have the results
correct) to see if "making it correct" has any impact.

So there are four things to compare:
>
>     * set operation on existing sets
>     * set operation including time to build the sets
>     * step-compare operation on pre-sorted lists
>     * step-compare including time to sort the lists
>
> (Also, for the set operation, there’s no need to build _two_ sets. Just
> build a set of one and intersect it with the other list.)
>

Right (again). When I was considering the algorithm on the lists, I thought
that technically, I could build just one set on the fly, but the other one
has to be at least hashed to test the inclusion, and I also assumed that
the hash operation will be dominant in this and basically told myself that
the cost of building the other set will not be significant. I just added
another implementation to the test (which I believe you had on mind):
```
def relcomp_set_list(a, b):
    bset = set(b)
    return [ia for ia in a if ia not in bset]
```
and confirmed that avoiding the 2nd set build actually brings a noticeable
improvement :).

I also changed the naming as the original code used `relcomp_set` to mark
set operation 'a - b', but returned the results as the list, which seemed
to me a bit confusing, so I renamed it to `relcomp_set_to_list` (do not
confuse it with `relcomp_set_list` mentioned above) and put there
`relcomp_set`, which does pure set 'a - b' and returns a set. So one can
see the overhead of the resulting list construction in the original
`relcomp_set`. The results are:

For A not in B:
trying A 1000, B 5000
naive_relcomp (msec): 38.105
relcomp_set (msec): 0.025
relcomp_set_to_list (msec): 0.034
relcomp_set with set init (msec): 0.215
relcomp_set_list (msec): 0.192
list_relcomp (msec): 0.329
list_relcomp[presorted] (msec): 0.214

For A not in B:
trying A 10000, B 50000
relcomp_set (msec): 0.416
relcomp_set_to_list (msec): 0.545
relcomp_set with set init (msec): 3.167
relcomp_set_list (msec): 2.546
list_relcomp (msec): 3.372
list_relcomp[presorted] (msec): 2.005

For A subset of B:
trying A 100000, B 500000
relcomp_set (msec): 8.557
relcomp_set_to_list (msec): 8.519
relcomp_set with set init (msec): 55.783
relcomp_set_list (msec): 48.903
list_relcomp (msec): 143.577
list_relcomp[presorted] (msec): 120.911

There is still one peculiarity in the last test, where relcomp_set and
relcomp_set_to_list gave basically the same result. I interpret is as that
the result is of a small size, basically insignificant to the other
operations in play.

Richard M.

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/CU7DVSIAAJQVYKQ5PX6ZMPYWVEUL2EID/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Set operations with Lists

Reply via email to