[Python-ideas] Re: Set operations with Lists

Andrew Barnert via Python-ideas Sun, 22 Sep 2019 14:39:23 -0700

Let me go back to the top here.

On Sep 18, 2019, at 12:29, Richard Higginbotham <higgi...@gmail.com> wrote:
> 
> I have frequently come across cases where I would like to compare items in 
> one list in another similar to relational algebra.

I’ve put together an itertools-style implementation at
https://github.com/abarnert/iterset if anyone’s interested. It includes merge,
union, intersection, difference, symmetric_difference, issubset, and
issuperset, all taking two sorted iterables and a key function.

While all of these things (except merge) can be done by just putting one
iterable into a set (or Counter, if you need dups) and passing the other to a
method, there are plenty of reasons to prefer these functions in some cases:
preserving input order, taking key functions, working on non-hashable values,
chaining up itertools-style with other Iterator transforms, etc.

Notice that the C++ standard library has exactly the same set of functions at
the same level of flexibility (they take C++ iterator ranges, and an optional
less-than function), despite having methods on (both unordered hash and ordered
tree) sets, for much the same reasons. Also notice that itertools already has
at least one function that expects to work on sorted iterables, groupby. And
that in general itertools is intended to provide an “iterator algebra” that
this fits into nicely.

So, I think they’re useful enough to exist somewhere, possibly in itertools,
possibly in a third-party library (although maybe as part of more-itertools
and/or toolz rather than on their own).

It’s barely tested, not at all optimized, not fully documented, and generally
not ready to turn into a patch to the stdlib, more-itertools, or toolz or into
a PyPI package. But if anyone wants it to be polished up to that level, let me
know and I could probably find time this week. (If not, I’ll just keep it in my
toolbox; and maybe come back to it next time I need it.)

I haven’t tested performance at all. I expect it to be slower (when used with
already-sorted data that’s already in a list) than Richard Higginbotham’s list
implementation by a potentially hefty multiplier. From a quick&dirty test, I
think the multiplier can be brought way down (albeit at significant cost to
readability), but it’ll still be higher than 1. And that’s already slower than
sets for at least most uses, as Richard Musli has demonstrated. Of course for
huge data where reading everything into memory means swap hell, it’ll be a
whole lot faster, but if you really need that, I’ll bet pandas/HDF5 has even
better alternatives. Interleaving the work with the cost of reading or
constructing the values might help performance in some cases, but that’s hard
to generalize about. Overall, you probably want them when you need to preserve
input order, or use a key function, etc., not when sets aren’t fast enough.
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/python-ideas@python.org/message/2NZ4HEZQNIBZTYR24J6FTFNF6KNEHUT5/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Set operations with Lists

Reply via email to