Let me go back to the top here. On Sep 18, 2019, at 12:29, Richard Higginbotham <higgi...@gmail.com> wrote: > > I have frequently come across cases where I would like to compare items in > one list in another similar to relational algebra.
I’ve put together an itertools-style implementation at https://github.com/abarnert/iterset if anyone’s interested. It includes merge, union, intersection, difference, symmetric_difference, issubset, and issuperset, all taking two sorted iterables and a key function. While all of these things (except merge) can be done by just putting one iterable into a set (or Counter, if you need dups) and passing the other to a method, there are plenty of reasons to prefer these functions in some cases: preserving input order, taking key functions, working on non-hashable values, chaining up itertools-style with other Iterator transforms, etc. Notice that the C++ standard library has exactly the same set of functions at the same level of flexibility (they take C++ iterator ranges, and an optional less-than function), despite having methods on (both unordered hash and ordered tree) sets, for much the same reasons. Also notice that itertools already has at least one function that expects to work on sorted iterables, groupby. And that in general itertools is intended to provide an “iterator algebra” that this fits into nicely. So, I think they’re useful enough to exist somewhere, possibly in itertools, possibly in a third-party library (although maybe as part of more-itertools and/or toolz rather than on their own). It’s barely tested, not at all optimized, not fully documented, and generally not ready to turn into a patch to the stdlib, more-itertools, or toolz or into a PyPI package. But if anyone wants it to be polished up to that level, let me know and I could probably find time this week. (If not, I’ll just keep it in my toolbox; and maybe come back to it next time I need it.) I haven’t tested performance at all. I expect it to be slower (when used with already-sorted data that’s already in a list) than Richard Higginbotham’s list implementation by a potentially hefty multiplier. From a quick&dirty test, I think the multiplier can be brought way down (albeit at significant cost to readability), but it’ll still be higher than 1. And that’s already slower than sets for at least most uses, as Richard Musli has demonstrated. Of course for huge data where reading everything into memory means swap hell, it’ll be a whole lot faster, but if you really need that, I’ll bet pandas/HDF5 has even better alternatives. Interleaving the work with the cost of reading or constructing the values might help performance in some cases, but that’s hard to generalize about. Overall, you probably want them when you need to preserve input order, or use a key function, etc., not when sets aren’t fast enough. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/2NZ4HEZQNIBZTYR24J6FTFNF6KNEHUT5/ Code of Conduct: http://python.org/psf/codeofconduct/