The value of the order is actually the critical part. If they are both ordered I can complete all comparisons in one pass over both lists at the same time. Lets think about in A and not B a = [1,2,3,6] b = [1,4,5,6,7] start at a[0] and b[0], 0 -- a[i] == b[j] so increment both indexes. 1 -- a[i] < b[j] so b cannot contain a[i] this is due to sort order, append a[i] to result and increment i 2 -- a[i] < b[j] so b cannot contain a[i] this is due to sort order, append a[i] to result and increment i .... every round of comparisons increments at least i or j if not both. That makes the run time roughly len(a) + len(b)
The original use case was detecting new files in various directories. Each directory was expected to have millions of 4KB files and maybe at most 100k of new files. This was in around 2002 and it was a huge time savings for me. I'm not even sure that set's were available then. Originally it was a C API extension. Computers have gotten faster enough that the speed difference doesn't matter if you convert to and from sets unless we are talking about millions of elements in a set. The original source was lost and the pure python version has been adequate for anything I've needed in the last 10 years. The algorithm's are actually very simple and short (maybe 200 lines total). I don't know if something like this would be useful for others, or it might be useful for a specific group like scipy with very large datasets. It might / probably? even be adapted to be used in the built in set operations. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/WWDAU6LRTYSD6S77BJQUXTQIT6YHUF2D/ Code of Conduct: http://python.org/psf/codeofconduct/