The value of the order is actually the critical part. If they are both ordered 
I can complete all comparisons in one pass over both lists at the same time. 
Lets think about in A and not B
a = [1,2,3,6]
b = [1,4,5,6,7]
start at a[0] and b[0],
0 --  a[i] == b[j] so increment both indexes.
1 -- a[i] < b[j] so b cannot contain a[i] this is due to sort order, append 
a[i] to result and increment i
2 -- a[i] < b[j] so b cannot contain a[i] this is due to sort order, append 
a[i] to result and increment i
....
every round of comparisons increments at least i or j if not both. That makes 
the run time roughly len(a) + len(b)

 The original use case was detecting new files in various directories. Each 
directory was expected to have millions of 4KB files and maybe at most 100k of 
new files. This was in around 2002 and it was a huge time savings for me. I'm 
not even sure that set's were available then. Originally it was a C API 
extension. Computers have gotten faster enough that the speed difference 
doesn't matter if you convert to and from sets unless we are talking about 
millions of elements in a set. The original source was lost and the pure python 
version has been adequate for anything I've needed in the last 10 years. The 
algorithm's are actually very simple and short (maybe 200 lines total). I don't 
know if something like this would be useful for others, or it might be useful 
for a specific group like scipy with very large datasets. It might / probably? 
even be adapted to be used in the built in set operations.
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/WWDAU6LRTYSD6S77BJQUXTQIT6YHUF2D/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to