On 9/28/2016 3:10 PM, Tim Chase wrote:
I've got several iterators sharing a common key in the same order and would like to iterate over them in parallel, operating on all items with the same key. I've simplified the data a bit here, but it would be something likedata1 = [ # key, data1 (1, "one A"), (1, "one B"), (2, "two"), (5, "five"), ] data2 = [ # key, data1 (1, "uno"), (2, "dos"), (3, "tres x"), (3, "tres y"), (3, "tres z"), (4, "cuatro"), ] data3 = [ # key, data1, data2 (2, "ii", "extra alpha"), (4, "iv", "extra beta"), (5, "v", "extra gamma"), ] And I'd like to do something like for common_key, d1, d2, d3 in magic_happens_here(data1, data2, data3): for row in d1: process_a(common_key, row) for thing in d2: process_b(common_key, row) for thing in d3: process_c(common_key, row) which would yield the common_key, along with enough of each of those iterators (note that gaps can happen, but the sortable order should remain the same). So in the above data, the outer FOR loop would happen 5 times with common_key being [1, 2, 3, 4, 5], and each of [d1, d2, d3] being an iterator that deals with just that data.
You just need d1, d2, d3 to be iterables, such as a list. Write a magic generator that opens the three files and reads one line of each (with next()). Then in while True loop, find minimum key and make 3 lists (up to 2 possibly empty) of the items in each file with that key. This will require up to 3 inner loops. The read-ahead makes this slightly messy. If any list is not empty, yield the key and 3 lists. Otherwise break the outer loop.
My original method was hauling everything into memory and making multiple passes filtering on the data. However, the actual sources are CSV-files, some of which are hundreds of megs in size, and my system was taking a bit of a hit. So I was hoping for a way to do this with each iterator making only one complete pass through each source (since they're sorted by common key). It's somewhat similar to the *nix "join" command, only dealing with N files.
It is also somewhat similar to a 3-way mergesort. -- Terry Jan Reedy -- https://mail.python.org/mailman/listinfo/python-list
