On Thu, Jun 6, 2019 at 7:31 PM Melanie Plageman <melanieplage...@gmail.com> wrote: > I'm not sure I understand why you would need to compare the original > tuples to the unmatched tuples file.
I think I was confused. Actually, I'm still not sure I understand this part: > Then, you iterate again through the outer side a third time to join it > to the unmatched tuples in the unmatched tuples file (from the first > chunk) and write the following to a new unmatched tuples file: > 5 > 9 > 11 > 11 and likewise here > Then you iterate a fifth time through the outer side to join it to the > unmatched tuples in the unmatched tuples file (from the second chunk) > and write the following to a new unmatched tuples file: > 11 > 11 So you refer to joining the outer side to the unmatched tuples file, but how would that tell you which outer tuples had no matches on the inner side? I think what you'd need to do is anti-join the unmatched tuples file to the current inner batch. So the algorithm would be something like: for each inner batch: for each outer tuple: if tuple matches inner batch then emit match if tuple does not match inner batch and this is the first inner batch: write tuple to unmatched tuples file if this is not the first inner batch: for each tuple from the unmatched tuples file: if tuple does not match inner batch: write to new unmatched tuples file discard previous unmatched tuples file and use the new one for the next iteration for each tuple in the final unmatched tuples file: null-extend and emit If that's not what you have in mind, maybe you could provide some similar pseudocode? Or you can just ignore me. I'm not trying to interfere with an otherwise-fruitful discussion by being the only one in the room who is confused... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company