[HACKERS] Hash vs. HashJoin nodes

Neil Conway Wed, 30 Mar 2005 20:06:46 -0800

Is there a reason why the implementation of hash joins uses a separate "hash" child node? AFAICS that node is only used in hash joins. Perhaps the intent was to be able to provide a generic "hashing" capability that could be used by any part of the executor that needs to hash tuples, but AFAICS the hash node is not currently used in that way.

(The reason I ask is that Andrew @ Supernews and I were discussing a potential minor improvement to the hash join implementation. If either of the inputs to an inner hash join is empty, we can avoid building the hash table or reading the other join relation. The existing code works fine if it is the inner hash relation that is empty (since that is read first), but if the outer join relation is empty we do a lot of unnecessary work. We could improve this by first pulling a single tuple from the hash join's inner relation; if it is non-null, then pull a single tuple from the outer relation. If that is also non-null, then go and build the hash table for the inner relation as usual. This isn't easy to implement at present because nodeHash is used to hash the inner relation, and does the whole job at once. Of course, it would be possible to hack nodeHash to detect the first time it is called and then return after a single tuple, so the caller would actually invoke it twice for non-empty input -- but that seems a bit ugly, so I'm wondering if there is any value to maintaining the hash vs. hash join distinction in the first place.)

-Neil


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

[HACKERS] Hash vs. HashJoin nodes

Reply via email to