On Fri, Apr 3, 2009 at 12:55 PM, Greg Stark <st...@enterprisedb.com> wrote: >>> 1. When the hash is not expected to spill to disk, it preserves the >>> pathkeys of the outer side of the join. If the optimizer were allowed >>> to assume that, it could produce significantly more efficient query >>> plans in some cases. >> >> This is definitely possible, but you will have to dynamically modify the >> execution path if the hash join ends up to be more than one batch. > > Yeah, this item seems to be another case where having a "conditional" > branch in the plan would be valuable. What's interesting is that it's > branching based on whether the hash has spilled into batches rather > than whether the number of rows is greater or less than some breakeven > point. It looks like these branche nodes are going to need to know > more than just the generic plan parameters. They're going to need to > know about specifics like "sort has spilled to disk" or "hash has > spilled into batches" etc.
In this particular case, the operation that has to be performed is more specific than "sort", it's "merge this set of sorted tapes". So it's more subtle than just inserting a sort node. > I like the idea of coalescing hash tables. I'm not sure the order in > which the planner decides on things is conducive to being able to make > good decisions about it though. Even if we did it afterwards without > adjusting the planner it might still be worthwhile though. I don't see why hash_inner_and_outer can't walk the outer path looking for suitable hashes to reuse. I think the question is how aggressive we want to be in performing that search. If we limit the search to hashes on base tables without restriction conditions, we'll probably catch most of the interesting cases, but obviously not all of them. If we look for hashes on baserels with restriction conditions or hashes on joinrels, etc., we might pick up a few more cases, but at the expense of increased planning time. The advantage of searching in the executor, I suppose, is that the checks don't have to be as cheap, since you're only checking the plan that has already won, rather than lots and lots of potential plans. On the other hand, your costing will be less accurate, which could lead to bad decisions elsewhere. > Incidentally a similar optimization is possible for materialize or > even sorts. They may not come up nearly so often since you would > normally want to go around adding indexes if you're repeatedly sorting > on the same columns. Biut it might not be any harder to get them all > in one swoop. At least in my experience, sort and materialize nodes are pretty rare, so I'm not sure it would be worth the time it would take to search for them. In most cases it seems to be cheaper to hash the inner and outer paths than to sort even one of them and then merge-join the result. When I do get these sorts of paths, they tend to be in larger, more complex queries where there's less chance of useful reuse. But my experience might not be representative... ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers