Are there any TODOs here? ---------------------------------------------------------------------------
Robert Haas wrote: > On Fri, Apr 3, 2009 at 5:10 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > > Robert Haas <robertmh...@gmail.com> writes: > >> On Fri, Apr 3, 2009 at 4:29 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > >>> Correct, but you've got the details all wrong. ?The real problem is that > >>> the planner might discard a join path hash(A,B) at level 2 because it > >>> loses compared to, say, merge(A,B). ?But when we get to level three, > >>> perhaps hash(hash(A,B),C) would've been the best plan due to synergy > >>> of the two hashes. ?We'll never find that out unless we keep the > >>> "inferior" hash path around. ?We can certainly do that; the question > >>> is what's it going to cost us to allow more paths to survive to the > >>> next join level. ?(And I'm afraid the answer may be "plenty"; it's a > >>> combinatorial explosion we're looking at here.) > > > >> That would be crazy. ?I think doing it the way I suggested is correct, > >> just not guaranteed to catch every case. ?The reality is that even if > >> we took Greg Stark's suggestion of detecting this situation only in > >> the executor, we'd still get some benefit out of this. ?If we take my > >> intermediate approach, we'll catch more cases where this is a win. > >> What you're suggesting here would catch every conceivable case, but at > >> the expense of what I'm sure would be an unacceptable increase in > >> planning time for very limit benefit. > > > > Maybe, maybe not. ?I've seen plenty of plans that have several > > mergejoins stacked up on top of each other with no intervening sorts. > > There is 0 chance that the planner would have produced that if it > > thought that it had to re-sort at each level; something else would have > > looked cheaper. ?I think that your proposals will end up getting very > > little of the possible benefit, because the planner will fail to choose > > plan trees in which the optimization can be exploited. > > Well, I'm all ears if you have suggestions for improvement. For > sorts, we use PathKeys to represent the ordering of each path and keep > the paths for each set of pathkeys. By analogy, we could maintain a > list of PathHash structures for each path representing the tables that > had already been hashed. add_path() would then have to consider both > the PathHash structures and the PathKey structures before concluding > that a path was definitely worse than some path previously found. At > each level of the join tree, we'd need to truncate PathHash structures > that provably have no further use (e.g. on a base table that does not > appear again above the level of the join already planned) to avoid > keeping around paths that appeared to be better only because we didn't > know that the paths they have hashed are worthless in practice. Maybe > that wouldn't even be that expensive, actually, because there will be > lots of cases where you know the relevant table doesn't appear > elsewhere in the query and not save any extra paths. But I think we'd > have to write the code and benchmark it to really know. > > I guess the reason I'm not too worked up about this is because my > experience is that the planner nearly always prefers hash joins on > small tables, even when an index is present - the queries I'm worried > about optimizing don't need any additional encouragement to use hash > joins; they're doing it already. But certainly it doesn't hurt to see > how many cases we can pick up. > > ...Robert > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian <br...@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers