Re: [HACKERS] An Idea for planner hints

Mark Dilger Wed, 23 Aug 2006 08:46:10 -0700

Jim C. Nasby wrote:

On Tue, Aug 22, 2006 at 11:56:17AM -0700, Mark Dilger wrote:
I proposed something like this quite a bit up-thread. I was hoping wecould have a mode in which the system would run the second, third, fourth,... best plans rather than just the best looking one, and then determinefrom actual runtime statistics which was best. (The proposal also includedthe ability to output the best plan and read that in at a later time inlieu of a SQL query, but that part of it can be ignored if you like.) Theposting didn't generate much response, so I'm not sure what people thoughtof it. The only major problem I see is getting the planner to keep trackof alternate plans. I don't know the internals of it very well, but Ithink the genetic query optimizer doesn't have a concept of "runner-up #1","runner-up #2", etc., which it would need to have.
I think the biggest issue is that you'd have to account for varying load
on the box. If we assume that the database is the only thing running on
the box, we might be able to do that by looking at things like how much
IO traffic we generated (though of course OS caching will screw with
that).

Actually, that's another issue... any plans run after the first one will
show up as being artificially fast, since there will be a lot of extra
cached data.

Yes, caching issues prevent you from using wall-clock time. We could instrumentthe code to count the number of rows vs. the number predicted for each internaljoin, from which new cost estimates could be generated.

Perhaps you can check my reasoning for me: I'm imagining a query which computesAxBxCxD, where A, B, C, and D are actual tables. I'm also imagining that theplanner always chooses AxB first, then joins on C, then joins on D. (It does sobecause the single-table statistics suggest this as the best course of action.)It might be that AxD is a really small metatable, much smaller than would beestimated from the statistics for A independent of the statistics for D, but AxBis pretty much what you would expect given the independent statistics for A andB. So we need some way for the system to stumble upon that fact. If we onlyever calculate cross-join statistics for plans that the system chooses, we willonly discover that AxB is about the size we expected it to be. So, if theactual size of AxB is nearly equal to the estimated size of AxB, the system willcontinue to choose the same plan in future queries, totally ignorant of theadvantages of doing AxD first.

That last paragraph is my reasoning for suggesting that the system have a modein which it runs the "runner-up #1", "runner-up #2", etc sorts of plans. Such amode could force it down alternate paths where it might pick up interestingstatistics that it wouldn't find otherwise.

This idea could be changed somewhat. Rather than running the other plans, wecould just extract from them which alternate joins they include, and consideralso calculating those join statistics.


mark

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq

Re: [HACKERS] An Idea for planner hints

Reply via email to