On 3/17/14, 3:32 PM, Merlin Moncure wrote:
On Mon, Mar 17, 2014 at 2:02 PM, Jim Nasby <j...@nasby.net> wrote:
Just being able to detect that something has possibly gone wrong would be
useful. We could log that to alert the DBA/user of a potential bad plan. We
could even format this in such a fashion that it's suitable for emailing the
community with; the query, the plan, the stats, etc. That might make it
easier for us to fix the planner (although at this point it seems like we're
hitting statistics gathering problems that we simply don't know how to
solve).

Again, that's not the case here.  The problem is that the server is
using hard wired assumptions (like, 10% selective) *instead* of
statistics -- at least in the case discussed above.  That being said,
I think you're on to something: EXPLAIN ANALYZE rowcounts don't
indicate if the row count was generated from data based assumptions or
SWAGs.  So maybe you could decorate the plan description with an
indicator that suggests when default selectivity rules were hit.

It occurs to me... it should be cheap for us to track actual rowcounts compared 
to the estimate... perhaps it's worth doing that and flagging plans when the 
estimates are off by more than X percent. Though... I suspect that will just 
tell us what we already know. :(

There is another aspect of this though: plan stability. There are lots of
cases where users couldn't care less about getting an optimal plan, but they
care *greatly* about not getting a brain-dead plan.

Except for cases I noted above, I don't understand how you could flag
'sub-optimal' or 'brain-dead' plans.   The server always picks the
best plan it can.  The trick is to (in a very simple and
cpu-unintensive way) indicate when there isn't a lot of confidence in
the plan -- but that's not the same thing.

Keep in mind that the use case here is critical queries that MUST perform "fast 
enough". They do NOT need to be optimal, but they definitely can not degrade into 
something stupid. It's often way better to have a query that's 50-100% slower than 
optimal as opposed to one that suddenly becomes 100+% slower than it normally is 
(regardless of if normal is optimal or not).

You could possibly do an "anti-hint": Never use this plan, because we know it 
sucks.

Even better would be if the planner could estimate how bad a plan will become 
if we made assumptions that turn out to be wrong.

Another option would be tracking normal execution times (and the plans used) for queries 
(not completely unreasonable now with pg_stat_statements); if we try to run a query and 
it takes noticeably longer than normal and it's a different plan then assume the plan is 
bad, mark it as such, and try again with one of the "known good" plans.

Worst case would be allowing a means to store an acceptable plan and 
force/strongly suggest that the planner use it.
--
Jim C. Nasby, Data Architect                       j...@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to