[PERFORM] Overriding the optimizer

Craig A. James Thu, 15 Dec 2005 15:09:56 -0800

I asked a while back if there were any plans to allow developers to override the 
optimizer's plan and force certain plans, and received a fairly resounding 
"No".  The general feeling I get is that a lot of work has gone into the 
optimizer, and by God we're going to use it!


I think this is just wrong, and I'm curious whether I'm alone in this opinion.

Over and over, I see questions posted to this mailing list about execution 
plans that don't work out well.  Many times there are good answers - add an 
index, refactor the design, etc. - that yield good results.  But, all too often 
the answer comes down to something like this recent one:

> Right on. Some of these "coerced" plans may perform> much better. If so, we can look at tweaking your runtime

  > config: e.g.
  >
  > effective_cache_size
  > random_page_cost
  > default_statistics_target
  >
  > to see if said plans can be chosen "naturally".

I see this over and over.  Tweak the parameters to "force" a certain plan, because 
there's no formal way for a developer to say, "I know the best plan."

There isn't a database in the world that is as smart as a developer, or that 
can have insight into things that only a developer can possibly know.  Here's a 
real-life example that caused me major headaches.  It's a trivial query, but 
Postgres totally blows it:

select * from my_tablewhere row_num >= 50000 and row_num < 100000

    and myfunc(foo, bar);

How can Postgres possibly know what "myfunc()" does?  In this example, my_table 
is about 10 million rows and row_num is indexed.  When the row_num range is less than 
about 30,000, Postgres (correctly) uses an row_num index scan, then filters by myfunc().  
But beyond that, it chooses a sequential scan, filtering by myfunc().  This is just 
wrong.  Postgres can't possibly know that myfunc() is VERY expensive.  The correct plan 
would be to switch from index to filtering on row_num.  Even if 99% of the database is 
selected by row_num, it should STILL at least filter by row_num first, and only filter by 
myfunc() as the very last step.

How can a database with no ability to override a plan possibly cope with this?

Without the explicit ability to override the plan Postgres generates, these 
problems dominate our development efforts.  Postgres does an excellent job 
optimizing on 90% of the SQL we write, but the last 10% is nearly impossible to 
get right.  We spend huge amounts of time on trial-and-error queries, second 
guessing Postgress, creating unnecessary temporary tables, sticking in the 
occasional OFFSET in a subquery to prevent merging layers, and so forth.

This same application also runs on Oracle, and although I've cursed Oracle's 
stupid planner many times, at least I can force it to do it right if I need to.

The danger of forced plans is that inexperienced developers tend to abuse them. So it goes -- the documentation should be clear that forced plans are always a last resort.

But there's no getting around the fact that Postgres needs a way for a 
developer to specify the execution plan.

Craig


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

[PERFORM] Overriding the optimizer

Reply via email to