Re: [HACKERS] detecting poor query plans

2003-11-26 Thread Sailesh Krishnamurthy
 Neil == Neil Conway [EMAIL PROTECTED] writes:

Neil It occurred to me that these kinds of poor planning
Neil decisions could easily be detected by PostgreSQL itself:
Neil after we've finished executing a plan, we can trivially
Neil compare the # of results produced by each node in the query
Neil tree with the # of results the planner expected that node to
Neil produce (look at EXPLAIN ANALYZE, for example). If the

Indeed. This is the approach being followed by the LeO project
(Learning Optimizer) at IBM Almaden. 

http://www.almaden.ibm.com/software/dm/SMART/leo.shtml

There is a vldb paper that describes it .. 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] detecting poor query plans

2003-11-26 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes:
 It occurred to me that these kinds of poor planning decisions could
 easily be detected by PostgreSQL itself: after we've finished
 executing a plan, we can trivially compare the # of results produced
 by each node in the query tree with the # of results the planner
 expected that node to produce (look at EXPLAIN ANALYZE, for
 example). If the estimate is off by a significant margin (say, 300%),
 we could perhaps emit a HINT suggesting that the user re-run ANALYZE

I think such a thing would have such a low signal-to-noise ratio as to
be useless :-(.  As you note, there are many places where the planner's
estimate is routinely off by more than 3x (or any other threshold you
might pick instead).  In some situations that doesn't really matter,
as the same plan would have gotten picked anyway.  Also, since 7.2 came
out what we've seen more and more is cases where the row count estimate
is acceptably good, but the wrong plan was picked anyway because of
deficiencies in the cost equations.

The question you really want to know about is not whether the row count
estimate is close, it's whether another plan could have done better.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] detecting poor query plans

2003-11-26 Thread Greg Stark

Neil Conway [EMAIL PROTECTED] writes:

 It occurred to me that these kinds of poor planning decisions could easily
 be detected by PostgreSQL itself: after we've finished executing a plan, we
 can trivially compare the # of results produced by each node in the query
 tree with the # of results the planner expected that node to produce

There's a dual to this as well. If the results were very close but the actual
time taken to run the node doesn't match the cost calculated then some
optimizer parameter needs to be adjusted. Either one of the cost_* parameters
or random_page_cost, or effective_cache_size or...

I'm not sure it's as obvious what to put in the HINT though. Ideally these
results would have to be gathered and pumped through linear optimization
algorithms which is a lot more work.

-- 
greg


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] detecting poor query plans

2003-11-26 Thread Neil Conway
Tom Lane [EMAIL PROTECTED] writes:
 I think such a thing would have such a low signal-to-noise ratio as
 to be useless :-(.  As you note, there are many places where the
 planner's estimate is routinely off by more than 3x (or any other
 threshold you might pick instead).

I wonder, perhaps we could add a certainty parameter to the
estimated query plan + result sizes + costs produced by the
planner. That way, when we run into a planner deficiency we can
basically mark the relevant portion of the query tree as a WAG, and
not bother with emitting hints for it.

 In some situations that doesn't really matter, as the same plan
 would have gotten picked anyway.

The hint is NOT the chosen plan was non-optimal; the hint is the
query planner did not produce an accurate row count estimate for this
node in the query tree. The chosen query plan may or may not be
optimal -- we're merely pointing out that we chose the plan we did on
shakey grounds. The hint might just as well indicate a problem with
another query that happens to apply a similar predicate to the column
in question.

 The question you really want to know about is not whether the row
 count estimate is close, it's whether another plan could have done
 better.

Perhaps, but is there a reasonable way to answer the second question?

-Neil


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] detecting poor query plans

2003-11-26 Thread Alvaro Herrera
On Wed, Nov 26, 2003 at 11:59:33AM -0500, Neil Conway wrote:

  In some situations that doesn't really matter, as the same plan
  would have gotten picked anyway.
 
 The hint is NOT the chosen plan was non-optimal; the hint is the
 query planner did not produce an accurate row count estimate for this
 node in the query tree.

Maybe it could only be done for SeqScan and IndexScan nodes, which are
probably the most common source of bad estimates related to poor
statistics.

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
At least to kernel hackers, who really are human, despite occasional
rumors to the contrary (LWN.net)

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] detecting poor query plans

2003-11-26 Thread Neil Conway
Greg Stark [EMAIL PROTECTED] writes:
 There's a dual to this as well. If the results were very close but
 the actual time taken to run the node doesn't match the cost
 calculated then some optimizer parameter needs to be adjusted.

I was thinking about this, but I couldn't think of how to get it to
work properly:

 (1) The optimizer's cost metric is somewhat bogus to begin with.
 ISTM that translating a cost of X into an expected runtime of
 Y msecs is definitely not trivial to do.

 (2) The size of a node's result relation does not depend upon
 anything outside of PostgreSQL, whereas the exact time it
 takes to produce that result relation depends on a wide
 collection of external factors. For example, if the system is
 under heavy load, queries will take longer than normal to
 run. Or, if the query invokes a function that happens to
 occasionally block waiting on some resource, the execution
 time of the query could be wildly unpredictable.

 (3) ISTM we couldn't produce a really helpful hint message, even
 if we somehow resolved #1 and #2

-Neil


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] detecting poor query plans

2003-11-26 Thread Greg Stark
Neil Conway [EMAIL PROTECTED] writes:

 I was thinking about this, but I couldn't think of how to get it to
 work properly:
 
  (1) The optimizer's cost metric is somewhat bogus to begin with.
  ISTM that translating a cost of X into an expected runtime of
  Y msecs is definitely not trivial to do.

At least for all the possible plans of a given query at a specific point in
time the intention is that the cost be proportional to the execution time. 

 the exact time it takes to produce that result relation depends on a wide
 collection of external factors.

That's a valid point. The ms/cost factor may not be constant over time.
However I think in the normal case this number will tend towards a fairly
consistent value across queries and over time. It will be influenced somewhat
by things like cache contention with other applications though.

On further thought the real problem is that these numbers are only available
when running with explain on. As shown recently on one of the lists, the
cost of the repeated gettimeofday calls can be substantial. It's not really
feasible to suggest running all queries with that profiling.

-- 
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] detecting poor query plans

2003-11-26 Thread Neil Conway
Greg Stark [EMAIL PROTECTED] writes:
 At least for all the possible plans of a given query at a specific
 point in time the intention is that the cost be proportional to the
 execution time.

Why is this relevant? 

Given a cost X at a given point in time, the system needs to derive an
expected runtime Y, and compare Y with the actual runtime. I think
that producing Y given an arbitrary X involves so many parameters as
to be practically impossible for us to compute with any degree of
accuracy.

 That's a valid point. The ms/cost factor may not be constant over
 time.  However I think in the normal case this number will tend
 towards a fairly consistent value across queries and over time.

It might be true in the normal case, but that doesn't seem very
helpful to me: in general, the mapping of plan costs to execution time
can vary wildly over time. Spewing hints to the log whenever the
system's workload changes, a checkpoint occurs, or the system's RAID
array hiccups doesn't sound like a useful feature to me.

 On further thought the real problem is that these numbers are only
 available when running with explain on. As shown recently on one
 of the lists, the cost of the repeated gettimeofday calls can be
 substantial.

That sounds more like an implementation detail than the real problem
to me -- I think this proposed feature has more fundamental issues.

-Neil


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] detecting poor query plans

2003-11-26 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 That's a valid point. The ms/cost factor may not be constant over time.
 However I think in the normal case this number will tend towards a fairly
 consistent value across queries and over time. It will be influenced somewhat
 by things like cache contention with other applications though.

I think it would be interesting to collect the numbers over a long
period of time and try to learn something from the averages.  The real
hole in Neil's original suggestion was that it assumed that comparisons
based on just a single query would be meaningful enough to pester the
user about.

 On further thought the real problem is that these numbers are only available
 when running with explain on. As shown recently on one of the lists, the
 cost of the repeated gettimeofday calls can be substantial. It's not really
 feasible to suggest running all queries with that profiling.

Yeah.  You could imagine a simplified-stats mode that only collects the
total runtime (two gettimeofday's per query is nothing) and the row
counts (shouldn't be impossibly expensive either, especially if we
merged the needed fields into PlanState instead of requiring a
separately allocated node).  Not sure if that's as useful though.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] detecting poor query plans

2003-11-26 Thread Gavin Sherry
  On further thought the real problem is that these numbers are only available
  when running with explain on. As shown recently on one of the lists, the
  cost of the repeated gettimeofday calls can be substantial. It's not really
  feasible to suggest running all queries with that profiling.

 Yeah.  You could imagine a simplified-stats mode that only collects the
 total runtime (two gettimeofday's per query is nothing) and the row
 counts (shouldn't be impossibly expensive either, especially if we
 merged the needed fields into PlanState instead of requiring a
 separately allocated node).  Not sure if that's as useful though.

How about a PGC_POSTMASTER GUC variable which tells postgres to collect
details on the planner's performance and comparison to actual run times.
Optionally, we could also have the executor run some/all of the possible
plans (presumably only useful for SELECTs) and keep details on the
performance of each. At postmaster shutdown (some other time?) a report
could be produced profiling all queries.

The reason I suggest this is it would have zero impact on production
databases but would allow DBAs to profile their databases with real usage
patterns in development environments.

Gavin

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] detecting poor query plans

2003-11-26 Thread Bruce Momjian
Gavin Sherry wrote:
   On further thought the real problem is that these numbers are only available
   when running with explain on. As shown recently on one of the lists, the
   cost of the repeated gettimeofday calls can be substantial. It's not really
   feasible to suggest running all queries with that profiling.
 
  Yeah.  You could imagine a simplified-stats mode that only collects the
  total runtime (two gettimeofday's per query is nothing) and the row
  counts (shouldn't be impossibly expensive either, especially if we
  merged the needed fields into PlanState instead of requiring a
  separately allocated node).  Not sure if that's as useful though.
 
 How about a PGC_POSTMASTER GUC variable which tells postgres to collect
 details on the planner's performance and comparison to actual run times.
 Optionally, we could also have the executor run some/all of the possible
 plans (presumably only useful for SELECTs) and keep details on the
 performance of each. At postmaster shutdown (some other time?) a report
 could be produced profiling all queries.
 
 The reason I suggest this is it would have zero impact on production
 databases but would allow DBAs to profile their databases with real usage
 patterns in development environments.

Great idea.  Under ideal situations, it shouldn't be needed, but things
are often less than idea.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])