Re: [HACKERS] detecting poor query plans
Neil == Neil Conway [EMAIL PROTECTED] writes: Neil It occurred to me that these kinds of poor planning Neil decisions could easily be detected by PostgreSQL itself: Neil after we've finished executing a plan, we can trivially Neil compare the # of results produced by each node in the query Neil tree with the # of results the planner expected that node to Neil produce (look at EXPLAIN ANALYZE, for example). If the Indeed. This is the approach being followed by the LeO project (Learning Optimizer) at IBM Almaden. http://www.almaden.ibm.com/software/dm/SMART/leo.shtml There is a vldb paper that describes it .. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] detecting poor query plans
Neil Conway [EMAIL PROTECTED] writes: It occurred to me that these kinds of poor planning decisions could easily be detected by PostgreSQL itself: after we've finished executing a plan, we can trivially compare the # of results produced by each node in the query tree with the # of results the planner expected that node to produce (look at EXPLAIN ANALYZE, for example). If the estimate is off by a significant margin (say, 300%), we could perhaps emit a HINT suggesting that the user re-run ANALYZE I think such a thing would have such a low signal-to-noise ratio as to be useless :-(. As you note, there are many places where the planner's estimate is routinely off by more than 3x (or any other threshold you might pick instead). In some situations that doesn't really matter, as the same plan would have gotten picked anyway. Also, since 7.2 came out what we've seen more and more is cases where the row count estimate is acceptably good, but the wrong plan was picked anyway because of deficiencies in the cost equations. The question you really want to know about is not whether the row count estimate is close, it's whether another plan could have done better. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] detecting poor query plans
Neil Conway [EMAIL PROTECTED] writes: It occurred to me that these kinds of poor planning decisions could easily be detected by PostgreSQL itself: after we've finished executing a plan, we can trivially compare the # of results produced by each node in the query tree with the # of results the planner expected that node to produce There's a dual to this as well. If the results were very close but the actual time taken to run the node doesn't match the cost calculated then some optimizer parameter needs to be adjusted. Either one of the cost_* parameters or random_page_cost, or effective_cache_size or... I'm not sure it's as obvious what to put in the HINT though. Ideally these results would have to be gathered and pumped through linear optimization algorithms which is a lot more work. -- greg ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] detecting poor query plans
Tom Lane [EMAIL PROTECTED] writes: I think such a thing would have such a low signal-to-noise ratio as to be useless :-(. As you note, there are many places where the planner's estimate is routinely off by more than 3x (or any other threshold you might pick instead). I wonder, perhaps we could add a certainty parameter to the estimated query plan + result sizes + costs produced by the planner. That way, when we run into a planner deficiency we can basically mark the relevant portion of the query tree as a WAG, and not bother with emitting hints for it. In some situations that doesn't really matter, as the same plan would have gotten picked anyway. The hint is NOT the chosen plan was non-optimal; the hint is the query planner did not produce an accurate row count estimate for this node in the query tree. The chosen query plan may or may not be optimal -- we're merely pointing out that we chose the plan we did on shakey grounds. The hint might just as well indicate a problem with another query that happens to apply a similar predicate to the column in question. The question you really want to know about is not whether the row count estimate is close, it's whether another plan could have done better. Perhaps, but is there a reasonable way to answer the second question? -Neil ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] detecting poor query plans
On Wed, Nov 26, 2003 at 11:59:33AM -0500, Neil Conway wrote: In some situations that doesn't really matter, as the same plan would have gotten picked anyway. The hint is NOT the chosen plan was non-optimal; the hint is the query planner did not produce an accurate row count estimate for this node in the query tree. Maybe it could only be done for SeqScan and IndexScan nodes, which are probably the most common source of bad estimates related to poor statistics. -- Alvaro Herrera (alvherre[a]dcc.uchile.cl) At least to kernel hackers, who really are human, despite occasional rumors to the contrary (LWN.net) ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] detecting poor query plans
Greg Stark [EMAIL PROTECTED] writes: There's a dual to this as well. If the results were very close but the actual time taken to run the node doesn't match the cost calculated then some optimizer parameter needs to be adjusted. I was thinking about this, but I couldn't think of how to get it to work properly: (1) The optimizer's cost metric is somewhat bogus to begin with. ISTM that translating a cost of X into an expected runtime of Y msecs is definitely not trivial to do. (2) The size of a node's result relation does not depend upon anything outside of PostgreSQL, whereas the exact time it takes to produce that result relation depends on a wide collection of external factors. For example, if the system is under heavy load, queries will take longer than normal to run. Or, if the query invokes a function that happens to occasionally block waiting on some resource, the execution time of the query could be wildly unpredictable. (3) ISTM we couldn't produce a really helpful hint message, even if we somehow resolved #1 and #2 -Neil ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] detecting poor query plans
Neil Conway [EMAIL PROTECTED] writes: I was thinking about this, but I couldn't think of how to get it to work properly: (1) The optimizer's cost metric is somewhat bogus to begin with. ISTM that translating a cost of X into an expected runtime of Y msecs is definitely not trivial to do. At least for all the possible plans of a given query at a specific point in time the intention is that the cost be proportional to the execution time. the exact time it takes to produce that result relation depends on a wide collection of external factors. That's a valid point. The ms/cost factor may not be constant over time. However I think in the normal case this number will tend towards a fairly consistent value across queries and over time. It will be influenced somewhat by things like cache contention with other applications though. On further thought the real problem is that these numbers are only available when running with explain on. As shown recently on one of the lists, the cost of the repeated gettimeofday calls can be substantial. It's not really feasible to suggest running all queries with that profiling. -- greg ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] detecting poor query plans
Greg Stark [EMAIL PROTECTED] writes: At least for all the possible plans of a given query at a specific point in time the intention is that the cost be proportional to the execution time. Why is this relevant? Given a cost X at a given point in time, the system needs to derive an expected runtime Y, and compare Y with the actual runtime. I think that producing Y given an arbitrary X involves so many parameters as to be practically impossible for us to compute with any degree of accuracy. That's a valid point. The ms/cost factor may not be constant over time. However I think in the normal case this number will tend towards a fairly consistent value across queries and over time. It might be true in the normal case, but that doesn't seem very helpful to me: in general, the mapping of plan costs to execution time can vary wildly over time. Spewing hints to the log whenever the system's workload changes, a checkpoint occurs, or the system's RAID array hiccups doesn't sound like a useful feature to me. On further thought the real problem is that these numbers are only available when running with explain on. As shown recently on one of the lists, the cost of the repeated gettimeofday calls can be substantial. That sounds more like an implementation detail than the real problem to me -- I think this proposed feature has more fundamental issues. -Neil ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] detecting poor query plans
Greg Stark [EMAIL PROTECTED] writes: That's a valid point. The ms/cost factor may not be constant over time. However I think in the normal case this number will tend towards a fairly consistent value across queries and over time. It will be influenced somewhat by things like cache contention with other applications though. I think it would be interesting to collect the numbers over a long period of time and try to learn something from the averages. The real hole in Neil's original suggestion was that it assumed that comparisons based on just a single query would be meaningful enough to pester the user about. On further thought the real problem is that these numbers are only available when running with explain on. As shown recently on one of the lists, the cost of the repeated gettimeofday calls can be substantial. It's not really feasible to suggest running all queries with that profiling. Yeah. You could imagine a simplified-stats mode that only collects the total runtime (two gettimeofday's per query is nothing) and the row counts (shouldn't be impossibly expensive either, especially if we merged the needed fields into PlanState instead of requiring a separately allocated node). Not sure if that's as useful though. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] detecting poor query plans
On further thought the real problem is that these numbers are only available when running with explain on. As shown recently on one of the lists, the cost of the repeated gettimeofday calls can be substantial. It's not really feasible to suggest running all queries with that profiling. Yeah. You could imagine a simplified-stats mode that only collects the total runtime (two gettimeofday's per query is nothing) and the row counts (shouldn't be impossibly expensive either, especially if we merged the needed fields into PlanState instead of requiring a separately allocated node). Not sure if that's as useful though. How about a PGC_POSTMASTER GUC variable which tells postgres to collect details on the planner's performance and comparison to actual run times. Optionally, we could also have the executor run some/all of the possible plans (presumably only useful for SELECTs) and keep details on the performance of each. At postmaster shutdown (some other time?) a report could be produced profiling all queries. The reason I suggest this is it would have zero impact on production databases but would allow DBAs to profile their databases with real usage patterns in development environments. Gavin ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] detecting poor query plans
Gavin Sherry wrote: On further thought the real problem is that these numbers are only available when running with explain on. As shown recently on one of the lists, the cost of the repeated gettimeofday calls can be substantial. It's not really feasible to suggest running all queries with that profiling. Yeah. You could imagine a simplified-stats mode that only collects the total runtime (two gettimeofday's per query is nothing) and the row counts (shouldn't be impossibly expensive either, especially if we merged the needed fields into PlanState instead of requiring a separately allocated node). Not sure if that's as useful though. How about a PGC_POSTMASTER GUC variable which tells postgres to collect details on the planner's performance and comparison to actual run times. Optionally, we could also have the executor run some/all of the possible plans (presumably only useful for SELECTs) and keep details on the performance of each. At postmaster shutdown (some other time?) a report could be produced profiling all queries. The reason I suggest this is it would have zero impact on production databases but would allow DBAs to profile their databases with real usage patterns in development environments. Great idea. Under ideal situations, it shouldn't be needed, but things are often less than idea. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])