On 12/30/2015 10:30 PM, David Rowley wrote:
On 31 December 2015 at 01:24, Tomas Vondra <tomas.von...@2ndquadrant.com
<mailto:tomas.von...@2ndquadrant.com>> wrote:

Personally I'd like to see automatic plan caching occur in
PostgreSQL. There is a bit of a problem with it in regards to a query
such as: select * from t where mySkewedColumn = 1; where the value 1
appears 99% of the time. Initially we may plan to seqscan, where with
other values we'd likely prefer to index scan. I imagine with my
unique joins patch, that it could be expanded to test
baserestrictinfos to see if they contain quals with a unique index.
This knowledge could later permit plan caching to occur on queries
which are safe from having any skew in the results. It might sound
rather restrictive, but likely this would cover 99% of UPDATE and
DELETE operations in an OLTP workload, and likely a good deal of the
SELECTs too. The safety of caching could be analyzed during planning,
and a flag could be set somewhere, perhaps in PlannedStmt to mark if
the plan is safe to cache. The planner() function could initially
hash the query string and check if any cached plan exists with that
hash, if not, plan the statement, and then check if the "safeToCache"
flag is set, and if so, stick that plan into a hash table. Also plans
with no baserestrictinfos could be "safeToCache" too.

Yeah, that's what I meant by "non-trivial". I don't know a good approach to this problem, or if such a "good" approach even exists, but I'd say being able to decide whether to rebuild a plan in such cases is a "must-have" feature. Otherwise we could easily loose any gains from the more thorough optimization because of poor plans.

In other words, we'd have to come up with a way to decide whether to use the same plan as before, or try building another plan (for the same query with different parameter values). I can think of two approaches:

(1) Try to measure how "different" are the parameter values used in the
    new query, compared to the existing plan(s). This probably means
    difference in terms of probability / frequencies etc.

(2) Compute the cost of the existing plan for the new parameters. I.e.
    don't perform the whole optimization, just the costing for the
    single plan. If the costs are "close" then use the existing plan.

Of course, none of this is trivial and still may fail for some cases.

I wonder what the other databases do, or if there are papers about this topic (I'd guess there are, but I haven't looked too much).

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to