On Wed, Sep 13, 2017 at 7:00 AM, Simon Riggs <si...@2ndquadrant.com> wrote: > If we do have an option it won't be using fancy mathematical > terminology at all, it would be described in terms of its function, > e.g. recheck_on_update
+1. > Yes, I'd rather not have an option at all, just some simple code with > useful effect, like we have in many other places. I think the question we need to be able to answer is: What is the probability that an update that would otherwise be non-HOT can be made into a HOT update by performing a recheck to see whether the value has changed? It doesn't seem easy to figure that out from any of the statistics we have available today or could easily get, because it depends not only on the behavior of the expression which appears in the index definition but also on the application behavior. For example, consider a JSON blob representing a bank account. b->'balance' is likely to change most of the time, but b->'account_holder_name' only rarely. That's going to be hard for an automated system to determine. We should clearly check as many of the other criteria for a HOT update as possible before performing a recheck of this type, so that it only gets performed when it might help. For example, if column a is indexed and b->'foo' is indexed, there's no point in checking whether b->'foo' has changed if we know that a has changed. I don't know whether it would be feasible to postpone deciding whether to do a recheck until after we've figured out whether the page seems to contain enough free space to allow a HOT update. Turning non-HOT updates into HOT updates is really good, so it seems likely that the rechecks will often be worthwhile. If we avoid a HOT update in 25% of cases, that's probably easily worth the CPU overhead of a recheck assuming the function isn't something ridiculously expensive to compute; the extra CPU cost will be repaid by reduced bloat. However, if we avoid a HOT update only one time in a million, it's probably not worth the cost of recomputing the expression the other 999,999 times. I wonder where the crossover point is -- it seems like something that could be figured out by benchmarking. While I agree that it would be nice to have this be a completely automatic determination, I am not sure that will be practical. I oppose overloading some other marker (like function_cost>10000) for this; that's too magical. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers