On Tue, 2020-04-07 at 11:20 -0700, Jeff Davis wrote: > Now that we have Disk-based Hash Aggregation, there are a lot more > situations where the planner can choose HashAgg. The > enable_hashagg_disk GUC, if set to true, chooses HashAgg based on > costing. If false, it only generates a HashAgg path if it thinks it > will fit in work_mem, similar to the old behavior (though it wlil now > spill to disk if the planner was wrong about it fitting in work_mem). > The current default is true. > > I expect this to be a win in a lot of cases, obviously. But as with > any > planner change, it will be wrong sometimes. We may want to be > conservative and set the default to false depending on the experience > during beta. I'm inclined to leave it as true for now though, because > that will give us better information upon which to base any decision.
A compromise may be to multiply the disk costs for HashAgg by, e.g. a 1.5 - 2X penalty. That would make the plan changes less abrupt, and may mitigate some of the concerns about I/O patterns that Tomas raised here: https://www.postgresql.org/message-id/20200519151202.u2p2gpiawoaznsv2@development The issues were improved a lot, but it will take us a while to really tune the IO behavior as well as Sort. Regards, Jeff Davis