On 2020-04-07 20:20, Jeff Davis wrote:
Now that we have Disk-based Hash Aggregation, there are a lot more
situations where the planner can choose HashAgg. The
enable_hashagg_disk GUC, if set to true, chooses HashAgg based on
costing. If false, it only generates a HashAgg path if it thinks it
will fit in work_mem, similar to the old behavior (though it wlil now
spill to disk if the planner was wrong about it fitting in work_mem).
The current default is true.
I have an anecdote that might be related to this discussion.
I was running an unrelated benchmark suite. With PostgreSQL 12, one
query ran out of memory. With PostgreSQL 13, the same query instead ran
out of disk space. I bisected this to the introduction of disk-based
hash aggregation. Of course, the very point of that feature is to
eliminate the out of memory and make use of disk space instead. But
running out of disk space is likely to be a worse experience than
running out of memory. Also, while it's relatively easy to limit memory
use both in PostgreSQL and in the kernel, it is difficult or impossible
to limit disk space use in a similar way.
I don't have a solution or proposal here, I just want to mention this as
a possibility and suggest that we look out for similar experiences.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services