On Sat, 2020-07-25 at 11:05 -0700, Peter Geoghegan wrote: > What worries me a bit is the sharp discontinuities when spilling with > significantly less work_mem than the "optimal" amount. For example, > with Tomas' TPC-H query (against my smaller TPC-H dataset), I find > that setting work_mem to 6MB looks like this:
... > Planned Partitions: 128 Peak Memory Usage: 6161kB Disk > Usage: 2478080kB HashAgg Batches: 128 ... > Planned Partitions: 128 Peak Memory Usage: 5393kB Disk > Usage: 2482152kB HashAgg Batches: 11456 ... > My guess that this is because the > recursive hash aggregation misbehaves in a self-similar fashion once > a > certain tipping point has been reached. It looks like it might be fairly easy to use HyperLogLog as an estimator for the recursive step. That should reduce the overpartitioning, which I believe is the cause of this discontinuity. It's not clear to me that overpartitioning is a real problem in this case -- but I think the fact that it's causing confusion is enough reason to see if we can fix it. Regards, Jeff Davis