On Sat, 2020-07-25 at 11:05 -0700, Peter Geoghegan wrote:
> What worries me a bit is the sharp discontinuities when spilling with
> significantly less work_mem than the "optimal" amount. For example,
> with Tomas' TPC-H query (against my smaller TPC-H dataset), I find
> that setting work_mem to 6MB looks like this:

...

>          Planned Partitions: 128  Peak Memory Usage: 6161kB  Disk
> Usage: 2478080kB  HashAgg Batches: 128

...

>          Planned Partitions: 128  Peak Memory Usage: 5393kB  Disk
> Usage: 2482152kB  HashAgg Batches: 11456

...

> My guess that this is because the
> recursive hash aggregation misbehaves in a self-similar fashion once
> a
> certain tipping point has been reached.

It looks like it might be fairly easy to use HyperLogLog as an
estimator for the recursive step. That should reduce the
overpartitioning, which I believe is the cause of this discontinuity.

It's not clear to me that overpartitioning is a real problem in this
case -- but I think the fact that it's causing confusion is enough
reason to see if we can fix it.

Regards,
        Jeff Davis




Reply via email to