Re: Default setting for enable_hashagg_disk

Tomas Vondra Thu, 25 Jun 2020 16:23:39 -0700

On Thu, Jun 25, 2020 at 09:42:33AM -0700, Jeff Davis wrote:

On Wed, 2020-06-24 at 12:31 -0700, Andres Freund wrote:

nodeAgg.c already treats those separately:


void
hash_agg_set_limits(double hashentrysize, uint64 input_groups, int
used_bits,
                                        Size *mem_limit, uint64
*ngroups_limit,
                                        int *num_partitions)
{
        int                     npartitions;
        Size            partition_mem;

        /* if not expected to spill, use all of work_mem */
        if (input_groups * hashentrysize < work_mem * 1024L)
        {
                if (num_partitions != NULL)
                        *num_partitions = 0;
                *mem_limit = work_mem * 1024L;
                *ngroups_limit = *mem_limit / hashentrysize;
                return;
        }


The reason this code exists is to decide how much of work_mem to set
aside for spilling (each spill partition needs an IO buffer).

The alternative would be to fix the number of partitions before
processing a batch, which didn't seem ideal. Or, we could just ignore
the memory required for IO buffers, like HashJoin.


I think the conclusion from the recent HashJoin discussions is that not
accounting for BufFiles is an issue, and we want to fix it. So repeating
that for HashAgg would be a mistake, IMHO.

Granted, this is an example where an underestimate can give an
advantage, but I don't think we want to extend the concept into other
areas.


I agree.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Default setting for enable_hashagg_disk

Reply via email to