On Tue, Mar 24, 2020 at 12:01 PM Kyotaro Horiguchi <horikyota....@gmail.com>
wrote:

> At Mon, 23 Mar 2020 21:13:48 +0800, Richard Guo <guofengli...@gmail.com>
> wrote in
> > Hi all,
> >
> > With the following statements on latest master (c81bd3b9), I find
> > negative cost for plan nodes.
> >
> > create table a (i int, j int);
> > insert into a select i%100000, i from generate_series(1,1000000)i;
> > analyze a;
> >
> > # explain select i from a group by i;
> >                            QUERY PLAN
> > -----------------------------------------------------------------
> >  HashAggregate  (cost=1300.00..-1585.82 rows=102043 width=4)
>
> Good catch!
>
> >    Group Key: i
> >    Planned Partitions: 4
> >    ->  Seq Scan on a  (cost=0.00..14425.00 rows=1000000 width=4)
> > (4 rows)
> >
> > In function cost_agg, when we add the disk costs of hash aggregation
> > that spills to disk, nbatches is calculated as 1.18 in this case. It is
> > greater than 1, so there will be spill. And the depth is calculated as
> > -1 in this case, with num_partitions being 4. I think this is where
> > thing goes wrong.
>
> The depth is the expected number of iterations of reading the relation.
>
> >       depth = ceil( log(nbatches - 1) / log(num_partitions) );
>

Yes correct.


>
> I'm not sure what the expression based on, but apparently it is wrong
> for nbatches <= 2.0. It looks like a thinko of something like this.
>
>         depth = ceil( log(nbatches) / log(num_partitions + 1) );
>

It seems to me we should use '(nbatches - 1)', without the log function.
Maybe I'm wrong.

I have sent this issue to the 'Memory-Bounded Hash Aggregation' thread.

Thanks
Richard

Reply via email to