Re: Negative cost is seen for plan node
On Tue, Mar 24, 2020 at 12:01 PM Kyotaro Horiguchi wrote: > At Mon, 23 Mar 2020 21:13:48 +0800, Richard Guo > wrote in > > Hi all, > > > > With the following statements on latest master (c81bd3b9), I find > > negative cost for plan nodes. > > > > create table a (i int, j int); > > insert into a select i%10, i from generate_series(1,100)i; > > analyze a; > > > > # explain select i from a group by i; > >QUERY PLAN > > - > > HashAggregate (cost=1300.00..-1585.82 rows=102043 width=4) > > Good catch! > > >Group Key: i > >Planned Partitions: 4 > >-> Seq Scan on a (cost=0.00..14425.00 rows=100 width=4) > > (4 rows) > > > > In function cost_agg, when we add the disk costs of hash aggregation > > that spills to disk, nbatches is calculated as 1.18 in this case. It is > > greater than 1, so there will be spill. And the depth is calculated as > > -1 in this case, with num_partitions being 4. I think this is where > > thing goes wrong. > > The depth is the expected number of iterations of reading the relation. > > > depth = ceil( log(nbatches - 1) / log(num_partitions) ); > Yes correct. > > I'm not sure what the expression based on, but apparently it is wrong > for nbatches <= 2.0. It looks like a thinko of something like this. > > depth = ceil( log(nbatches) / log(num_partitions + 1) ); > It seems to me we should use '(nbatches - 1)', without the log function. Maybe I'm wrong. I have sent this issue to the 'Memory-Bounded Hash Aggregation' thread. Thanks Richard
Re: Negative cost is seen for plan node
At Mon, 23 Mar 2020 21:13:48 +0800, Richard Guo wrote in > Hi all, > > With the following statements on latest master (c81bd3b9), I find > negative cost for plan nodes. > > create table a (i int, j int); > insert into a select i%10, i from generate_series(1,100)i; > analyze a; > > # explain select i from a group by i; >QUERY PLAN > - > HashAggregate (cost=1300.00..-1585.82 rows=102043 width=4) Good catch! >Group Key: i >Planned Partitions: 4 >-> Seq Scan on a (cost=0.00..14425.00 rows=100 width=4) > (4 rows) > > In function cost_agg, when we add the disk costs of hash aggregation > that spills to disk, nbatches is calculated as 1.18 in this case. It is > greater than 1, so there will be spill. And the depth is calculated as > -1 in this case, with num_partitions being 4. I think this is where > thing goes wrong. The depth is the expected number of iterations of reading the relation. > depth = ceil( log(nbatches - 1) / log(num_partitions) ); I'm not sure what the expression based on, but apparently it is wrong for nbatches <= 2.0. It looks like a thinko of something like this. depth = ceil( log(nbatches) / log(num_partitions + 1) ); regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Negative cost is seen for plan node
Hi all, With the following statements on latest master (c81bd3b9), I find negative cost for plan nodes. create table a (i int, j int); insert into a select i%10, i from generate_series(1,100)i; analyze a; # explain select i from a group by i; QUERY PLAN - HashAggregate (cost=1300.00..-1585.82 rows=102043 width=4) Group Key: i Planned Partitions: 4 -> Seq Scan on a (cost=0.00..14425.00 rows=100 width=4) (4 rows) In function cost_agg, when we add the disk costs of hash aggregation that spills to disk, nbatches is calculated as 1.18 in this case. It is greater than 1, so there will be spill. And the depth is calculated as -1 in this case, with num_partitions being 4. I think this is where thing goes wrong. Thanks Richard