Re: [HACKERS] hash agg is slower on wide tables?

Pavel Stehule Sun, 22 Feb 2015 00:42:08 -0800

2015-02-22 9:28 GMT+01:00 Pavel Stehule <pavel.steh...@gmail.com>:

> Hi
>
> I did some benchmarks and I found some strange numbers.
>
> do $$
> begin
>   drop table if exists t1;
>   execute 'create table t1(' ||
>      array_to_string(array(select 'a' || i || ' smallint' from
> generate_series(1,30) g(i)), ',') || ')';
>   -- special column a2, a11
>   insert into t1
>     select 2008, i % 12 + 1, random()*20, random()*20, random()*20,
> random()*20, random()*20, random()*20, random()*20, random()*20,
>            case when random() < 0.9 then 1 else 0 end, random()*20,
> random()*20, random()*20, random()*20, random()*20, random()*20,
> random()*20, random()*20, random()*20,
>            random()*20, random()*20, random()*20, random()*20,
> random()*20, random()*20, random()*20, random()*20, random()*20, random()*20
>       from generate_series(1,7009728) g(i);
>   drop table if exists t2;
>   create table t2 as select a2, a11 from t1;
>   analyze t1; analyze t2;
> end;
> $$;
>
> postgres=# \dt+ t*
>                   List of relations
>  Schema | Name | Type  | Owner |  Size  | Description
> --------+------+-------+-------+--------+-------------
>  public | t1   | table | pavel | 622 MB |
>  public | t2   | table | pavel | 242 MB |
> (2 rows)
>
> postgres=# explain analyze select count(*), a2, a11 from t1 group by 3,2
> order by 3,2;
>                                                         QUERY
> PLAN
>
> ---------------------------------------------------------------------------------------------------------------------------
>  Sort  (cost=202327.03..202327.09 rows=24 width=4) (actual
> time=2609.159..2609.161 rows=24 loops=1)
>    Sort Key: a11, a2
>    Sort Method: quicksort  Memory: 26kB
>    ->  HashAggregate  (cost=202326.24..202326.48 rows=24 width=4) (actual
> time=2609.137..2609.139 rows=24 loops=1) --- grouping 1997 ms
>          Group Key: a11, a2
>          ->  Seq Scan on t1  (cost=0.00..149753.28 rows=7009728 width=4)
> (actual time=0.071..616.222 rows=7009728 loops=1)
>  Planning time: 0.138 ms
>  Execution time: 2609.247 ms
> (8 rows)
>
> postgres=# explain analyze select count(*), a2, a11 from t2 group by 3,2
> order by 3,2;
>                                                         QUERY
> PLAN
>
> ---------------------------------------------------------------------------------------------------------------------------
>  Sort  (cost=153688.03..153688.09 rows=24 width=4) (actual
> time=2100.058..2100.059 rows=24 loops=1)
>    Sort Key: a11, a2
>    Sort Method: quicksort  Memory: 26kB
>    ->  HashAggregate  (cost=153687.24..153687.48 rows=24 width=4) (actual
> time=2100.037..2100.040 rows=24 loops=1) --- grouping 1567 ms -- 25% faster
>          Group Key: a11, a2
>          ->  Seq Scan on t2  (cost=0.00..101114.28 rows=7009728 width=4)
> (actual time=0.043..532.680 rows=7009728 loops=1)
>  Planning time: 0.178 ms
>  Execution time: 2100.158 ms
> (8 rows)
>
> postgres=# \dt+ t*
>                    List of relations
>  Schema | Name | Type  | Owner |  Size   | Description
> --------+------+-------+-------+---------+-------------
>  public | t1   | table | pavel | 6225 MB |
>  public | t2   | table | pavel | 2423 MB |
> (2 rows)
>
> postgres=# explain analyze select count(*), a2, a11 from t1 group by 3,2
> order by 3,2;
>                                                            QUERY
> PLAN
>
> ---------------------------------------------------------------------------------------------------------------------------------
>  Sort  (cost=2023263.19..2023263.25 rows=24 width=4) (actual
> time=99453.272..99453.274 rows=24 loops=1)
>    Sort Key: a11, a2
>    Sort Method: quicksort  Memory: 26kB
>    ->  HashAggregate  (cost=2023262.40..2023262.64 rows=24 width=4)
> (actual time=99453.244..99453.249 rows=24 loops=1) --- 31891 ms
>          Group Key: a11, a2
>          ->  Seq Scan on t1  (cost=0.00..1497532.80 rows=70097280 width=4)
> (actual time=16.935..67562.615 rows=70097280 loops=1)
>  Planning time: 14.526 ms
>  Execution time: 99453.413 ms
> (8 rows)
>
> postgres=# explain analyze select count(*), a2, a11 from t2 group by 3,2
> order by 3,2;
>                                                           QUERY
> PLAN
>
> -------------------------------------------------------------------------------------------------------------------------------
>  Sort  (cost=1536868.33..1536868.39 rows=24 width=4) (actual
> time=20656.397..20656.399 rows=24 loops=1)
>    Sort Key: a11, a2
>    Sort Method: quicksort  Memory: 26kB
>    ->  HashAggregate  (cost=1536867.54..1536867.78 rows=24 width=4)
> (actual time=20656.375..20656.378 rows=24 loops=1) --- 15248 ms --100%
> faster
>          Group Key: a11, a2
>          ->  Seq Scan on t2  (cost=0.00..1011137.88 rows=70097288 width=4)
> (actual time=0.060..5408.205 rows=70097280 loops=1)
>  Planning time: 0.161 ms
>  Execution time: 20656.475 ms
> (8 rows)
>
> It looks like hah agg is slower when it is based on wide table about
> 25-100%. Is it - or I don't see something?
>


next query?

why we read all columns from t1?

postgres=# explain analyze verbose select count(*), a2, a11 from t1 group
by 3,2 order by 3,2;

QUERY
PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=202327.03..202327.09 rows=24 width=4) (actual
time=2585.274..2585.275 rows=24 loops=1)
   Output: (count(*)), a2, a11
   Sort Key: t1.a11, t1.a2
   Sort Method: quicksort  Memory: 26kB
   ->  HashAggregate  (cost=202326.24..202326.48 rows=24 width=4) (actual
time=2585.250..2585.256 rows=24 loops=1)
         Output: count(*), a2, a11
         Group Key: t1.a11, t1.a2
         ->  Seq Scan on public.t1  (cost=0.00..149753.28 rows=7009728
width=4) (actual time=0.018..608.238 rows=7009728 loops=1)
               Output: a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12,
a13, a14, a15, a16, a17, a18, a19, a20, a21, a22, a23, a24, a25, a26, a27,
a28, a29, a30
 Planning time: 0.128 ms
 Execution time: 2585.405 ms
(11 rows)

when I disable hash agg, then only a11 and a2 columns are processed

postgres=# set enable_hashagg to off;
SET
Time: 0.469 ms
postgres=# explain analyze verbose select count(*), a2, a11 from t1 group
by 3,2 order by 3,2;
                                                            QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=946791.84..1016889.36 rows=24 width=4) (actual
time=3136.565..4883.198 rows=24 loops=1)
   Output: count(*), a2, a11
   Group Key: t1.a11, t1.a2
   ->  Sort  (cost=946791.84..964316.16 rows=7009728 width=4) (actual
time=3120.988..3991.546 rows=7009728 loops=1)
         Output: a2, a11
         Sort Key: t1.a11, t1.a2
         Sort Method: quicksort  Memory: 525190kB
         ->  Seq Scan on public.t1  (cost=0.00..149753.28 rows=7009728
width=4) (actual time=0.021..1183.852 rows=7009728 loops=1)
               Output: a2, a11
 Planning time: 0.119 ms
 Execution time: 4932.804 ms
(11 rows)

so it looks so hashagg doesn't eliminate source columns well

Regards

Pavel


>
> Regards
>
> Pavel
>

Re: [HACKERS] hash agg is slower on wide tables?

Reply via email to