Hi all,

I'm new to window functions. Recently I run some simple queries but
surprised to find percent_rank is so slower than rank, could anybody tell me
why?

The table schema:
test=# \d inventory1
         Table "public.inventory1"
        Column        |  Type   | Modifiers
----------------------+---------+-----------
 inv_date_sk          | integer | not null
 inv_item_sk          | integer | not null
 inv_warehouse_sk     | integer | not null
 inv_quantity_on_hand | integer |

test=# \dt+ inventory1
                       List of relations
 Schema |    Name    | Type  |  Owner   |  Size   | Description
--------+------------+-------+----------+---------+-------------
 public | inventory1 | table | workshop | 8880 kB |

The rank query result:
test=# explain analyze select inv_date_sk,inv_item_sk, rank()over(partition
by inv_date_sk order by inv_item_sk) from inventory1;
                                                          QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------
 WindowAgg  (cost=19563.99..23343.99 rows=189000 width=8) (actual
time=631.947..1361.158 rows=189000 loops=1)
   ->  Sort  (cost=19563.99..20036.49 rows=189000 width=8) (actual
time=631.924..771.990 rows=189000 loops=1)
         Sort Key: inv_date_sk, inv_item_sk
         Sort Method:  quicksort  Memory: 12218kB
         ->  Seq Scan on inventory1  (cost=0.00..3000.00 rows=189000
width=8) (actual time=0.055..198.948 rows=189000 loops=1)
 Total runtime: 1500.193 ms
(6 rows)

The percent_rank result:
test=# explain analyze select inv_date_sk,inv_item_sk,
percent_rank()over(partition by inv_date_sk order by inv_item_sk) from
inventory1;
                                                          QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------
 WindowAgg  (cost=19563.99..23343.99 rows=189000 width=8) (actual
time=766.432..32924.804 rows=189000 loops=1)
   ->  Sort  (cost=19563.99..20036.49 rows=189000 width=8) (actual
time=756.320..905.407 rows=189000 loops=1)
         Sort Key: inv_date_sk, inv_item_sk
         Sort Method:  quicksort  Memory: 12218kB
         ->  Seq Scan on inventory1  (cost=0.00..3000.00 rows=189000
width=8) (actual time=0.102..224.607 rows=189000 loops=1)
 Total runtime: 33152.188 ms
(6 rows)

One special thing is that all the values of the partition key(inv_date_sk)
are the same, that is, there is only one window partition. I find that
percent_rank needs to buffer all the tuples to get the total number of rows.
But why is it so expensive?

I use 8.4.4. And I only increase the work_mem to 100M and leave other
parameters untouched.

Thanks,
Li Jie

Reply via email to