Hi all, I'm new to window functions. Recently I run some simple queries but surprised to find percent_rank is so slower than rank, could anybody tell me why?
The table schema: test=# \d inventory1 Table "public.inventory1" Column | Type | Modifiers ----------------------+---------+----------- inv_date_sk | integer | not null inv_item_sk | integer | not null inv_warehouse_sk | integer | not null inv_quantity_on_hand | integer | test=# \dt+ inventory1 List of relations Schema | Name | Type | Owner | Size | Description --------+------------+-------+----------+---------+------------- public | inventory1 | table | workshop | 8880 kB | The rank query result: test=# explain analyze select inv_date_sk,inv_item_sk, rank()over(partition by inv_date_sk order by inv_item_sk) from inventory1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- WindowAgg (cost=19563.99..23343.99 rows=189000 width=8) (actual time=631.947..1361.158 rows=189000 loops=1) -> Sort (cost=19563.99..20036.49 rows=189000 width=8) (actual time=631.924..771.990 rows=189000 loops=1) Sort Key: inv_date_sk, inv_item_sk Sort Method: quicksort Memory: 12218kB -> Seq Scan on inventory1 (cost=0.00..3000.00 rows=189000 width=8) (actual time=0.055..198.948 rows=189000 loops=1) Total runtime: 1500.193 ms (6 rows) The percent_rank result: test=# explain analyze select inv_date_sk,inv_item_sk, percent_rank()over(partition by inv_date_sk order by inv_item_sk) from inventory1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- WindowAgg (cost=19563.99..23343.99 rows=189000 width=8) (actual time=766.432..32924.804 rows=189000 loops=1) -> Sort (cost=19563.99..20036.49 rows=189000 width=8) (actual time=756.320..905.407 rows=189000 loops=1) Sort Key: inv_date_sk, inv_item_sk Sort Method: quicksort Memory: 12218kB -> Seq Scan on inventory1 (cost=0.00..3000.00 rows=189000 width=8) (actual time=0.102..224.607 rows=189000 loops=1) Total runtime: 33152.188 ms (6 rows) One special thing is that all the values of the partition key(inv_date_sk) are the same, that is, there is only one window partition. I find that percent_rank needs to buffer all the tuples to get the total number of rows. But why is it so expensive? I use 8.4.4. And I only increase the work_mem to 100M and leave other parameters untouched. Thanks, Li Jie