Hello,
I wrote article about statistical function - when I tested Joe Celko's
method, I found some problems on not unique dataset:
on distinct dataset is rule so rows here is max(hi), then there is min(lo):
create table x1 (a integer);
insert into x1 select generate_series(1,10);
postgres=# select row_number() over (order by a), row_number() over
(order by a desc) from x1;
row_number | row_number
------------+------------
10 | 1
9 | 2
8 | 3
7 | 4
6 | 5
5 | 6
4 | 7
3 | 8
2 | 9
1 | 10
(10 rows)
but on other set I got
truncate table x1;
insert into x1
values(2),(2),(3),(3),(4),(4),(5),(5),(6),(6),(6),(8),(9),(9),(10),(10);
postgres=# select row_number() over (order by a), row_number() over
(order by a desc) from x1;
row_number | row_number
------------+------------
16 | 1
15 | 2
14 | 3
11 | 4
13 | 5
12 | 6
9 | 7
10 | 8
7 | 9
8 | 10
5 | 11
6 | 12
4 | 13
3 | 14
1 | 15
2 | 16
(16 rows)
I am not sure, is this correct? When this solution is correct, then
Joe Celko's method for median calculation is buggy.
Regards
Pavel Stehule
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers