Re: [HACKERS] Windowing Function Patch Review - Performance Comparison.

2008-11-16 Thread Hitoshi Harada
2008/11/15 David Rowley [EMAIL PROTECTED]:
 Hitoshi Harada wrote:
  david=# explain select date,lag(date,1) over (order by date) from
  meter_Readings order by date;
  QUERY PLAN
  
 
  
   Sort  (cost=1038.73..1063.74 rows=10001 width=4)
Sort Key: date
-  Window  (cost=0.00..374.27 rows=10001 width=4)
  -  Index Scan using meter_readings_pkey on meter_readings
  (cost=0.00..299.27 rows=10001 width=4)
  (4 rows)
 
  Is the final sort still required? Is it not already sorted in the
 window?
 

 Oh, I forgot to mention about it. This behavior is also fixed and it
 works without sort on the window now. I don't remember at all why I
 did so and there's no comment around that but regression tests showed
 there is no preblem without it.

 Regards,

 --
 Hitoshi Harada

 Fantastic news! That will speed up the few test queries I have quite a bit
 the sort was splitting to disk so performance was dropping quite a bit. This
 might be a good time to say that the hardware that I'm testing on is ultra
 slow. 600 Mhz Pentium III with only 28MB shared buffers.

 I've done some more performance tests with the patch. Realising that I
 really need to be comparing the performance with something else I decided to
 have a query process a large table with then without a windowing function
 just to see how much the query slows. Of course there is going to be an
 overhead using a tuple store. I just wanted to see how much. My results are
 not really very interesting, so I'll just include them in the bottom of the
 email for those who want to see.

Thanks for your test code. I attach the result of your test to which
another query is added. The added row_number() query has row buffer
strategy that doesn't hold tuplestore but only scans and returns rows.
So the difference between row_number(), 44 sec, and ntile(), 61 sec,
cases roughly shows how much tuplestore adds overhead. I had supposed
the row_number() case would be more efficient but still we can see it
works as an optimization.

 Casting my mind back to when the patch was always doing a sort before a
 window with an order by even when a btree index was there, it's really come
 a long way. I've little idea how difficult it would be to implement better
 planning for the following. I suppose if it's difficult then it's probably
 better to wait for the patch to be commited first, or maybe it's something
 for 8.5.

Yeah, the planner around group by, order, distinct, indexscan, and
window is quite complicated. Though I think I can manage to do it if
there left enough time, but at first the basic design should be
qualified by core hackers. I am waiting for subsequent review and
responses from Heikki and others.

 SELECT department,
   SUM(Salary),
   ROW_NUMBER() OVER (ORDER BY department),
   SUM(SUM(salary)) OVER (ORDER BY department DESC)
 FROM employees
 GROUP BY department ORDER BY department;

 This query performs more sorts than really is needed. I suppose the most
 efficient way to process it would be to process the 2nd window first then
 the 1st before returning the results in the same order as the 1st window.

 Currently the plan looks like:

 QUERY PLAN
 
 -
  Sort  (cost=1.33..1.34 rows=3 width=9)
   Sort Key: department
   -  Window  (cost=1.25..1.31 rows=3 width=9)
 -  Sort  (cost=1.25..1.26 rows=3 width=9)
   Sort Key: department
   -  Window  (cost=1.17..1.23 rows=3 width=9)
 -  Sort  (cost=1.17..1.18 rows=3 width=9)
   Sort Key: department
   -  HashAggregate  (cost=1.10..1.15 rows=3
 width=9)
 -  Seq Scan on employees  (cost=0.00..1.06
 rows=6 width=9)


 Maybe it's possible to sort the processing order of the windows based on the
 ORDER BY clauses putting any that match the ORDER BY of the final results
 last. I've not looked into this in much detail. Currently I cannot see any
 scenario where it would be bad.

 What do you think?


The patch doesn't have infrastracture to relocate each window node
yet. When relocating them we must re-number wfunc-winref so it's a
bit work, though possible. Also, I can imagine to fix only above case
but I'm afraid without having great overall sketch of planner
side-effect would come up...  For intance, what if the window query is
a subquery of group with sort strategy that assumes its subplan
doesn't sort. Maybe in that case upper group by doesn't notice the
sort key change done by window node relocating in subplan.

Regards,


-- 
Hitoshi Harada

$ uname -srpio
Linux 2.6.9-55.0.9.ELsmp i686 i386 GNU/Linux

# cpuinfo
Intel(R) Xeon(R) CPU   X3210  @ 2.13GHz



sample=# explain 

Re: [HACKERS] Windowing Function Patch Review - Performance Comparison.

2008-11-14 Thread David Rowley
Hitoshi Harada wrote:
  david=# explain select date,lag(date,1) over (order by date) from
  meter_Readings order by date;
  QUERY PLAN
  
 
  
   Sort  (cost=1038.73..1063.74 rows=10001 width=4)
Sort Key: date
-  Window  (cost=0.00..374.27 rows=10001 width=4)
  -  Index Scan using meter_readings_pkey on meter_readings
  (cost=0.00..299.27 rows=10001 width=4)
  (4 rows)
 
  Is the final sort still required? Is it not already sorted in the
 window?
 
 
 Oh, I forgot to mention about it. This behavior is also fixed and it
 works without sort on the window now. I don't remember at all why I
 did so and there's no comment around that but regression tests showed
 there is no preblem without it.
 
 Regards,
 
 --
 Hitoshi Harada

Fantastic news! That will speed up the few test queries I have quite a bit
the sort was splitting to disk so performance was dropping quite a bit. This
might be a good time to say that the hardware that I'm testing on is ultra
slow. 600 Mhz Pentium III with only 28MB shared buffers.

I've done some more performance tests with the patch. Realising that I
really need to be comparing the performance with something else I decided to
have a query process a large table with then without a windowing function
just to see how much the query slows. Of course there is going to be an
overhead using a tuple store. I just wanted to see how much. My results are
not really very interesting, so I'll just include them in the bottom of the
email for those who want to see.

Casting my mind back to when the patch was always doing a sort before a
window with an order by even when a btree index was there, it's really come
a long way. I've little idea how difficult it would be to implement better
planning for the following. I suppose if it's difficult then it's probably
better to wait for the patch to be commited first, or maybe it's something
for 8.5.

SELECT department,
   SUM(Salary),
   ROW_NUMBER() OVER (ORDER BY department),
   SUM(SUM(salary)) OVER (ORDER BY department DESC)
FROM employees
GROUP BY department ORDER BY department;

This query performs more sorts than really is needed. I suppose the most
efficient way to process it would be to process the 2nd window first then
the 1st before returning the results in the same order as the 1st window.

Currently the plan looks like:

 QUERY PLAN

-
 Sort  (cost=1.33..1.34 rows=3 width=9)
   Sort Key: department
   -  Window  (cost=1.25..1.31 rows=3 width=9)
 -  Sort  (cost=1.25..1.26 rows=3 width=9)
   Sort Key: department
   -  Window  (cost=1.17..1.23 rows=3 width=9)
 -  Sort  (cost=1.17..1.18 rows=3 width=9)
   Sort Key: department
   -  HashAggregate  (cost=1.10..1.15 rows=3
width=9)
 -  Seq Scan on employees  (cost=0.00..1.06
rows=6 width=9)


Maybe it's possible to sort the processing order of the windows based on the
ORDER BY clauses putting any that match the ORDER BY of the final results
last. I've not looked into this in much detail. Currently I cannot see any
scenario where it would be bad.

What do you think?

David


CREATE TABLE bigtable (
  id SERIAL NOT NULL PRIMARY KEY,
  timestamp TIMESTAMP NOT NULL
);

-- about 383MB of data
INSERT INTO bigtable (timestamp)
SELECT NOW() + (CAST(RANDOM() * 10 AS INT) || ' secs')::INTERVAL
FROM generate_series(1,1000);

CREATE INDEX bigtable_timestamp_idx ON bigtable (timestamp);

VACUUM ANALYZE bigtable;

-- base line test
david=# SELECT COUNT(*) FROM (select id,timestamp from bigtable order by id)
t;
  count
--
 1000
(1 row)

Time: 72862.238 ms

-- lag test
david=# SELECT COUNT(*) FROM (select id,LAG(timestamp,1) OVER (order by id)
from bigtable order by id) t;
  count
--
 1000
(1 row)

Time: 257713.350 ms

david=# SELECT COUNT(*) FROM (select id,NTILE(10) OVER (order by id) from
bigtable order by id) t;
  count
--
 1000
(1 row)

Time: 183131.425 ms




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Windowing Function Patch Review - Performance Comparison.

2008-11-10 Thread Hitoshi Harada
2008/11/10 David Rowley [EMAIL PROTECTED]:
 Hitoshi Harada wrote:
 I found how to do it, though it's only on the case you gave. Thinking
 about the planner optimization of the Window nodes (and its attached
 Sort nodes), we must consider the execution order of more than one
 node. In the test case we only take care of only one window, but there
 may be more window/sort node sets, which is too difficult to choose
 the best execution order including the downer indexscan, mergejoin in
 subquery and sort-based GROUP BY. So I didn't touch the complicated
 planner jungle. I rewrote the patch so that only the given bottom
 window's sort can consider indexscan. Deeper optimizations are over my
 capability.

 After more playing around with a few queries and testing some performance of
 larger tables. I discovered something strange in the plan for this query.


 david=# explain select date,lag(date,1) over (order by date) from
 meter_Readings order by date;
 QUERY PLAN
 
 
  Sort  (cost=1038.73..1063.74 rows=10001 width=4)
   Sort Key: date
   -  Window  (cost=0.00..374.27 rows=10001 width=4)
 -  Index Scan using meter_readings_pkey on meter_readings
 (cost=0.00..299.27 rows=10001 width=4)
 (4 rows)

 Is the final sort still required? Is it not already sorted in the window?


Oh, I forgot to mention about it. This behavior is also fixed and it
works without sort on the window now. I don't remember at all why I
did so and there's no comment around that but regression tests showed
there is no preblem without it.

Regards,

-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Windowing Function Patch Review - Performance Comparison.

2008-11-09 Thread David Rowley
Hitoshi Harada wrote:
 I found how to do it, though it's only on the case you gave. Thinking
 about the planner optimization of the Window nodes (and its attached
 Sort nodes), we must consider the execution order of more than one
 node. In the test case we only take care of only one window, but there
 may be more window/sort node sets, which is too difficult to choose
 the best execution order including the downer indexscan, mergejoin in
 subquery and sort-based GROUP BY. So I didn't touch the complicated
 planner jungle. I rewrote the patch so that only the given bottom
 window's sort can consider indexscan. Deeper optimizations are over my
 capability.

After more playing around with a few queries and testing some performance of
larger tables. I discovered something strange in the plan for this query.


david=# explain select date,lag(date,1) over (order by date) from
meter_Readings order by date;
 QUERY PLAN


 Sort  (cost=1038.73..1063.74 rows=10001 width=4)
   Sort Key: date
   -  Window  (cost=0.00..374.27 rows=10001 width=4)
 -  Index Scan using meter_readings_pkey on meter_readings
(cost=0.00..299.27 rows=10001 width=4)
(4 rows)

Is the final sort still required? Is it not already sorted in the window?

The table I was testing on split the sort to disk, I would probably not have
noticed otherwise.

David.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Windowing Function Patch Review - Performance Comparison.

2008-11-09 Thread David Rowley
Hitoshi Harada wrote:
 I found how to do it, though it's only on the case you gave. Thinking
 about the planner optimization of the Window nodes (and its attached
 Sort nodes), we must consider the execution order of more than one
 node. In the test case we only take care of only one window, but there
 may be more window/sort node sets, which is too difficult to choose
 the best execution order including the downer indexscan, mergejoin in
 subquery and sort-based GROUP BY. So I didn't touch the complicated
 planner jungle. I rewrote the patch so that only the given bottom
 window's sort can consider indexscan. Deeper optimizations are over my
 capability.

Sorry for the delay on this. I've updated the benchmark results using the
new patch you sent today. I did a dump and re-load of the tables, since some
of the numbers are randomly generated I wouldn't want to compare them to the
old results for any of the tests. This is a complete new list with the CVS
head as of this morning.

Test   Sub Query Self Join Vladimir Windowing UOM Window over Best alt
Test 1 504   N/A   N/A  568   TPS 12.7%
Test 2 340.9 425   182  450.38TPS 6.0%
Test 3 1.304 8.12  1.9637.52  TPS -7.4%
Test 4 422   365   195  630 TPS 49.3%
Test 5 8.874 N/A   5825 31203 TPH 435.6%
Test 6 251   N/A   N/A  300   TPS 19.5%

Only test 3 and 5 made use of the index scan, performance dropped slightly
on test 3 but there's not much point in paying much attention to that since
we're probably close to the cross over point between a seqscan and indexscan
where the planner's decision is not as critical. 

Certain Self join methods I used don't implement the exact requirements I've
stated at the top of the test. For example the meter reading for self join
requires no days to be missed.

Maybe multi window optimisation is one for 8.5's TODO

I've attached the test scripts.

David.
Test Queries:


Added Self joins test cases. 
Also added Vladimir's method which requires the following code to be run first:

begin work;

create function var_set (text,text) returns text as $$
  select set_config ('public.' || $2 || pg_backend_pid(), $1, false);
$$ LANGUAGE 'sql';

create function var_get (text) returns text as $$
  select current_setting('public.' || $1 || pg_backend_pid());
$$ LANGUAGE 'sql';

create operator  (procedure = var_set, leftarg = text, rightarg = text);
create operator  (procedure = var_get, rightarg = text);

commit;

The above code exploits the use of custom GUC variables to get the lagged 
value. Clever but ditry :-)


-- Test 1 -- 
--
-- Notes: Get the name, department and salary of highest paid department
--member for each department. If the two highest paid employees
--in the department get the same salary show the one with the
--lowest id.


create table employees (
  id INT primary key,
  name varchar(30) not null,
  department varchar(30) not null,
  salary int not null,
  check (salary = 0)
);


insert into employees values(1,'Jeff','IT',1);
insert into employees values(2,'Sam','IT',12000);

insert into employees values(3,'Richard','Manager',3);
insert into employees values(4,'Ian','Manager',2);

insert into employees values(5,'John','IT',6);
insert into employees values(6,'Matthew','Director',6);

VACUUM ANALYZE employees;


-- Normal way.

SELECT name,department,salary
FROM employees AS e
WHERE id = (SELECT id FROM employees WHERE department = e.department ORDER BY 
salary DESC,id ASC LIMIT 1);

-- above query = 498 tps


-- With windowing functions.

SELECT name,department,salary
FROM (SELECT name,
 department,
 salary,
 row_number() OVER (PARTITION BY department ORDER BY salary DESC,id 
ASC) AS num
  FROM employees
) AS t WHERE num = 1;
 
-- above query = 578 tps


-- Test 2 -- Split times problem
-- 
-- Notes: Find the interval of time between 1 timestamp and the previous
-- timestamp. The previous timestamp being the one with the next lowest id
-- column value. If there is no previous timestamp show the value NULL.

CREATE TABLE tstest (
  id SERIAL NOT NULL PRIMARY KEY,
  timestamp TIMESTAMP NOT NULL
);

INSERT INTO tstest (timestamp) VALUES(NOW());

INSERT INTO tstest (timestamp)
SELECT MAX(timestamp) + ((random() * 100)::TEXT || ' sec ')::interval
FROM tstest;

INSERT INTO tstest (timestamp)
SELECT MAX(timestamp) + ((random() * 100)::TEXT || ' sec ')::interval
FROM tstest;

INSERT INTO tstest (timestamp)
SELECT MAX(timestamp) + ((random() * 100)::TEXT || ' sec ')::interval
FROM tstest;

INSERT INTO tstest (timestamp)
SELECT MAX(timestamp) + ((random() * 100)::TEXT || ' sec ')::interval
FROM tstest;

INSERT INTO tstest 

Re: [HACKERS] Windowing Function Patch Review - Performance Comparison.

2008-11-04 Thread David Rowley
Hitoshi Harada wrote:
   Test 3 and 5 did not seem to make use of an index to get a sorted
 list
  of
   results. I disabled enable_seqscan but the planner still failed to
  choose
   index_scan. Is there any reason for this? Perhaps I'm missing
 something.
   Hitoshi, can you take a look at this?
 
  Ah, good point. Maybe it's because I haven't paid attention to choose
  index_scan for upper sort node. I just put the sort node whatever the
  downer node is, so it might be needed to sink the information down to
  scan choice process that we use sort node upper. Could someone point
  me out how to do it, or which part of the existing code would be a
  good guide?
 
  I know you need to wait for an answer about this, so I'd like to delay
 any
  further performance tests until that's sorted out as it should affect
  performance of larger tables quite a bit.
 
 
 I found how to do it, though it's only on the case you gave. Thinking
 about the planner optimization of the Window nodes (and its attached
 Sort nodes), we must consider the execution order of more than one
 node. In the test case we only take care of only one window, but there
 may be more window/sort node sets, which is too difficult to choose
 the best execution order including the downer indexscan, mergejoin in
 subquery and sort-based GROUP BY. So I didn't touch the complicated
 planner jungle. I rewrote the patch so that only the given bottom
 window's sort can consider indexscan. Deeper optimizations are over my
 capability.

I've just looked into what some other implementations do. Sybase seems to do
exactly what you've done. It only looks at the first window clause in the
query. Oracle seems to use the index regardless to the position of the
window clause. To me personally what you've done seems fine for now. Perhaps
something could be done later to improve on this. Maybe someone else has
ideas about how to do it?

It seems quite similar to SELECT MAX(idxcol),MAX(idxcol2) where the planner
often makes use of 2 indexes when available, yet this case is probably far
more simple as there is always just 1 row. Costing would likely be more
complex with the windowing functions version.

Good work.

I'll continue with more benchmarks soon.

David.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Windowing Function Patch Review - Performance Comparison.

2008-11-03 Thread Hitoshi Harada
2008/11/2 David Rowley [EMAIL PROTECTED]:
 Hitoshi Harada Wrote:
 2008/11/2 David Rowley [EMAIL PROTECTED]:
  Obervations:
 
  Test 3 and 5 did not seem to make use of an index to get a sorted list
 of
  results. I disabled enable_seqscan but the planner still failed to
 choose
  index_scan. Is there any reason for this? Perhaps I'm missing something.
  Hitoshi, can you take a look at this?

 Ah, good point. Maybe it's because I haven't paid attention to choose
 index_scan for upper sort node. I just put the sort node whatever the
 downer node is, so it might be needed to sink the information down to
 scan choice process that we use sort node upper. Could someone point
 me out how to do it, or which part of the existing code would be a
 good guide?

 I know you need to wait for an answer about this, so I'd like to delay any
 further performance tests until that's sorted out as it should affect
 performance of larger tables quite a bit.


I found how to do it, though it's only on the case you gave. Thinking
about the planner optimization of the Window nodes (and its attached
Sort nodes), we must consider the execution order of more than one
node. In the test case we only take care of only one window, but there
may be more window/sort node sets, which is too difficult to choose
the best execution order including the downer indexscan, mergejoin in
subquery and sort-based GROUP BY. So I didn't touch the complicated
planner jungle. I rewrote the patch so that only the given bottom
window's sort can consider indexscan. Deeper optimizations are over my
capability.

Attach is a delta patch against the last one. Also see the git diff:
http://git.postgresql.org/?p=~davidfetter/window_functions/.git;a=commitdiff;h=bbba638f721a7e1d11cb3ee6af3bc1d7d3c11aa8;hp=48b73ee574779a14a3c36d373d8544d59a5b8b46

Regards,



-- 
Hitoshi Harada


window_functions.delta.patch.20081103
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Windowing Function Patch Review - Performance Comparison.

2008-11-02 Thread David Rowley
Hitoshi Harada Wrote:
 Thanks for your test. Didn't post publicly, I've also tested real
 problems and performed better than I thought. If you can afford it,
 could you add selfjoin cases? It's like:

Ok, did self joins with some. I don't know if it's possible with all.

Test   Sub query Self join Vladimir Windowing UOMIncrease %
Test 1 498.00N/A   N/A  578.00Trans/Sec  16.06%
Test 2 336.00424.00182.78   481.00Trans/Sec  13.44%
Test 3 1.30  7.59  1.90 8.45  Trans/Sec  11.33%
Test 4 424.00361.00182.00   629.00Trans/Sec  48.35%
Test 5 8.89  N/A   5844.16  31052.69  Trans/Hour 431.35%
Test 6 253.00N/A   N/A  305.00Trans/Sec  20.55%

See attached for details.
The increase % column is now:

Window / max ( Sub query, self join, Vladimir ) - 1

Vladimir, I've included your tests too. I understand that people will
probably use this method as sometimes there is little choice to get the
performance that is required.

Hitoshi Harada Wrote:
 2008/11/2 David Rowley [EMAIL PROTECTED]:
  Obervations:
 
  Test 3 and 5 did not seem to make use of an index to get a sorted list
 of
  results. I disabled enable_seqscan but the planner still failed to
 choose
  index_scan. Is there any reason for this? Perhaps I'm missing something.
  Hitoshi, can you take a look at this?

 Ah, good point. Maybe it's because I haven't paid attention to choose
 index_scan for upper sort node. I just put the sort node whatever the
 downer node is, so it might be needed to sink the information down to
 scan choice process that we use sort node upper. Could someone point
 me out how to do it, or which part of the existing code would be a
 good guide?

I know you need to wait for an answer about this, so I'd like to delay any
further performance tests until that's sorted out as it should affect
performance of larger tables quite a bit.

David.

Test Queries:


Added Self joins test cases. 
Also added Vladimir's method which requires the following code to be run first:

begin work;

create function var_set (text,text) returns text as $$
  select set_config ('public.' || $2 || pg_backend_pid(), $1, false);
$$ LANGUAGE 'sql';

create function var_get (text) returns text as $$
  select current_setting('public.' || $1 || pg_backend_pid());
$$ LANGUAGE 'sql';

create operator  (procedure = var_set, leftarg = text, rightarg = text);
create operator  (procedure = var_get, rightarg = text);

commit;

The above code exploits the use of custom GUC variables to get the lagged 
value. Clever but ditry :-)


-- Test 1 -- 
--
-- Notes: Get the name, department and salary of highest paid department
--member for each department. If the two highest paid employees
--in the department get the same salary show the one with the
--lowest id.


create table employees (
  id INT primary key,
  name varchar(30) not null,
  department varchar(30) not null,
  salary int not null,
  check (salary = 0)
);


insert into employees values(1,'Jeff','IT',1);
insert into employees values(2,'Sam','IT',12000);

insert into employees values(3,'Richard','Manager',3);
insert into employees values(4,'Ian','Manager',2);

insert into employees values(5,'John','IT',6);
insert into employees values(6,'Matthew','Director',6);

VACUUM ANALYZE employees;


-- Normal way.

SELECT name,department,salary
FROM employees AS e
WHERE id = (SELECT id FROM employees WHERE department = e.department ORDER BY 
salary DESC,id ASC LIMIT 1);

-- above query = 498 tps


-- With windowing functions.

SELECT name,department,salary
FROM (SELECT name,
 department,
 salary,
 row_number() OVER (PARTITION BY department ORDER BY salary DESC,id 
ASC) AS num
  FROM employees
) AS t WHERE num = 1;
 
-- above query = 578 tps


-- Test 2 -- Split times problem
-- 
-- Notes: Find the interval of time between 1 timestamp and the previous
-- timestamp. The previous timestamp being the one with the next lowest id
-- column value. If there is no previous timestamp show the value NULL.

CREATE TABLE tstest (
  id SERIAL NOT NULL PRIMARY KEY,
  timestamp TIMESTAMP NOT NULL
);

INSERT INTO tstest (timestamp) VALUES(NOW());

INSERT INTO tstest (timestamp)
SELECT MAX(timestamp) + ((random() * 100)::TEXT || ' sec ')::interval
FROM tstest;

INSERT INTO tstest (timestamp)
SELECT MAX(timestamp) + ((random() * 100)::TEXT || ' sec ')::interval
FROM tstest;

INSERT INTO tstest (timestamp)
SELECT MAX(timestamp) + ((random() * 100)::TEXT || ' sec ')::interval
FROM tstest;

INSERT INTO tstest (timestamp)
SELECT MAX(timestamp) + ((random() * 100)::TEXT || ' sec ')::interval
FROM tstest;

INSERT INTO tstest (timestamp)
SELECT MAX(timestamp) + 

Re: [HACKERS] Windowing Function Patch Review - Performance Comparison.

2008-11-01 Thread Vladimir Sitnikov
Here is another way to solve big marathon without window functions (and
many other kinds of windowing queries, especially those that do not
specify rows preceeding etc.).

It could be considered as a very dirty hack, however it could give you an
insight on the performance of the windowed query with indexscan instead of
seqscan.

create function var_set (text,text) returns text
as
'
  select set_config (''public.''||$2||pg_backend_pid(), $1, false);
' LANGUAGE 'sql';

create function var_get (text) returns text
as
'
  select current_setting(''public.''||$1||pg_backend_pid());
' LANGUAGE 'sql';

create operator  (procedure = var_set, leftarg = text, rightarg = text);
create operator  (procedure = var_get, rightarg = text);


-- init values
select '''prev_time', '0''dense_rank';

-- marathon query
select *
  from (
 select (((case when time::text = 'prev_time' then *0* else *1*
end)+('dense_rank')::int4)::text'dense_rank')::int4 as position,
runnerid, time
   from big_marathon
  order by time
  ) results
 where position=*2*

Best regards,
Vladimir Sitnikov


Re: [HACKERS] Windowing Function Patch Review - Performance Comparison.

2008-11-01 Thread Vladimir Sitnikov
Just a small correction: there should be time::text'prev_time' for the
calculations to be correct:


select *
  from (
 select (((case when time::text = 'prev_time' then *0* else *1*
end)+('dense_rank')::int4)::text'dense_rank')::int4 as position,
runnerid, time, time::text'prev_time'
   from big_marathon
  order by time
  ) results
 where position=*2*

-- meter_readings
select ''  'lag';

select date, reading::numeric-(case lag when '' then null else lag
end)::numeric as used
  from (
select date, 'lag' as lag, reading::text 'lag' as reading
  from meter_readings
 order by date
  ) as t
 order by used asc nulls last limit *1*


Best regards,
Vladimir Sitnikov


Re: [HACKERS] Windowing Function Patch Review - Performance Comparison.

2008-11-01 Thread Hitoshi Harada
2008/11/2 David Rowley [EMAIL PROTECTED]:
 Obervations:

 Test 3 and 5 did not seem to make use of an index to get a sorted list of
 results. I disabled enable_seqscan but the planner still failed to choose
 index_scan. Is there any reason for this? Perhaps I'm missing something.
 Hitoshi, can you take a look at this?

Ah, good point. Maybe it's because I haven't paid attention to choose
index_scan for upper sort node. I just put the sort node whatever the
downer node is, so it might be needed to sink the information down to
scan choice process that we use sort node upper. Could someone point
me out how to do it, or which part of the existing code would be a
good guide?

 Tests:

 Please see attached file. Perhaps there were more efficient ways for certain
 queries, I just couldn't think of them...

 Please let me know if you feel I should be conducting the review in another
 way.

Thanks for your test. Didn't post publicly, I've also tested real
problems and performed better than I thought. If you can afford it,
could you add selfjoin cases? It's like:

-- normal
SELECT t1.id, t1.grp, count(t2) + 1 AS row_number
FROM t t1
INNER JOIN t t2 ON t1.grp = t2.grp AND t1.id  t2.id;

-- windowing
SELECT id, grp, row_number() OVER (PARTITION grp ORDER BY id)
FROM t;


Regards,

-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers