Re: [HACKERS] Windowing Function Patch Review - Performance Comparison.
2008/11/15 David Rowley [EMAIL PROTECTED]: Hitoshi Harada wrote: david=# explain select date,lag(date,1) over (order by date) from meter_Readings order by date; QUERY PLAN Sort (cost=1038.73..1063.74 rows=10001 width=4) Sort Key: date - Window (cost=0.00..374.27 rows=10001 width=4) - Index Scan using meter_readings_pkey on meter_readings (cost=0.00..299.27 rows=10001 width=4) (4 rows) Is the final sort still required? Is it not already sorted in the window? Oh, I forgot to mention about it. This behavior is also fixed and it works without sort on the window now. I don't remember at all why I did so and there's no comment around that but regression tests showed there is no preblem without it. Regards, -- Hitoshi Harada Fantastic news! That will speed up the few test queries I have quite a bit the sort was splitting to disk so performance was dropping quite a bit. This might be a good time to say that the hardware that I'm testing on is ultra slow. 600 Mhz Pentium III with only 28MB shared buffers. I've done some more performance tests with the patch. Realising that I really need to be comparing the performance with something else I decided to have a query process a large table with then without a windowing function just to see how much the query slows. Of course there is going to be an overhead using a tuple store. I just wanted to see how much. My results are not really very interesting, so I'll just include them in the bottom of the email for those who want to see. Thanks for your test code. I attach the result of your test to which another query is added. The added row_number() query has row buffer strategy that doesn't hold tuplestore but only scans and returns rows. So the difference between row_number(), 44 sec, and ntile(), 61 sec, cases roughly shows how much tuplestore adds overhead. I had supposed the row_number() case would be more efficient but still we can see it works as an optimization. Casting my mind back to when the patch was always doing a sort before a window with an order by even when a btree index was there, it's really come a long way. I've little idea how difficult it would be to implement better planning for the following. I suppose if it's difficult then it's probably better to wait for the patch to be commited first, or maybe it's something for 8.5. Yeah, the planner around group by, order, distinct, indexscan, and window is quite complicated. Though I think I can manage to do it if there left enough time, but at first the basic design should be qualified by core hackers. I am waiting for subsequent review and responses from Heikki and others. SELECT department, SUM(Salary), ROW_NUMBER() OVER (ORDER BY department), SUM(SUM(salary)) OVER (ORDER BY department DESC) FROM employees GROUP BY department ORDER BY department; This query performs more sorts than really is needed. I suppose the most efficient way to process it would be to process the 2nd window first then the 1st before returning the results in the same order as the 1st window. Currently the plan looks like: QUERY PLAN - Sort (cost=1.33..1.34 rows=3 width=9) Sort Key: department - Window (cost=1.25..1.31 rows=3 width=9) - Sort (cost=1.25..1.26 rows=3 width=9) Sort Key: department - Window (cost=1.17..1.23 rows=3 width=9) - Sort (cost=1.17..1.18 rows=3 width=9) Sort Key: department - HashAggregate (cost=1.10..1.15 rows=3 width=9) - Seq Scan on employees (cost=0.00..1.06 rows=6 width=9) Maybe it's possible to sort the processing order of the windows based on the ORDER BY clauses putting any that match the ORDER BY of the final results last. I've not looked into this in much detail. Currently I cannot see any scenario where it would be bad. What do you think? The patch doesn't have infrastracture to relocate each window node yet. When relocating them we must re-number wfunc-winref so it's a bit work, though possible. Also, I can imagine to fix only above case but I'm afraid without having great overall sketch of planner side-effect would come up... For intance, what if the window query is a subquery of group with sort strategy that assumes its subplan doesn't sort. Maybe in that case upper group by doesn't notice the sort key change done by window node relocating in subplan. Regards, -- Hitoshi Harada $ uname -srpio Linux 2.6.9-55.0.9.ELsmp i686 i386 GNU/Linux # cpuinfo Intel(R) Xeon(R) CPU X3210 @ 2.13GHz sample=# explain
Re: [HACKERS] Windowing Function Patch Review - Performance Comparison.
Hitoshi Harada wrote: david=# explain select date,lag(date,1) over (order by date) from meter_Readings order by date; QUERY PLAN Sort (cost=1038.73..1063.74 rows=10001 width=4) Sort Key: date - Window (cost=0.00..374.27 rows=10001 width=4) - Index Scan using meter_readings_pkey on meter_readings (cost=0.00..299.27 rows=10001 width=4) (4 rows) Is the final sort still required? Is it not already sorted in the window? Oh, I forgot to mention about it. This behavior is also fixed and it works without sort on the window now. I don't remember at all why I did so and there's no comment around that but regression tests showed there is no preblem without it. Regards, -- Hitoshi Harada Fantastic news! That will speed up the few test queries I have quite a bit the sort was splitting to disk so performance was dropping quite a bit. This might be a good time to say that the hardware that I'm testing on is ultra slow. 600 Mhz Pentium III with only 28MB shared buffers. I've done some more performance tests with the patch. Realising that I really need to be comparing the performance with something else I decided to have a query process a large table with then without a windowing function just to see how much the query slows. Of course there is going to be an overhead using a tuple store. I just wanted to see how much. My results are not really very interesting, so I'll just include them in the bottom of the email for those who want to see. Casting my mind back to when the patch was always doing a sort before a window with an order by even when a btree index was there, it's really come a long way. I've little idea how difficult it would be to implement better planning for the following. I suppose if it's difficult then it's probably better to wait for the patch to be commited first, or maybe it's something for 8.5. SELECT department, SUM(Salary), ROW_NUMBER() OVER (ORDER BY department), SUM(SUM(salary)) OVER (ORDER BY department DESC) FROM employees GROUP BY department ORDER BY department; This query performs more sorts than really is needed. I suppose the most efficient way to process it would be to process the 2nd window first then the 1st before returning the results in the same order as the 1st window. Currently the plan looks like: QUERY PLAN - Sort (cost=1.33..1.34 rows=3 width=9) Sort Key: department - Window (cost=1.25..1.31 rows=3 width=9) - Sort (cost=1.25..1.26 rows=3 width=9) Sort Key: department - Window (cost=1.17..1.23 rows=3 width=9) - Sort (cost=1.17..1.18 rows=3 width=9) Sort Key: department - HashAggregate (cost=1.10..1.15 rows=3 width=9) - Seq Scan on employees (cost=0.00..1.06 rows=6 width=9) Maybe it's possible to sort the processing order of the windows based on the ORDER BY clauses putting any that match the ORDER BY of the final results last. I've not looked into this in much detail. Currently I cannot see any scenario where it would be bad. What do you think? David CREATE TABLE bigtable ( id SERIAL NOT NULL PRIMARY KEY, timestamp TIMESTAMP NOT NULL ); -- about 383MB of data INSERT INTO bigtable (timestamp) SELECT NOW() + (CAST(RANDOM() * 10 AS INT) || ' secs')::INTERVAL FROM generate_series(1,1000); CREATE INDEX bigtable_timestamp_idx ON bigtable (timestamp); VACUUM ANALYZE bigtable; -- base line test david=# SELECT COUNT(*) FROM (select id,timestamp from bigtable order by id) t; count -- 1000 (1 row) Time: 72862.238 ms -- lag test david=# SELECT COUNT(*) FROM (select id,LAG(timestamp,1) OVER (order by id) from bigtable order by id) t; count -- 1000 (1 row) Time: 257713.350 ms david=# SELECT COUNT(*) FROM (select id,NTILE(10) OVER (order by id) from bigtable order by id) t; count -- 1000 (1 row) Time: 183131.425 ms -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Performance Comparison.
2008/11/10 David Rowley [EMAIL PROTECTED]: Hitoshi Harada wrote: I found how to do it, though it's only on the case you gave. Thinking about the planner optimization of the Window nodes (and its attached Sort nodes), we must consider the execution order of more than one node. In the test case we only take care of only one window, but there may be more window/sort node sets, which is too difficult to choose the best execution order including the downer indexscan, mergejoin in subquery and sort-based GROUP BY. So I didn't touch the complicated planner jungle. I rewrote the patch so that only the given bottom window's sort can consider indexscan. Deeper optimizations are over my capability. After more playing around with a few queries and testing some performance of larger tables. I discovered something strange in the plan for this query. david=# explain select date,lag(date,1) over (order by date) from meter_Readings order by date; QUERY PLAN Sort (cost=1038.73..1063.74 rows=10001 width=4) Sort Key: date - Window (cost=0.00..374.27 rows=10001 width=4) - Index Scan using meter_readings_pkey on meter_readings (cost=0.00..299.27 rows=10001 width=4) (4 rows) Is the final sort still required? Is it not already sorted in the window? Oh, I forgot to mention about it. This behavior is also fixed and it works without sort on the window now. I don't remember at all why I did so and there's no comment around that but regression tests showed there is no preblem without it. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Performance Comparison.
Hitoshi Harada wrote: I found how to do it, though it's only on the case you gave. Thinking about the planner optimization of the Window nodes (and its attached Sort nodes), we must consider the execution order of more than one node. In the test case we only take care of only one window, but there may be more window/sort node sets, which is too difficult to choose the best execution order including the downer indexscan, mergejoin in subquery and sort-based GROUP BY. So I didn't touch the complicated planner jungle. I rewrote the patch so that only the given bottom window's sort can consider indexscan. Deeper optimizations are over my capability. After more playing around with a few queries and testing some performance of larger tables. I discovered something strange in the plan for this query. david=# explain select date,lag(date,1) over (order by date) from meter_Readings order by date; QUERY PLAN Sort (cost=1038.73..1063.74 rows=10001 width=4) Sort Key: date - Window (cost=0.00..374.27 rows=10001 width=4) - Index Scan using meter_readings_pkey on meter_readings (cost=0.00..299.27 rows=10001 width=4) (4 rows) Is the final sort still required? Is it not already sorted in the window? The table I was testing on split the sort to disk, I would probably not have noticed otherwise. David. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Performance Comparison.
Hitoshi Harada wrote: I found how to do it, though it's only on the case you gave. Thinking about the planner optimization of the Window nodes (and its attached Sort nodes), we must consider the execution order of more than one node. In the test case we only take care of only one window, but there may be more window/sort node sets, which is too difficult to choose the best execution order including the downer indexscan, mergejoin in subquery and sort-based GROUP BY. So I didn't touch the complicated planner jungle. I rewrote the patch so that only the given bottom window's sort can consider indexscan. Deeper optimizations are over my capability. Sorry for the delay on this. I've updated the benchmark results using the new patch you sent today. I did a dump and re-load of the tables, since some of the numbers are randomly generated I wouldn't want to compare them to the old results for any of the tests. This is a complete new list with the CVS head as of this morning. Test Sub Query Self Join Vladimir Windowing UOM Window over Best alt Test 1 504 N/A N/A 568 TPS 12.7% Test 2 340.9 425 182 450.38TPS 6.0% Test 3 1.304 8.12 1.9637.52 TPS -7.4% Test 4 422 365 195 630 TPS 49.3% Test 5 8.874 N/A 5825 31203 TPH 435.6% Test 6 251 N/A N/A 300 TPS 19.5% Only test 3 and 5 made use of the index scan, performance dropped slightly on test 3 but there's not much point in paying much attention to that since we're probably close to the cross over point between a seqscan and indexscan where the planner's decision is not as critical. Certain Self join methods I used don't implement the exact requirements I've stated at the top of the test. For example the meter reading for self join requires no days to be missed. Maybe multi window optimisation is one for 8.5's TODO I've attached the test scripts. David. Test Queries: Added Self joins test cases. Also added Vladimir's method which requires the following code to be run first: begin work; create function var_set (text,text) returns text as $$ select set_config ('public.' || $2 || pg_backend_pid(), $1, false); $$ LANGUAGE 'sql'; create function var_get (text) returns text as $$ select current_setting('public.' || $1 || pg_backend_pid()); $$ LANGUAGE 'sql'; create operator (procedure = var_set, leftarg = text, rightarg = text); create operator (procedure = var_get, rightarg = text); commit; The above code exploits the use of custom GUC variables to get the lagged value. Clever but ditry :-) -- Test 1 -- -- -- Notes: Get the name, department and salary of highest paid department --member for each department. If the two highest paid employees --in the department get the same salary show the one with the --lowest id. create table employees ( id INT primary key, name varchar(30) not null, department varchar(30) not null, salary int not null, check (salary = 0) ); insert into employees values(1,'Jeff','IT',1); insert into employees values(2,'Sam','IT',12000); insert into employees values(3,'Richard','Manager',3); insert into employees values(4,'Ian','Manager',2); insert into employees values(5,'John','IT',6); insert into employees values(6,'Matthew','Director',6); VACUUM ANALYZE employees; -- Normal way. SELECT name,department,salary FROM employees AS e WHERE id = (SELECT id FROM employees WHERE department = e.department ORDER BY salary DESC,id ASC LIMIT 1); -- above query = 498 tps -- With windowing functions. SELECT name,department,salary FROM (SELECT name, department, salary, row_number() OVER (PARTITION BY department ORDER BY salary DESC,id ASC) AS num FROM employees ) AS t WHERE num = 1; -- above query = 578 tps -- Test 2 -- Split times problem -- -- Notes: Find the interval of time between 1 timestamp and the previous -- timestamp. The previous timestamp being the one with the next lowest id -- column value. If there is no previous timestamp show the value NULL. CREATE TABLE tstest ( id SERIAL NOT NULL PRIMARY KEY, timestamp TIMESTAMP NOT NULL ); INSERT INTO tstest (timestamp) VALUES(NOW()); INSERT INTO tstest (timestamp) SELECT MAX(timestamp) + ((random() * 100)::TEXT || ' sec ')::interval FROM tstest; INSERT INTO tstest (timestamp) SELECT MAX(timestamp) + ((random() * 100)::TEXT || ' sec ')::interval FROM tstest; INSERT INTO tstest (timestamp) SELECT MAX(timestamp) + ((random() * 100)::TEXT || ' sec ')::interval FROM tstest; INSERT INTO tstest (timestamp) SELECT MAX(timestamp) + ((random() * 100)::TEXT || ' sec ')::interval FROM tstest; INSERT INTO tstest
Re: [HACKERS] Windowing Function Patch Review - Performance Comparison.
Hitoshi Harada wrote: Test 3 and 5 did not seem to make use of an index to get a sorted list of results. I disabled enable_seqscan but the planner still failed to choose index_scan. Is there any reason for this? Perhaps I'm missing something. Hitoshi, can you take a look at this? Ah, good point. Maybe it's because I haven't paid attention to choose index_scan for upper sort node. I just put the sort node whatever the downer node is, so it might be needed to sink the information down to scan choice process that we use sort node upper. Could someone point me out how to do it, or which part of the existing code would be a good guide? I know you need to wait for an answer about this, so I'd like to delay any further performance tests until that's sorted out as it should affect performance of larger tables quite a bit. I found how to do it, though it's only on the case you gave. Thinking about the planner optimization of the Window nodes (and its attached Sort nodes), we must consider the execution order of more than one node. In the test case we only take care of only one window, but there may be more window/sort node sets, which is too difficult to choose the best execution order including the downer indexscan, mergejoin in subquery and sort-based GROUP BY. So I didn't touch the complicated planner jungle. I rewrote the patch so that only the given bottom window's sort can consider indexscan. Deeper optimizations are over my capability. I've just looked into what some other implementations do. Sybase seems to do exactly what you've done. It only looks at the first window clause in the query. Oracle seems to use the index regardless to the position of the window clause. To me personally what you've done seems fine for now. Perhaps something could be done later to improve on this. Maybe someone else has ideas about how to do it? It seems quite similar to SELECT MAX(idxcol),MAX(idxcol2) where the planner often makes use of 2 indexes when available, yet this case is probably far more simple as there is always just 1 row. Costing would likely be more complex with the windowing functions version. Good work. I'll continue with more benchmarks soon. David. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Performance Comparison.
2008/11/2 David Rowley [EMAIL PROTECTED]: Hitoshi Harada Wrote: 2008/11/2 David Rowley [EMAIL PROTECTED]: Obervations: Test 3 and 5 did not seem to make use of an index to get a sorted list of results. I disabled enable_seqscan but the planner still failed to choose index_scan. Is there any reason for this? Perhaps I'm missing something. Hitoshi, can you take a look at this? Ah, good point. Maybe it's because I haven't paid attention to choose index_scan for upper sort node. I just put the sort node whatever the downer node is, so it might be needed to sink the information down to scan choice process that we use sort node upper. Could someone point me out how to do it, or which part of the existing code would be a good guide? I know you need to wait for an answer about this, so I'd like to delay any further performance tests until that's sorted out as it should affect performance of larger tables quite a bit. I found how to do it, though it's only on the case you gave. Thinking about the planner optimization of the Window nodes (and its attached Sort nodes), we must consider the execution order of more than one node. In the test case we only take care of only one window, but there may be more window/sort node sets, which is too difficult to choose the best execution order including the downer indexscan, mergejoin in subquery and sort-based GROUP BY. So I didn't touch the complicated planner jungle. I rewrote the patch so that only the given bottom window's sort can consider indexscan. Deeper optimizations are over my capability. Attach is a delta patch against the last one. Also see the git diff: http://git.postgresql.org/?p=~davidfetter/window_functions/.git;a=commitdiff;h=bbba638f721a7e1d11cb3ee6af3bc1d7d3c11aa8;hp=48b73ee574779a14a3c36d373d8544d59a5b8b46 Regards, -- Hitoshi Harada window_functions.delta.patch.20081103 Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Performance Comparison.
Hitoshi Harada Wrote: Thanks for your test. Didn't post publicly, I've also tested real problems and performed better than I thought. If you can afford it, could you add selfjoin cases? It's like: Ok, did self joins with some. I don't know if it's possible with all. Test Sub query Self join Vladimir Windowing UOMIncrease % Test 1 498.00N/A N/A 578.00Trans/Sec 16.06% Test 2 336.00424.00182.78 481.00Trans/Sec 13.44% Test 3 1.30 7.59 1.90 8.45 Trans/Sec 11.33% Test 4 424.00361.00182.00 629.00Trans/Sec 48.35% Test 5 8.89 N/A 5844.16 31052.69 Trans/Hour 431.35% Test 6 253.00N/A N/A 305.00Trans/Sec 20.55% See attached for details. The increase % column is now: Window / max ( Sub query, self join, Vladimir ) - 1 Vladimir, I've included your tests too. I understand that people will probably use this method as sometimes there is little choice to get the performance that is required. Hitoshi Harada Wrote: 2008/11/2 David Rowley [EMAIL PROTECTED]: Obervations: Test 3 and 5 did not seem to make use of an index to get a sorted list of results. I disabled enable_seqscan but the planner still failed to choose index_scan. Is there any reason for this? Perhaps I'm missing something. Hitoshi, can you take a look at this? Ah, good point. Maybe it's because I haven't paid attention to choose index_scan for upper sort node. I just put the sort node whatever the downer node is, so it might be needed to sink the information down to scan choice process that we use sort node upper. Could someone point me out how to do it, or which part of the existing code would be a good guide? I know you need to wait for an answer about this, so I'd like to delay any further performance tests until that's sorted out as it should affect performance of larger tables quite a bit. David. Test Queries: Added Self joins test cases. Also added Vladimir's method which requires the following code to be run first: begin work; create function var_set (text,text) returns text as $$ select set_config ('public.' || $2 || pg_backend_pid(), $1, false); $$ LANGUAGE 'sql'; create function var_get (text) returns text as $$ select current_setting('public.' || $1 || pg_backend_pid()); $$ LANGUAGE 'sql'; create operator (procedure = var_set, leftarg = text, rightarg = text); create operator (procedure = var_get, rightarg = text); commit; The above code exploits the use of custom GUC variables to get the lagged value. Clever but ditry :-) -- Test 1 -- -- -- Notes: Get the name, department and salary of highest paid department --member for each department. If the two highest paid employees --in the department get the same salary show the one with the --lowest id. create table employees ( id INT primary key, name varchar(30) not null, department varchar(30) not null, salary int not null, check (salary = 0) ); insert into employees values(1,'Jeff','IT',1); insert into employees values(2,'Sam','IT',12000); insert into employees values(3,'Richard','Manager',3); insert into employees values(4,'Ian','Manager',2); insert into employees values(5,'John','IT',6); insert into employees values(6,'Matthew','Director',6); VACUUM ANALYZE employees; -- Normal way. SELECT name,department,salary FROM employees AS e WHERE id = (SELECT id FROM employees WHERE department = e.department ORDER BY salary DESC,id ASC LIMIT 1); -- above query = 498 tps -- With windowing functions. SELECT name,department,salary FROM (SELECT name, department, salary, row_number() OVER (PARTITION BY department ORDER BY salary DESC,id ASC) AS num FROM employees ) AS t WHERE num = 1; -- above query = 578 tps -- Test 2 -- Split times problem -- -- Notes: Find the interval of time between 1 timestamp and the previous -- timestamp. The previous timestamp being the one with the next lowest id -- column value. If there is no previous timestamp show the value NULL. CREATE TABLE tstest ( id SERIAL NOT NULL PRIMARY KEY, timestamp TIMESTAMP NOT NULL ); INSERT INTO tstest (timestamp) VALUES(NOW()); INSERT INTO tstest (timestamp) SELECT MAX(timestamp) + ((random() * 100)::TEXT || ' sec ')::interval FROM tstest; INSERT INTO tstest (timestamp) SELECT MAX(timestamp) + ((random() * 100)::TEXT || ' sec ')::interval FROM tstest; INSERT INTO tstest (timestamp) SELECT MAX(timestamp) + ((random() * 100)::TEXT || ' sec ')::interval FROM tstest; INSERT INTO tstest (timestamp) SELECT MAX(timestamp) + ((random() * 100)::TEXT || ' sec ')::interval FROM tstest; INSERT INTO tstest (timestamp) SELECT MAX(timestamp) +
Re: [HACKERS] Windowing Function Patch Review - Performance Comparison.
Here is another way to solve big marathon without window functions (and many other kinds of windowing queries, especially those that do not specify rows preceeding etc.). It could be considered as a very dirty hack, however it could give you an insight on the performance of the windowed query with indexscan instead of seqscan. create function var_set (text,text) returns text as ' select set_config (''public.''||$2||pg_backend_pid(), $1, false); ' LANGUAGE 'sql'; create function var_get (text) returns text as ' select current_setting(''public.''||$1||pg_backend_pid()); ' LANGUAGE 'sql'; create operator (procedure = var_set, leftarg = text, rightarg = text); create operator (procedure = var_get, rightarg = text); -- init values select '''prev_time', '0''dense_rank'; -- marathon query select * from ( select (((case when time::text = 'prev_time' then *0* else *1* end)+('dense_rank')::int4)::text'dense_rank')::int4 as position, runnerid, time from big_marathon order by time ) results where position=*2* Best regards, Vladimir Sitnikov
Re: [HACKERS] Windowing Function Patch Review - Performance Comparison.
Just a small correction: there should be time::text'prev_time' for the calculations to be correct: select * from ( select (((case when time::text = 'prev_time' then *0* else *1* end)+('dense_rank')::int4)::text'dense_rank')::int4 as position, runnerid, time, time::text'prev_time' from big_marathon order by time ) results where position=*2* -- meter_readings select '' 'lag'; select date, reading::numeric-(case lag when '' then null else lag end)::numeric as used from ( select date, 'lag' as lag, reading::text 'lag' as reading from meter_readings order by date ) as t order by used asc nulls last limit *1* Best regards, Vladimir Sitnikov
Re: [HACKERS] Windowing Function Patch Review - Performance Comparison.
2008/11/2 David Rowley [EMAIL PROTECTED]: Obervations: Test 3 and 5 did not seem to make use of an index to get a sorted list of results. I disabled enable_seqscan but the planner still failed to choose index_scan. Is there any reason for this? Perhaps I'm missing something. Hitoshi, can you take a look at this? Ah, good point. Maybe it's because I haven't paid attention to choose index_scan for upper sort node. I just put the sort node whatever the downer node is, so it might be needed to sink the information down to scan choice process that we use sort node upper. Could someone point me out how to do it, or which part of the existing code would be a good guide? Tests: Please see attached file. Perhaps there were more efficient ways for certain queries, I just couldn't think of them... Please let me know if you feel I should be conducting the review in another way. Thanks for your test. Didn't post publicly, I've also tested real problems and performed better than I thought. If you can afford it, could you add selfjoin cases? It's like: -- normal SELECT t1.id, t1.grp, count(t2) + 1 AS row_number FROM t t1 INNER JOIN t t2 ON t1.grp = t2.grp AND t1.id t2.id; -- windowing SELECT id, grp, row_number() OVER (PARTITION grp ORDER BY id) FROM t; Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers