Re: [GENERAL] a JOIN on same table, but 'slided over'
Thank you All for this extensive help! BTW: google helps, once you know that the construct is called correlated subquery - there is no way to get an answer before one knows the question :) Thenx again! -R On Thu, 2007-06-28 at 23:23 +0530, Gurjeet Singh wrote: On 6/28/07, Alban Hertroys [EMAIL PROTECTED] wrote: This is called a 'correlated subquery'. Basically the subquery is performed for each record in the top query. Google gave me this: http://publib.boulder.ibm.com/infocenter/iseries/v5r3/index.jsp?topic=/sqlp/rbafycorrs.htm I think the sub-section titled Example: Correlated subquery in a WHERE Clause is appropriate to explain our query at hand. Simply put, correlated queries are like nested FOR loops of any high level programming language. 1. FOR( record R in result of outer-query ) 2. execute inner query, using any R.colname1 3. compare R.colname2 with the result of the correlated-subquery 4. produce R in output, iff the above comparison succeeded Line 2 can be treated as another FOR loop, where every record of inner-query is being processed, and comparing the local expressions with a column (or expression) that comes from outer query. The comparison in step 3 can be against any expression, with columns or against a pure constant too! For example, the following query produces the name of all the employees, who manage at least one other employee. select empno, ename from emp e1 where exists (select 1 from emp e2 where e2.mgr = e1.empno); The only thing I would add for our query is that, that the outer SELECT of our query produces a cartesian product (no join-condition between t1 and t2), but only one row from t2 qualifies for the join, since the WHERE condition is on a unique column, and the correlated subquery returns just the required value (lowest of the IDs that are greater than current t1.ID being processed). I know the above one-line-paragraph may sound a bit cryptic for someone new to correlated subqueries, but if you understand the example in the link above, then this would start making some sense. And there's probably more to find. Interestingly enough wikipedia doesn't seem to have an article on the subject. Regards, -- [EMAIL PROTECTED] [EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com 17°29'34.37N 78°30'59.76E - Hyderabad * 18°32'57.25N 73°56'25.42E - Pune Sent from my BlackLaptop device ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] a JOIN on same table, but 'slided over'
It _is_ the optimised version as you can see from the explain plans posted in the other mail, the planner shows that the cost is drastically less than the 'distinct on' version. For smaller data-sets 'distinct-on' version might seem faster, but for reasonably larger datasets, it's performance deteriorates exponentially... This is because of the Nested-loops involved in the plan... I increased your data-set to 10240 rows by executing the following query 10 times: insert into test select id+(select max(id) from test), thread, info from test; On such data-set (which is not very large by any means), the standard SQL version executes in almost a second, and on the other hand, I had to cancel the EXPLAIN ANALYZE of the 'distinct on' query after letting it run for over three minutes!!! postgres=# explain analyze postgres-# select t1.id as id, t2.id as id+1, postgres-# t1.thread as thread, t2.thread as thread+1, postgres-# t1.info as info, t2.info as info+1 postgres-# from test as t1, test as t2 postgres-# where t2.id = ( select min(id) from test as t3 where t3.id t1.id ) postgres-# order by t1.id asc; QUERY PLAN -- Sort (cost=2971.36..2996.96 rows=10240 width=24) (actual time= 1004.031..1030.116 rows=10239 loops=1) Sort Key: t1.id Sort Method: external sort Disk: 416kB - Merge Join (cost=840.48..2289.28 rows=10240 width=24) (actual time= 834.218..956.595 rows=10239 loops=1) Merge Cond: (t2.id = ((subplan))) - Index Scan using test_id_key on test t2 (cost=0.00..332.85rows=10240 width=12) (actual time= 0.060..24.503 rows=10240 loops=1) - Sort (cost=840.48..866.08 rows=10240 width=12) (actual time= 834.129..854.776 rows=10240 loops=1) Sort Key: ((subplan)) Sort Method: quicksort Memory: 928kB - Seq Scan on test t1 (cost=0.00..158.40 rows=10240 width=12)(actual time=0.196..797.752 rows=10240 loops=1) SubPlan - Result (cost=0.04..0.05 rows=1 width=0) (actual time=0.062..0.064 rows=1 loops=10240) InitPlan - Limit (cost=0.00..0.04 rows=1 width=4) (actual time=0.047..0.050 rows=1 loops=10240) - Index Scan using test_id_key on test t3 (cost=0.00..121.98 rows=3413 width=4) (actual time=0.038..0.038rows=1 loops=10240) Index Cond: (id $0) Filter: (id IS NOT NULL) Total runtime: 1052.802 ms (18 rows) Time: 1056.740 ms postgres=# explain analyze postgres-# select postgres-# distinct on (t1.id) postgres-# t1.*, t2.* postgres-# from postgres-# test t1 postgres-# join test t2 on t2.id t1.id postgres-# order by t1.id asc, t2.id asc; Cancel request sent ERROR: canceling statement due to user request postgres=# On 6/26/07, Rafal Pietrak [EMAIL PROTECTED] wrote: OK. Have tried this one looks like close to 6 times slower then the 'non-standard' phrase with 'distinct on'. On the small dataset that I've included in my original post (ten rows of data within TEST), I've run both queries through EXPLAIN ANALYSE, with the following result summary (for clearity, I've cut away the details from EXPLAIN output): ---STANDARD Total runtime: 10.660 ms ---DISTINCT-ON Total runtime: 1.479 ms --- Would there be ways to optimise the standard query to get the performance closer to the none-standard one? -R On Tue, 2007-06-26 at 18:05 +0530, Gurjeet Singh wrote: Hi Rafal, Just a note that this is not standard SQL... 'distinct on' is an extension to SQL provided by postgres. Following query utilizes the standard SQL to get the same results: selectt1.id as id, t2.id as id+1, t1.thread as thread, t2.thread as thread+1, t1.info as info, t2.info as info+1 from test as t1, test as t2 where t2.id = ( select min(id) from test as t3 where t3.id t1.id); HTH -- [EMAIL PROTECTED] [EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com 17°29'34.37N 78°30'59.76E - Hyderabad * 18°32'57.25N 73°56'25.42 E - Pune Sent from my BlackLaptop device On 6/26/07, Rafal Pietrak [EMAIL PROTECTED] wrote: Marvelous! Thenx! -R On Tue, 2007-06-26 at 10:06 +0200, hubert depesz lubaczewski wrote: On 6/26/07, Rafal Pietrak [EMAIL PROTECTED] wrote: Is there an SQL construct to get it? select distinct on (t1.id) t1.*, t2.* from test t1 join test t2 on t2.id t1.id order by t1.id asc, t2.id asc should do the trick. depesz --
Re: [GENERAL] a JOIN on same table, but 'slided over'
Gurjeet, Focusing on the standars solution, I did some 'exercises' - works fine, just learning. But the ambarasing thing is, that I looks like I really don't get it, meaning - what exactly the internal query does. I've never ever seen or used a subquery with data/params from 'upper level' query used within a subquery - any time I've written a hierarchical query (e.g. with subqueries), the relations were always hierarchical. In other words, I was always able to run an internal subquery outside of the compound query and get consistant results. With this one I cannot do that due to the 'entanglement' of t3 and t1. Postgress query plan from EXPLAIN doesn't help me here - probably I'm unable to interpret it correctly without 'a paradigm mind shift'. So, would you mind commenting a little on how exactly the t1.id influences subquery (with t3), and the result influences back the selection of t1 set? Will greatly apreciate that. -R On Tue, 2007-06-26 at 19:14 +0530, Gurjeet Singh wrote: I missed the ORDER BY clause... Here it goes: selectt1.id as id, t2.id as id+1, t1.thread as thread, t2.thread as thread+1, t1.info as info, t2.info as info+1 from test as t1, test as t2 where t2.id = ( select min(id) from test as t3 where t3.id t1.id ) order by t1.id asc; Also note that this query is much cheaper that the 'distinct on' query by more than two orders on magnitude ( 217.86 vs. 98040.67): postgres=# explain postgres-# select postgres-# distinct on (t1.id) postgres-# t1.*, t2.* postgres-# from postgres-# test t1 postgres-# join test t2 on t2.id t1.id postgres-# order by t1.id asc, t2.id asc; QUERY PLAN Unique (cost=95798.00..98040.67 rows=1160 width=80) - Sort (cost=95798.00..96919.33 rows=448533 width=80) Sort Key: t1.id, t2.id - Nested Loop (cost=0.00..13827.29 rows=448533 width=80) - Seq Scan on test t1 (cost=0.00..21.60 rows=1160 width=40) - Index Scan using test_id_key on test t2 (cost=0.00..7.06 rows=387 width=40) Index Cond: (t2.id t1.id) (7 rows) Time: 5.003 ms postgres=# explain postgres-# select t1.id as id, t2.id as id+1, postgres-# t1.thread as thread, t2.thread as thread+1, postgres-# t1.info as info, t2.info as info+1 postgres-# from test as t1, test as t2 postgres-# where t2.id = ( select min(id) from test as t3 where t3.id t1.id ) postgres-# order by t1.id asc; QUERY PLAN -- Sort (cost=214.96..217.86 rows=1160 width=80) Sort Key: t1.id - Hash Join (cost= 36.10..155.92 rows=1160 width=80) Hash Cond: ((subplan) = t2.id) - Seq Scan on test t1 (cost=0.00..21.60 rows=1160 width=40) - Hash (cost=21.60..21.60 rows=1160 width=40) - Seq Scan on test t2 (cost=0.00..21.60 rows=1160 width=40) SubPlan - Result (cost=0.13..0.14 rows=1 width=0) InitPlan - Limit (cost= 0.00..0.13 rows=1 width=4) - Index Scan using test_id_key on test t3 (cost=0.00..51.02 rows=387 width=4) Index Cond: (id $0) Filter: (id IS NOT NULL) (14 rows) Time: 4.125 ms Best regards, -- [EMAIL PROTECTED] [EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com 17°29'34.37N 78°30'59.76E - Hyderabad * 18°32'57.25N 73°56'25.42E - Pune Sent from my BlackLaptop device On 6/26/07, Gurjeet Singh [EMAIL PROTECTED] wrote: Hi Rafal, Just a note that this is not standard SQL... 'distinct on' is an extension to SQL provided by postgres. Following query utilizes the standard SQL to get the same results: selectt1.id as id, t2.id as id+1, t1.thread as thread, t2.thread as thread+1, t1.info as info, t2.info as info+1 from test as t1, test as t2 where t2.id = ( select min(id) from test as t3 where t3.id t1.id); HTH -- [EMAIL PROTECTED] [EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com 17°29'34.37N 78°30'59.76E - Hyderabad * 18°32'57.25N 73°56' 25.42 E - Pune Sent from my BlackLaptop device On 6/26/07, Rafal Pietrak [EMAIL PROTECTED] wrote: Marvelous! Thenx! -R On Tue, 2007-06-26 at 10:06 +0200, hubert depesz lubaczewski wrote:
Re: [GENERAL] a JOIN on same table, but 'slided over'
Rafal Pietrak wrote: Gurjeet, Focusing on the standars solution, I did some 'exercises' - works fine, just learning. But the ambarasing thing is, that I looks like I really don't get it, meaning - what exactly the internal query does. I've never ever seen or used a subquery with data/params from 'upper level' query used within a subquery - any time I've written a hierarchical query (e.g. with subqueries), the relations were always hierarchical. In other words, I was always able to run an internal subquery outside of the compound query and get consistant results. With this one I cannot do that due to the 'entanglement' of t3 and t1. This is called a 'correlated subquery'. Basically the subquery is performed for each record in the top query. Google gave me this: http://publib.boulder.ibm.com/infocenter/iseries/v5r3/index.jsp?topic=/sqlp/rbafycorrs.htm And there's probably more to find. Interestingly enough wikipedia doesn't seem to have an article on the subject. Postgress query plan from EXPLAIN doesn't help me here - probably I'm unable to interpret it correctly without 'a paradigm mind shift'. So, would you mind commenting a little on how exactly the t1.id influences subquery (with t3), and the result influences back the selection of t1 set? Will greatly apreciate that. -- Alban Hertroys [EMAIL PROTECTED] magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World // ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] a JOIN on same table, but 'slided over'
On 6/28/07, Alban Hertroys [EMAIL PROTECTED] wrote: This is called a 'correlated subquery'. Basically the subquery is performed for each record in the top query. Google gave me this: http://publib.boulder.ibm.com/infocenter/iseries/v5r3/index.jsp?topic=/sqlp/rbafycorrs.htm I think the sub-section titled Example: Correlated subquery in a WHERE Clause is appropriate to explain our query at hand. Simply put, correlated queries are like nested FOR loops of any high level programming language. 1. FOR( record R in result of outer-query ) 2. execute inner query, using any R.colname1 3. compare R.colname2 with the result of the correlated-subquery 4. produce R in output, iff the above comparison succeeded Line 2 can be treated as another FOR loop, where every record of inner-query is being processed, and comparing the local expressions with a column (or expression) that comes from outer query. The comparison in step 3 can be against any expression, with columns or against a pure constant too! For example, the following query produces the name of all the employees, who manage at least one other employee. select empno, ename from emp e1 where exists (select 1 from emp e2 where e2.mgr = e1.empno); The only thing I would add for our query is that, that the outer SELECT of our query produces a cartesian product (no join-condition between t1 and t2), but only one row from t2 qualifies for the join, since the WHERE condition is on a unique column, and the correlated subquery returns just the required value (lowest of the IDs that are greater than current t1.IDbeing processed). I know the above one-line-paragraph may sound a bit cryptic for someone new to correlated subqueries, but if you understand the example in the link above, then this would start making some sense. And there's probably more to find. Interestingly enough wikipedia doesn't seem to have an article on the subject. Regards, -- [EMAIL PROTECTED] [EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com 17°29'34.37N 78°30'59.76E - Hyderabad * 18°32'57.25N 73°56'25.42E - Pune Sent from my BlackLaptop device
[GENERAL] a JOIN on same table, but 'slided over'
Hi, I understand, that this is 'general SQL' question rather then 'general postgres'. But may be someone here could help me with it anyways. I have a *single* table: CREATE TABLE test (id int not null unique, thread int not null, info text); The ID, although unique, is not continues. A sample query: SELECT * from test; id | thread | info ++-- 2 |763 | A 3 |764 | B 6 | 5 | C 8 | 88946 | Cats 9 | 69315 | Eifel 10 | 96379 | G 14 | 23927 | test 1 16 | 16529 | test 2 17 |634 | test 3 20 | 63930 | batman (10 rows) - Now, I'd like to make a JOIN-ed query of that table with itself, so that I'd get rows paiwise: every row containing data from *two* rows of the original TEST table so, that those data come from rows of consequtive ID's - not neceserly (depending on the TEST table contents) continuesly consequtive. Like: SELECT * from view_of_test; id | id+X | thread | thread+X | info | info+X +--++--+---+- 2 |3 |763 | 764 | A | B 3 |6 |764 |5 | B | C 6 |8 | 5 |88946 | C | Cats 8 |9 | 88946 |69315 | Cats | Eifel 9 | 10 | 69315 |96379 | Eifel | G - Is there an SQL construct to get it? I'd apreciate any hints or sugestions. -R ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] a JOIN on same table, but 'slided over'
On 6/26/07, Rafal Pietrak [EMAIL PROTECTED] wrote: Is there an SQL construct to get it? select distinct on (t1.id) t1.*, t2.* from test t1 join test t2 on t2.id t1.id order by t1.id asc, t2.id asc should do the trick. depesz -- http://www.depesz.com/ - nowy, lepszy depesz
Re: [GENERAL] a JOIN on same table, but 'slided over'
Now, I'd like to make a JOIN-ed query of that table with itself, so that I'd get rows paiwise: every row containing data from *two* rows of the original TEST table so, that those data come from rows of consequtive ID's - not neceserly (depending on the TEST table contents) continuesly consequtive. Like: SELECT * from view_of_test; id | id+X | thread | thread+X | info | info+X +--++--+---+- 2 |3 |763 | 764 | A | B 3 |6 |764 |5 | B | C 6 |8 | 5 |88946 | C | Cats 8 |9 | 88946 |69315 | Cats | Eifel 9 | 10 | 69315 |96379 | Eifel | G - Is there an SQL construct to get it? I would use a plpgsql procedure, select all the rows ORDER BY id, keep the current and last row in a variable, and that's it. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] a JOIN on same table, but 'slided over'
Marvelous! Thenx! -R On Tue, 2007-06-26 at 10:06 +0200, hubert depesz lubaczewski wrote: On 6/26/07, Rafal Pietrak [EMAIL PROTECTED] wrote: Is there an SQL construct to get it? select distinct on (t1.id) t1.*, t2.* from test t1 join test t2 on t2.id t1.id order by t1.id asc, t2.id asc should do the trick. depesz -- http://www.depesz.com/ - nowy, lepszy depesz ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] a JOIN on same table, but 'slided over'
Hi Rafal, Just a note that this is not standard SQL... 'distinct on' is an extension to SQL provided by postgres. Following query utilizes the standard SQL to get the same results: selectt1.id as id, t2.id as id+1, t1.thread as thread, t2.thread as thread+1, t1.info as info, t2.info as info+1 from test as t1, test as t2 where t2.id = ( select min(id) from test as t3 where t3.id t1.id); HTH -- [EMAIL PROTECTED] [EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com 17°29'34.37N 78°30'59.76E - Hyderabad * 18°32'57.25N 73°56'25.42E - Pune Sent from my BlackLaptop device On 6/26/07, Rafal Pietrak [EMAIL PROTECTED] wrote: Marvelous! Thenx! -R On Tue, 2007-06-26 at 10:06 +0200, hubert depesz lubaczewski wrote: On 6/26/07, Rafal Pietrak [EMAIL PROTECTED] wrote: Is there an SQL construct to get it? select distinct on (t1.id) t1.*, t2.* from test t1 join test t2 on t2.id t1.id order by t1.id asc, t2.id asc should do the trick. depesz -- http://www.depesz.com/ - nowy, lepszy depesz ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] a JOIN on same table, but 'slided over'
OK. Have tried this one looks like close to 6 times slower then the 'non-standard' phrase with 'distinct on'. On the small dataset that I've included in my original post (ten rows of data within TEST), I've run both queries through EXPLAIN ANALYSE, with the following result summary (for clearity, I've cut away the details from EXPLAIN output): ---STANDARD Total runtime: 10.660 ms ---DISTINCT-ON Total runtime: 1.479 ms --- Would there be ways to optimise the standard query to get the performance closer to the none-standard one? -R On Tue, 2007-06-26 at 18:05 +0530, Gurjeet Singh wrote: Hi Rafal, Just a note that this is not standard SQL... 'distinct on' is an extension to SQL provided by postgres. Following query utilizes the standard SQL to get the same results: selectt1.id as id, t2.id as id+1, t1.thread as thread, t2.thread as thread+1, t1.info as info, t2.info as info+1 from test as t1, test as t2 where t2.id = ( select min(id) from test as t3 where t3.id t1.id); HTH -- [EMAIL PROTECTED] [EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com 17°29'34.37N 78°30'59.76E - Hyderabad * 18°32'57.25N 73°56'25.42 E - Pune Sent from my BlackLaptop device On 6/26/07, Rafal Pietrak [EMAIL PROTECTED] wrote: Marvelous! Thenx! -R On Tue, 2007-06-26 at 10:06 +0200, hubert depesz lubaczewski wrote: On 6/26/07, Rafal Pietrak [EMAIL PROTECTED] wrote: Is there an SQL construct to get it? select distinct on (t1.id) t1.*, t2.* from test t1 join test t2 on t2.id t1.id order by t1.id asc, t2.id asc should do the trick. depesz -- http://www.depesz.com/ - nowy, lepszy depesz ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] a JOIN on same table, but 'slided over'
I missed the ORDER BY clause... Here it goes: selectt1.id as id, t2.id as id+1, t1.thread as thread, t2.thread as thread+1, t1.info as info, t2.info as info+1 from test as t1, test as t2 where t2.id = ( select min(id) from test as t3 where t3.id t1.id ) order by t1.id asc; Also note that this query is much cheaper that the 'distinct on' query by more than two orders on magnitude (217.86 vs. 98040.67): postgres=# explain postgres-# select postgres-# distinct on (t1.id) postgres-# t1.*, t2.* postgres-# from postgres-# test t1 postgres-# join test t2 on t2.id t1.id postgres-# order by t1.id asc, t2.id asc; QUERY PLAN Unique (cost=95798.00..98040.67 rows=1160 width=80) - Sort (cost=95798.00..96919.33 rows=448533 width=80) Sort Key: t1.id, t2.id - Nested Loop (cost=0.00..13827.29 rows=448533 width=80) - Seq Scan on test t1 (cost=0.00..21.60 rows=1160 width=40) - Index Scan using test_id_key on test t2 (cost=0.00..7.06rows=387 width=40) Index Cond: (t2.id t1.id) (7 rows) Time: 5.003 ms postgres=# explain postgres-# select t1.id as id, t2.id as id+1, postgres-# t1.thread as thread, t2.thread as thread+1, postgres-# t1.info as info, t2.info as info+1 postgres-# from test as t1, test as t2 postgres-# where t2.id = ( select min(id) from test as t3 where t3.id t1.id ) postgres-# order by t1.id asc; QUERY PLAN -- Sort (cost=214.96..217.86 rows=1160 width=80) Sort Key: t1.id - Hash Join (cost=36.10..155.92 rows=1160 width=80) Hash Cond: ((subplan) = t2.id) - Seq Scan on test t1 (cost=0.00..21.60 rows=1160 width=40) - Hash (cost=21.60..21.60 rows=1160 width=40) - Seq Scan on test t2 (cost=0.00..21.60 rows=1160 width=40) SubPlan - Result (cost=0.13..0.14 rows=1 width=0) InitPlan - Limit (cost=0.00..0.13 rows=1 width=4) - Index Scan using test_id_key on test t3 (cost= 0.00..51.02 rows=387 width=4) Index Cond: (id $0) Filter: (id IS NOT NULL) (14 rows) Time: 4.125 ms Best regards, -- [EMAIL PROTECTED] [EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com 17°29'34.37N 78°30'59.76E - Hyderabad * 18°32'57.25N 73°56'25.42E - Pune Sent from my BlackLaptop device On 6/26/07, Gurjeet Singh [EMAIL PROTECTED] wrote: Hi Rafal, Just a note that this is not standard SQL... 'distinct on' is an extension to SQL provided by postgres. Following query utilizes the standard SQL to get the same results: selectt1.id as id, t2.id as id+1, t1.thread as thread, t2.thread as thread+1, t1.info as info, t2.info as info+1 from test as t1, test as t2 where t2.id = ( select min(id) from test as t3 where t3.id t1.id); HTH -- [EMAIL PROTECTED] [EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com 17°29'34.37N 78°30'59.76E - Hyderabad * 18°32'57.25N 73°56'25.42 E - Pune Sent from my BlackLaptop device On 6/26/07, Rafal Pietrak [EMAIL PROTECTED] wrote: Marvelous! Thenx! -R On Tue, 2007-06-26 at 10:06 +0200, hubert depesz lubaczewski wrote: On 6/26/07, Rafal Pietrak [EMAIL PROTECTED] wrote: Is there an SQL construct to get it? select distinct on (t1.id) t1.*, t2.* from test t1 join test t2 on t2.id t1.id order by t1.id asc, t2.id asc should do the trick. depesz -- http://www.depesz.com/ - nowy, lepszy depesz ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] a JOIN on same table, but 'slided over'
Gurjeet Singh skrev: I missed the ORDER BY clause... Here it goes: selectt1.id http://t1.id as id, t2.id http://t2.id as id+1, t1.thread as thread, t2.thread as thread+1, t1.info http://t1.info as info, t2.info http://t2.info as info+1 from test as t1, test as t2 where t2.id http://t2.id = ( select min(id) from test as t3 where t3.id http://t3.id t1.id http://t1.id ) order by t1.id http://t1.id asc; Also note that this query is much cheaper that the 'distinct on' query by more than two orders on magnitude ( 217.86 vs. 98040.67): No it isn't. The estimate is much lower, but the actual times are very close: [explain of distinct on] Time: 5.003 ms [explain of correlated subquery] Time: 4.125 ms I tried on a larger table (16384 rows), and in this case the numbers are strongly in favor of the subquery. In fact, I am still waiting for the distinct on version to return ... /Nis ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] a JOIN on same table, but 'slided over'
news.gmane.org [EMAIL PROTECTED] writes: Gurjeet Singh skrev: Also note that this query is much cheaper that the 'distinct on' query by more than two orders on magnitude ( 217.86 vs. 98040.67): No it isn't. The estimate is much lower, but the actual times are very close: [explain of distinct on] Time: 5.003 ms [explain of correlated subquery] Time: 4.125 ms You're both confused: the planner estimate certainly should not be taken as gospel, but the actual runtime of an EXPLAIN (not EXPLAIN ANALYZE) only reflects planning effort. EXPLAIN ANALYZE output would be a lot more suitable to settle the question which one is faster. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] a JOIN on same table, but 'slided over'
I see. (Have actually tried it on a larger dataset - to see it for myself ... it is optimised :) Thenx again! -R On Tue, 2007-06-26 at 19:56 +0530, Gurjeet Singh wrote: It _is_ the optimised version as you can see from the explain plans posted in the other mail, the planner shows that the cost is drastically less than the 'distinct on' version. For smaller data-sets 'distinct-on' version might seem faster, but for reasonably larger datasets, it's performance deteriorates exponentially... This is because of the Nested-loops involved in the plan... I increased your data-set to 10240 rows by executing the following query 10 times: insert into test select id+(select max(id) from test), thread, info from test; On such data-set (which is not very large by any means), the standard SQL version executes in almost a second, and on the other hand, I had to cancel the EXPLAIN ANALYZE of the 'distinct on' query after letting it run for over three minutes!!! postgres=# explain analyze postgres-# select t1.id as id, t2.id as id+1, postgres-# t1.thread as thread, t2.thread as thread+1, postgres-# t1.info as info, t2.info as info+1 postgres-# from test as t1, test as t2 postgres-# where t2.id = ( select min(id) from test as t3 where t3.id t1.id ) postgres-# order by t1.id asc; QUERY PLAN -- Sort (cost=2971.36..2996.96 rows=10240 width=24) (actual time=1004.031..1030.116 rows=10239 loops=1) Sort Key: t1.id Sort Method: external sort Disk: 416kB - Merge Join (cost=840.48..2289.28 rows=10240 width=24) (actual time=834.218..956.595 rows=10239 loops=1) Merge Cond: (t2.id = ((subplan))) - Index Scan using test_id_key on test t2 (cost=0.00..332.85 rows=10240 width=12) (actual time=0.060..24.503 rows=10240 loops=1) - Sort (cost=840.48..866.08 rows=10240 width=12) (actual time=834.129..854.776 rows=10240 loops=1) Sort Key: ((subplan)) Sort Method: quicksort Memory: 928kB - Seq Scan on test t1 (cost=0.00..158.40 rows=10240 width=12)(actual time=0.196..797.752 rows=10240 loops=1) SubPlan - Result (cost= 0.04..0.05 rows=1 width=0) (actual time=0.062..0.064 rows=1 loops=10240) InitPlan - Limit (cost=0.00..0.04 rows=1 width=4) (actual time=0.047..0.050 rows=1 loops=10240) - Index Scan using test_id_key on test t3 (cost=0.00..121.98 rows=3413 width=4) (actual time= 0.038..0.038 rows=1 loops=10240) Index Cond: (id $0) Filter: (id IS NOT NULL) Total runtime: 1052.802 ms (18 rows) Time: 1056.740 ms postgres=# explain analyze postgres-# select postgres-# distinct on (t1.id) postgres-# t1.*, t2.* postgres-# from postgres-# test t1 postgres-# join test t2 on t2.id t1.id postgres-# order by t1.id asc, t2.id asc; Cancel request sent ERROR: canceling statement due to user request postgres=# On 6/26/07, Rafal Pietrak [EMAIL PROTECTED] wrote: OK. Have tried this one looks like close to 6 times slower then the 'non-standard' phrase with 'distinct on'. On the small dataset that I've included in my original post (ten rows of data within TEST), I've run both queries through EXPLAIN ANALYSE, with the following result summary (for clearity, I've cut away the details from EXPLAIN output): ---STANDARD Total runtime: 10.660 ms ---DISTINCT-ON Total runtime: 1.479 ms --- Would there be ways to optimise the standard query to get the performance closer to the none-standard one? -R On Tue, 2007-06-26 at 18:05 +0530, Gurjeet Singh wrote: Hi Rafal, Just a note that this is not standard SQL... 'distinct on' is an extension to SQL provided by postgres. Following query utilizes the standard SQL to get the same results: selectt1.id as id, t2.id as id+1, t1.thread as thread, t2.thread as thread+1, t1.info as info, t2.info as info+1 from test as t1, test as t2 where t2.id = ( select min(id) from test as t3 where t3.id t1.id); HTH -- [EMAIL PROTECTED] [EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com
Re: [GENERAL] a JOIN on same table, but 'slided over'
OK, check... test= CREATE TABLE test (id INTEGER PRIMARY KEY); test= INSERT INTO test SELECT random()*5 + n*10 FROM generate_series( 1,10 ) AS n; test= SELECT * FROM test LIMIT 10; id - 11 23 31 41 52 63 70 85 94 103 test= ANALYZE test; ANALYZE -- Self Join 1 test= EXPLAIN ANALYZE SELECT t1.id AS current_id, t2.id AS next_id FROM test t1, test t2 WHERE t2.id = ( SELECT min(id) FROM test AS t3 WHERE t3.id t1.id ) ORDER BY t1.id; QUERY PLAN --- Sort (cost=26703.19..26953.19 rows=10 width=8) (actual time=5240.392..5271.529 rows=9 loops=1) Sort Key: t1.id - Hash Join (cost=2691.00..18398.37 rows=10 width=8) (actual time=106.588..5179.737 rows=9 loops=1) Hash Cond: ((subplan) = t2.id) - Seq Scan on test t1 (cost=0.00..1441.00 rows=10 width=4) (actual time=0.013..34.782 rows=10 loops=1) - Hash (cost=1441.00..1441.00 rows=10 width=4) (actual time=106.420..106.420 rows=10 loops=1) - Seq Scan on test t2 (cost=0.00..1441.00 rows=10 width=4) (actual time=0.007..43.077 rows=10 loops=1) SubPlan - Result (cost=0.03..0.04 rows=1 width=0) (actual time=0.023..0.023 rows=1 loops=19) InitPlan - Limit (cost=0.00..0.03 rows=1 width=4) (actual time=0.021..0.022 rows=1 loops=19) - Index Scan using test_pkey on test t3 (cost=0.00..1029.59 rows=3 width=4) (actual time=0.020..0.020 rows=1 loops=19) Index Cond: (id $0) Filter: (id IS NOT NULL) Total runtime: 5295.677 ms -- Self Join 2 test= set enable_hashjoin TO 0; test= EXPLAIN ANALYZE SELECT t1.id AS current_id, t2.id AS next_id FROM test t1, test t2 WHERE t2.id = ( SELECT min(id) FROM test AS t3 WHERE t3.id t1.id ) ORDER BY t1.id; QUERY PLAN --- Sort (cost=30806.48..31056.48 rows=10 width=8) (actual time=2876.249..2903.011 rows=9 loops=1) Sort Key: t1.id - Merge Join (cost=9745.82..22501.66 rows=10 width=8) (actual time=2547.830..2820.347 rows=9 loops=1) Merge Cond: (t2.id = inner.?column2?) - Index Scan using test_pkey on test t2 (cost=0.00..2828.26 rows=10 width=4) (actual time=0.035..67.747 rows=10 loops=1) - Sort (cost=9745.82..9995.82 rows=10 width=4) (actual time=2547.779..2582.889 rows=10 loops=1) Sort Key: (subplan) - Seq Scan on test t1 (cost=0.00..1441.00 rows=10 width=4) (actual time=0.060..2487.728 rows=10 loops=1) SubPlan - Result (cost=0.03..0.04 rows=1 width=0) (actual time=0.023..0.023 rows=1 loops=10) InitPlan - Limit (cost=0.00..0.03 rows=1 width=4) (actual time=0.021..0.022 rows=1 loops=10) - Index Scan using test_pkey on test t3 (cost=0.00..1029.59 rows=3 width=4) (actual time=0.020..0.020 rows=1 loops=10) Index Cond: (id $0) Filter: (id IS NOT NULL) Total runtime: 2923.804 ms -- DISTINCT ON test= EXPLAIN SELECT DISTINCT ON (t1.id) t1.id AS current_id, t2.id AS next_id FROM test t1 JOIN test t2 ON t2.id t1.id ORDER BY t1.id, t2.id; QUERY PLAN - Unique (cost=729806679.75..746473346.41 rows=10 width=8) - Sort (cost=729806679.75..738140013.08 rows=33 width=8) Sort Key: t1.id, t2.id - Nested Loop (cost=0.00..100028973.00 rows=33 width=8) - Seq Scan on test t1 (cost=0.00..1441.00 rows=10 width=4) - Index Scan using test_pkey on test t2 (cost=0.00..583.61 rows=3 width=4) Index Cond: (t2.id t1.id) (7 lignes) This one takes much longer (I interrupted it). -- Using a function CREATE TYPE test_type AS ( current_id INTEGER, next_id INTEGER ); CREATE OR REPLACE FUNCTION testfunc( ) RETURNS SETOF test_type LANGUAGE plpgsql AS $$ DECLARE _rowtest_type; BEGIN _row.current_id = NULL; FOR _row.next_id IN SELECT id FROM test ORDER BY id LOOP
Re: [GENERAL] a JOIN on same table, but 'slided over'
On 6/26/07, Tom Lane [EMAIL PROTECTED] wrote: news.gmane.org [EMAIL PROTECTED] writes: Gurjeet Singh skrev: Also note that this query is much cheaper that the 'distinct on' query by more than two orders on magnitude ( 217.86 vs. 98040.67): No it isn't. The estimate is much lower, but the actual times are very close: [explain of distinct on] Time: 5.003 ms [explain of correlated subquery] Time: 4.125 ms You're both confused: ??? the planner estimate certainly should not be taken as gospel, true but the actual runtime of an EXPLAIN (not EXPLAIN ANALYZE) only reflects planning effort. Agree completely EXPLAIN ANALYZE output would be a lot more suitable to settle the question which one is faster. Agree again. I was using the EXPLAIN output just to make a point that optimizer thinks the query utilizing a subquery is much cheaper (and hence maybe faster) than the 'distinct on' query. In a later mail I posted the analyze o/p too... -- [EMAIL PROTECTED] [EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com 17°29'34.37N 78°30'59.76E - Hyderabad * 18°32'57.25N 73°56'25.42E - Pune Sent from my BlackLaptop device