Re: [GENERAL] a JOIN on same table, but 'slided over'

2007-06-29 Thread Rafal Pietrak
Thank you All for this extensive help!

BTW: google helps, once you know that the construct is called
correlated subquery - there is no way to get an answer before one
knows the question :)

Thenx again!

-R

On Thu, 2007-06-28 at 23:23 +0530, Gurjeet Singh wrote:
 On 6/28/07, Alban Hertroys [EMAIL PROTECTED] wrote:
 
 This is called a 'correlated subquery'. Basically the subquery
 is
 performed for each record in the top query.
 
 Google gave me this:
 
 http://publib.boulder.ibm.com/infocenter/iseries/v5r3/index.jsp?topic=/sqlp/rbafycorrs.htm
 
 I think the sub-section titled Example: Correlated subquery in a
 WHERE Clause is appropriate to explain our query at hand.
 
 Simply put, correlated queries are like nested FOR loops of any high
 level programming language.
 
 1. FOR( record R in result of outer-query )
 2.   execute inner query, using any R.colname1
 3.   compare R.colname2 with the result of the correlated-subquery
 4.   produce R in output, iff the above comparison succeeded
 
 Line 2 can be treated as another FOR loop, where every record of
 inner-query is being processed, and comparing the local expressions
 with a column (or expression) that comes from outer query. 
 
 The comparison in step 3 can be against any expression, with columns
 or against a pure constant too!
 
 For example, the following query produces the name of all the
 employees, who manage at least one other employee. 
 
 select empno, ename
 from   emp e1
 where  exists (select 1
from   emp e2
where e2.mgr = e1.empno);
 
 The only thing I would add for our query is that, that the outer
 SELECT of our query produces a cartesian product (no join-condition
 between t1 and t2), but only one row from t2 qualifies for the join,
 since the WHERE condition is on a unique column, and the correlated
 subquery returns just the required value (lowest of the IDs that are
 greater than current t1.ID being processed).
 
 I know the above one-line-paragraph may sound a bit cryptic for
 someone new to correlated subqueries, but if you understand the
 example in the link above, then this would start making some sense.
 
 
 And there's probably more to find. Interestingly enough
 wikipedia
 doesn't seem to have an article on the subject.
  
 
 
 
 
 Regards,
 -- 
 [EMAIL PROTECTED]
 [EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com
 
 17°29'34.37N  78°30'59.76E - Hyderabad *
 18°32'57.25N  73°56'25.42E - Pune
 
 Sent from my BlackLaptop device

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] a JOIN on same table, but 'slided over'

2007-06-29 Thread Gurjeet Singh

   It _is_ the optimised version as you can see from the explain plans
posted in the other mail, the planner shows that the cost is drastically
less than the 'distinct on' version.

   For smaller data-sets 'distinct-on' version might seem faster, but for
reasonably larger datasets, it's performance deteriorates exponentially...
This is because of the Nested-loops involved in the plan...

   I increased your data-set to 10240 rows by executing the following query
10 times:

insert into test select id+(select max(id) from test), thread, info from
test;

   On such data-set (which is not very large by any means), the standard
SQL version executes in almost a second, and on the other hand, I had to
cancel the EXPLAIN ANALYZE of the 'distinct on' query after letting it run
for over three minutes!!!

postgres=# explain analyze
postgres-# select   t1.id as id, t2.id as id+1,
postgres-#  t1.thread as thread, t2.thread as thread+1,
postgres-#  t1.info as info, t2.info as info+1
postgres-# from test as t1, test as t2
postgres-# where t2.id = ( select min(id) from test as t3 where t3.id 
t1.id )
postgres-# order by t1.id asc;

QUERY PLAN
--
Sort  (cost=2971.36..2996.96 rows=10240 width=24) (actual time=
1004.031..1030.116 rows=10239 loops=1)
  Sort Key: t1.id
  Sort Method:  external sort  Disk: 416kB
  -  Merge Join  (cost=840.48..2289.28 rows=10240 width=24) (actual time=
834.218..956.595 rows=10239 loops=1)
Merge Cond: (t2.id = ((subplan)))
-  Index Scan using test_id_key on test t2
(cost=0.00..332.85rows=10240 width=12) (actual time=
0.060..24.503 rows=10240 loops=1)
-  Sort  (cost=840.48..866.08 rows=10240 width=12) (actual time=
834.129..854.776 rows=10240 loops=1)
  Sort Key: ((subplan))
  Sort Method:  quicksort  Memory: 928kB
  -  Seq Scan on test t1  (cost=0.00..158.40 rows=10240
width=12)(actual time=0.196..797.752 rows=10240 loops=1)
SubPlan
  -  Result  (cost=0.04..0.05 rows=1 width=0) (actual
time=0.062..0.064 rows=1 loops=10240)
InitPlan
  -  Limit  (cost=0.00..0.04 rows=1 width=4)
(actual time=0.047..0.050 rows=1 loops=10240)
-  Index Scan using test_id_key on
test t3  (cost=0.00..121.98 rows=3413 width=4) (actual
time=0.038..0.038rows=1 loops=10240)
  Index Cond: (id  $0)
  Filter: (id IS NOT NULL)
Total runtime: 1052.802 ms
(18 rows)
Time: 1056.740 ms

postgres=# explain analyze
postgres-# select
postgres-# distinct on (t1.id)
postgres-# t1.*, t2.*
postgres-# from
postgres-# test t1
postgres-# join test t2 on t2.id  t1.id
postgres-# order by t1.id asc, t2.id asc;
Cancel request sent
ERROR:  canceling statement due to user request
postgres=#



On 6/26/07, Rafal Pietrak [EMAIL PROTECTED] wrote:


OK. Have tried this one looks like close to 6 times slower then the
'non-standard' phrase with 'distinct on'.

On the small dataset that I've included in my original post (ten rows of
data within TEST), I've run both queries through EXPLAIN ANALYSE, with
the following result summary (for clearity, I've cut away the details
from EXPLAIN output):

---STANDARD
Total runtime: 10.660 ms
---DISTINCT-ON
Total runtime: 1.479 ms
---

Would there be ways to optimise the standard query to get the
performance closer to the none-standard one?


-R


On Tue, 2007-06-26 at 18:05 +0530, Gurjeet Singh wrote:
 Hi Rafal,

 Just a note that this is not standard SQL... 'distinct on' is an
 extension to SQL provided by postgres.

 Following query utilizes the standard SQL to get the same results:

 selectt1.id as id, t2.id as id+1,
 t1.thread as thread, t2.thread as thread+1,
 t1.info as info, t2.info as info+1
 from test as t1, test as t2
 where t2.id = ( select min(id) from test as t3 where t3.id  t1.id);

 HTH
 --
 [EMAIL PROTECTED]
 [EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com

 17°29'34.37N  78°30'59.76E - Hyderabad *
 18°32'57.25N  73°56'25.42 E - Pune

 Sent from my BlackLaptop device

 On 6/26/07, Rafal Pietrak [EMAIL PROTECTED] wrote:
 Marvelous! Thenx!

 -R

 On Tue, 2007-06-26 at 10:06 +0200, hubert depesz lubaczewski
 wrote:
  On 6/26/07, Rafal Pietrak [EMAIL PROTECTED] wrote:
  Is there an SQL construct to get it?
 
  select
  distinct on (t1.id)
  t1.*, t2.*
  from
  test t1
  join test t2 on t2.id  t1.id
  order by t1.id asc, t2.id asc
 
  should do the trick.
 
  depesz
 
  --
  

Re: [GENERAL] a JOIN on same table, but 'slided over'

2007-06-28 Thread Rafal Pietrak
Gurjeet,

Focusing on the standars solution, I did some 'exercises' - works fine,
just learning. 

But the ambarasing thing is, that I looks like I really don't get it,
meaning - what exactly the internal query does. I've never ever seen or
used a subquery with data/params from 'upper level' query used within a
subquery - any time I've written a hierarchical query (e.g. with
subqueries), the relations were always hierarchical. In other words, I
was always able to run an internal subquery outside of the compound
query and get consistant results. With this one I cannot do that due to
the 'entanglement' of t3 and t1.

Postgress query plan from EXPLAIN doesn't help me here - probably I'm
unable to interpret it correctly without 'a paradigm mind shift'.

So, would you mind commenting a little on how exactly the t1.id
influences subquery (with t3), and the result influences back the
selection of t1 set?

Will greatly apreciate that.

-R

On Tue, 2007-06-26 at 19:14 +0530, Gurjeet Singh wrote:
 I missed the ORDER BY clause... Here it goes:
 
 selectt1.id as id, t2.id as id+1,
 t1.thread as thread, t2.thread as thread+1,
 t1.info as info, t2.info as info+1
 from test as t1, test as t2
 where t2.id = ( select min(id) from test as t3 where t3.id  t1.id )
 order by t1.id asc;
 
 Also note that this query is much cheaper that the 'distinct on' query
 by more than two orders on magnitude ( 217.86 vs. 98040.67):
 
 postgres=# explain
 postgres-# select
 postgres-# distinct on (t1.id)
 postgres-# t1.*, t2.*
 postgres-# from
 postgres-# test t1
 postgres-# join test t2 on t2.id  t1.id
 postgres-# order by t1.id asc, t2.id asc;
QUERY PLAN 
 
  Unique  (cost=95798.00..98040.67 rows=1160 width=80)
-  Sort  (cost=95798.00..96919.33 rows=448533 width=80) 
  Sort Key: t1.id, t2.id
  -  Nested Loop  (cost=0.00..13827.29 rows=448533 width=80)
-  Seq Scan on test t1  (cost=0.00..21.60 rows=1160
 width=40)
-  Index Scan using test_id_key on test t2
 (cost=0.00..7.06 rows=387 width=40)
  Index Cond: (t2.id  t1.id)
 (7 rows)
 Time: 5.003 ms
 postgres=# explain
 postgres-# select   t1.id as id, t2.id as id+1,
 postgres-#  t1.thread as thread, t2.thread as thread+1,
 postgres-#  t1.info as info, t2.info as info+1
 postgres-# from test as t1, test as t2
 postgres-# where t2.id = ( select min(id) from test as t3 where t3.id
  t1.id )
 postgres-# order by t1.id asc;
 QUERY PLAN 
 --
  Sort  (cost=214.96..217.86 rows=1160 width=80)
Sort Key: t1.id
-  Hash Join  (cost= 36.10..155.92 rows=1160 width=80)
  Hash Cond: ((subplan) = t2.id)
  -  Seq Scan on test t1  (cost=0.00..21.60 rows=1160
 width=40)
  -  Hash  (cost=21.60..21.60 rows=1160 width=40)
-  Seq Scan on test t2  (cost=0.00..21.60 rows=1160
 width=40)
  SubPlan
-  Result  (cost=0.13..0.14 rows=1 width=0)
  InitPlan
-  Limit  (cost= 0.00..0.13 rows=1 width=4)
  -  Index Scan using test_id_key on test t3
 (cost=0.00..51.02 rows=387 width=4)
Index Cond: (id  $0)
Filter: (id IS NOT NULL) 
 (14 rows)
 Time: 4.125 ms
 
 
 Best regards,
 -- 
 [EMAIL PROTECTED]
 [EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com
 
 17°29'34.37N  78°30'59.76E - Hyderabad *
 18°32'57.25N  73°56'25.42E - Pune 
 
 Sent from my BlackLaptop device
 
 On 6/26/07, Gurjeet Singh [EMAIL PROTECTED]  wrote:
 Hi Rafal,
 
 Just a note that this is not standard SQL... 'distinct on'
 is an extension to SQL provided by postgres. 
 
 Following query utilizes the standard SQL to get the same
 results:
 
 selectt1.id as id, t2.id as id+1, 
 t1.thread as thread, t2.thread as thread+1,
 t1.info as info, t2.info as info+1
 from test as t1, test as t2
 where t2.id = ( select min(id) from test as t3 where t3.id 
 t1.id);
 
 HTH
 -- 
 [EMAIL PROTECTED]
 [EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com
 
 17°29'34.37N  78°30'59.76E - Hyderabad *
 18°32'57.25N  73°56' 25.42 E - Pune
 
 Sent from my BlackLaptop device
 
 
 On 6/26/07, Rafal Pietrak [EMAIL PROTECTED] wrote:
 Marvelous! Thenx!
 
 -R
 
 On Tue, 2007-06-26 at 10:06 +0200, hubert depesz
 lubaczewski wrote:

Re: [GENERAL] a JOIN on same table, but 'slided over'

2007-06-28 Thread Alban Hertroys
Rafal Pietrak wrote:
 Gurjeet,
 
 Focusing on the standars solution, I did some 'exercises' - works fine,
 just learning. 
 
 But the ambarasing thing is, that I looks like I really don't get it,
 meaning - what exactly the internal query does. I've never ever seen or
 used a subquery with data/params from 'upper level' query used within a
 subquery - any time I've written a hierarchical query (e.g. with
 subqueries), the relations were always hierarchical. In other words, I
 was always able to run an internal subquery outside of the compound
 query and get consistant results. With this one I cannot do that due to
 the 'entanglement' of t3 and t1.

This is called a 'correlated subquery'. Basically the subquery is
performed for each record in the top query.

Google gave me this:
http://publib.boulder.ibm.com/infocenter/iseries/v5r3/index.jsp?topic=/sqlp/rbafycorrs.htm

And there's probably more to find. Interestingly enough wikipedia
doesn't seem to have an article on the subject.

 Postgress query plan from EXPLAIN doesn't help me here - probably I'm
 unable to interpret it correctly without 'a paradigm mind shift'.
 
 So, would you mind commenting a little on how exactly the t1.id
 influences subquery (with t3), and the result influences back the
 selection of t1 set?
 
 Will greatly apreciate that.


-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] a JOIN on same table, but 'slided over'

2007-06-28 Thread Gurjeet Singh

On 6/28/07, Alban Hertroys [EMAIL PROTECTED] wrote:



This is called a 'correlated subquery'. Basically the subquery is
performed for each record in the top query.

Google gave me this:

http://publib.boulder.ibm.com/infocenter/iseries/v5r3/index.jsp?topic=/sqlp/rbafycorrs.htm



I think the sub-section titled Example: Correlated subquery in a WHERE
Clause is appropriate to explain our query at hand.

Simply put, correlated queries are like nested FOR loops of any high level
programming language.

1. FOR( record R in result of outer-query )
2.   execute inner query, using any R.colname1
3.   compare R.colname2 with the result of the correlated-subquery
4.   produce R in output, iff the above comparison succeeded

Line 2 can be treated as another FOR loop, where every record of inner-query
is being processed, and comparing the local expressions with a column (or
expression) that comes from outer query.

The comparison in step 3 can be against any expression, with columns or
against a pure constant too!

For example, the following query produces the name of all the employees, who
manage at least one other employee.

select empno, ename
from   emp e1
where  exists (select 1
  from   emp e2
  where e2.mgr = e1.empno);

The only thing I would add for our query is that, that the outer SELECT of
our query produces a cartesian product (no join-condition between t1 and
t2), but only one row from t2 qualifies for the join, since the WHERE
condition is on a unique column, and the correlated subquery returns just
the required value (lowest of the IDs that are greater than current
t1.IDbeing processed).

I know the above one-line-paragraph may sound a bit cryptic for someone new
to correlated subqueries, but if you understand the example in the link
above, then this would start making some sense.

And there's probably more to find. Interestingly enough wikipedia

doesn't seem to have an article on the subject.






Regards,
--
[EMAIL PROTECTED]
[EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com

17°29'34.37N  78°30'59.76E - Hyderabad *
18°32'57.25N  73°56'25.42E - Pune

Sent from my BlackLaptop device


[GENERAL] a JOIN on same table, but 'slided over'

2007-06-26 Thread Rafal Pietrak
Hi,

I understand, that this is 'general SQL' question rather then 'general
postgres'. But may be someone here could help me with it anyways.

I have a *single* table:

CREATE TABLE test (id int not null unique, thread int not null, info
text);

The ID, although unique, is not continues. A sample query:

SELECT * from test;
 id | thread | info 
++--
  2 |763 | A
  3 |764 | B
  6 |  5 | C
  8 |  88946 | Cats
  9 |  69315 | Eifel
 10 |  96379 | G
 14 |  23927 | test 1
 16 |  16529 | test 2
 17 |634 | test 3
 20 |  63930 | batman
(10 rows)
-

Now, I'd like to make a JOIN-ed query of that table with itself, so that
I'd get rows paiwise: every row containing data from *two* rows of the
original TEST table so, that those data come from rows of consequtive
ID's - not neceserly (depending on the TEST table contents) continuesly
consequtive. Like:

SELECT * from view_of_test;
 id | id+X | thread | thread+X | info  | info+X 
+--++--+---+-
  2 |3 |763 |  764 | A | B
  3 |6 |764 |5 | B | C
  6 |8 |  5 |88946 | C | Cats
  8 |9 |  88946 |69315 | Cats  | Eifel
  9 |   10 |  69315 |96379 | Eifel | G
-
Is there an SQL construct to get it?

I'd apreciate any hints or sugestions.

-R

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] a JOIN on same table, but 'slided over'

2007-06-26 Thread hubert depesz lubaczewski

On 6/26/07, Rafal Pietrak [EMAIL PROTECTED] wrote:


Is there an SQL construct to get it?



select
   distinct on (t1.id)
   t1.*, t2.*
from
   test t1
   join test t2 on t2.id  t1.id
order by t1.id asc, t2.id asc

should do the trick.

depesz

--
http://www.depesz.com/ - nowy, lepszy depesz


Re: [GENERAL] a JOIN on same table, but 'slided over'

2007-06-26 Thread PFC



Now, I'd like to make a JOIN-ed query of that table with itself, so that
I'd get rows paiwise: every row containing data from *two* rows of the
original TEST table so, that those data come from rows of consequtive
ID's - not neceserly (depending on the TEST table contents) continuesly
consequtive. Like:

SELECT * from view_of_test;
 id | id+X | thread | thread+X | info  | info+X
+--++--+---+-
  2 |3 |763 |  764 | A | B
  3 |6 |764 |5 | B | C
  6 |8 |  5 |88946 | C | Cats
  8 |9 |  88946 |69315 | Cats  | Eifel
  9 |   10 |  69315 |96379 | Eifel | G
-
Is there an SQL construct to get it?


	I would use a plpgsql procedure, select all the rows ORDER BY id, keep  
the current and last row in a variable, and that's it.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] a JOIN on same table, but 'slided over'

2007-06-26 Thread Rafal Pietrak
Marvelous! Thenx! 

-R

On Tue, 2007-06-26 at 10:06 +0200, hubert depesz lubaczewski wrote:
 On 6/26/07, Rafal Pietrak [EMAIL PROTECTED] wrote:
 Is there an SQL construct to get it?
 
 select
 distinct on (t1.id)
 t1.*, t2.*
 from
 test t1
 join test t2 on t2.id  t1.id
 order by t1.id asc, t2.id asc
 
 should do the trick.
 
 depesz
 
 -- 
 http://www.depesz.com/ - nowy, lepszy depesz

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] a JOIN on same table, but 'slided over'

2007-06-26 Thread Gurjeet Singh

Hi Rafal,

   Just a note that this is not standard SQL... 'distinct on' is an
extension to SQL provided by postgres.

Following query utilizes the standard SQL to get the same results:

selectt1.id as id, t2.id as id+1,
   t1.thread as thread, t2.thread as thread+1,
   t1.info as info, t2.info as info+1
from test as t1, test as t2
where t2.id = ( select min(id) from test as t3 where t3.id  t1.id);

HTH
--
[EMAIL PROTECTED]
[EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com

17°29'34.37N  78°30'59.76E - Hyderabad *
18°32'57.25N  73°56'25.42E - Pune

Sent from my BlackLaptop device

On 6/26/07, Rafal Pietrak [EMAIL PROTECTED] wrote:


Marvelous! Thenx!

-R

On Tue, 2007-06-26 at 10:06 +0200, hubert depesz lubaczewski wrote:
 On 6/26/07, Rafal Pietrak [EMAIL PROTECTED] wrote:
 Is there an SQL construct to get it?

 select
 distinct on (t1.id)
 t1.*, t2.*
 from
 test t1
 join test t2 on t2.id  t1.id
 order by t1.id asc, t2.id asc

 should do the trick.

 depesz

 --
 http://www.depesz.com/ - nowy, lepszy depesz

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/



Re: [GENERAL] a JOIN on same table, but 'slided over'

2007-06-26 Thread Rafal Pietrak
OK. Have tried this one looks like close to 6 times slower then the
'non-standard' phrase with 'distinct on'. 

On the small dataset that I've included in my original post (ten rows of
data within TEST), I've run both queries through EXPLAIN ANALYSE, with
the following result summary (for clearity, I've cut away the details
from EXPLAIN output):

---STANDARD
 Total runtime: 10.660 ms
---DISTINCT-ON
 Total runtime: 1.479 ms
---

Would there be ways to optimise the standard query to get the
performance closer to the none-standard one?


-R


On Tue, 2007-06-26 at 18:05 +0530, Gurjeet Singh wrote:
 Hi Rafal,
 
 Just a note that this is not standard SQL... 'distinct on' is an
 extension to SQL provided by postgres.
 
 Following query utilizes the standard SQL to get the same results:
 
 selectt1.id as id, t2.id as id+1,
 t1.thread as thread, t2.thread as thread+1,
 t1.info as info, t2.info as info+1
 from test as t1, test as t2
 where t2.id = ( select min(id) from test as t3 where t3.id  t1.id);
 
 HTH
 -- 
 [EMAIL PROTECTED]
 [EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com
 
 17°29'34.37N  78°30'59.76E - Hyderabad *
 18°32'57.25N  73°56'25.42 E - Pune
 
 Sent from my BlackLaptop device 
 
 On 6/26/07, Rafal Pietrak [EMAIL PROTECTED] wrote:
 Marvelous! Thenx!
 
 -R
 
 On Tue, 2007-06-26 at 10:06 +0200, hubert depesz lubaczewski
 wrote:
  On 6/26/07, Rafal Pietrak [EMAIL PROTECTED] wrote: 
  Is there an SQL construct to get it?
 
  select
  distinct on (t1.id)
  t1.*, t2.*
  from
  test t1
  join test t2 on t2.id  t1.id
  order by t1.id asc, t2.id asc
 
  should do the trick.
 
  depesz
 
  --
  http://www.depesz.com/ - nowy, lepszy depesz
 
 ---(end of
 broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org/
 

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] a JOIN on same table, but 'slided over'

2007-06-26 Thread Gurjeet Singh

I missed the ORDER BY clause... Here it goes:

selectt1.id as id, t2.id as id+1,
   t1.thread as thread, t2.thread as thread+1,
   t1.info as info, t2.info as info+1
from test as t1, test as t2
where t2.id = ( select min(id) from test as t3 where t3.id  t1.id )
order by t1.id asc;

Also note that this query is much cheaper that the 'distinct on' query by
more than two orders on magnitude (217.86 vs. 98040.67):

postgres=# explain
postgres-# select
postgres-# distinct on (t1.id)
postgres-# t1.*, t2.*
postgres-# from
postgres-# test t1
postgres-# join test t2 on t2.id  t1.id
postgres-# order by t1.id asc, t2.id asc;
  QUERY PLAN

Unique  (cost=95798.00..98040.67 rows=1160 width=80)
  -  Sort  (cost=95798.00..96919.33 rows=448533 width=80)
Sort Key: t1.id, t2.id
-  Nested Loop  (cost=0.00..13827.29 rows=448533 width=80)
  -  Seq Scan on test t1  (cost=0.00..21.60 rows=1160
width=40)
  -  Index Scan using test_id_key on test t2
(cost=0.00..7.06rows=387 width=40)
Index Cond: (t2.id  t1.id)
(7 rows)
Time: 5.003 ms
postgres=# explain
postgres-# select   t1.id as id, t2.id as id+1,
postgres-#  t1.thread as thread, t2.thread as thread+1,
postgres-#  t1.info as info, t2.info as info+1
postgres-# from test as t1, test as t2
postgres-# where t2.id = ( select min(id) from test as t3 where t3.id 
t1.id )
postgres-# order by t1.id asc;
   QUERY PLAN
--
Sort  (cost=214.96..217.86 rows=1160 width=80)
  Sort Key: t1.id
  -  Hash Join  (cost=36.10..155.92 rows=1160 width=80)
Hash Cond: ((subplan) = t2.id)
-  Seq Scan on test t1  (cost=0.00..21.60 rows=1160 width=40)
-  Hash  (cost=21.60..21.60 rows=1160 width=40)
  -  Seq Scan on test t2  (cost=0.00..21.60 rows=1160
width=40)
SubPlan
  -  Result  (cost=0.13..0.14 rows=1 width=0)
InitPlan
  -  Limit  (cost=0.00..0.13 rows=1 width=4)
-  Index Scan using test_id_key on test t3  (cost=
0.00..51.02 rows=387 width=4)
  Index Cond: (id  $0)
  Filter: (id IS NOT NULL)
(14 rows)
Time: 4.125 ms


Best regards,
--
[EMAIL PROTECTED]
[EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com

17°29'34.37N  78°30'59.76E - Hyderabad *
18°32'57.25N  73°56'25.42E - Pune

Sent from my BlackLaptop device

On 6/26/07, Gurjeet Singh [EMAIL PROTECTED] wrote:


Hi Rafal,

Just a note that this is not standard SQL... 'distinct on' is an
extension to SQL provided by postgres.

Following query utilizes the standard SQL to get the same results:

selectt1.id as id, t2.id as id+1,
t1.thread as thread, t2.thread as thread+1,
t1.info as info, t2.info as info+1
from test as t1, test as t2
where t2.id = ( select min(id) from test as t3 where t3.id  t1.id);

HTH
--
[EMAIL PROTECTED]
[EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com

17°29'34.37N  78°30'59.76E - Hyderabad *
18°32'57.25N  73°56'25.42 E - Pune

Sent from my BlackLaptop device

On 6/26/07, Rafal Pietrak [EMAIL PROTECTED] wrote:

 Marvelous! Thenx!

 -R

 On Tue, 2007-06-26 at 10:06 +0200, hubert depesz lubaczewski wrote:
  On 6/26/07, Rafal Pietrak [EMAIL PROTECTED] wrote:
  Is there an SQL construct to get it?
 
  select
  distinct on (t1.id)
  t1.*, t2.*
  from
  test t1
  join test t2 on t2.id  t1.id
  order by t1.id asc, t2.id asc
 
  should do the trick.
 
  depesz
 
  --
  http://www.depesz.com/ - nowy, lepszy depesz

 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?

http://archives.postgresql.org/





Re: [GENERAL] a JOIN on same table, but 'slided over'

2007-06-26 Thread news.gmane.org
Gurjeet Singh skrev:
 I missed the ORDER BY clause... Here it goes:
 
 selectt1.id http://t1.id as id, t2.id http://t2.id as id+1,
 t1.thread as thread, t2.thread as thread+1,
 t1.info http://t1.info as info, t2.info http://t2.info as
 info+1
 from test as t1, test as t2
 where t2.id http://t2.id = ( select min(id) from test as t3 where
 t3.id http://t3.id  t1.id http://t1.id )
 order by t1.id http://t1.id asc;
 
 Also note that this query is much cheaper that the 'distinct on' query
 by more than two orders on magnitude ( 217.86 vs. 98040.67):

No it isn't. The estimate is much lower, but the actual times are very
close:

[explain of distinct on]

 Time: 5.003 ms

[explain of correlated subquery]

 Time: 4.125 ms

I tried on a larger table (16384 rows), and in this case the numbers are
strongly in favor of  the subquery. In fact, I am still waiting for the
distinct on version to return ...

/Nis


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] a JOIN on same table, but 'slided over'

2007-06-26 Thread Tom Lane
news.gmane.org [EMAIL PROTECTED] writes:
 Gurjeet Singh skrev:
 Also note that this query is much cheaper that the 'distinct on' query
 by more than two orders on magnitude ( 217.86 vs. 98040.67):

 No it isn't. The estimate is much lower, but the actual times are very
 close:

 [explain of distinct on]
 Time: 5.003 ms

 [explain of correlated subquery]
 Time: 4.125 ms

You're both confused: the planner estimate certainly should not be taken
as gospel, but the actual runtime of an EXPLAIN (not EXPLAIN ANALYZE)
only reflects planning effort.

EXPLAIN ANALYZE output would be a lot more suitable to settle the
question which one is faster.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] a JOIN on same table, but 'slided over'

2007-06-26 Thread Rafal Pietrak
I see. (Have actually tried it on a larger dataset - to see it for
myself ... it is optimised :)

Thenx again!

-R


On Tue, 2007-06-26 at 19:56 +0530, Gurjeet Singh wrote:
 It _is_ the optimised version as you can see from the explain
 plans posted in the other mail, the planner shows that the cost is
 drastically less than the 'distinct on' version.
 
 For smaller data-sets 'distinct-on' version might seem faster, but
 for reasonably larger datasets, it's performance deteriorates
 exponentially... This is because of the Nested-loops involved in the
 plan...
 
 I increased your data-set to 10240 rows by executing the following
 query 10 times:
 
 insert into test select id+(select max(id) from test), thread, info
 from test;
 
 On such data-set (which is not very large by any means), the
 standard SQL version executes in almost a second, and on the other
 hand, I had to cancel the EXPLAIN ANALYZE of the 'distinct on' query
 after letting it run for over three minutes!!!
 
 postgres=# explain analyze
 postgres-# select   t1.id as id, t2.id as id+1,
 postgres-#  t1.thread as thread, t2.thread as thread+1,
 postgres-#  t1.info as info, t2.info as info+1
 postgres-# from test as t1, test as t2
 postgres-# where t2.id = ( select min(id) from test as t3 where t3.id
  t1.id )
 postgres-# order by t1.id asc;
 
 QUERY PLAN
 --
  
  Sort  (cost=2971.36..2996.96 rows=10240 width=24) (actual
 time=1004.031..1030.116 rows=10239 loops=1)
Sort Key: t1.id
Sort Method:  external sort  Disk: 416kB
-  Merge Join  (cost=840.48..2289.28 rows=10240 width=24) (actual
 time=834.218..956.595 rows=10239 loops=1) 
  Merge Cond: (t2.id = ((subplan)))
  -  Index Scan using test_id_key on test t2
 (cost=0.00..332.85 rows=10240 width=12) (actual time=0.060..24.503
 rows=10240 loops=1)
  -  Sort  (cost=840.48..866.08 rows=10240 width=12) (actual
 time=834.129..854.776 rows=10240 loops=1)
Sort Key: ((subplan))
Sort Method:  quicksort  Memory: 928kB
-  Seq Scan on test t1  (cost=0.00..158.40 rows=10240
 width=12)(actual time=0.196..797.752 rows=10240 loops=1)
  SubPlan
-  Result  (cost= 0.04..0.05 rows=1 width=0)
 (actual time=0.062..0.064 rows=1 loops=10240)
  InitPlan 
-  Limit  (cost=0.00..0.04 rows=1
 width=4) (actual time=0.047..0.050 rows=1 loops=10240)
  -  Index Scan using test_id_key
 on test t3  (cost=0.00..121.98 rows=3413 width=4) (actual time=
 0.038..0.038 rows=1 loops=10240)
Index Cond: (id  $0)
Filter: (id IS NOT NULL)
  Total runtime: 1052.802 ms
 (18 rows)
 Time: 1056.740 ms
 
 postgres=# explain analyze
 postgres-# select
 postgres-# distinct on (t1.id)
 postgres-# t1.*, t2.*
 postgres-# from
 postgres-# test t1
 postgres-# join test t2 on t2.id  t1.id
 postgres-# order by t1.id asc, t2.id asc;
 Cancel request sent
 ERROR:  canceling statement due to user request 
 postgres=#
 
 
 
 On 6/26/07, Rafal Pietrak [EMAIL PROTECTED] wrote:
 OK. Have tried this one looks like close to 6 times slower
 then the
 'non-standard' phrase with 'distinct on'.
 
 On the small dataset that I've included in my original post
 (ten rows of
 data within TEST), I've run both queries through EXPLAIN
 ANALYSE, with 
 the following result summary (for clearity, I've cut away the
 details
 from EXPLAIN output):
 
 ---STANDARD
 Total runtime: 10.660 ms
 ---DISTINCT-ON
 Total runtime: 1.479 ms
 --- 
 
 Would there be ways to optimise the standard query to get the
 performance closer to the none-standard one?
 
 
 -R
 
 
 On Tue, 2007-06-26 at 18:05 +0530, Gurjeet Singh wrote:
  Hi Rafal,
  
  Just a note that this is not standard SQL... 'distinct
 on' is an
  extension to SQL provided by postgres.
 
  Following query utilizes the standard SQL to get the same
 results:
  
  selectt1.id as id, t2.id as id+1,
  t1.thread as thread, t2.thread as thread+1,
  t1.info as info, t2.info as info+1
  from test as t1, test as t2
  where t2.id = ( select min(id) from test as t3 where t3.id 
 t1.id);
 
  HTH
  --
  [EMAIL PROTECTED]
  [EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com
 
  

Re: [GENERAL] a JOIN on same table, but 'slided over'

2007-06-26 Thread PFC


OK, check...

test= CREATE TABLE test (id INTEGER PRIMARY KEY);
test= INSERT INTO test SELECT random()*5 + n*10 FROM  
generate_series( 1,10 ) AS n;

test= SELECT * FROM test LIMIT 10;
  id
-
  11
  23
  31
  41
  52
  63
  70
  85
  94
 103

test= ANALYZE test;
ANALYZE

-- Self Join 1

test= EXPLAIN ANALYZE SELECT t1.id AS current_id, t2.id AS next_id
FROM test t1, test t2
WHERE t2.id = ( SELECT min(id) FROM test AS t3 WHERE t3.id  t1.id )
ORDER BY t1.id;
QUERY  
PLAN

---
 Sort  (cost=26703.19..26953.19 rows=10 width=8) (actual  
time=5240.392..5271.529 rows=9 loops=1)

   Sort Key: t1.id
   -  Hash Join  (cost=2691.00..18398.37 rows=10 width=8) (actual  
time=106.588..5179.737 rows=9 loops=1)

 Hash Cond: ((subplan) = t2.id)
 -  Seq Scan on test t1  (cost=0.00..1441.00 rows=10 width=4)  
(actual time=0.013..34.782 rows=10 loops=1)
 -  Hash  (cost=1441.00..1441.00 rows=10 width=4) (actual  
time=106.420..106.420 rows=10 loops=1)
   -  Seq Scan on test t2  (cost=0.00..1441.00 rows=10  
width=4) (actual time=0.007..43.077 rows=10 loops=1)

 SubPlan
   -  Result  (cost=0.03..0.04 rows=1 width=0) (actual  
time=0.023..0.023 rows=1 loops=19)

 InitPlan
   -  Limit  (cost=0.00..0.03 rows=1 width=4) (actual  
time=0.021..0.022 rows=1 loops=19)
 -  Index Scan using test_pkey on test t3   
(cost=0.00..1029.59 rows=3 width=4) (actual time=0.020..0.020 rows=1  
loops=19)

   Index Cond: (id  $0)
   Filter: (id IS NOT NULL)
 Total runtime: 5295.677 ms

-- Self Join 2

test= set enable_hashjoin TO 0;
test= EXPLAIN ANALYZE SELECT t1.id AS current_id, t2.id AS next_id
FROM test t1, test t2
WHERE t2.id = ( SELECT min(id) FROM test AS t3 WHERE t3.id  t1.id )
ORDER BY t1.id;
  QUERY  
PLAN

---
 Sort  (cost=30806.48..31056.48 rows=10 width=8) (actual  
time=2876.249..2903.011 rows=9 loops=1)

   Sort Key: t1.id
   -  Merge Join  (cost=9745.82..22501.66 rows=10 width=8) (actual  
time=2547.830..2820.347 rows=9 loops=1)

 Merge Cond: (t2.id = inner.?column2?)
 -  Index Scan using test_pkey on test t2  (cost=0.00..2828.26  
rows=10 width=4) (actual time=0.035..67.747 rows=10 loops=1)
 -  Sort  (cost=9745.82..9995.82 rows=10 width=4) (actual  
time=2547.779..2582.889 rows=10 loops=1)

   Sort Key: (subplan)
   -  Seq Scan on test t1  (cost=0.00..1441.00 rows=10  
width=4) (actual time=0.060..2487.728 rows=10 loops=1)

 SubPlan
   -  Result  (cost=0.03..0.04 rows=1 width=0)  
(actual time=0.023..0.023 rows=1 loops=10)

 InitPlan
   -  Limit  (cost=0.00..0.03 rows=1 width=4)  
(actual time=0.021..0.022 rows=1 loops=10)
 -  Index Scan using test_pkey on  
test t3  (cost=0.00..1029.59 rows=3 width=4) (actual time=0.020..0.020  
rows=1 loops=10)

   Index Cond: (id  $0)
   Filter: (id IS NOT NULL)
 Total runtime: 2923.804 ms

-- DISTINCT ON

test= EXPLAIN SELECT DISTINCT ON (t1.id) t1.id AS current_id, t2.id AS  
next_id

FROM test t1 JOIN test t2 ON t2.id  t1.id
ORDER BY t1.id, t2.id;
   QUERY PLAN
-
 Unique  (cost=729806679.75..746473346.41 rows=10 width=8)
   -  Sort  (cost=729806679.75..738140013.08 rows=33 width=8)
 Sort Key: t1.id, t2.id
 -  Nested Loop  (cost=0.00..100028973.00 rows=33 width=8)
   -  Seq Scan on test t1  (cost=0.00..1441.00 rows=10  
width=4)
   -  Index Scan using test_pkey on test t2   
(cost=0.00..583.61 rows=3 width=4)

 Index Cond: (t2.id  t1.id)
(7 lignes)

This one takes much longer (I interrupted it).

-- Using a function

CREATE TYPE test_type AS ( current_id INTEGER, next_id INTEGER );

CREATE OR REPLACE FUNCTION testfunc( )
RETURNS SETOF test_type
LANGUAGE plpgsql
AS
$$
DECLARE
_rowtest_type;
BEGIN
_row.current_id = NULL;

FOR _row.next_id IN SELECT id FROM test ORDER BY id LOOP
 

Re: [GENERAL] a JOIN on same table, but 'slided over'

2007-06-26 Thread Gurjeet Singh

On 6/26/07, Tom Lane [EMAIL PROTECTED] wrote:


news.gmane.org [EMAIL PROTECTED] writes:
 Gurjeet Singh skrev:
 Also note that this query is much cheaper that the 'distinct on' query
 by more than two orders on magnitude ( 217.86 vs. 98040.67):

 No it isn't. The estimate is much lower, but the actual times are very
 close:

 [explain of distinct on]
 Time: 5.003 ms

 [explain of correlated subquery]
 Time: 4.125 ms

You're both confused:



???

the planner estimate certainly should not be taken

as gospel,



true

but the actual runtime of an EXPLAIN (not EXPLAIN ANALYZE)

only reflects planning effort.



Agree completely

EXPLAIN ANALYZE output would be a lot more suitable to settle the

question which one is faster.



Agree again. I was using the EXPLAIN output just to make a point that
optimizer thinks the query utilizing a subquery is much cheaper (and hence
maybe faster) than the 'distinct on' query.

In a later mail I posted the analyze o/p too...

--
[EMAIL PROTECTED]
[EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com

17°29'34.37N  78°30'59.76E - Hyderabad *
18°32'57.25N  73°56'25.42E - Pune

Sent from my BlackLaptop device