Re: [PERFORM] max() versus order/limit (WAS: High update activity, PostgreSQL vs BigDBMS)

2007-01-15 Thread Alvaro Herrera
Adam Rich wrote:
 
 Did anybody get a chance to look at this?  Is it expected behavior?
 Everyone seemed so incredulous, I hoped maybe this exposed a bug
 that would be fixed in a near release.

Actually, the planner is only able to do the min()/max() transformation
into order by/limit in the case of a single table being scanned.  Since
you have a join here, the optimization is obviously not used:

 select max(item_id)
 from events e, receipts r, receipt_items ri
 where e.event_id=r.event_id and r.receipt_id=ri.receipt_id

plan/planagg.c says

/*
 * We also restrict the query to reference exactly one table, since join
 * conditions can't be handled reasonably.  (We could perhaps handle a
 * query containing cartesian-product joins, but it hardly seems worth the
 * trouble.)
 */

so you should keep using your hand-written order by/limit query.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] max() versus order/limit (WAS: High update

2007-01-15 Thread Alvaro Herrera
Luke Lonergan wrote:
 Adam,
 
 This optimization would require teaching the planner to use an index for
 MAX/MIN when available.  It seems like an OK thing to do to me.

This optimization already exists, albeit for queries that use a single
table.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] max() versus order/limit (WAS: High update

2007-01-15 Thread Joshua D. Drake
Luke Lonergan wrote:
 Adam,
 
 This optimization would require teaching the planner to use an index for
 MAX/MIN when available.  It seems like an OK thing to do to me.

Uhmmm I thought we did that already in 8.1?

Joshua D. Drake


 
 - Luke
 
 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Adam Rich
 Sent: Sunday, January 14, 2007 8:52 PM
 To: 'Joshua D. Drake'; 'Tom Lane'
 Cc: 'Craig A. James'; 'PostgreSQL Performance'
 Subject: Re: [PERFORM] max() versus order/limit (WAS: High 
 update activity, PostgreSQL vs BigDBMS)


 Did anybody get a chance to look at this?  Is it expected behavior?
 Everyone seemed so incredulous, I hoped maybe this exposed a 
 bug that would be fixed in a near release.


 -Original Message-
 From: Adam Rich [mailto:[EMAIL PROTECTED]
 Sent: Sunday, January 07, 2007 11:53 PM
 To: 'Joshua D. Drake'; 'Tom Lane'
 Cc: 'Craig A. James'; 'PostgreSQL Performance'
 Subject: RE: [PERFORM] High update activity, PostgreSQL vs BigDBMS



 Here's another, more drastic example... Here the order by / limit
 version
 runs in less than 1/7000 the time of the MAX() version.


 select max(item_id)
 from events e, receipts r, receipt_items ri
 where e.event_id=r.event_id and r.receipt_id=ri.receipt_id

 Aggregate  (cost=10850.84..10850.85 rows=1 width=4) (actual
 time=816.382..816.383 rows=1 loops=1)
   -  Hash Join  (cost=2072.12..10503.30 rows=139019 width=4) (actual
 time=155.177..675.870 rows=147383 loops=1)
 Hash Cond: (ri.receipt_id = r.receipt_id)
 -  Seq Scan on receipt_items ri  (cost=0.00..4097.56
 rows=168196 width=8) (actual time=0.009..176.894 rows=168196 loops=1)
 -  Hash  (cost=2010.69..2010.69 rows=24571 width=4) (actual
 time=155.146..155.146 rows=24571 loops=1)
   -  Hash Join  (cost=506.84..2010.69 rows=24571 width=4)
 (actual time=34.803..126.452 rows=24571 loops=1)
 Hash Cond: (r.event_id = e.event_id)
 -  Seq Scan on receipts r  (cost=0.00..663.58
 rows=29728 width=8) (actual time=0.006..30.870 rows=29728 loops=1)
 -  Hash  (cost=469.73..469.73 rows=14843 width=4)
 (actual time=34.780..34.780 rows=14843 loops=1)
   -  Seq Scan on events e  (cost=0.00..469.73
 rows=14843 width=4) (actual time=0.007..17.603 rows=14843 loops=1)
 Total runtime: 816.645 ms

 select item_id
 from events e, receipts r, receipt_items ri
 where e.event_id=r.event_id and r.receipt_id=ri.receipt_id
 order by item_id desc limit 1


 Limit  (cost=0.00..0.16 rows=1 width=4) (actual 
 time=0.047..0.048 rows=1
 loops=1)
   -  Nested Loop  (cost=0.00..22131.43 rows=139019 width=4) (actual
 time=0.044..0.044 rows=1 loops=1)
 -  Nested Loop  (cost=0.00..12987.42 rows=168196 width=8)
 (actual time=0.032..0.032 rows=1 loops=1)
   -  Index Scan Backward using receipt_items_pkey on
 receipt_items ri  (cost=0.00..6885.50 rows=168196 width=8) (actual
 time=0.016..0.016 rows=1 loops=1)
   -  Index Scan using receipts_pkey on receipts r
 (cost=0.00..0.02 rows=1 width=8) (actual time=0.010..0.010 rows=1
 loops=1)
 Index Cond: (r.receipt_id = ri.receipt_id)
 -  Index Scan using events_pkey on events e  (cost=0.00..0.04
 rows=1 width=4) (actual time=0.009..0.009 rows=1 loops=1)
   Index Cond: (e.event_id = r.event_id)
 Total runtime: 0.112 ms





 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Joshua D.
 Drake
 Sent: Sunday, January 07, 2007 9:10 PM
 To: Adam Rich
 Cc: 'Craig A. James'; 'Guy Rouillier'; 'PostgreSQL Performance'
 Subject: Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS


 On Sun, 2007-01-07 at 20:26 -0600, Adam Rich wrote:
 I'm using 8.2 and using order by  limit is still faster than MAX()
 even though MAX() now seems to rewrite to an almost identical plan
 internally.

 Gonna need you to back that up :) Can we get an explain analyze?


 Count(*) still seems to use a full table scan rather than an index
 scan.
 There is a TODO out there to help this. Don't know if it will 
 get done.

 Joshua D. Drake

 -- 

   === The PostgreSQL Company: Command Prompt, Inc. ===
 Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
 Providing the most comprehensive  PostgreSQL solutions since 1997
  http://www.commandprompt.com/

 Donate to the PostgreSQL Project: 
 http://www.postgresql.org/about/donate




 ---(end of 
 broadcast)---
 TIP 4: Have you searched our list archives?

http://archives.postgresql.org


 ---(end of 
 broadcast)---
 TIP 7: You can help support the PostgreSQL project by donating at

 http://www.postgresql.org/about/donate


 
 
 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner

Re: [PERFORM] max() versus order/limit (WAS: High update

2007-01-14 Thread Luke Lonergan
Adam,

This optimization would require teaching the planner to use an index for
MAX/MIN when available.  It seems like an OK thing to do to me.

- Luke

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Adam Rich
 Sent: Sunday, January 14, 2007 8:52 PM
 To: 'Joshua D. Drake'; 'Tom Lane'
 Cc: 'Craig A. James'; 'PostgreSQL Performance'
 Subject: Re: [PERFORM] max() versus order/limit (WAS: High 
 update activity, PostgreSQL vs BigDBMS)
 
 
 Did anybody get a chance to look at this?  Is it expected behavior?
 Everyone seemed so incredulous, I hoped maybe this exposed a 
 bug that would be fixed in a near release.
 
 
 -Original Message-
 From: Adam Rich [mailto:[EMAIL PROTECTED]
 Sent: Sunday, January 07, 2007 11:53 PM
 To: 'Joshua D. Drake'; 'Tom Lane'
 Cc: 'Craig A. James'; 'PostgreSQL Performance'
 Subject: RE: [PERFORM] High update activity, PostgreSQL vs BigDBMS
 
 
 
 Here's another, more drastic example... Here the order by / limit
 version
 runs in less than 1/7000 the time of the MAX() version.
 
 
 select max(item_id)
 from events e, receipts r, receipt_items ri
 where e.event_id=r.event_id and r.receipt_id=ri.receipt_id
 
 Aggregate  (cost=10850.84..10850.85 rows=1 width=4) (actual
 time=816.382..816.383 rows=1 loops=1)
   -  Hash Join  (cost=2072.12..10503.30 rows=139019 width=4) (actual
 time=155.177..675.870 rows=147383 loops=1)
 Hash Cond: (ri.receipt_id = r.receipt_id)
 -  Seq Scan on receipt_items ri  (cost=0.00..4097.56
 rows=168196 width=8) (actual time=0.009..176.894 rows=168196 loops=1)
 -  Hash  (cost=2010.69..2010.69 rows=24571 width=4) (actual
 time=155.146..155.146 rows=24571 loops=1)
   -  Hash Join  (cost=506.84..2010.69 rows=24571 width=4)
 (actual time=34.803..126.452 rows=24571 loops=1)
 Hash Cond: (r.event_id = e.event_id)
 -  Seq Scan on receipts r  (cost=0.00..663.58
 rows=29728 width=8) (actual time=0.006..30.870 rows=29728 loops=1)
 -  Hash  (cost=469.73..469.73 rows=14843 width=4)
 (actual time=34.780..34.780 rows=14843 loops=1)
   -  Seq Scan on events e  (cost=0.00..469.73
 rows=14843 width=4) (actual time=0.007..17.603 rows=14843 loops=1)
 Total runtime: 816.645 ms
 
 select item_id
 from events e, receipts r, receipt_items ri
 where e.event_id=r.event_id and r.receipt_id=ri.receipt_id
 order by item_id desc limit 1
 
 
 Limit  (cost=0.00..0.16 rows=1 width=4) (actual 
 time=0.047..0.048 rows=1
 loops=1)
   -  Nested Loop  (cost=0.00..22131.43 rows=139019 width=4) (actual
 time=0.044..0.044 rows=1 loops=1)
 -  Nested Loop  (cost=0.00..12987.42 rows=168196 width=8)
 (actual time=0.032..0.032 rows=1 loops=1)
   -  Index Scan Backward using receipt_items_pkey on
 receipt_items ri  (cost=0.00..6885.50 rows=168196 width=8) (actual
 time=0.016..0.016 rows=1 loops=1)
   -  Index Scan using receipts_pkey on receipts r
 (cost=0.00..0.02 rows=1 width=8) (actual time=0.010..0.010 rows=1
 loops=1)
 Index Cond: (r.receipt_id = ri.receipt_id)
 -  Index Scan using events_pkey on events e  (cost=0.00..0.04
 rows=1 width=4) (actual time=0.009..0.009 rows=1 loops=1)
   Index Cond: (e.event_id = r.event_id)
 Total runtime: 0.112 ms
 
 
 
 
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Joshua D.
 Drake
 Sent: Sunday, January 07, 2007 9:10 PM
 To: Adam Rich
 Cc: 'Craig A. James'; 'Guy Rouillier'; 'PostgreSQL Performance'
 Subject: Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS
 
 
 On Sun, 2007-01-07 at 20:26 -0600, Adam Rich wrote:
  I'm using 8.2 and using order by  limit is still faster than MAX()
  even though MAX() now seems to rewrite to an almost identical plan
  internally.
 
 
 Gonna need you to back that up :) Can we get an explain analyze?
 
 
  Count(*) still seems to use a full table scan rather than an index
 scan.
  
 
 There is a TODO out there to help this. Don't know if it will 
 get done.
 
 Joshua D. Drake
 
 -- 
 
   === The PostgreSQL Company: Command Prompt, Inc. ===
 Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
 Providing the most comprehensive  PostgreSQL solutions since 1997
  http://www.commandprompt.com/
 
 Donate to the PostgreSQL Project: 
 http://www.postgresql.org/about/donate
 
 
 
 
 ---(end of 
 broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org
 
 
 ---(end of 
 broadcast)---
 TIP 7: You can help support the PostgreSQL project by donating at
 
 http://www.postgresql.org/about/donate
 
 


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan