Re: [SQL] Breaking up a query

2006-08-13 Thread Andrew Sullivan
On Thu, Aug 10, 2006 at 06:53:33PM -0400, Saad Anis wrote:
> Hi Guys,
> 
> A fellow at work has written the SQL below to retrieve some data from
> multiple tables. Obviously it is inefficient and unnecessarily complex, and

Why is that obvious?  Why do you think that joining in your
application is going to give better performance than joining in the
database?  This is what RDBMS _do_, for heaven's sake.

(I agree that your plan shows it's taking a long time.  But that's a
different problem.  We don't actually know what you want from your
query.)

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
I remember when computers were frustrating because they *did* exactly what 
you told them to.  That actually seems sort of quaint now.
--J.D. Baldwin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[SQL] Using bitmap index scans-more efficient

2006-08-13 Thread Kyle Bateman

How can I use bitmap index scans more effectively? (version 8.1.0)

I have a financial ledger (actually a view, grouping several other tables) 
containing about a million records.  Each record contains an account code and 
a project code.  I can query for all the transactions belonging to any single

project code and it is very fast and efficient (milliseconds/project).

But projects are organized in a hierarchical structure, so I also need to 
query the ledger for transactions belonging to a particular project and/or all

its progeny.  Depending on the method, this is taking several seconds to
several minutes per project.

For testing purposes, I'll present results using a smaller version of the 
ledger with the following query times:



It is most efficient to enumerate the group of projects using "in" (0.144 
seconds)

 select * from ledger where proj in 
(4737,4789,4892,4893,4894,4895,4933,4934,4935);

---
Nested Loop Left Join  (cost=19.73..4164.10 rows=7 width=85)
  ->  Nested Loop  (cost=19.73..4139.08 rows=7 width=81)
->  Nested Loop  (cost=19.73..4100.07 rows=7 width=63)
  ->  Bitmap Heap Scan on apinv_items i  (cost=19.73..1185.71 
rows=487 width=55)
Recheck Cond: ((proj = 4737) OR (proj = 4789) OR (proj = 
4892) OR (proj = 4893) OR (proj = 4894) OR (proj = 4895) OR (proj = 4933) OR 
(proj = 4934
) OR (proj = 4935))
Filter: ((status = 'en'::bpchar) OR (status = 'cl'::bpchar) 
OR (status = 'pd'::bpchar))
->  BitmapOr  (cost=19.73..19.73 rows=495 width=0)
  ->  Bitmap Index Scan on i_apinv_items_proj  
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4737)
  ->  Bitmap Index Scan on i_apinv_items_proj  
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4789)
  ->  Bitmap Index Scan on i_apinv_items_proj  
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4892)
  ->  Bitmap Index Scan on i_apinv_items_proj  
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4893)
  ->  Bitmap Index Scan on i_apinv_items_proj  
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4894)
  ->  Bitmap Index Scan on i_apinv_items_proj  
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4895)
  ->  Bitmap Index Scan on i_apinv_items_proj  
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4933)
  ->  Bitmap Index Scan on i_apinv_items_proj  
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4934)
  ->  Bitmap Index Scan on i_apinv_items_proj  
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4935)
  ->  Index Scan using apinv_hdr_pkey on apinv_hdr h  
(cost=0.00..5.97 rows=1 width=21)
Index Cond: (("outer".vendid = h.vendid) AND 
(("outer".invnum)::text = (h.invnum)::text))
->  Index Scan using vend_org_pkey on vend_org v  (cost=0.00..5.56 
rows=1 width=26)
  Index Cond: (v.org_id = "outer".vendid)
  ->  Seq Scan on acct a  (cost=0.00..3.54 rows=1 width=4)
Filter: ((code)::text = 'ap'::text)
---

Problem is, the project list has to be hard-coded into the SQL statement. 
What I really need is transactions belonging to "project 4737 and all its progeny."
So I've tried using a many-to-many table proj_prog that describes which projects 
are progeny of which other projects.  Unfortunately, the query time then goes up 
by a factor of 6 (to 0.85 seconds).


Examples:
 select * from ledger where proj = any (array(select prog_id from proj_prog 
where proj_id = 4737));
 select * from ledger where proj = any 
(array[4737,4789,4892,4893,4894,4895,4933,4934,4935]);"

---
Nested Loop Left Join  (cost=13584.99..17647.39 rows=850 width=85)
  InitPlan
->  Index Scan using proj_prog_pkey on proj_prog  (cost=0.00..38.04 rows=21 
width=4)
  Index Cond: (proj_id = 4737)
  ->  Merge Join  (cost=13543.42..17565.44 rows=850 width=81)
Merge Cond: ("outer".vendid = "inner".org_id)
->  Merge Join  (cost=13543.42..17405.05 rows=850 width=63)
  Merge Cond: (("outer".vendid = "inner".vendid) AND (("outer".invnum)::text = 
"inner"."?column10?"))
  ->  Index Scan using apinv_hdr_pkey on apinv_hdr h  
(cost=0.00..3148.16 rows=51016 width=21)
  ->  Sort  (cost=13543.42..13693.47 rows=60020 width=55)
  

Re: [SQL] Using bitmap index scans-more efficient

2006-08-13 Thread Florian Weimer
* Kyle Bateman:

> Any ideas about whether/how this can be done?

If the project tree is fairly consistent, it's convenient to encode it
using intervals instead of parent/child intervals.  IIRC, Celko's "SQL
for smarties" explains how to do this, and Kristian Koehntopp has
written some PHP code to implement it.

-- 
Florian Weimer<[EMAIL PROTECTED]>
BFK edv-consulting GmbH   http://www.bfk.de/
Durlacher Allee 47tel: +49-721-96201-1
D-76131 Karlsruhe fax: +49-721-96201-99

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq