How can I use bitmap index scans more effectively? (version 8.1.0)
I have a financial ledger (actually a view, grouping several other tables)
containing about a million records. Each record contains an account code and
a project code. I can query for all the transactions belonging to any single
project code and it is very fast and efficient (milliseconds/project).
But projects are organized in a hierarchical structure, so I also need to
query the ledger for transactions belonging to a particular project and/or all
its progeny. Depending on the method, this is taking several seconds to
several minutes per project.
For testing purposes, I'll present results using a smaller version of the
ledger with the following query times:
It is most efficient to enumerate the group of projects using "in" (0.144
seconds)
select * from ledger where proj in
(4737,4789,4892,4893,4894,4895,4933,4934,4935);
---
Nested Loop Left Join (cost=19.73..4164.10 rows=7 width=85)
-> Nested Loop (cost=19.73..4139.08 rows=7 width=81)
-> Nested Loop (cost=19.73..4100.07 rows=7 width=63)
-> Bitmap Heap Scan on apinv_items i (cost=19.73..1185.71
rows=487 width=55)
Recheck Cond: ((proj = 4737) OR (proj = 4789) OR (proj =
4892) OR (proj = 4893) OR (proj = 4894) OR (proj = 4895) OR (proj = 4933) OR
(proj = 4934
) OR (proj = 4935))
Filter: ((status = 'en'::bpchar) OR (status = 'cl'::bpchar)
OR (status = 'pd'::bpchar))
-> BitmapOr (cost=19.73..19.73 rows=495 width=0)
-> Bitmap Index Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4737)
-> Bitmap Index Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4789)
-> Bitmap Index Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4892)
-> Bitmap Index Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4893)
-> Bitmap Index Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4894)
-> Bitmap Index Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4895)
-> Bitmap Index Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4933)
-> Bitmap Index Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4934)
-> Bitmap Index Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4935)
-> Index Scan using apinv_hdr_pkey on apinv_hdr h
(cost=0.00..5.97 rows=1 width=21)
Index Cond: (("outer".vendid = h.vendid) AND
(("outer".invnum)::text = (h.invnum)::text))
-> Index Scan using vend_org_pkey on vend_org v (cost=0.00..5.56
rows=1 width=26)
Index Cond: (v.org_id = "outer".vendid)
-> Seq Scan on acct a (cost=0.00..3.54 rows=1 width=4)
Filter: ((code)::text = 'ap'::text)
---
Problem is, the project list has to be hard-coded into the SQL statement.
What I really need is transactions belonging to "project 4737 and all its progeny."
So I've tried using a many-to-many table proj_prog that describes which projects
are progeny of which other projects. Unfortunately, the query time then goes up
by a factor of 6 (to 0.85 seconds).
Examples:
select * from ledger where proj = any (array(select prog_id from proj_prog
where proj_id = 4737));
select * from ledger where proj = any
(array[4737,4789,4892,4893,4894,4895,4933,4934,4935]);"
---
Nested Loop Left Join (cost=13584.99..17647.39 rows=850 width=85)
InitPlan
-> Index Scan using proj_prog_pkey on proj_prog (cost=0.00..38.04 rows=21
width=4)
Index Cond: (proj_id = 4737)
-> Merge Join (cost=13543.42..17565.44 rows=850 width=81)
Merge Cond: ("outer".vendid = "inner".org_id)
-> Merge Join (cost=13543.42..17405.05 rows=850 width=63)
Merge Cond: (("outer".vendid = "inner".vendid) AND (("outer".invnum)::text =
"inner"."?column10?"))
-> Index Scan using apinv_hdr_pkey on apinv_hdr h
(cost=0.00..3148.16 rows=51016 width=21)
-> Sort (cost=13543.42..13693.47 rows=60020 width=55)