Hi,
My question is regarding ORDER BY / LIMIT query behavior when using
partitioning.
I have a large table (about 100 columns, several million rows) partitioned by a
column called day (which is the date stored as yyyymmdd - say 20100502 for May
2nd 2010 etc.). Say the main table is called FACT_TABLE and each child table
is called FACT_TABLE_yyyymmdd (e.g. FACT_TABLE_20100502, FACT_TABLE_20100503
etc.) and has an appropriate CHECK constraint created on it to CHECK (day =
yyyymmdd).
Postgres Version: PostgreSQL 8.4.2 on x86_64-unknown-linux-gnu, compiled by
GCC gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-10), 64-bit
The query pattern I am looking at is (I have tried to simplify the column names
for readability):
SELECT F1 from FACT_TABLE
where day >= 20100502 and day <= 20100507 # selecting for a week
ORDER BY F2 desc
LIMIT 100
This is what is happening:
When I query from the specific day's (child) table, I get what I expect - a
descending Index scan and good performance.
# explain select F1 from FACT_TABLE_20100502 where day = 20100502 order by F2
desc limit 100;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
--
Limit (cost=0.00..4.81 rows=100 width=41)
-> Index Scan Backward using F2_20100502 on FACT_TABLE_20100502
(cost=0.00..90355.89 rows=1876985 width=41
)
Filter: (day = 20100502)
BUT:
When I do the same query against the parent table it is much slower - two
things seem to happen - one is that the descending scan of the index is not
done and secondly there seems to be a separate sort/limit at the end - i.e. all
data from all partitions is retrieved and then sorted and limited - This seems
to be much less efficient than doing a descending scan on each partition and
limiting the results and then combining and reapplying the limit at the end.
explain select F1 from FACT_TABLE where day = 20100502 order by F2 desc limit
100;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
---
Limit (cost=20000084948.01..20000084948.01 rows=100 width=41)
-> Sort (cost=20000084948.01..20000084994.93 rows=1876986 width=41)
Sort Key: public.FACT_TABLE.F2
-> Result (cost=10000000000.00..20000084230.64 rows=1876986 width=41)
-> Append (cost=10000000000.00..20000084230.64 rows=1876986
width=41)
-> Seq Scan on FACT_TABLE
(cost=10000000000.00..10000000010.02 rows=1 width=186)
Filter: (day = 20100502)
-> Seq Scan on FACT_TABLE_20100502 FACT_TABLE
(cost=10000000000.00..10000084220.62 rows=1876985 width=4
1)
Filter: (day = 20100502)
(9 rows)
Could anyone please explain why this is happening and what I can do to get the
query to perform well even when querying from the parent table?
Thanks,
Ranga
_________________________________________________________________
Hotmail is redefining busy with tools for the New Busy. Get more from your
inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2