Re: [PERFORM] Why is plan (and performance) different on partitioned table?
Title: RE: [PERFORM] Why is plan (and performance) different on partitioned table? If you don't have anything in the parent table br_1min, then deleting the (presumably obsolete) pg_statistic rows for it should fix your immediate problem. Otherwise, consider applying the attached. Tom, thanks alot for your reply. A few follow-up questions, and one potential bug? I've been experimenting with deleting the rows from pg_statistics. FYI, there were statistics for all master tables prior to us partioning the data. We then manually inserted the rows into each inherited partition and, when done - did a truncate of the master table. So, here's what I'm finding. 1) When I delete the rows from pg_statistics, the new plan is, indeed, a hash join. explain analyze SELECT * FROM br_1min br1 JOIN br_mods mod on br1.modules_id = mod.id WHERE ((end_time = '2006-05-01 17:12:18-07' AND end_time '2006-05-01 17:13:18-07')) AND mod.downloads_id IN (153226,153714,153730,153728,153727,153724,153713,153725,153739,153722) ; Hash Join (cost=763.35..807.35 rows=1 width=109) (actual time=3.631..36.181 rows=45 loops=1) Hash Cond: (outer.modules_id = inner.id) - Append (cost=1.04..40.64 rows=877 width=32) (actual time=0.198..34.872 rows=910 loops=1) - Bitmap Heap Scan on br_1min bfs1 (cost=1.04..8.70 rows=6 width=32) (actual time=0.060..0.060 rows=0 loops=1) Recheck Cond: ((end_time = '2006-05-01 17:12:18-07'::timestamp with time zone) AND (end_time '2006-05-01 17:13:18-07'::timestamp with time zone)) - Bitmap Index Scan on br_1min_end_idx (cost=0.00..1.04 rows=6 width=0) (actual time=0.054..0.054 rows=0 loops=1) Index Cond: ((end_time = '2006-05-01 17:12:18-07'::timestamp with time zone) AND (end_time '2006-05-01 17:13:18-07'::timestamp with time zone)) - Index Scan using br_1min_20557_end_idx on br_1min_20557 bfs1 (cost=0.00..25.91 rows=869 width=32) (actual time=0.136..1.858 rows=910 loops=1) Index Cond: ((end_time = '2006-05-01 17:12:18-07'::timestamp with time zone) AND (end_time '2006-05-01 17:13:18-07'::timestamp with time zone)) - Index Scan using br_1min_20570_end_idx on br_1min_20570 bfs1 (cost=0.00..3.02 rows=1 width=32) (actual time=0.092..0.092 rows=0 loops=1) Index Cond: ((end_time = '2006-05-01 17:12:18-07'::timestamp with time zone) AND (end_time '2006-05-01 17:13:18-07'::timestamp with time zone)) - Index Scan using br_1min_20583_end_idx on br_1min_20583 bfs1 (cost=0.00..3.02 rows=1 width=32) (actual time=32.034..32.034 rows=0 loops=1) Index Cond: ((end_time = '2006-05-01 17:12:18-07'::timestamp with time zone) AND (end_time '2006-05-01 17:13:18-07'::timestamp with time zone)) - Hash (cost=761.61..761.61 rows=281 width=77) (actual time=0.487..0.487 rows=45 loops=1) - Bitmap Heap Scan on br_mods mod (cost=20.98..761.61 rows=281 width=77) (actual time=0.264..0.435 rows=45 loops=1) Recheck Cond: ((downloads_id = 153226) OR (downloads_id = 153714) OR (downloads_id = 153730) OR (downloads_id = 153728) OR (downloads_id = 153727) OR (downloads_id = 153724) OR (downloads_id = 153713) OR (downloads_id = 153725) OR (downloads_id = 153739) OR (downloads_id = 153722)) - BitmapOr (cost=20.98..20.98 rows=281 width=0) (actual time=0.223..0.223 rows=0 loops=1) - Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.091..0.091 rows=14 loops=1) Index Cond: (downloads_id = 153226) - Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.037..0.037 rows=2 loops=1) Index Cond: (downloads_id = 153714) - Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.010..0.010 rows=2 loops=1) Index Cond: (downloads_id = 153730) - Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.008..0.008 rows=2 loops=1) Index Cond: (downloads_id = 153728) - Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.008..0.008 rows=2 loops=1) Index Cond: (downloads_id = 153727) - Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.007..0.007 rows=2 loops=1) Index Cond: (downloads_id = 153724) - Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.007..0.007 rows=2 loops=1) Index Cond: (downloads_id = 153713) - Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.007..0.007 rows=2 loops=1) Index Cond: (downloads_id = 153725) - Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.031..0.031 rows=16 loops=1) Index Cond: (downloads_id = 153739) - Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.009..0.009 rows=1 loops=1) Index Cond: (downloads_id = 153722) Total runtime: 36.605 ms (38 rows) Note: there are 2 new partitions that our cron jobs automatically created yesterday that are being scanned, but
[PERFORM] Why is plan (and performance) different on partitioned table?
Title: Why is plan (and performance) different on partitioned table? Hi, I have recently implemented table partitioning in our postgres 8.1 db. Upon analyzing query performance, I have realized that, even when only a single one of the partitions has to be scanned, the plan is drastically different, and performs much worse, when I query against the master table (uses merge join), vs. a direct query against the partition directly (uses a hash join). The majority of our queries only access a single partition. Any insight into why this happens and what can be done to improve performance would be greatly appreciated. br_1min is my partitioned table: explain analyze SELECT * FROM br_1min br1 JOIN br_mods mod on br1.modules_id = mod.id WHERE ((end_time = '2006-05-01 17:12:18-07' AND end_time '2006-05-01 17:13:18-07')) AND mod.downloads_id IN (153226,153714,153730,153728,153727,153724,153713,153725,153739,153722) ; -- Merge Join (cost=73.99..223.43 rows=1 width=109) (actual time=2925.629..3082.188 rows=45 loops=1) Merge Cond: (outer.id = inner.modules_id) - Index Scan using br_mods_id_pkey on br_mods mod (cost=0.00..40861.18 rows=282 width=77) (actual time=2922.223..3078.335 rows=45 loops=1) Filter: ((downloads_id = 153226) OR (downloads_id = 153714) OR (downloads_id = 153730) OR (downloads_id = 153728) OR (downloads_id = 153727) OR (downloads_id = 153724) OR (downloads_id = 153713) OR (downloads_id = 153725) OR (downloads_id = 153739) OR (downloads_id = 153722)) - Sort (cost=73.99..76.26 rows=906 width=32) (actual time=3.334..3.508 rows=348 loops=1) Sort Key: br1.modules_id - Append (cost=0.00..29.49 rows=906 width=32) (actual time=0.133..2.169 rows=910 loops=1) - Index Scan using br_1min_end_idx on br_1min br1 (cost=0.00..2.02 rows=1 width=32) (actual time=0.029..0.029 rows=0 loops=1) Index Cond: ((end_time = '2006-05-01 17:12:18-07'::timestamp with time zone) AND (end_time '2006-05-01 17:13:18-07'::timestamp with time zone)) - Index Scan using br_1min_20557_end_idx on br_1min_20557 br1 (cost=0.00..27.48 rows=905 width=32) (actual time=0.101..1.384 rows=910 loops=1) Index Cond: ((end_time = '2006-05-01 17:12:18-07'::timestamp with time zone) AND (end_time '2006-05-01 17:13:18-07'::timestamp with time zone)) Total runtime: 3082.450 ms (12 rows) Now, If I query directly against br_1min_20557, my partition, I get: explain analyze SELECT * FROM br_1min_20557 br1 JOIN br_mods mod on br1.modules_id = mod.id WHERE ((end_time = '2006-05-01 17:12:18-07' AND end_time '2006-05-01 17:13:18-07')) AND mod.downloads_id IN (153226,153714,153730,153728,153727,153724,153713,153725,153739,153722) ; -- Hash Join (cost=764.74..796.94 rows=1 width=109) (actual time=2.488..2.865 rows=45 loops=1) Hash Cond: (outer.modules_id = inner.id) - Index Scan using br_1min_20557_end_idx on br_1min_20557 br1 (cost=0.00..27.62 rows=914 width=32) (actual time=0.084..1.886 rows=910 loops=1) Index Cond: ((end_time = '2006-05-01 17:12:18-07'::timestamp with time zone) AND (end_time '2006-05-01 17:13:18-07'::timestamp with time zone)) - Hash (cost=764.03..764.03 rows=282 width=77) (actual time=0.284..0.284 rows=45 loops=1) - Bitmap Heap Scan on br_mods mod (cost=20.99..764.03 rows=282 width=77) (actual time=0.154..0.245 rows=45 loops=1) Recheck Cond: ((downloads_id = 153226) OR (downloads_id = 153714) OR (downloads_id = 153730) OR (downloads_id = 153728) OR (downloads_id = 153727) OR (downloads_id = 153724) OR (downloads_id = 153713) OR (downloads_id = 153725) OR (downloads_id = 153739) OR (downloads_id = 153722)) - BitmapOr (cost=20.99..20.99 rows=282 width=0) (actual time=0.144..0.144 rows=0 loops=1) - Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.031..0.031 rows=14 loops=1) Index Cond: (downloads_id = 153226) - Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.011..0.011 rows=2 loops=1) Index Cond: (downloads_id = 153714) - Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.007..0.007 rows=2 loops=1) Index Cond: (downloads_id = 153730) - Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.007..0.007 rows=2 loops=1) Index Cond: (downloads_id = 153728) - Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.008..0.008 rows=2 loops=1) Index Cond: (downloads_id = 153727) - Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.008..0.008 rows=2 loops=1) Index Cond: (downloads_id = 153724) - Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual
Re: [PERFORM] SQL Function Performance
Title: RE: [PERFORM] SQL Function Performance in my case; both direct query and sql function gererate same execution plan. Also, execution plan belongs to the sql function better than direct sql query plan. But, direct sql result comes less than 1 second. sql function result comes about in 50 seconds. How are you getting at the plan inside your function? If you just do an EXPLAIN on the function call you get a FUNCTION SCAN line in your plan, which tells you nothing. I remember I had to work through some process for catching the output of the Explain plan in a cursor and returning that to actually see the plan. I saw in a previous response he suggested using a PREPARE and EXECUTE against that. I'm not sure that's the same as what's going on in the function (although I could be wrong). Just humor me and try creating the sql query in the fuction in a text variable and then Executing it. Prior to that, however, you might try just recreating the function. The plan may be re-evaluated at that point. - Mark
[PERFORM] Stable function being evaluated more than once in a single query
Hi, I've got a set-returning function, defined as STABLE, that I reference twice within a single query, yet appears to be evaluated via two seperate function scans. I created a simple query that calls the function below and joins the results to itself (Note: in case you wonder why I'd do such a query, it's not my actual query, which is much more complex. I just created this simple query to try to test out the 'stable' behavior). select proname,provolatile from pg_proc where proname = 'get_tran_filesize'; proname| provolatile +- get_tran_filesize | s (1 row) explain analyze select * from get_tran_filesize('2005-12-11 00:00:00-08','2006-01-11 15:58:33-08','{228226,228222,228210}'); QUERY PLAN -- Function Scan on get_tran_filesize (cost=0.00..12.50 rows=1000 width=40) (actual time=49.522..49.524 rows=3 loops=1) Total runtime: 49.550 ms (2 rows) explain analyze select * from get_tran_filesize('2005-12-11 00:00:00-08','2006-01-11 15:58:33-08','{228226,228222,228210}') gt, get_tran_filesize('2005-12-11 00:00:00-08','2006-01-11 15:58:33-08','{228226,228222,228210}') gt2 where gt.tran_id = gt2.tran_id; QUERY PLAN -- Merge Join (cost=124.66..204.66 rows=5000 width=80) (actual time=83.027..83.040 rows=3 loops=1) Merge Cond: (outer.tran_id = inner.tran_id) - Sort (cost=62.33..64.83 rows=1000 width=40) (actual time=40.250..40.251 rows=3 loops=1) Sort Key: gt.tran_id - Function Scan on get_tran_filesize gt (cost=0.00..12.50 rows=1000 width=40) (actual time=40.237..40.237 rows=3 loops=1) - Sort (cost=62.33..64.83 rows=1000 width=40) (actual time=42.765..42.767 rows=3 loops=1) Sort Key: gt2.tran_id - Function Scan on get_tran_filesize gt2 (cost=0.00..12.50 rows=1000 width=40) (actual time=42.748..42.751 rows=3 loops=1) Total runtime: 83.112 ms (9 rows) If I do get this working, then my question is, if I reference this function within a single query, but within seperate subqueries within the query, will it be re-evaluated each time, or just once. Basically, I'm not clear on the definition of surrounding query in the following exerpt from the Postgreql documentation: A STABLE function cannot modify the database and is guaranteed to return the same results given the same arguments for all calls within a single surrounding query. Thanks, Mark ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Help in avoiding a query 'Warm-Up' period/shared buffer cache
On Thursday 05 January 2006 15:12, Qingqing Zhou wrote: Mark Liberman [EMAIL PROTECTED] wrote First run, after a night of inactivity: - Bitmap Index Scan on 1min_events_file_id_begin_idx (cost=0.00..37.85 rows=3670 width=0) (actual time=313.468..313.468 rows=11082 loops=1) Index Cond: (file_id = 137271) Total runtime: 313.643 ms Second run, after that: - Bitmap Index Scan on 1min_events_file_id_begin_idx (cost=0.00..37.85 rows=3670 width=0) (actual time=2.106..2.106 rows=11082 loops=1) Index Cond: (file_id = 137271) Total runtime: 2.276 ms It is clear that the first query takes longer time because of the IO time of index 1min_events_file_id_begin_idx (see 313.468 vs. 2.106). I am afraid currently there is no easy solution for this situation, unless you could predicate which part of relation/index your query will use, then you can preload or warm-up cache for it. Regards, Qingqing Thanks Qingqing, this actually helped me determine that the compound index, 1min_events_file_id_begin_idx, is not the proper index to use as it is based on file_id and begin_time - the later of which is not involved in the where clause. It is only using that index to filter out the listed file_id. Now, my follow-up question / assumption. I am assuming that the IO time is so long on that index because it has to read the entire index (for that file_id) into memory (because it cannot just scan the rows with a certain date range because we are not using begin_time in the where clause). But, if I replaced that compound index with the proper compound index of file_id / end_time, it would give similar performance results to the scan on 1min_events_end_idx (which was 1 ms). E.g. the latest rows that were updated are more likely to be in the cache - and it is smart enough to only read the index rows that it needs. Alternatively, I could create a single index on file_id (and rely upon the new bitmap scan capabilities in 1.2). But, I fear that, although this will be smaller than the erroneous compound index on file_id / begin_time, it will still display the same behavior in that it will need to read all rows from that index for the appropriate file_id - and since the data goes back every minute for 60 days, that IO might be large. Obviously, I will be testing this - but it might take a few days, as I haven't figure out how to simulate the period of inactivity to get the data flushed out of the cache ... so I have to run this each morning. But, any confirmation / corrections to my assumptions are greatly appreciated. E.g. is the compound index the way to go, or the solo index on file_id? Thanks, Mark ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org