Re: [PERFORM] Why is plan (and performance) different on partitioned table?

2006-05-02 Thread Mark Liberman
Title: RE: [PERFORM] Why is plan (and performance) different on partitioned table? 






If you don't have anything in the parent table br_1min, then deleting
the (presumably obsolete) pg_statistic rows for it should fix your
immediate problem. Otherwise, consider applying the attached.

Tom, thanks alot for your reply. A few follow-up questions, and one potential bug?

I've been experimenting with deleting the rows from pg_statistics. FYI, there were statistics for all master tables prior to us partioning the data. We then manually inserted the rows into each inherited partition and, when done - did a truncate of the master table.

So, here's what I'm finding.

1) When I delete the rows from pg_statistics, the new plan is, indeed, a hash join.

explain analyze
SELECT *
FROM br_1min br1 JOIN br_mods mod on br1.modules_id = mod.id
WHERE ((end_time = '2006-05-01 17:12:18-07' AND end_time  '2006-05-01 17:13:18-07'))
 AND mod.downloads_id IN (153226,153714,153730,153728,153727,153724,153713,153725,153739,153722) ;

Hash Join (cost=763.35..807.35 rows=1 width=109) (actual time=3.631..36.181 rows=45 loops=1)
 Hash Cond: (outer.modules_id = inner.id)
 - Append (cost=1.04..40.64 rows=877 width=32) (actual time=0.198..34.872 rows=910 loops=1)
 - Bitmap Heap Scan on br_1min bfs1 (cost=1.04..8.70 rows=6 width=32) (actual time=0.060..0.060 rows=0 loops=1)
 Recheck Cond: ((end_time = '2006-05-01 17:12:18-07'::timestamp with time zone) AND (end_time  '2006-05-01 17:13:18-07'::timestamp with time zone))
 - Bitmap Index Scan on br_1min_end_idx (cost=0.00..1.04 rows=6 width=0) (actual time=0.054..0.054 rows=0 loops=1)
 Index Cond: ((end_time = '2006-05-01 17:12:18-07'::timestamp with time zone) AND (end_time  '2006-05-01 17:13:18-07'::timestamp with time zone))
 - Index Scan using br_1min_20557_end_idx on br_1min_20557 bfs1 (cost=0.00..25.91 rows=869 width=32) (actual time=0.136..1.858 rows=910 loops=1)
 Index Cond: ((end_time = '2006-05-01 17:12:18-07'::timestamp with time zone) AND (end_time  '2006-05-01 17:13:18-07'::timestamp with time zone))
 - Index Scan using br_1min_20570_end_idx on br_1min_20570 bfs1 (cost=0.00..3.02 rows=1 width=32) (actual time=0.092..0.092 rows=0 loops=1)
 Index Cond: ((end_time = '2006-05-01 17:12:18-07'::timestamp with time zone) AND (end_time  '2006-05-01 17:13:18-07'::timestamp with time zone))
 - Index Scan using br_1min_20583_end_idx on br_1min_20583 bfs1 (cost=0.00..3.02 rows=1 width=32) (actual time=32.034..32.034 rows=0 loops=1)
 Index Cond: ((end_time = '2006-05-01 17:12:18-07'::timestamp with time zone) AND (end_time  '2006-05-01 17:13:18-07'::timestamp with time zone))
 - Hash (cost=761.61..761.61 rows=281 width=77) (actual time=0.487..0.487 rows=45 loops=1)
 - Bitmap Heap Scan on br_mods mod (cost=20.98..761.61 rows=281 width=77) (actual time=0.264..0.435 rows=45 loops=1)
 Recheck Cond: ((downloads_id = 153226) OR (downloads_id = 153714) OR (downloads_id = 153730) OR (downloads_id = 153728) OR (downloads_id = 153727) OR (downloads_id = 153724) OR (downloads_id = 153713) OR (downloads_id = 153725) OR (downloads_id = 153739) OR (downloads_id = 153722))
 - BitmapOr (cost=20.98..20.98 rows=281 width=0) (actual time=0.223..0.223 rows=0 loops=1)
 - Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.091..0.091 rows=14 loops=1)
 Index Cond: (downloads_id = 153226)
 - Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.037..0.037 rows=2 loops=1)
 Index Cond: (downloads_id = 153714)
 - Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.010..0.010 rows=2 loops=1)
 Index Cond: (downloads_id = 153730)
 - Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.008..0.008 rows=2 loops=1)
 Index Cond: (downloads_id = 153728)
 - Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.008..0.008 rows=2 loops=1)
 Index Cond: (downloads_id = 153727)
 - Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.007..0.007 rows=2 loops=1)
 Index Cond: (downloads_id = 153724)
 - Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.007..0.007 rows=2 loops=1)
 Index Cond: (downloads_id = 153713)
 - Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.007..0.007 rows=2 loops=1)
 Index Cond: (downloads_id = 153725)
 - Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.031..0.031 rows=16 loops=1)
 Index Cond: (downloads_id = 153739)
 - Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.009..0.009 rows=1 loops=1)
 Index Cond: (downloads_id = 153722)
Total runtime: 36.605 ms
(38 rows)


Note: there are 2 new partitions that our cron jobs automatically created yesterday that are being scanned, but 

[PERFORM] Why is plan (and performance) different on partitioned table?

2006-05-01 Thread Mark Liberman
Title: Why is plan (and performance) different on partitioned table?






Hi,

I have recently implemented table partitioning in our postgres 8.1 db. Upon analyzing query performance, I have realized that, even when only a single one of the partitions has to be scanned, the plan is drastically different, and performs much worse, when I query against the master table (uses merge join), vs. a direct query against the partition directly (uses a hash join). The majority of our queries only access a single partition.

Any insight into why this happens and what can be done to improve performance would be greatly appreciated.

br_1min is my partitioned table:

explain analyze
SELECT *
FROM br_1min br1 JOIN br_mods mod on br1.modules_id = mod.id
WHERE ((end_time = '2006-05-01 17:12:18-07' AND end_time  '2006-05-01 17:13:18-07'))
 AND mod.downloads_id IN (153226,153714,153730,153728,153727,153724,153713,153725,153739,153722) ;

--
Merge Join (cost=73.99..223.43 rows=1 width=109) (actual time=2925.629..3082.188 rows=45 loops=1)
 Merge Cond: (outer.id = inner.modules_id)
 - Index Scan using br_mods_id_pkey on br_mods mod (cost=0.00..40861.18 rows=282 width=77) (actual time=2922.223..3078.335 rows=45 loops=1)
 Filter: ((downloads_id = 153226) OR (downloads_id = 153714) OR (downloads_id = 153730) OR (downloads_id = 153728) OR (downloads_id = 153727) OR (downloads_id = 153724) OR (downloads_id = 153713) OR (downloads_id = 153725) OR (downloads_id = 153739) OR (downloads_id = 153722))
 - Sort (cost=73.99..76.26 rows=906 width=32) (actual time=3.334..3.508 rows=348 loops=1)
 Sort Key: br1.modules_id
 - Append (cost=0.00..29.49 rows=906 width=32) (actual time=0.133..2.169 rows=910 loops=1)
 - Index Scan using br_1min_end_idx on br_1min br1 (cost=0.00..2.02 rows=1 width=32) (actual time=0.029..0.029 rows=0 loops=1)
 Index Cond: ((end_time = '2006-05-01 17:12:18-07'::timestamp with time zone) AND (end_time  '2006-05-01 17:13:18-07'::timestamp with time zone))
 - Index Scan using br_1min_20557_end_idx on br_1min_20557 br1 (cost=0.00..27.48 rows=905 width=32) (actual time=0.101..1.384 rows=910 loops=1)
 Index Cond: ((end_time = '2006-05-01 17:12:18-07'::timestamp with time zone) AND (end_time  '2006-05-01 17:13:18-07'::timestamp with time zone))
Total runtime: 3082.450 ms
(12 rows)



Now, If I query directly against br_1min_20557, my partition, I get:

explain analyze
SELECT *
FROM br_1min_20557 br1 JOIN br_mods mod on br1.modules_id = mod.id
WHERE ((end_time = '2006-05-01 17:12:18-07' AND end_time  '2006-05-01 17:13:18-07'))
 AND mod.downloads_id IN (153226,153714,153730,153728,153727,153724,153713,153725,153739,153722) ;


--
Hash Join (cost=764.74..796.94 rows=1 width=109) (actual time=2.488..2.865 rows=45 loops=1)
 Hash Cond: (outer.modules_id = inner.id)
 - Index Scan using br_1min_20557_end_idx on br_1min_20557 br1 (cost=0.00..27.62 rows=914 width=32) (actual time=0.084..1.886 rows=910 loops=1)
 Index Cond: ((end_time = '2006-05-01 17:12:18-07'::timestamp with time zone) AND (end_time  '2006-05-01 17:13:18-07'::timestamp with time zone))
 - Hash (cost=764.03..764.03 rows=282 width=77) (actual time=0.284..0.284 rows=45 loops=1)
 - Bitmap Heap Scan on br_mods mod (cost=20.99..764.03 rows=282 width=77) (actual time=0.154..0.245 rows=45 loops=1)
 Recheck Cond: ((downloads_id = 153226) OR (downloads_id = 153714) OR (downloads_id = 153730) OR (downloads_id = 153728) OR (downloads_id = 153727) OR (downloads_id = 153724) OR (downloads_id = 153713) OR (downloads_id = 153725) OR (downloads_id = 153739) OR (downloads_id = 153722))
 - BitmapOr (cost=20.99..20.99 rows=282 width=0) (actual time=0.144..0.144 rows=0 loops=1)
 - Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.031..0.031 rows=14 loops=1)
 Index Cond: (downloads_id = 153226)
 - Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.011..0.011 rows=2 loops=1)
 Index Cond: (downloads_id = 153714)
 - Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.007..0.007 rows=2 loops=1)
 Index Cond: (downloads_id = 153730)
 - Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.007..0.007 rows=2 loops=1)
 Index Cond: (downloads_id = 153728)
 - Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.008..0.008 rows=2 loops=1)
 Index Cond: (downloads_id = 153727)
 - Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.008..0.008 rows=2 loops=1)
 Index Cond: (downloads_id = 153724)
 - Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual 

Re: [PERFORM] SQL Function Performance

2006-02-17 Thread Mark Liberman
Title: RE: [PERFORM] SQL Function Performance






 in my case; both direct query and sql function gererate same execution plan. Also, execution plan belongs to the sql function better than direct sql  query plan. But, direct sql result comes less than 1 second. sql function result comes about in 50 seconds.

How are you getting at the plan inside your function? If you just do an EXPLAIN on the function call you get a FUNCTION SCAN line in your plan, which tells you nothing. I remember I had to work through some process for catching the output of the Explain plan in a cursor and returning that to actually see the plan. I saw in a previous response he suggested using a PREPARE and EXECUTE against that. I'm not sure that's the same as what's going on in the function (although I could be wrong).

Just humor me and try creating the sql query in the fuction in a text variable and then Executing it.

Prior to that, however, you might try just recreating the function. The plan may be re-evaluated at that point.

- Mark








[PERFORM] Stable function being evaluated more than once in a single query

2006-01-11 Thread Mark Liberman
Hi,

I've got a set-returning function,  defined as STABLE, that I reference twice 
within a single query, yet appears to be evaluated via two seperate function 
scans.  I created a simple query that calls the function below and joins the 
results to itself (Note:  in case you wonder why I'd do such a query, it's 
not my actual query, which is much more complex.  I just created this simple 
query to try to test out the 'stable' behavior).


select proname,provolatile from pg_proc where proname = 'get_tran_filesize';
 proname| provolatile
+-
 get_tran_filesize  | s
(1 row)


explain analyze
select * from 
 get_tran_filesize('2005-12-11 00:00:00-08','2006-01-11 
15:58:33-08','{228226,228222,228210}');

  QUERY PLAN
--
 Function Scan on get_tran_filesize  (cost=0.00..12.50 rows=1000 width=40) 
(actual time=49.522..49.524 rows=3 loops=1)
 Total runtime: 49.550 ms
(2 rows)


explain analyze
select * from 
 get_tran_filesize('2005-12-11 00:00:00-08','2006-01-11 
15:58:33-08','{228226,228222,228210}') gt,
 get_tran_filesize('2005-12-11 00:00:00-08','2006-01-11 
15:58:33-08','{228226,228222,228210}') gt2
where gt.tran_id = gt2.tran_id;

  QUERY 
PLAN
--
 Merge Join  (cost=124.66..204.66 rows=5000 width=80) (actual 
time=83.027..83.040 rows=3 loops=1)
   Merge Cond: (outer.tran_id = inner.tran_id)
   -  Sort  (cost=62.33..64.83 rows=1000 width=40) (actual 
time=40.250..40.251 rows=3 loops=1)
 Sort Key: gt.tran_id
 -  Function Scan on get_tran_filesize gt  (cost=0.00..12.50 
rows=1000 width=40) (actual time=40.237..40.237 rows=3 loops=1)
   -  Sort  (cost=62.33..64.83 rows=1000 width=40) (actual 
time=42.765..42.767 rows=3 loops=1)
 Sort Key: gt2.tran_id
 -  Function Scan on get_tran_filesize gt2  (cost=0.00..12.50 
rows=1000 width=40) (actual time=42.748..42.751 rows=3 loops=1)
 Total runtime: 83.112 ms
(9 rows)


If I do get this working, then my question is, if I reference this function 
within a single query, but within seperate subqueries within the query, will 
it be re-evaluated each time, or just once.  Basically, I'm not clear on the 
definition of surrounding query in the following exerpt from the Postgreql 
documentation:

A STABLE function cannot modify the database and is guaranteed to return the 
same results given the same arguments for all calls within a single
surrounding query.

Thanks,

Mark

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Help in avoiding a query 'Warm-Up' period/shared buffer cache

2006-01-05 Thread Mark Liberman
On Thursday 05 January 2006 15:12, Qingqing Zhou wrote:
 Mark Liberman [EMAIL PROTECTED] wrote

  First run, after a night of inactivity:
 
  -  Bitmap Index Scan on
  1min_events_file_id_begin_idx (cost=0.00..37.85 rows=3670 width=0)
  (actual time=313.468..313.468 rows=11082
  loops=1)
Index Cond: (file_id = 137271)
  Total runtime: 313.643 ms
 
  Second run, after that:
 
  -  Bitmap Index Scan on
  1min_events_file_id_begin_idx (cost=0.00..37.85 rows=3670 width=0)
  (actual time=2.106..2.106 rows=11082 loops=1)
Index Cond: (file_id = 137271)
  Total runtime: 2.276 ms

 It is clear that the first query takes longer time because of the IO time
 of index 1min_events_file_id_begin_idx (see 313.468 vs. 2.106). I am afraid
 currently there is no easy solution for this situation, unless you could
 predicate which part of relation/index your query will use, then you can
 preload or warm-up cache for it.

 Regards,
 Qingqing


Thanks Qingqing, 

this actually helped me determine that the compound index, 
1min_events_file_id_begin_idx, is not the proper index to use as it is based 
on file_id and begin_time - the later of which is not involved in the where 
clause.  It is only using that index to filter out the listed file_id.  

Now, my follow-up question / assumption.  I am assuming that the  IO time is 
so long on that index because it has to read the entire index (for that 
file_id) into memory (because it cannot just scan the rows with a certain 
date range because we are not using begin_time in the where clause). 

But, if I replaced that compound index with the proper compound index of 
file_id / end_time, it would give similar performance results to the scan on 
1min_events_end_idx (which was  1 ms).  E.g. the latest rows that were 
updated are more likely to be in the cache - and it is smart enough to only 
read the index rows that it needs.

Alternatively, I could create a single index on file_id (and rely upon the new 
bitmap scan capabilities in 1.2).  But, I fear that, although this will be 
smaller than the erroneous compound index on file_id / begin_time, it will 
still display the same behavior in that it will need to read all rows from 
that index for the appropriate file_id - and since the data goes back every 
minute for 60 days, that IO might be large.

Obviously, I will be testing this - but it might take a few days, as I haven't 
figure out how to simulate the period of inactivity to get the data flushed 
out of the cache ... so I have to run this each morning.  But, any 
confirmation / corrections to my assumptions are greatly appreciated. E.g. is 
the compound index the way to go, or the solo index on file_id?

Thanks,

Mark


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org