Re: [HACKERS] 27 second plan times

2007-04-21 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 I think there's still a problem here with some kind of n^2 behaviour for
 appends of very wide tables but I haven't quite nailed it yet. In any case is
 there any reason not to make the following small change to move the constraint
 exclusion ahead of the size estimates and index checks and save ourselves
 potentially a lot of work?

Applied along with some other hacking to reduce the costs of the
lower-level functions that this example shows to be inefficient.
They'd still be slow in large queries, whether CE applies or not.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] 27 second plan times

2007-04-21 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 Applied along with some other hacking to reduce the costs of the
 lower-level functions that this example shows to be inefficient.
 They'd still be slow in large queries, whether CE applies or not.

BIG difference. The case that caused swapping and took almost 15m to plan is
now down to 2.5s. The profile still looks a bit odd but  I can't argue with
the results.


[EMAIL PROTECTED]:/var/tmp/db$ gprof /usr/local/pgsql/bin/postgres gmon.out
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self  self total   
 time   seconds   secondscalls   s/call   s/call  name
 24.36  2.46 2.46   418517 0.00 0.00  SearchCatCache
  7.33  3.20 0.74  2564235 0.00 0.00  hash_any
  6.34  3.84 0.64  4283964 0.00 0.00  
hash_search_with_hash_value
  4.36  4.28 0.44   216316 0.00 0.00  list_nth_cell
  3.96  4.68 0.40  6535943 0.00 0.00  AllocSetAlloc
  3.37  5.02 0.34  4165664 0.00 0.00  _bt_compare
  2.67  5.29 0.27  2266696 0.00 0.00  
MemoryContextAllocZeroAligned

...
0.010.032000/424529  get_namespace_name [164]
0.010.032001/424529  pg_class_aclmask [167]
0.010.032001/424529  get_rel_name [163]
0.010.032002/424529  has_subclass [165]
1.212.69  204102/424529  get_attavgwidth [37]
1.212.69  204308/424529  TupleDescInitEntry [36]
[632]0.00.000.00  418517 SearchSysCache cycle 9 [632]
  418517 SearchCatCache cycle 9 [15]



-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] 27 second plan times

2007-04-21 Thread Gregory Stark
Gregory Stark [EMAIL PROTECTED] writes:

 Tom Lane [EMAIL PROTECTED] writes:

 Applied along with some other hacking to reduce the costs of the
 lower-level functions that this example shows to be inefficient.
 They'd still be slow in large queries, whether CE applies or not.

 BIG difference. The case that caused swapping and took almost 15m to plan is
 now down to 2.5s. The profile still looks a bit odd but  I can't argue with
 the results.

 [EMAIL PROTECTED]:/var/tmp/db$ gprof /usr/local/pgsql/bin/postgres gmon.out
 Flat profile:

 Each sample counts as 0.01 seconds.
   %   cumulative   self  self total   
  time   seconds   secondscalls   s/call   s/call  name
  24.36  2.46 2.46   418517 0.00 0.00  SearchCatCache
   7.33  3.20 0.74  2564235 0.00 0.00  hash_any
   6.34  3.84 0.64  4283964 0.00 0.00  
 hash_search_with_hash_value
   4.36  4.28 0.44   216316 0.00 0.00  list_nth_cell
   3.96  4.68 0.40  6535943 0.00 0.00  AllocSetAlloc
   3.37  5.02 0.34  4165664 0.00 0.00  _bt_compare
   2.67  5.29 0.27  2266696 0.00 0.00  
 MemoryContextAllocZeroAligned

For what it's worth if I defeat the same column position optimization
SearchCatCache and list_nth_cell pop back to the top. I had a version that
kept the col_mappings and translated_vars lists in arrays to fix the n^2
list_nth_cell behaviour. But the SearchCatCache is something else.

 53.77 87.3687.36  4813018 0.00 0.00  SearchCatCache
 11.29105.7118.35  1610810 0.00 0.00  list_nth_cell
  3.09110.72 5.01 17738640 0.00 0.00  hash_any
  2.50114.78 4.05 30612499 0.00 0.00  
hash_search_with_hash_value
  2.16118.29 3.51 54588745 0.00 0.00  AllocSetAlloc


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] 27 second plan times

2007-04-21 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 Applied along with some other hacking to reduce the costs of the
 lower-level functions that this example shows to be inefficient.
 They'd still be slow in large queries, whether CE applies or not.

 BIG difference. The case that caused swapping and took almost 15m to plan is
 now down to 2.5s. The profile still looks a bit odd but  I can't argue with
 the results.

I'm still feeling a bit annoyed with the behavior of the stats machinery
(pgstat_initstats and related macros).  Yesterday I fixed it so that
pgstat_initstats doesn't perform redundant searches of the tabstat
arrays, but there's still an issue, which is that any rel that's
heap_opened or index_opened within a transaction is going to get a
tabstat entry, whether any events are subsequently counted or not.
In typical scenarios I don't think this is a big deal, but in examples
such as yours we're going to be sending a whole lot of all-zero tabstat
messages; there'll be one for every heap or index that the planner even
momentarily considered.  That means more UDP traffic and more work for
the stats collector.  gprof won't show the resulting overhead since
it doesn't know anything about kernel-level overhead or activity in the
stats collector.  (Hm, might be able to measure it in oprofile
though...)

We could fix this by not doing pgstat_initstats at heap_open time,
but postponing it until something more interesting happens.  The trouble
is that that'd add at least a small amount of overhead at the places
where something more interesting is counted, since the pgstat macros
would have to check validity of the tabstat pointer.  The added overhead
should be only about one extra comparison, but maybe that's enough to
make someone object?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] 27 second plan times

2007-04-21 Thread Tom Lane
I wrote:
 I'm still feeling a bit annoyed with the behavior of the stats machinery
 (pgstat_initstats and related macros).
 ... That means more UDP traffic and more work for
 the stats collector.  gprof won't show the resulting overhead since
 it doesn't know anything about kernel-level overhead or activity in the
 stats collector.  (Hm, might be able to measure it in oprofile
 though...)

I spent some time with oprofile and couldn't muster any evidence
suggesting that this was accounting for more than 1% or so of total
runtime.  So for the moment I'll leave it alone.  It might eventually
become worth worrying about, though.

The thing I saw as being more interesting than the tabstat overhead
is that the planner does RelationGetNumberOfBlocks (ie, an lseek kernel
call) on every child rel ... and would do it on every index of every
child rel, too, if the example had any.  It would be nice if we could
postpone all of the work of get_relation_info() until after we've
checked for constraint exclusion.  This looks like it'd require some
nontrivial refactoring though --- in particular, I'm not sure how we'd
handle the total_table_pages calculation.

If you're satisfied with the performance as it now stands, let's leave
this for the maybe-do-someday list.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] 27 second plan times

2007-04-20 Thread Gregory Stark

Following up on some complaints we've had about extremely large plan times for
large partitioned tables I've been doing some profiling. I've constructed a
situation where it takes upwards of 30 seconds to plan a simple query like:

postgres=# explain select * from a where i between 999 and 1001;
  QUERY PLAN  
--
 Result  (cost=0.00..8.63 rows=58 width=108)
   -  Append  (cost=0.00..8.63 rows=58 width=108)
 -  Seq Scan on a  (cost=0.00..1.01 rows=1 width=108)
   Filter: ((i = 999) AND (i = 1001))
 -  Seq Scan on a997 a  (cost=0.00..1.27 rows=1 width=108)
   Filter: ((i = 999) AND (i = 1001))
 -  Seq Scan on a998 a  (cost=0.00..1.27 rows=1 width=108)
   Filter: ((i = 999) AND (i = 1001))
 -  Seq Scan on a999 a  (cost=0.00..1.27 rows=18 width=108)
   Filter: ((i = 999) AND (i = 1001))
 -  Seq Scan on a1000 a  (cost=0.00..1.27 rows=18 width=108)
   Filter: ((i = 999) AND (i = 1001))
 -  Seq Scan on a1001 a  (cost=0.00..1.27 rows=18 width=108)
   Filter: ((i = 999) AND (i = 1001))
 -  Seq Scan on a1002 a  (cost=0.00..1.27 rows=1 width=108)
   Filter: ((i = 999) AND (i = 1001))
(16 rows)

Time: 46324.627 ms

Table a is a parent table with 2,000 partitions each of which have 102
columns, two of which are covered by constraints of the form WHERE i BETWEEN
90 AND 110.

The gprof output is pretty damning:

Each sample counts as 0.01 seconds.
  %   cumulative   self  self total   
 time   seconds   secondscalls   s/call   s/call  name
 62.58203.76   203.76 21,474,937 0.00 0.00  SearchCatCache
  9.01233.0829.32  1,923,789 0.00 0.00  list_nth_cell
  5.34250.4617.38  2386465 0.00 0.00  pgstat_initstats
  2.80259.57 9.11 235691652 0.00 0.00  AllocSetAlloc
  1.95265.91 6.34 219852840 0.00 0.00  nocachegetattr
  1.19269.78 3.87 256569078 0.00 0.00  FunctionCall2
  0.74272.18 2.40 107923848 0.00 0.00  
MemoryContextAllocZeroAligned


The SearchCatCache here is the one in get_attavgwidth called to estimate the
relation width. There are 200k attributes being measured here but I'm not
clear why it's causing 21M calls.

The first thing that comes to mind is that we're doing the
constraint_exclusion code *after* estimating the width of the relations we're
going to exclude. If we push the constraint exclusion up a few lines the
planning time goes down the 1.7s.

I think there's still a problem here with some kind of n^2 behaviour for
appends of very wide tables but I haven't quite nailed it yet. In any case is
there any reason not to make the following small change to move the constraint
exclusion ahead of the size estimates and index checks and save ourselves
potentially a lot of work?

Index: allpaths.c
===
RCS file: 
/home/stark/src/REPOSITORY/pgsql/src/backend/optimizer/path/allpaths.c,v
retrieving revision 1.161
diff -c -r1.161 allpaths.c
*** allpaths.c  22 Feb 2007 22:00:23 -  1.161
--- allpaths.c  20 Apr 2007 18:12:40 -
***
*** 196,215 
  static void
  set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
  {
-   /* Mark rel with estimated output rows, width, etc */
-   set_baserel_size_estimates(root, rel);
- 
-   /* Test any partial indexes of rel for applicability */
-   check_partial_indexes(root, rel);
- 
-   /*
-* Check to see if we can extract any restriction conditions from join
-* quals that are OR-of-AND structures.  If so, add them to the rel's
-* restriction list, and recompute the size estimates.
-*/
-   if (create_or_index_quals(root, rel))
-   set_baserel_size_estimates(root, rel);
- 
/*
 * If we can prove we don't need to scan the rel via constraint 
exclusion,
 * set up a single dummy path for it.  (Rather than inventing a special
--- 196,201 
***
*** 228,233 
--- 214,233 
return;
}
  
+   /* Mark rel with estimated output rows, width, etc */
+   set_baserel_size_estimates(root, rel);
+ 
+   /* Test any partial indexes of rel for applicability */
+   check_partial_indexes(root, rel);
+ 
+   /*
+* Check to see if we can extract any restriction conditions from join
+* quals that are OR-of-AND structures.  If so, add them to the rel's
+* restriction list, and recompute the size estimates.
+*/
+   if (create_or_index_quals(root, rel))
+   set_baserel_size_estimates(root, rel);
+ 
/*
 * Generate paths and add them to the rel's pathlist.
 *




-- 
  Gregory Stark
  

Re: [HACKERS] 27 second plan times

2007-04-20 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 The SearchCatCache here is the one in get_attavgwidth called to estimate the
 relation width.

Hmm, that information is supposed to be cached ... could you provide the
test case?

 The first thing that comes to mind is that we're doing the
 constraint_exclusion code *after* estimating the width of the relations we're
 going to exclude. If we push the constraint exclusion up a few lines the
 planning time goes down the 1.7s.

... and probably breaks a few things; are you sure there are no needed
side effects of the skipped code?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] 27 second plan times

2007-04-20 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 Gregory Stark [EMAIL PROTECTED] writes:
 The SearchCatCache here is the one in get_attavgwidth called to estimate the
 relation width.

 Hmm, that information is supposed to be cached ... could you provide the
 test case?

Cached meaning it's stored in the cache once it's looked up to avoid future
lookups? In which case it will still mean 200k lookups since there really are
that many attributes.

Or cached meaning we prime the cache in one batch operation? It may well be
avoiding i/o, but there are suspiciously large numbers of calls to
SearchCatCache itself.

I'll send the pg_dump to pgsql-patches. But it's much larger than really
necessary to reproduce the effect. The planning time was already through the
roof before I added data and extra constraints. The key seems to be the wide
tables though, planning 2,000 child tables is fast if they only have a couple
columns.

 The first thing that comes to mind is that we're doing the
 constraint_exclusion code *after* estimating the width of the relations we're
 going to exclude. If we push the constraint exclusion up a few lines the
 planning time goes down the 1.7s.

 ... and probably breaks a few things; are you sure there are no needed
 side effects of the skipped code?

Well that's why I asked whether there was a reason we had to do it this way. I
haven't checked yet what those other pieces of code are doing exactly. But the
constraint exclusion doesn't depend on much else, I can't see anything related
to stats or indexes affecting it. The regression tests all pass but that
doesn't prove anything.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] 27 second plan times

2007-04-20 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Gregory Stark [EMAIL PROTECTED] writes:
 The SearchCatCache here is the one in get_attavgwidth called to estimate the
 relation width.

 Hmm, that information is supposed to be cached ... could you provide the
 test case?

This case caused my machine to thrash and swap terribly with the backend
requiring over 1.3G of RAM to handle the explain. 

What's really strange is that I got warnings about checkpoints happening too
frequently. How could an explain of a select cause checkpoints?! There are no
log entries for autovacuum but that's the only thing I can think of.

I'll dive back into gprof tomorrow, but for now it's time for sleep.


postgres=# copy (select 'create table t (' union all select 
'a'||generate_series(1,1599)||' bool,' union all select 'a1600 bool);') to 
'/tmp/t';
COPY 1601
postgres=# \i /tmp/t
CREATE TABLE
postgres=# copy (select 'create table t'||generate_series(1,1000)||' () 
inherits (t);') to '/tmp/u';
COPY 1000
Time: 5.002 ms
postgres=# 
postgres=# \i /tmp/u
postgres=# explain select * from t;
   QUERY PLAN   

 Result  (cost=0.00..10510.50 rows=50050 width=1600)
   -  Append  (cost=0.00..10510.50 rows=50050 width=1600)
 -  Seq Scan on t  (cost=0.00..10.50 rows=50 width=1600)
 -  Seq Scan on t1 t  (cost=0.00..10.50 rows=50 width=1600)
 -  Seq Scan on t2 t  (cost=0.00..10.50 rows=50 width=1600)
...
 -  Seq Scan on t999 t  (cost=0.00..10.50 rows=50 width=1600)
 -  Seq Scan on t1000 t  (cost=0.00..10.50 rows=50 width=1600)
(1003 rows)

Time: 889727.115 ms


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] 27 second plan times

2007-04-20 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Following up on some complaints we've had about extremely large plan times for
 large partitioned tables I've been doing some profiling.

   %   cumulative   self  self total   
  time   seconds   secondscalls   s/call   s/call  name
  62.58203.76   203.76 21,474,937 0.00 0.00  SearchCatCache
   9.01233.0829.32  1,923,789 0.00 0.00  list_nth_cell
   5.34250.4617.38  2386465 0.00 0.00  pgstat_initstats
   2.80259.57 9.11 235691652 0.00 0.00  AllocSetAlloc
   1.95265.91 6.34 219852840 0.00 0.00  nocachegetattr
   1.19269.78 3.87 256569078 0.00 0.00  FunctionCall2

Is this supposed to be for a single run of the query?  Cause the numbers
I get are a lot different:

  %   cumulative   self  self total   
 time   seconds   secondscalls   s/call   s/call  name
 35.24 15.2315.23   353301 0.00 0.00  SearchCatCache
 30.08 28.2313.00   422469 0.00 0.00  list_nth_cell
 21.03 37.32 9.09   642488 0.00 0.00  pgstat_initstats
  1.39 37.92 0.60  4281770 0.00 0.00  
hash_search_with_hash_value
  1.23 38.45 0.53  5130941 0.00 0.00  AllocSetAlloc
  1.16 38.95 0.50  4154885 0.00 0.00  _bt_compare

(This is CVS HEAD as of today, which might or might not make a difference.)

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly