subject:"Re\: \[HACKERS\] pg_upgrade and statistics"

Re: [HACKERS] pg_upgrade and statistics

2012-03-16 Thread Ants Aasma

On Thu, Mar 15, 2012 at 8:48 PM, Alvaro Herrera alvhe...@commandprompt.com
 What Peter proposed seems to me pretty reasonable, in the sense that it
 should be possible to come up with a function that creates some text
 representation of whatever is in pg_statistic, and another function to
 load that data into the new catalog(s).  There's no need to keep
 pg_statistic binary-compatible, or even continue to have only
 pg_statistic (IIRC Zoltan/Hans-Jurgen patch for cross-column stats adds
 a new catalog, pg_statistic2 or similar).  If the upgrade target release
 has room for more/improved stats, that's fine -- they'll be unused after
 loading the stats from the dump.  And the old stats can be
 reacommodated, if necessary.

I have been reading up on selectivity estimation research for the last
few days. I must say that I also think that having a text
representation as an intermediate won't create a huge maintenance
burden. The basic concepts that are there are pretty solid.
Conceptually MCV's and histograms continue to be essential even with
the more complex approaches. Trying to maintain binary compatibility
is probably a bad idea, as Tom noted with the array selectivity patch
- encoding of the information could be better. But given a textual
format it won't be too hard to just massage the data to the new
format. Making it possible to dump and load stats has the additional
bonus of enabling more experimentation with custom stats collectors.
One could easily prototype the stats collection with R, scipy, etc. Of
course the proof will be in the pudding.

Re, the patch, current posted WIP cross-col patch doesn't create a new
catalog,. It repurposes the stat slots mechanism to store multiple
dimensions. But I'll most likely rewrite it to use a separate catalog
because the storage requirements are rather different. I'll post a
proposal in the appropriate thread when I have decently clear idea how
this should work. One thing that seems clear is that multi-dimensional
histograms will want this mechanism even more, optimal histogram
construction is NP-hard in the multi-dimensional case and so people
will want to try different algorithms, or make different tradeoffs on
effort spent on constructing the histogram. Or even build one by hand.

Cheers,
Ants Aasma

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-16 Thread Bruce Momjian

On Thu, Mar 15, 2012 at 11:46:04AM -0400, Bruce Momjian wrote:
 On Thu, Mar 15, 2012 at 11:15:42AM -0400, Andrew Dunstan wrote:
  You're not the only person who could do that. I don't think this is
  all down to you. It should just be understood that if the stats
  format is changed, adjusting pg_upgrade needs to be part of the
  change. When we modified how enums worked, we adjusted pg_upgrade at
  the same time. That sort of thing seems totally reasonable to me.
  
  I haven't looked at it, but I'm wondering how hard it is going to be
  in practice?
 
 Well, the reason I am not recommending migration is because the work
 will _not_ fall on me, but rather on the larger community of developers,
 who might or might not care about pg_upgrade.  (I am concerned about
 pg_upgrade retarding development progress.)
 
 We are already telling developers not to change the binary storage
 format without considering pg_upgrade --- do we want to do the same for
 optimizer statistics?  Does the optimizer statistics format change more
 frequently than the storage format?  I think the answer is yes.  Does it
 change too frequently to require migration work?  I don't know the
 answer to that, partly because I would not be the one doing the work.
 
 Also, considering we are on the last 9.2 commit-fest, I doubt we can get
 something working for statistics migration for 9.2, I think the
 incremental statistics build approach is our only way to improve this
 for 9.2, and frankly, 9.1 users can also use the script once I post it.
 
 FYI, I have not received a huge number of complaints about the old
 analyze recommendation --- a few, but not a flood.  The incremental
 build approach might be good enough.
 
 My wild guess is that even if we migrated all statistics, the migrated
 statistics will still not have any improvements we have made in
 statistics gathering, meaning there will still be some kind of analyze
 process necessary, hopefully just on the affected tables --- that would
 be shorter, but perhaps not shorter enough to warrant the work in
 migrating the statistics.
 
 I am attaching the updated script and script output.
 
 Please, don't think I am ungrateful for the offers of help in migrating
 statistics.  I just remember how complex the discussion was when we
 modified the enum improvements to allow pg_upgrade, and how complex the
 checksum discussion was related to pg_upgrade.

Applied to git head for PG 9.2.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-15 Thread Peter Eisentraut

On ons, 2012-03-14 at 17:36 -0400, Bruce Momjian wrote:
 Well, I have not had to make major adjustments to pg_upgrade since 9.0,
 meaning the code is almost complete unchanged and does not require
 additional testing for each major release.  If we go down the road of
 dumping stats, we will need to adjust for stats changes and test this to
 make sure we have made the proper adjustment for every major release. 

I think this could be budgeted under keeping pg_dump backward
compatible.  You have to do that anyway for each catalog change, and so
doing something extra for a pg_statistic change should be too shocking.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-15 Thread Bruce Momjian

On Thu, Mar 15, 2012 at 08:22:24AM +0200, Peter Eisentraut wrote:
 On ons, 2012-03-14 at 17:36 -0400, Bruce Momjian wrote:
  Well, I have not had to make major adjustments to pg_upgrade since 9.0,
  meaning the code is almost complete unchanged and does not require
  additional testing for each major release.  If we go down the road of
  dumping stats, we will need to adjust for stats changes and test this to
  make sure we have made the proper adjustment for every major release. 
 
 I think this could be budgeted under keeping pg_dump backward
 compatible.  You have to do that anyway for each catalog change, and so
 doing something extra for a pg_statistic change should be too shocking.

Well, the big question is whether the community wants to buy into that
workload.  It isn't going to be possible for me to adjust the statistics
dump/restore code based on the changes someone makes unless I can fully
understand the changes by looking at the patch.

I think we have two choices --- either migrate the statistics, or adopt
my approach to generating incremental statistics quickly.  Does anyone
see any other options?

In an ideal world, analyze would generate minimal statistics on all
tables/databases, then keep improving them until they are the default,
but that is unlikely to happen because of the code complexity involved. 
My powers-of-10 approach is probably the best we are going to do in the
short term.

My current idea is to apply the incremental analyze script patch to 9.2,
and blog about the patch and supply downloadable versions of the script
people can use on 9.1 and get feedback on improvements.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-15 Thread Andrew Dunstan




On 03/15/2012 11:03 AM, Bruce Momjian wrote:

On Thu, Mar 15, 2012 at 08:22:24AM +0200, Peter Eisentraut wrote:

On ons, 2012-03-14 at 17:36 -0400, Bruce Momjian wrote:

Well, I have not had to make major adjustments to pg_upgrade since 9.0,
meaning the code is almost complete unchanged and does not require
additional testing for each major release.  If we go down the road of
dumping stats, we will need to adjust for stats changes and test this to
make sure we have made the proper adjustment for every major release.

I think this could be budgeted under keeping pg_dump backward
compatible.  You have to do that anyway for each catalog change, and so
doing something extra for a pg_statistic change should be too shocking.

Well, the big question is whether the community wants to buy into that
workload.  It isn't going to be possible for me to adjust the statistics
dump/restore code based on the changes someone makes unless I can fully
understand the changes by looking at the patch.



You're not the only person who could do that. I don't think this is all 
down to you. It should just be understood that if the stats format is 
changed, adjusting pg_upgrade needs to be part of the change. When we 
modified how enums worked, we adjusted pg_upgrade at the same time. That 
sort of thing seems totally reasonable to me.


I haven't looked at it, but I'm wondering how hard it is going to be in 
practice?


cheers

andrew





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-15 Thread Kevin Grittner

Bruce Momjian br...@momjian.us wrote:
 
 I think we have two choices --- either migrate the statistics, or
 adopt my approach to generating incremental statistics quickly.
 Does anyone see any other options?
 
Would it make any sense to modify the incremental approach to do a
first pass of any tables with target overrides, using the default
GUC setting, and then proceed through the passes you describe for
all tables *except* those?  I'm thinking that any overrides were
probably set because the columns are particularly important in terms
of accurate statistics, and that running with different GUC settings
will just be a waste of time for those tables -- if they have a high
setting for any column, they will sample more blocks for every run,
right?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-15 Thread Bruce Momjian

On Thu, Mar 15, 2012 at 11:15:42AM -0400, Andrew Dunstan wrote:
 You're not the only person who could do that. I don't think this is
 all down to you. It should just be understood that if the stats
 format is changed, adjusting pg_upgrade needs to be part of the
 change. When we modified how enums worked, we adjusted pg_upgrade at
 the same time. That sort of thing seems totally reasonable to me.
 
 I haven't looked at it, but I'm wondering how hard it is going to be
 in practice?

Well, the reason I am not recommending migration is because the work
will _not_ fall on me, but rather on the larger community of developers,
who might or might not care about pg_upgrade.  (I am concerned about
pg_upgrade retarding development progress.)

We are already telling developers not to change the binary storage
format without considering pg_upgrade --- do we want to do the same for
optimizer statistics?  Does the optimizer statistics format change more
frequently than the storage format?  I think the answer is yes.  Does it
change too frequently to require migration work?  I don't know the
answer to that, partly because I would not be the one doing the work.

Also, considering we are on the last 9.2 commit-fest, I doubt we can get
something working for statistics migration for 9.2, I think the
incremental statistics build approach is our only way to improve this
for 9.2, and frankly, 9.1 users can also use the script once I post it.

FYI, I have not received a huge number of complaints about the old
analyze recommendation --- a few, but not a flood.  The incremental
build approach might be good enough.

My wild guess is that even if we migrated all statistics, the migrated
statistics will still not have any improvements we have made in
statistics gathering, meaning there will still be some kind of analyze
process necessary, hopefully just on the affected tables --- that would
be shorter, but perhaps not shorter enough to warrant the work in
migrating the statistics.

I am attaching the updated script and script output.

Please, don't think I am ungrateful for the offers of help in migrating
statistics.  I just remember how complex the discussion was when we
modified the enum improvements to allow pg_upgrade, and how complex the
checksum discussion was related to pg_upgrade.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


analyze_new_cluster.sh
Description: Bourne shell script
This script will generate minimal optimizer statistics rapidly
so your system is usable, and then gather statistics twice more
with increasing accuracy.  When it is done, your system will
have the default level of optimizer statistics.

If you have used ALTER TABLE to modify the statistics target for
any tables, you might want to remove them and restore them after
running this script because they will delay fast statistics generation.

If you would like default statistics as quickly as possible, cancel
this script and run:
vacuumdb --all --analyze-only

Generating minimal optimizer statistics (1 target)
--
vacuumdb: vacuuming database postgres
vacuumdb: vacuuming database template1
vacuumdb: vacuuming database test

The server is now available with minimal optimizer statistics.
Query performance will be optimal once this script completes.

Generating medium optimizer statistics (10 targets)
---
vacuumdb: vacuuming database postgres
vacuumdb: vacuuming database template1
vacuumdb: vacuuming database test

Generating default (full) optimizer statistics (100 targets?)
-
vacuumdb: vacuuming database postgres
vacuumdb: vacuuming database template1
vacuumdb: vacuuming database test

Done


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-15 Thread Bruce Momjian

On Thu, Mar 15, 2012 at 10:20:02AM -0500, Kevin Grittner wrote:
 Bruce Momjian br...@momjian.us wrote:
  
  I think we have two choices --- either migrate the statistics, or
  adopt my approach to generating incremental statistics quickly.
  Does anyone see any other options?
  
 Would it make any sense to modify the incremental approach to do a
 first pass of any tables with target overrides, using the default
 GUC setting, and then proceed through the passes you describe for
 all tables *except* those?  I'm thinking that any overrides were
 probably set because the columns are particularly important in terms
 of accurate statistics, and that running with different GUC settings
 will just be a waste of time for those tables -- if they have a high
 setting for any column, they will sample more blocks for every run,
 right?

I just added text telling users they might want to remove and re-add
those per-table statistics.  I could try coding up something to store
and reset those values, but it is going to be complex, partly because
they are in different databases.   I would need to create a pg_upgrade
schema in every database, store and reset the targets, and restore them
once complete.

I don't think it makes sense to do the custom targets first because it
would likely delay all tables from getting any statistics.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-15 Thread Tom Lane

Andrew Dunstan and...@dunslane.net writes:
 On 03/15/2012 11:03 AM, Bruce Momjian wrote:
 On Thu, Mar 15, 2012 at 08:22:24AM +0200, Peter Eisentraut wrote:
 I think this could be budgeted under keeping pg_dump backward
 compatible.  You have to do that anyway for each catalog change, and so
 doing something extra for a pg_statistic change should be too shocking.

 Well, the big question is whether the community wants to buy into that
 workload.  It isn't going to be possible for me to adjust the statistics
 dump/restore code based on the changes someone makes unless I can fully
 understand the changes by looking at the patch.

 You're not the only person who could do that. I don't think this is all 
 down to you. It should just be understood that if the stats format is 
 changed, adjusting pg_upgrade needs to be part of the change. When we 
 modified how enums worked, we adjusted pg_upgrade at the same time. That 
 sort of thing seems totally reasonable to me.

Considering that no pg_dump infrastructure for this exists, much less
has ever been modified to accommodate a cross-version change, it seems
a bit presumptuous to just say yes we're all buying into that.

If someone were to create that infrastructure, complete with the
ability to support the already-committed 9.1 to 9.2 changes, then
we would have a basis for discussing such a requirement.  But until
then it's moot.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-15 Thread Greg Stark

On Thu, Mar 15, 2012 at 3:15 PM, Andrew Dunstan and...@dunslane.net wrote:

 You're not the only person who could do that. I don't think this is all down
 to you. It should just be understood that if the stats format is changed,
 adjusting pg_upgrade needs to be part of the change. When we modified how
 enums worked, we adjusted pg_upgrade at the same time. That sort of thing
 seems totally reasonable to me.

 I haven't looked at it, but I'm wondering how hard it is going to be in
 practice?

Historically pg_statistic has been pretty static. But that seems like
a negative, not a positive -- a big part of my fear here is that it
really really needs a lot of work to improve the statistics. I *hope*
they get knocked around dramatically in each of the next few versions
to handle multi-column stats, something to replace correlation that's
more useful, custom stats functions for more data types, stats
specifically for partitioned tables, etc. I wouldn't want to see any
reason to hold back on these changes.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-15 Thread Peter Eisentraut

On tor, 2012-03-15 at 11:15 -0400, Andrew Dunstan wrote:
 I haven't looked at it, but I'm wondering how hard it is going to be
 in practice?

Take a look at the commit log of pg_statistic.h; it's not a lot.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-15 Thread Tom Lane

Peter Eisentraut pete...@gmx.net writes:
 On tor, 2012-03-15 at 11:15 -0400, Andrew Dunstan wrote:
 I haven't looked at it, but I'm wondering how hard it is going to be
 in practice?

 Take a look at the commit log of pg_statistic.h; it's not a lot.

That says nothing as all about the cost of dealing with a change.
And as Greg pointed out, there might be a lot more churn in the future
than there has been in the past.  We're getting to the point where stats
are a primary limitation on what we can do, so I wouldn't be surprised
if he's right and there's more activity in this area soon.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-15 Thread Alvaro Herrera


Excerpts from Greg Stark's message of jue mar 15 14:45:16 -0300 2012:
 On Thu, Mar 15, 2012 at 3:15 PM, Andrew Dunstan and...@dunslane.net wrote:
 
  You're not the only person who could do that. I don't think this is all down
  to you. It should just be understood that if the stats format is changed,
  adjusting pg_upgrade needs to be part of the change. When we modified how
  enums worked, we adjusted pg_upgrade at the same time. That sort of thing
  seems totally reasonable to me.
 
  I haven't looked at it, but I'm wondering how hard it is going to be in
  practice?
 
 Historically pg_statistic has been pretty static. But that seems like
 a negative, not a positive -- a big part of my fear here is that it
 really really needs a lot of work to improve the statistics. I *hope*
 they get knocked around dramatically in each of the next few versions
 to handle multi-column stats, something to replace correlation that's
 more useful, custom stats functions for more data types, stats
 specifically for partitioned tables, etc. I wouldn't want to see any
 reason to hold back on these changes.

What Peter proposed seems to me pretty reasonable, in the sense that it
should be possible to come up with a function that creates some text
representation of whatever is in pg_statistic, and another function to
load that data into the new catalog(s).  There's no need to keep
pg_statistic binary-compatible, or even continue to have only
pg_statistic (IIRC Zoltan/Hans-Jurgen patch for cross-column stats adds
a new catalog, pg_statistic2 or similar).  If the upgrade target release
has room for more/improved stats, that's fine -- they'll be unused after
loading the stats from the dump.  And the old stats can be
reacommodated, if necessary.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-14 Thread Peter Eisentraut

On tis, 2012-03-13 at 20:34 -0400, Bruce Momjian wrote:
 I frankly am worried that if we copy over statistics even in ASCII
 that don't match what the server expects, it might lead to a crash,
 which has me back to wanting to speed up vacuumdb.

Why can't we maintain a conversion routine for statistics from older
versions?  It's not like the statistics layout changes every week.
pg_dump could print out something like

SELECT pg_restore_statistics(catversion, tablename, ... some data ...);
...

and that function would have the knowledge to convert the data and
insert it back into pg_statistic and pg_class.

That can't be that hard considering all the other work we put into
backward compatibility and upgrade capability.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-14 Thread Bruce Momjian

On Wed, Mar 14, 2012 at 10:40:41PM +0200, Peter Eisentraut wrote:
 On tis, 2012-03-13 at 20:34 -0400, Bruce Momjian wrote:
  I frankly am worried that if we copy over statistics even in ASCII
  that don't match what the server expects, it might lead to a crash,
  which has me back to wanting to speed up vacuumdb.
 
 Why can't we maintain a conversion routine for statistics from older
 versions?  It's not like the statistics layout changes every week.
 pg_dump could print out something like
 
 SELECT pg_restore_statistics(catversion, tablename, ... some data ...);
 ...
 
 and that function would have the knowledge to convert the data and
 insert it back into pg_statistic and pg_class.
 
 That can't be that hard considering all the other work we put into
 backward compatibility and upgrade capability.

Well, I have not had to make major adjustments to pg_upgrade since 9.0,
meaning the code is almost complete unchanged and does not require
additional testing for each major release.  If we go down the road of
dumping stats, we will need to adjust for stats changes and test this to
make sure we have made the proper adjustment for every major release.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-14 Thread Bruce Momjian

On Tue, Mar 13, 2012 at 09:15:52PM -0400, Bruce Momjian wrote:
 On Tue, Mar 13, 2012 at 08:22:51PM -0400, Bruce Momjian wrote:
  On Tue, Mar 13, 2012 at 05:33:29PM -0500, Kevin Grittner wrote:
   Bruce Momjian br...@momjian.us wrote:

What is the target=10 duration?  I think 10 is as low as we can
acceptably recommend.  Should we recommend they run vacuumdb
twice, once with default_statistics_target = 4, and another with
the default?

   Here are the results at various settings.

   1   :  172198.892 ms
   2   :  295536.814 ms
   4   :  474319.826 ms
   10  :  750458.312 ms
   100 :  3433794.609 ms
  
  Thanks, good numbers to know.
 
 OK, new crazy idea.  Kevin has shown that his database can get a single
 bucket in 2.8 minutes.  What if we have pg_upgrade create a vacuumdb
 script that generates increasingly accurate statistics, e.g. it runs for
 default_statistics_target values of 1, 10, and default (100).  That
 would give basic statistics quickly (2.8 minutes), and full statistics
 in an hour, for Kevin's database.
 
   PGOPTIONS='-c default_statistics_target=1' vacuumdb --all --analyze-only
   PGOPTIONS='-c default_statistics_target=10' vacuumdb --all 
 --analyze-only
   vacuumdb --all --analyze-only
 
 The only problem I see is that users who use non-default statistics per
 table would not be affected by the increasing default_statistics_target
 values.
 
 The upside is this would work for all releases of Postgres.

OK, I have modified pg_upgrade with the attached patch to do exactly
this.  I have also attached the script pg_upgrade creates that should be
run instead of vacuumdb.

Based on Kevin's numbers above, the first vacuumdb will be done in 2.8
minutes (1 target), the 10 target vacuumdb done after 15 minutes, and
the 100 target vacuumdb done after 72 minutes (times accumulate).  Here
is what the output looks like:

Generating minimal optimizer statistics (1 target)
--
vacuumdb: vacuuming database postgres
vacuumdb: vacuuming database template1
vacuumdb: vacuuming database test

The server is now available with minimal optimizer statistics.
Query performance will be optimal once this script completes.

Generating medium optimizer statistics (10 targets)
---
vacuumdb: vacuuming database postgres
vacuumdb: vacuuming database template1
vacuumdb: vacuuming database test

Generating default (full) optimizer statistics (100 targets?)
-
vacuumdb: vacuuming database postgres
vacuumdb: vacuuming database template1
vacuumdb: vacuuming database test

Done

If we don't want to try migrating the statistics from the old system,
this seems like the best approach.

Does anyone know how bad the queries will be with only one target?

I did see if vacuumdb --analyze-only was somehow being throttled by the
vacuum settings, but saw the drive at 100% utilization analying a 36GB
table on a 24GB RAM server, so it seems I/O bound.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c
new file mode 100644
index cf43384..4481de0
*** a/contrib/pg_upgrade/check.c
--- b/contrib/pg_upgrade/check.c
*** static void check_for_reg_data_type_usag
*** 23,28 
--- 23,33 
  static void check_for_support_lib(ClusterInfo *cluster);
  static void get_bin_version(ClusterInfo *cluster);
  
+ #ifndef WIN32
+ #define	ECHO_QUOTE	'
+ #else
+ #define	ECHO_QUOTE	
+ #endif
  
  void
  output_check_banner(bool *live_check)
*** issue_warnings(char *sequence_script_fil
*** 193,213 
  
  
  void
! output_completion_banner(char *deletion_script_file_name)
  {
  	/* Did we copy the free space files? */
  	if (GET_MAJOR_VERSION(old_cluster.major_version) = 804)
  		pg_log(PG_REPORT,
! 			   Optimizer statistics are not transferred by pg_upgrade so\n
! 			   consider running:\n
! 			   vacuumdb --all --analyze-only\n
! 			   on the newly-upgraded cluster.\n\n);
  	else
  		pg_log(PG_REPORT,
  			   Optimizer statistics and free space information are not transferred\n
! 			   by pg_upgrade so consider running:\n
! 			   vacuumdb --all --analyze\n
! 			   on the newly-upgraded cluster.\n\n);
  
  	pg_log(PG_REPORT,
  		   Running this script will delete the old cluster's data files:\n
--- 198,217 
  
  
  void
! output_completion_banner(char *analyze_script_file_name,
! 		 char *deletion_script_file_name)
  {
  	/* Did we copy the free space files? */
  	if (GET_MAJOR_VERSION(old_cluster.major_version) = 804)
  		pg_log(PG_REPORT,
! 			   Optimizer

Re: [HACKERS] pg_upgrade and statistics

2012-03-14 Thread Tom Lane

Bruce Momjian br...@momjian.us writes:
 Does anyone know how bad the queries will be with only one target?

Bad.  That cycle seems like largely a waste of time.  About the only
thing it would do for you is ensure that relpages/reltuples are up to
date, which seems like something we could possibly arrange for during
the data import.

 I did see if vacuumdb --analyze-only was somehow being throttled by the
 vacuum settings, but saw the drive at 100% utilization analying a 36GB
 table on a 24GB RAM server, so it seems I/O bound.

I think it'd be good to explicitly set vacuum_cost_delay to 0 in the
first pass, in the same way as you are forcing
default_statistics_target, just in case somebody has a nondefault
setting for that.  The second pass could probably be allowed to use some
higher delay setting.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-14 Thread Bruce Momjian

On Wed, Mar 14, 2012 at 08:26:06PM -0400, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  Does anyone know how bad the queries will be with only one target?
 
 Bad.  That cycle seems like largely a waste of time.  About the only
 thing it would do for you is ensure that relpages/reltuples are up to
 date, which seems like something we could possibly arrange for during
 the data import.

Well, it is also getting us the most common value, which seems useful.

  I did see if vacuumdb --analyze-only was somehow being throttled by the
  vacuum settings, but saw the drive at 100% utilization analying a 36GB
  table on a 24GB RAM server, so it seems I/O bound.
 
 I think it'd be good to explicitly set vacuum_cost_delay to 0 in the
 first pass, in the same way as you are forcing
 default_statistics_target, just in case somebody has a nondefault
 setting for that.  The second pass could probably be allowed to use some
 higher delay setting.

OK, I have now set vacuum_cost_delay=0 for the first vacuumdb
(target=1).

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Daniel Farina

On Mon, Mar 12, 2012 at 8:10 PM, Bruce Momjian br...@momjian.us wrote:
 To answer your specific question, I think clearing the last analyzed
 fields should cause autovacuum to run on analyze those tables.  What I
 don't know is whether not clearing the last vacuum datetime will cause
 the table not to be analyzed.

Thank you very much for this reference.  I will look into it.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Daniel Farina

On Mon, Mar 12, 2012 at 9:12 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Bruce Momjian br...@momjian.us writes:
 Copying the statistics from the old server is on the pg_upgrade TODO
 list.  I have avoided it because it will add an additional requirement
 that will make pg_upgrade more fragile in case of major version changes.

 Does anyone have a sense of how often we change the statistics data
 between major versions?

 I don't think pg_statistic is inherently any more stable than any other
 system catalog.

Agreed, but it would appear that in practice that a fair amount of it
carries forward.  If someone ripped up the statistics system and did
them all over in such a way that the old fields had no meaning on
future costing metrics, that'd probably be reasonable cause for a
caveat involving full-blown reanalyze...still, that doesn't seem to
happen every year.

 We've whacked it around significantly just last week,
 which might color my perception a bit, but there are other changes on
 the to-do list.  (For one example, see nearby complaints about
 estimating TOAST-related costs, which we could not fix without adding
 more stats data.)

Is accruing additional statistics likely going to be a big problem?  I
noticed the addition of the new anyarray (presumably for
array-selectivity) features; would planning with an empty assumption
be disastrous vs. the old behavior, which had no concept of those at
all?

I don't think it's necessary to make statistics porting a feature of
pg_upgrade in all circumstances, but it would be nice when possible.
 That having been said, perhaps there are other ways for pg_upgrade to
be better invested inor, best of all and somewhat unrelatedly,
full blown logical replication.

Although this conversation has taken focus on how do we move stats
forward, I am about as interested in how do I run statements (like
ANALYZE) more 'nicely'.  The same general problem pervades many
background task issues, including autovacuum and large physical
reorganizations of data.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Bruce Momjian

On Tue, Mar 13, 2012 at 12:12:27AM -0400, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  Copying the statistics from the old server is on the pg_upgrade TODO
  list.  I have avoided it because it will add an additional requirement
  that will make pg_upgrade more fragile in case of major version changes.
 
  Does anyone have a sense of how often we change the statistics data
  between major versions?
 
 I don't think pg_statistic is inherently any more stable than any other
 system catalog.  We've whacked it around significantly just last week,
 which might color my perception a bit, but there are other changes on
 the to-do list.  (For one example, see nearby complaints about
 estimating TOAST-related costs, which we could not fix without adding
 more stats data.)

Yes, that was my reaction too.  pg_upgrade has worked hard to avoid
copying any system tables, relying on pg_dump to handle that.  

I just received a sobering blog comment stating that pg_upgrade took 5
minutes on a 0.5TB database, but analyze took over an hour:

http://momjian.us/main/blogs/pgblog/2012.html#March_12_2012

Is there some type of intermediate format we could use to dump/restore
the statistics?  Is there an analyze light mode we could support that
would run faster?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Bruce Momjian

On Tue, Mar 13, 2012 at 12:33:09AM -0700, Daniel Farina wrote:
 On Mon, Mar 12, 2012 at 8:10 PM, Bruce Momjian br...@momjian.us wrote:
  To answer your specific question, I think clearing the last analyzed
  fields should cause autovacuum to run on analyze those tables.  What I
  don't know is whether not clearing the last vacuum datetime will cause
  the table not to be analyzed.
 
 Thank you very much for this reference.  I will look into it.

I assume a missing last_analyze would trigger an auto-analyze, but I am
unclear if we assume a last_vacuum included an analyze;  I think you
need to look at autovacuum.c for the details;  let me know if you need
help.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Kevin Grittner

Bruce Momjian br...@momjian.us wrote:
 
 I just received a sobering blog comment stating that pg_upgrade
 took 5 minutes on a 0.5TB database, but analyze took over an hour:
 
Yeah, we have had similar experiences.  Even if this can't be done
for every release or for every data type, bringing over statistics
from the old release as a starting point would really help minimize
downtime on large databases.
 
Of course, release docs should indicate which statistics *won't* be
coming across, and should probably recommend a database ANALYZE or
VACUUM ANALYZE be done when possible.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Alvaro Herrera


Excerpts from Bruce Momjian's message of mar mar 13 11:14:43 -0300 2012:
 On Tue, Mar 13, 2012 at 12:33:09AM -0700, Daniel Farina wrote:
  On Mon, Mar 12, 2012 at 8:10 PM, Bruce Momjian br...@momjian.us wrote:
   To answer your specific question, I think clearing the last analyzed
   fields should cause autovacuum to run on analyze those tables.  What I
   don't know is whether not clearing the last vacuum datetime will cause
   the table not to be analyzed.
  
  Thank you very much for this reference.  I will look into it.
 
 I assume a missing last_analyze would trigger an auto-analyze,

You're wrong.  Autovacuum does not consider time, only dead/live tuple
counts.  The formulas it uses are in the autovacuum docs; some details
(such as the fact that it skips tables that do not have stat entries)
might be missing.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Bruce Momjian

On Tue, Mar 13, 2012 at 11:34:16AM -0300, Alvaro Herrera wrote:
 
 Excerpts from Bruce Momjian's message of mar mar 13 11:14:43 -0300 2012:
  On Tue, Mar 13, 2012 at 12:33:09AM -0700, Daniel Farina wrote:
   On Mon, Mar 12, 2012 at 8:10 PM, Bruce Momjian br...@momjian.us wrote:
To answer your specific question, I think clearing the last analyzed
fields should cause autovacuum to run on analyze those tables.  What I
don't know is whether not clearing the last vacuum datetime will cause
the table not to be analyzed.
   
   Thank you very much for this reference.  I will look into it.
  
  I assume a missing last_analyze would trigger an auto-analyze,
 
 You're wrong.  Autovacuum does not consider time, only dead/live tuple
 counts.  The formulas it uses are in the autovacuum docs; some details
 (such as the fact that it skips tables that do not have stat entries)
 might be missing.

Oh, yes.  Thank you for the correction;  not sure what I was thinking.

How would they trigger an autovacuum then?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Alvaro Herrera


Excerpts from Bruce Momjian's message of mar mar 13 11:49:26 -0300 2012:
 
 On Tue, Mar 13, 2012 at 11:34:16AM -0300, Alvaro Herrera wrote:
  
  Excerpts from Bruce Momjian's message of mar mar 13 11:14:43 -0300 2012:
   On Tue, Mar 13, 2012 at 12:33:09AM -0700, Daniel Farina wrote:
On Mon, Mar 12, 2012 at 8:10 PM, Bruce Momjian br...@momjian.us wrote:
 To answer your specific question, I think clearing the last analyzed
 fields should cause autovacuum to run on analyze those tables.  What I
 don't know is whether not clearing the last vacuum datetime will cause
 the table not to be analyzed.

Thank you very much for this reference.  I will look into it.
   
   I assume a missing last_analyze would trigger an auto-analyze,
  
  You're wrong.  Autovacuum does not consider time, only dead/live tuple
  counts.  The formulas it uses are in the autovacuum docs; some details
  (such as the fact that it skips tables that do not have stat entries)
  might be missing.
 
 Oh, yes.  Thank you for the correction;  not sure what I was thinking.
 
 How would they trigger an autovacuum then?

We don't have any mechanism to trigger it currently.  Maybe we could
inject fake messages to the stats collector so that it'd believe the
tables have lots of new tuples and an analyze is necessary.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Bruce Momjian

On Tue, Mar 13, 2012 at 09:28:58AM -0500, Kevin Grittner wrote:
 Bruce Momjian br...@momjian.us wrote:
  
  I just received a sobering blog comment stating that pg_upgrade
  took 5 minutes on a 0.5TB database, but analyze took over an hour:
  
 Yeah, we have had similar experiences.  Even if this can't be done
 for every release or for every data type, bringing over statistics
 from the old release as a starting point would really help minimize
 downtime on large databases.
  
 Of course, release docs should indicate which statistics *won't* be
 coming across, and should probably recommend a database ANALYZE or
 VACUUM ANALYZE be done when possible.

Having a works timetimes behavior is really not good;  some users
aren't going to notice until it is too late that they need to run
analyze.  It is fine for hard-core folks like Kevin, but not for the
average user.

At best, pg_upgrade needs to copy over the statistics it can, and adjust
the system statistics to cause autoanalyze to run on those that can't be
migrated.  Frankly, as Tom stated, we have been adjusting the system
statistics collection so often that I have avoided hard-coding that
information into pg_upgrade --- it could potentially make pg_upgrade
less reliable, i.e. vacuumdb --all --analyze always works.

We might decide we want a consistently slow process rather than one that
is sometimes fast and sometimes slow.

As you can see, I am at a loss in how to improve this.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Bruce Momjian

On Tue, Mar 13, 2012 at 12:08:41PM -0300, Alvaro Herrera wrote:
   You're wrong.  Autovacuum does not consider time, only dead/live tuple
   counts.  The formulas it uses are in the autovacuum docs; some details
   (such as the fact that it skips tables that do not have stat entries)
   might be missing.
  
  Oh, yes.  Thank you for the correction;  not sure what I was thinking.
  
  How would they trigger an autovacuum then?
 
 We don't have any mechanism to trigger it currently.  Maybe we could
 inject fake messages to the stats collector so that it'd believe the
 tables have lots of new tuples and an analyze is necessary.

Ewe!  Yes, I thought some more and realized these are system _views_,
meaning we can't just update them with UPDATE.  It sounds like something
pg_upgrade will have to do with a server-side function, someday.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Greg Stark

On Tue, Mar 13, 2012 at 1:38 AM, Daniel Farina dan...@heroku.com wrote:
 You probably are going to ask: why not just run ANALYZE and be done
 with it?

Uhm yes. If analyze takes a long time then something is broken. It's
only reading a sample which should be pretty much a fixed number of
pages per table. It shouldn't take much longer on your large database
than on your smaller databases.

Perhaps you're running vacuum analyze by mistake?

If Analyze is taking a long time then we're getting the worst of both
worlds. The statistics are very poor for certain metrics (namely
ndistinct). The main reason we don't do better is because we don't
want to do a full scan.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Kevin Grittner

Greg Stark st...@mit.edu wrote:
 Daniel Farina dan...@heroku.com wrote:
 You probably are going to ask: why not just run ANALYZE and be
 done with it?
 
 Uhm yes. If analyze takes a long time then something is broken.
 It's only reading a sample which should be pretty much a fixed
 number of pages per table. It shouldn't take much longer on your
 large database than on your smaller databases.
 
On a small database:
 
cc=# analyze CaseHist;
ANALYZE
Time: 255.107 ms
cc=# select relpages, reltuples from pg_class where relname =
'CaseHist';
 relpages | reltuples 
--+---
 1264 | 94426
(1 row)
 
Same table on a much larger database (and much more powerful
hardware):
 
cir=# analyze CaseHist;
ANALYZE
Time: 143450.467 ms
cir=# select relpages, reltuples from pg_class where relname =
'CaseHist';
 relpages |  reltuples  
--+-
  3588659 | 2.12391e+08
(1 row)
 
Either way, there are about 500 tables in the database.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Bruce Momjian

On Tue, Mar 13, 2012 at 05:46:06PM +, Greg Stark wrote:
 On Tue, Mar 13, 2012 at 1:38 AM, Daniel Farina dan...@heroku.com wrote:
  You probably are going to ask: why not just run ANALYZE and be done
  with it?
 
 Uhm yes. If analyze takes a long time then something is broken. It's
 only reading a sample which should be pretty much a fixed number of
 pages per table. It shouldn't take much longer on your large database
 than on your smaller databases.
 
 Perhaps you're running vacuum analyze by mistake?

pg_upgrade recommends running this command:

vacuumdb --all --analyze-only

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Bruce Momjian

On Tue, Mar 13, 2012 at 01:18:58PM -0500, Kevin Grittner wrote:
 Greg Stark st...@mit.edu wrote:
  Daniel Farina dan...@heroku.com wrote:
  You probably are going to ask: why not just run ANALYZE and be
  done with it?
  
  Uhm yes. If analyze takes a long time then something is broken.
  It's only reading a sample which should be pretty much a fixed
  number of pages per table. It shouldn't take much longer on your
  large database than on your smaller databases.
  
 On a small database:
  
 cc=# analyze CaseHist;
 ANALYZE
 Time: 255.107 ms
 cc=# select relpages, reltuples from pg_class where relname =
 'CaseHist';
  relpages | reltuples 
 --+---
  1264 | 94426
 (1 row)
  
 Same table on a much larger database (and much more powerful
 hardware):
  
 cir=# analyze CaseHist;
 ANALYZE
 Time: 143450.467 ms
 cir=# select relpages, reltuples from pg_class where relname =
 'CaseHist';
  relpages |  reltuples  
 --+-
   3588659 | 2.12391e+08
 (1 row)
  
 Either way, there are about 500 tables in the database.

That is 2.5 minutes.  How large is that database?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Kevin Grittner

Bruce Momjian br...@momjian.us wrote:
 On Tue, Mar 13, 2012 at 01:18:58PM -0500, Kevin Grittner wrote:
 
 cir=# analyze CaseHist;
 ANALYZE
 Time: 143450.467 ms
 cir=# select relpages, reltuples from pg_class where relname =
 'CaseHist';
  relpages |  reltuples  
 --+-
   3588659 | 2.12391e+08
 (1 row)
  
 Either way, there are about 500 tables in the database.
 
 That is 2.5 minutes.  How large is that database?
 
cir=# select pg_size_pretty(pg_database_size('cir'));
 pg_size_pretty 

 2563 GB
(1 row)
 
In case you meant How large is that table that took 2.5 minutes to
analyze?:
 
cir=# select pg_size_pretty(pg_total_relation_size('CaseHist'));
 pg_size_pretty 

 44 GB
(1 row)
 
I've started a database analyze, to see how long that takes.  Even
if each table took 1/4 second (like on the small database) with over
500 user tables, plus the system tables, it'd be 15 minutes.  I'm
guessing it'll run over an hour, but I haven't timed it lately, so
-- we'll see.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Kevin Grittner

Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 Bruce Momjian br...@momjian.us wrote:
 
 That is 2.5 minutes.  How large is that database?
 
I dug around a little and found that we had turned on vacuum cost
limits on the central databases, because otherwise the web team
complained about performance during maintenance windows.  On the
county database we generally don't have users working all night, so
we do maintenance during off hours, and run without cost-based
limits.
 
When the full run completes, I'll try analyze on that table again,
in a session with the limits off.
 
Maybe vacuumdb should have an option to disable the limits, and we
recommend that after pg_upgrade?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Bruce Momjian

On Tue, Mar 13, 2012 at 02:07:14PM -0500, Kevin Grittner wrote:
 Bruce Momjian br...@momjian.us wrote:
  On Tue, Mar 13, 2012 at 01:18:58PM -0500, Kevin Grittner wrote:
  
  cir=# analyze CaseHist;
  ANALYZE
  Time: 143450.467 ms
  cir=# select relpages, reltuples from pg_class where relname =
  'CaseHist';
   relpages |  reltuples  
  --+-
3588659 | 2.12391e+08
  (1 row)
   
  Either way, there are about 500 tables in the database.
  
  That is 2.5 minutes.  How large is that database?
  
 cir=# select pg_size_pretty(pg_database_size('cir'));
  pg_size_pretty 
 
  2563 GB
 (1 row)
  
 In case you meant How large is that table that took 2.5 minutes to
 analyze?:
  
 cir=# select pg_size_pretty(pg_total_relation_size('CaseHist'));
  pg_size_pretty 
 
  44 GB
 (1 row)
  
 I've started a database analyze, to see how long that takes.  Even
 if each table took 1/4 second (like on the small database) with over
 500 user tables, plus the system tables, it'd be 15 minutes.  I'm
 guessing it'll run over an hour, but I haven't timed it lately, so
 -- we'll see.

OK, so a single 44GB tables took 2.5 minutes to analyze;  that is not
good.  It would require 11 such tables to reach 500GB (0.5 TB), and
would take 27 minutes.  The report I had was twice as long, but still in
the ballpark of too long.  :-(

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Tom Lane

Greg Stark st...@mit.edu writes:
 On Tue, Mar 13, 2012 at 1:38 AM, Daniel Farina dan...@heroku.com wrote:
 You probably are going to ask: why not just run ANALYZE and be done
 with it?

 Uhm yes. If analyze takes a long time then something is broken. It's
 only reading a sample which should be pretty much a fixed number of
 pages per table. It shouldn't take much longer on your large database
 than on your smaller databases.

The data collection work does scale according to the statistics target,
which is something that's crept up quite a lot since the code was
originally written.

I wonder whether it'd be worth recommending that people do an initial
ANALYZE with a low stats target, just to get some stats in place,
and then go back to analyze at whatever their normal setting is.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Kevin Grittner

Bruce Momjian br...@momjian.us wrote:
 
 cir=# analyze CaseHist;
 ANALYZE
 Time: 143450.467 ms
 
 OK, so a single 44GB tables took 2.5 minutes to analyze;  that is
 not good.  It would require 11 such tables to reach 500GB (0.5
 TB), and would take 27 minutes.  The report I had was twice as
 long, but still in the ballpark of too long.  :-(
 
We have a sister machine to the one used for that benchmark -- same
hardware and database.  The cost limit didn't seem to make much
difference:
 
cir=# set vacuum_cost_delay = 0;
SET
cir=# \timing on 
Timing is on.
cir=# analyze CaseHist ;
ANALYZE
Time: 146169.728 ms
 
So it really does seem to take that long.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Peter Eisentraut

On tis, 2012-03-13 at 15:44 -0400, Tom Lane wrote:
 I wonder whether it'd be worth recommending that people do an initial
 ANALYZE with a low stats target, just to get some stats in place,
 and then go back to analyze at whatever their normal setting is.

Perhaps going even further, ANALYZE could have a quick-and-dirty mode,
where it just analyzes say 10 random pages per table and fills in as
much data as it can from that.  And then it does the real analyze.  (Or
it could even insert fake statistics to trigger autoanalyze.)  That way,
you could have a database back in business in less than a minute.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Kevin Grittner

Bruce Momjian br...@momjian.us wrote:
 
 OK, so a single 44GB tables took 2.5 minutes to analyze;  that is
 not good.  It would require 11 such tables to reach 500GB (0.5
 TB), and would take 27 minutes.  The report I had was twice as
 long, but still in the ballpark of too long.  :-(
 
But it's really 600 tables of different sizes, which wound up
actually taking:
 
cir=# analyze;
ANALYZE
Time: 3433794.609 ms
 
Just under one hour.
 
Now, if I remember right, the cluster was down for about three
minutes to run pg_upgrade.  Until there are some statistics for key
tables, though, it's not really usable.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Bruce Momjian

On Tue, Mar 13, 2012 at 10:10:02PM +0200, Peter Eisentraut wrote:
 On tis, 2012-03-13 at 15:44 -0400, Tom Lane wrote:
  I wonder whether it'd be worth recommending that people do an initial
  ANALYZE with a low stats target, just to get some stats in place,
  and then go back to analyze at whatever their normal setting is.
 
 Perhaps going even further, ANALYZE could have a quick-and-dirty mode,
 where it just analyzes say 10 random pages per table and fills in as
 much data as it can from that.  And then it does the real analyze.  (Or
 it could even insert fake statistics to trigger autoanalyze.)  That way,
 you could have a database back in business in less than a minute.

OK, good idea.  Kevin, can you test this:

  PGOPTIONS='-c default_statistics_target=10' vacuumdb --all --analyze-only

Is it faster?  Thanks.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Kevin Grittner

Bruce Momjian br...@momjian.us wrote:
 
 OK, good idea.  Kevin, can you test this:
 
   PGOPTIONS='-c default_statistics_target=10' vacuumdb --all
 --analyze-only
 
 Is it faster?  Thanks.
 
Well, I just did something similar in psql -- I disabled the delays
by:
 
set vacuum_cost_delay = 0;
 
I checked for any statistics overrides, and found none (if I did
this right):
 
cir=# select attrelid::regclass, attname, attstattarget from
pg_attribute where attstattarget  0;
 attrelid | attname | attstattarget 
--+-+---
(0 rows)
 
I went even lower than you suggested:
 
set default_statistics_target = 4;
 
And it was much faster, but still more time than the pg_upgrade run
itself:
 
cir=# analyze;
ANALYZE
Time: 474319.826 ms
 
A little under 8 minutes.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Greg Stark

On Tue, Mar 13, 2012 at 7:30 PM, Bruce Momjian br...@momjian.us wrote:
 OK, so a single 44GB tables took 2.5 minutes to analyze;  that is not
 good.  It would require 11 such tables to reach 500GB (0.5 TB), and
 would take 27 minutes.  The report I had was twice as long, but still in
 the ballpark of too long.  :-(

Well my claim was that it shouldn't scale that way.

Assuming you have the same statistics target the sample size is the
same regardless of the size of the table. When the table is small
there might be a significant number of tuples on the same pages so the
number of pages to read might be much smaller than the sample size. As
the table grows it would eventually reach the point where each page
has about 1 sampled tuple on it but from that point on increasing the
size of the table won't increase the amount of i/o Postgres does.

However on thinking about it further hard drives don't really work
that way. Each page that's read in causes a seek -- if the head is on
a different track. Reading two pages on the same page is about the
same speed as reading a single page on that track. So even though
Postgres is doing the same amount of i/o the hard drive is actually
doing more and more seeks up to the point where each sample is a
separate seek.

analyze.c samples 300 * statistics_target rows -- that means the
default is it to 30,000 rows. So the point where each row is on a
separate page would be around 8kB*30,000 or 240MB. And if each row
causes a seek, and each seek takes 5ms then that should be about 150s
per table. Once your table is large enough to take 150s to analyze it
shouldn't really take any longer when it grows larger. I'm not sure at
what size that would happen but I'm guessing it would be at about 8MB
* 30,000 or about 240GB per table plus or minus an order of magnitude.

So in short, I guess the time to analyze does increase as the table
grows larger. Not because the number of rows being sample grows but
because the rows are farther apart and that causes more iops.

hmph. One thing that could speed up analyze on raid arrays would be
doing prefetching so more than one spindle can be busy. Sacrificing
statistical accuracy by reading a less random sample on contiguous
blocks of rows would also be faster but less accurate -- we've gone
around on that topic at least once in the past.

+1 to the idea that pg_upgrade could run analyze with a
statistics_target of 10 and then let autovacuum analyze it again later
with the normal targets at its leisure.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Bruce Momjian

On Tue, Mar 13, 2012 at 09:28:33PM +, Greg Stark wrote:
 hmph. One thing that could speed up analyze on raid arrays would be
 doing prefetching so more than one spindle can be busy. Sacrificing
 statistical accuracy by reading a less random sample on contiguous
 blocks of rows would also be faster but less accurate -- we've gone
 around on that topic at least once in the past.
 
 +1 to the idea that pg_upgrade could run analyze with a
 statistics_target of 10 and then let autovacuum analyze it again later
 with the normal targets at its leisure.

Well, that's going to take quite a bit of infrastructure.  We can tell
users to use the PGOPTIONS setting when they run vacuumdb, but we are
going to need some way to cause autovacuum to run later with a higher
statistics target  ---  right now I don't think autovacuum cares how
large a statistics target the previous analyze run used.  It might be a
simple case of causing autovacuum to analyze if it has statistics that
don't match the current statistics target, but I don't think autovacuum
has easy access to those values.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Bruce Momjian

On Tue, Mar 13, 2012 at 03:29:22PM -0500, Kevin Grittner wrote:
 I went even lower than you suggested:
  
 set default_statistics_target = 4;
  
 And it was much faster, but still more time than the pg_upgrade run
 itself:
  
 cir=# analyze;
 ANALYZE
 Time: 474319.826 ms
  
 A little under 8 minutes.

What is the target=10 duration?  I think 10 is as low as we can
acceptably recommend.  Should we recommend they run vacuumdb twice, once
with default_statistics_target = 4, and another with the default?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Robert Haas

On Tue, Mar 13, 2012 at 5:42 PM, Bruce Momjian br...@momjian.us wrote:
 What is the target=10 duration?  I think 10 is as low as we can
 acceptably recommend.  Should we recommend they run vacuumdb twice, once
 with default_statistics_target = 4, and another with the default?

I'm not sure why we're so glibly rejecting Dan's original proposal.
Sure, adjusting pg_upgrade when we whack around pg_statistic is work,
but who ever said that a workable in-place upgrade facility would be
maintenance-free?  We're operating under a number of restrictions
imposed by the need to be pg_upgrade-compatible, and this doesn't
strike me as a particularly severe one by comparison -- we can always
arrange to NOT migrate statistics between incompatible versions; that
doesn't mean that we shouldn't migrate them when they ARE compatible.
Also, unlike the alternatives thus far proposed, Dan's idea actually
fixes the problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Kevin Grittner

Bruce Momjian br...@momjian.us wrote:
 
 What is the target=10 duration?  I think 10 is as low as we can
 acceptably recommend.  Should we recommend they run vacuumdb
 twice, once with default_statistics_target = 4, and another with
 the default?
 
Here are the results at various settings.
 
1   :  172198.892 ms
2   :  295536.814 ms
4   :  474319.826 ms
10  :  750458.312 ms
100 :  3433794.609 ms
 
I'm not sure what's best for a general approach to the problem.  For
my own part, I'd be inclined to cherry-pick tables if I were in a
hurry.
 
I hope we at least bring over relpages and reltuples, to give the
optimizer *some* clue what it's looking at.  I wouldn't thing those
would be changing semantics or format very often.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Kevin Grittner

Robert Haas robertmh...@gmail.com wrote:
 
 I'm not sure why we're so glibly rejecting Dan's original
 proposal.  Sure, adjusting pg_upgrade when we whack around
 pg_statistic is work, but who ever said that a workable in-place
 upgrade facility would be maintenance-free?  We're operating under
 a number of restrictions imposed by the need to be pg_upgrade-
 compatible, and this doesn't strike me as a particularly severe
 one by comparison -- we can always arrange to NOT migrate
 statistics between incompatible versions; that doesn't mean that
 we shouldn't migrate them when they ARE compatible.  Also, unlike
 the alternatives thus far proposed, Dan's idea actually fixes the
 problem.
 
In case it got lost with my various timings, I agree with Robert on
all of the above.  The three-minute downtime for pg_upgrade to
upgrade our multi-TB databases is *very* impressive; but I think we
lose bragging rights if we follow that up with -- oh, but the
database isn't really fully *usable* until you run a one-hour
analyze afterward.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Daniel Farina

On Tue, Mar 13, 2012 at 3:30 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Mar 13, 2012 at 5:42 PM, Bruce Momjian br...@momjian.us wrote:
 What is the target=10 duration?  I think 10 is as low as we can
 acceptably recommend.  Should we recommend they run vacuumdb twice, once
 with default_statistics_target = 4, and another with the default?

 I'm not sure why we're so glibly rejecting Dan's original proposal.

 Dan's idea actually fixes the problem.

I appreciate your support, but I don't feel dismissed; I see the
obvious appeal of not having to port the catalog if a fast/good enough
regeneration technique can be found, so I'm glad people are trying
those out and measuring things.  I think the main problem that is hard
to work around lies in our inability to trigger autoanalyze one-shot.

I can't really speak on the behalf of a smaller operation (where
pg_upgrade taking on the task of hacking catalogs is clearly very
desirable -- worth serious consideration), but for the scale of our
operation having to do our own catalog hacking at the cost of
possible-terribleness that can result from a botched pg_statistic is
not hugely concerning.  Rather, the more insurmountable and general
problem we keep encountering is how we can trigger throttled
maintenance on a special basis.  It is definitely in my interest to --
some day -- be able to run VACUUM FULL and REINDEX (provided
incremental-self-defragmenting indexes don't get written first)
without any disastrous impact on the user at all, including when they
attempt to drop the table (in which we should yield the lock and let
it happen) or alter its column definition.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Andrew Dunstan




On 03/13/2012 06:30 PM, Robert Haas wrote:

On Tue, Mar 13, 2012 at 5:42 PM, Bruce Momjianbr...@momjian.us  wrote:

What is the target=10 duration?  I think 10 is as low as we can
acceptably recommend.  Should we recommend they run vacuumdb twice, once
with default_statistics_target = 4, and another with the default?

I'm not sure why we're so glibly rejecting Dan's original proposal.
Sure, adjusting pg_upgrade when we whack around pg_statistic is work,
but who ever said that a workable in-place upgrade facility would be
maintenance-free?  We're operating under a number of restrictions
imposed by the need to be pg_upgrade-compatible, and this doesn't
strike me as a particularly severe one by comparison -- we can always
arrange to NOT migrate statistics between incompatible versions; that
doesn't mean that we shouldn't migrate them when they ARE compatible.
Also, unlike the alternatives thus far proposed, Dan's idea actually
fixes the problem.


I agree.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Bruce Momjian

On Tue, Mar 13, 2012 at 05:33:29PM -0500, Kevin Grittner wrote:
 Bruce Momjian br...@momjian.us wrote:
  
  What is the target=10 duration?  I think 10 is as low as we can
  acceptably recommend.  Should we recommend they run vacuumdb
  twice, once with default_statistics_target = 4, and another with
  the default?
  
 Here are the results at various settings.
  
 1   :  172198.892 ms
 2   :  295536.814 ms
 4   :  474319.826 ms
 10  :  750458.312 ms
 100 :  3433794.609 ms

Thanks, good numbers to know.

 I'm not sure what's best for a general approach to the problem.  For
 my own part, I'd be inclined to cherry-pick tables if I were in a
 hurry.
  
 I hope we at least bring over relpages and reltuples, to give the
 optimizer *some* clue what it's looking at.  I wouldn't thing those
 would be changing semantics or format very often.

True, but we don't migrate them either.

This is the exact same problem you would have restoring a pg_dump
backup.  The improvement needs to go into pg_dump, and then pg_upgrade
can make use of it.

Another idea is to just copy over pg_statistic like we copy of
pg_largeobject now, and force autovacuum to run.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Tom Lane

Bruce Momjian br...@momjian.us writes:
 Another idea is to just copy over pg_statistic like we copy of
 pg_largeobject now, and force autovacuum to run.

That would be an automatic crash in a 9.1 to 9.2 migration, as well as
any other release where we changed the column layout of pg_statistic.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Bruce Momjian

On Tue, Mar 13, 2012 at 08:30:17PM -0400, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  Another idea is to just copy over pg_statistic like we copy of
  pg_largeobject now, and force autovacuum to run.
 
 That would be an automatic crash in a 9.1 to 9.2 migration, as well as
 any other release where we changed the column layout of pg_statistic.

It might be a solution for cases where we don't modify it.  I frankly am
worried that if we copy over statistics even in ASCII that don't match
what the server expects, it might lead to a crash, which has me back to
wanting to speed up vacuumdb.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Bruce Momjian

On Tue, Mar 13, 2012 at 08:22:51PM -0400, Bruce Momjian wrote:
 On Tue, Mar 13, 2012 at 05:33:29PM -0500, Kevin Grittner wrote:
  Bruce Momjian br...@momjian.us wrote:
   
   What is the target=10 duration?  I think 10 is as low as we can
   acceptably recommend.  Should we recommend they run vacuumdb
   twice, once with default_statistics_target = 4, and another with
   the default?
   
  Here are the results at various settings.
   
  1   :  172198.892 ms
  2   :  295536.814 ms
  4   :  474319.826 ms
  10  :  750458.312 ms
  100 :  3433794.609 ms
 
 Thanks, good numbers to know.

OK, new crazy idea.  Kevin has shown that his database can get a single
bucket in 2.8 minutes.  What if we have pg_upgrade create a vacuumdb
script that generates increasingly accurate statistics, e.g. it runs for
default_statistics_target values of 1, 10, and default (100).  That
would give basic statistics quickly (2.8 minutes), and full statistics
in an hour, for Kevin's database.

PGOPTIONS='-c default_statistics_target=1' vacuumdb --all --analyze-only
PGOPTIONS='-c default_statistics_target=10' vacuumdb --all 
--analyze-only
vacuumdb --all --analyze-only

The only problem I see is that users who use non-default statistics per
table would not be affected by the increasing default_statistics_target
values.

The upside is this would work for all releases of Postgres.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-13 Thread Euler Taveira

On 13-03-2012 21:34, Bruce Momjian wrote:
 It might be a solution for cases where we don't modify it.  I frankly am
 worried that if we copy over statistics even in ASCII that don't match
 what the server expects, it might lead to a crash, which has me back to
 wanting to speed up vacuumdb.
 
That was discussed in another thread some time ago [1]. Adopting a hack
solution is not the way to go. It could lead to bad consequences in a near 
future.

For 9.2, we could advise users to divide the ANALYZE step into
ANALYZE-per-table steps and run them all in parallel. This ANALYZE-per-table
ranking could be accomplished using a simple approach like '... row_number()
OVER (ORDER BY pg_relation_size(oid) ... WHERE row_number % n = x' (tip stolen
from Simon's book).


[1] http://archives.postgresql.org/message-id/4f10a728.7090...@agliodbs.com


-- 
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-12 Thread Bruce Momjian

On Mon, Mar 12, 2012 at 06:38:30PM -0700, Daniel Farina wrote:
 You probably are going to ask: why not just run ANALYZE and be done
 with it?  The reasons are:
 
   * ANALYZE can take a sufficiently long time on large databases that
 the downtime of switching versions is not attractive
 
   * If we don't run ANALYZE and have no old statistics, then the plans
 can be disastrously bad for the user
 
   * If we do run the ANALYZE statement on a user's behalf as part of
 the upgrade, any compatibility fixups that require an exclusive lock
 (such as some ALTER TABLE statements) would have to block on this
 relatively long ANALYZE.  autoanalyze/autovacuum, by comparison, backs
 off frequently, so disaster is averted.
 
 If anyone has any insightful comments as to how to meet these
 requirements, I'd appreciate them, otherwise I can consider it an
 interesting area for improvement and will eat the ANALYZE and salt the
 documentation with caveats.

Copying the statistics from the old server is on the pg_upgrade TODO
list.  I have avoided it because it will add an additional requirement
that will make pg_upgrade more fragile in case of major version changes.

Does anyone have a sense of how often we change the statistics data
between major versions?  Ideally, pg_dump/pg_dumpall would add the
ability to dump statistics, and pg_upgrade could use that.

To answer your specific question, I think clearing the last analyzed
fields should cause autovacuum to run on analyze those tables.  What I
don't know is whether not clearing the last vacuum datetime will cause
the table not to be analyzed.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and statistics

2012-03-12 Thread Tom Lane

Bruce Momjian br...@momjian.us writes:
 Copying the statistics from the old server is on the pg_upgrade TODO
 list.  I have avoided it because it will add an additional requirement
 that will make pg_upgrade more fragile in case of major version changes.

 Does anyone have a sense of how often we change the statistics data
 between major versions?

I don't think pg_statistic is inherently any more stable than any other
system catalog.  We've whacked it around significantly just last week,
which might color my perception a bit, but there are other changes on
the to-do list.  (For one example, see nearby complaints about
estimating TOAST-related costs, which we could not fix without adding
more stats data.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

56 matches

Mail list logo