Re: Use of fsync; was Re: [HACKERS] Pg_upgrade speed for many tables

2012-12-03 Thread Bruce Momjian

Applied.

---

On Fri, Nov 30, 2012 at 10:43:29PM -0500, Bruce Momjian wrote:
 On Mon, Nov 26, 2012 at 02:43:19PM -0500, Bruce Momjian wrote:
In any event, I think the documentation should caution that the
upgrade should not be deemed to be a success until after a system-wide
sync has been done.  Even if we use the link rather than copy method,
are we sure that that is safe if the directories recording those links
have not been fsynced?
   
OK, the above is something I have been thinking about, and obviously you
have too.  If you change fsync from off to on in a cluster, and restart
it, there is no guarantee that the dirty pages you read from the kernel
are actually on disk, because Postgres doesn't know they are dirty.
They probably will be pushed to disk by the kernel in less than one
minute, but still, it doesn't seem reliable. Should this be documented
in the fsync section?
   
Again, another reason not to use fsync=off, though your example of the
file copy is a good one.  As you stated, this is a problem with the file
copy/link, independent of how Postgres handles the files.  We can tell
people to use 'sync' as root on Unix, but what about Windows?
   
   I'm pretty sure someone mentioned the way to do that on Windows in
   this list in the last few months, but I can't seem to find it.  I
   thought it was the initdb fsync thread.
  
  Yep, the code is already in initdb to fsync a directory --- we just need
  a way for pg_upgrade to access it.
 
 I have developed the attached patch that does this.  It basically adds
 an --sync-only option to initdb, then turns off all durability in
 pg_upgrade and has pg_upgrade run initdb --sync-only;  this give us
 another nice speedup!
 
-- SSD  -- magnetic ---
   gitpatchgitpatch
   1  11.11   11.11   11.10   11.13
1000  20.57   19.89   20.72   19.30
2000  28.02   25.81   28.50   27.53
4000  42.00   43.59   46.71   46.84
8000  89.66   74.16   89.10   73.67
   16000 157.66  135.98  159.97  153.48
   32000 316.24  296.90  334.74  308.59
   64000 814.97  715.53  797.34  727.94
 
 (I am very happy with these times.  Thanks to Jeff Janes for his
 suggestions.)
 
 I have also added documentation to the 'fsync' configuration variable
 warning about dirty buffers and recommending flushing them to disk
 before the cluster is crash-recovery safe.
 
 I consider this patch ready for 9.3 application (meaning it is not a
 prototype).
 
 -- 
   Bruce Momjian  br...@momjian.ushttp://momjian.us
   EnterpriseDB http://enterprisedb.com
 
   + It's impossible for everything to be true. +

 diff --git a/contrib/pg_upgrade/pg_upgrade.c b/contrib/pg_upgrade/pg_upgrade.c
 new file mode 100644
 index c12f15b..63df529
 *** a/contrib/pg_upgrade/pg_upgrade.c
 --- b/contrib/pg_upgrade/pg_upgrade.c
 *** main(int argc, char **argv)
 *** 150,155 
 --- 150,161 
 new_cluster.pgdata);
   check_ok();
   
 + prep_status(Sync data directory to disk);
 + exec_prog(UTILITY_LOG_FILE, NULL, true,
 +   \%s/initdb\ --sync-only \%s\, 
 new_cluster.bindir,
 +   new_cluster.pgdata);
 + check_ok();
 + 
   create_script_for_cluster_analyze(analyze_script_file_name);
   create_script_for_old_cluster_deletion(deletion_script_file_name);
   
 diff --git a/contrib/pg_upgrade/server.c b/contrib/pg_upgrade/server.c
 new file mode 100644
 index 49d4c8f..05d8cc0
 *** a/contrib/pg_upgrade/server.c
 --- b/contrib/pg_upgrade/server.c
 *** start_postmaster(ClusterInfo *cluster)
 *** 209,217 
* a gap of 20 from the current xid counter, so autovacuum will
* not touch them.
*
 !  *  synchronous_commit=off improves object creation speed, and we 
 only
 !  *  modify the new cluster, so only use it there.  If there is a 
 crash,
 !  *  the new cluster has to be recreated anyway.
*/
   snprintf(cmd, sizeof(cmd),
\%s/pg_ctl\ -w -l \%s\ -D \%s\ -o \-p 
 %d%s%s%s%s\ start,
 --- 209,217 
* a gap of 20 from the current xid counter, so autovacuum will
* not touch them.
*
 !  * Turn off durability requirements to improve object creation speed, 
 and
 !  * we only modify the new cluster, so only use it there.  If there is a
 !  * crash, the new cluster has to be recreated anyway.
*/
   snprintf(cmd, sizeof(cmd),
\%s/pg_ctl\ -w -l \%s\ -D \%s\ -o \-p 
 %d%s%s%s%s\ start,
 *** start_postmaster(ClusterInfo *cluster)
 *** 219,225 
(cluster-controldata.cat_ver =
 

Re: Use of fsync; was Re: [HACKERS] Pg_upgrade speed for many tables

2012-11-30 Thread Bruce Momjian
On Mon, Nov 26, 2012 at 02:43:19PM -0500, Bruce Momjian wrote:
   In any event, I think the documentation should caution that the
   upgrade should not be deemed to be a success until after a system-wide
   sync has been done.  Even if we use the link rather than copy method,
   are we sure that that is safe if the directories recording those links
   have not been fsynced?
  
   OK, the above is something I have been thinking about, and obviously you
   have too.  If you change fsync from off to on in a cluster, and restart
   it, there is no guarantee that the dirty pages you read from the kernel
   are actually on disk, because Postgres doesn't know they are dirty.
   They probably will be pushed to disk by the kernel in less than one
   minute, but still, it doesn't seem reliable. Should this be documented
   in the fsync section?
  
   Again, another reason not to use fsync=off, though your example of the
   file copy is a good one.  As you stated, this is a problem with the file
   copy/link, independent of how Postgres handles the files.  We can tell
   people to use 'sync' as root on Unix, but what about Windows?
  
  I'm pretty sure someone mentioned the way to do that on Windows in
  this list in the last few months, but I can't seem to find it.  I
  thought it was the initdb fsync thread.
 
 Yep, the code is already in initdb to fsync a directory --- we just need
 a way for pg_upgrade to access it.

I have developed the attached patch that does this.  It basically adds
an --sync-only option to initdb, then turns off all durability in
pg_upgrade and has pg_upgrade run initdb --sync-only;  this give us
another nice speedup!

 -- SSD  -- magnetic ---
gitpatchgitpatch
1  11.11   11.11   11.10   11.13
 1000  20.57   19.89   20.72   19.30
 2000  28.02   25.81   28.50   27.53
 4000  42.00   43.59   46.71   46.84
 8000  89.66   74.16   89.10   73.67
16000 157.66  135.98  159.97  153.48
32000 316.24  296.90  334.74  308.59
64000 814.97  715.53  797.34  727.94

(I am very happy with these times.  Thanks to Jeff Janes for his
suggestions.)

I have also added documentation to the 'fsync' configuration variable
warning about dirty buffers and recommending flushing them to disk
before the cluster is crash-recovery safe.

I consider this patch ready for 9.3 application (meaning it is not a
prototype).

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/pg_upgrade.c b/contrib/pg_upgrade/pg_upgrade.c
new file mode 100644
index c12f15b..63df529
*** a/contrib/pg_upgrade/pg_upgrade.c
--- b/contrib/pg_upgrade/pg_upgrade.c
*** main(int argc, char **argv)
*** 150,155 
--- 150,161 
  			  new_cluster.pgdata);
  	check_ok();
  
+ 	prep_status(Sync data directory to disk);
+ 	exec_prog(UTILITY_LOG_FILE, NULL, true,
+ 			  \%s/initdb\ --sync-only \%s\, new_cluster.bindir,
+ 			  new_cluster.pgdata);
+ 	check_ok();
+ 
  	create_script_for_cluster_analyze(analyze_script_file_name);
  	create_script_for_old_cluster_deletion(deletion_script_file_name);
  
diff --git a/contrib/pg_upgrade/server.c b/contrib/pg_upgrade/server.c
new file mode 100644
index 49d4c8f..05d8cc0
*** a/contrib/pg_upgrade/server.c
--- b/contrib/pg_upgrade/server.c
*** start_postmaster(ClusterInfo *cluster)
*** 209,217 
  	 * a gap of 20 from the current xid counter, so autovacuum will
  	 * not touch them.
  	 *
! 	 *	synchronous_commit=off improves object creation speed, and we only
! 	 *	modify the new cluster, so only use it there.  If there is a crash,
! 	 *	the new cluster has to be recreated anyway.
  	 */
  	snprintf(cmd, sizeof(cmd),
  			 \%s/pg_ctl\ -w -l \%s\ -D \%s\ -o \-p %d%s%s%s%s\ start,
--- 209,217 
  	 * a gap of 20 from the current xid counter, so autovacuum will
  	 * not touch them.
  	 *
! 	 * Turn off durability requirements to improve object creation speed, and
! 	 * we only modify the new cluster, so only use it there.  If there is a
! 	 * crash, the new cluster has to be recreated anyway.
  	 */
  	snprintf(cmd, sizeof(cmd),
  			 \%s/pg_ctl\ -w -l \%s\ -D \%s\ -o \-p %d%s%s%s%s\ start,
*** start_postmaster(ClusterInfo *cluster)
*** 219,225 
  			 (cluster-controldata.cat_ver =
  			  BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ?  -b :
  			  -c autovacuum=off -c autovacuum_freeze_max_age=20,
! 			 (cluster == new_cluster) ?  -c synchronous_commit=off : ,
  			 cluster-pgopts ? cluster-pgopts : , socket_string);
  
  	/*
--- 219,226 
  			 (cluster-controldata.cat_ver =
  			  BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ?  -b :
  			  -c autovacuum=off -c autovacuum_freeze_max_age=20,
! 			 (cluster == new_cluster) ?
!  -c 

Re: Use of fsync; was Re: [HACKERS] Pg_upgrade speed for many tables

2012-11-26 Thread Bruce Momjian
On Sat, Nov 24, 2012 at 09:42:08PM -0800, Jeff Janes wrote:
 On Fri, Nov 23, 2012 at 7:22 PM, Bruce Momjian br...@momjian.us wrote:
   On Mon, Nov 19, 2012 at 12:11:26PM -0800, Jeff Janes wrote:
 
 
  Yes, it is with synchronous_commit=off. (or if it wasn't originally,
  it is now, with the same result)
 
  Applying your fsync patch does solve the problem for me on ext4.
  Having the new cluster be on ext3 rather than ext4 also solves the
  problem, without the need for a patch; but it would be nice to more
  friendly to ext4, which is popular even though not recommended.
 
  Do you have numbers with synchronous-commit=off, fsync=off, and both, on
  ext4?
 
 for 5,000 tables like create table fooN (x serial), upgrading
 from 9.3dev to 9.3dev:
 
 Timings are in seconds, done twice.  I had to hack pg_upgrade so that
 the pg_ctl stop command did -w -t 3600, otherwise
 I'd get an database did not shut down error for the first two.
 
 both on648.29  608.42
 synchronous_commit off   250.24  366.50
 fsync off   46.91   43.96
 both off 41.44   44.81
 
 Also, I did a manual sync as soon as Removing support functions from
 new cluster OK appears, with synchronous_commit off bug fsync on:
 45.96   46.46

OK, these very convincing numbers.  I am going to modify initdb to have
an --fsync-only option, and have pg_upgrade use that.  This is 9.3
material.

 
  In any event, I think the documentation should caution that the
  upgrade should not be deemed to be a success until after a system-wide
  sync has been done.  Even if we use the link rather than copy method,
  are we sure that that is safe if the directories recording those links
  have not been fsynced?
 
  OK, the above is something I have been thinking about, and obviously you
  have too.  If you change fsync from off to on in a cluster, and restart
  it, there is no guarantee that the dirty pages you read from the kernel
  are actually on disk, because Postgres doesn't know they are dirty.
  They probably will be pushed to disk by the kernel in less than one
  minute, but still, it doesn't seem reliable. Should this be documented
  in the fsync section?
 
  Again, another reason not to use fsync=off, though your example of the
  file copy is a good one.  As you stated, this is a problem with the file
  copy/link, independent of how Postgres handles the files.  We can tell
  people to use 'sync' as root on Unix, but what about Windows?
 
 I'm pretty sure someone mentioned the way to do that on Windows in
 this list in the last few months, but I can't seem to find it.  I
 thought it was the initdb fsync thread.

Yep, the code is already in initdb to fsync a directory --- we just need
a way for pg_upgrade to access it.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Use of fsync; was Re: [HACKERS] Pg_upgrade speed for many tables

2012-11-24 Thread Jeff Janes
On Fri, Nov 23, 2012 at 7:22 PM, Bruce Momjian br...@momjian.us wrote:
  On Mon, Nov 19, 2012 at 12:11:26PM -0800, Jeff Janes wrote:


 Yes, it is with synchronous_commit=off. (or if it wasn't originally,
 it is now, with the same result)

 Applying your fsync patch does solve the problem for me on ext4.
 Having the new cluster be on ext3 rather than ext4 also solves the
 problem, without the need for a patch; but it would be nice to more
 friendly to ext4, which is popular even though not recommended.

 Do you have numbers with synchronous-commit=off, fsync=off, and both, on
 ext4?

for 5,000 tables like create table fooN (x serial), upgrading
from 9.3dev to 9.3dev:

Timings are in seconds, done twice.  I had to hack pg_upgrade so that
the pg_ctl stop command did -w -t 3600, otherwise
I'd get an database did not shut down error for the first two.

both on648.29  608.42
synchronous_commit off   250.24  366.50
fsync off   46.91   43.96
both off 41.44   44.81

Also, I did a manual sync as soon as Removing support functions from
new cluster OK appears, with synchronous_commit off bug fsync on:
45.96   46.46



 In any event, I think the documentation should caution that the
 upgrade should not be deemed to be a success until after a system-wide
 sync has been done.  Even if we use the link rather than copy method,
 are we sure that that is safe if the directories recording those links
 have not been fsynced?

 OK, the above is something I have been thinking about, and obviously you
 have too.  If you change fsync from off to on in a cluster, and restart
 it, there is no guarantee that the dirty pages you read from the kernel
 are actually on disk, because Postgres doesn't know they are dirty.
 They probably will be pushed to disk by the kernel in less than one
 minute, but still, it doesn't seem reliable. Should this be documented
 in the fsync section?

 Again, another reason not to use fsync=off, though your example of the
 file copy is a good one.  As you stated, this is a problem with the file
 copy/link, independent of how Postgres handles the files.  We can tell
 people to use 'sync' as root on Unix, but what about Windows?

I'm pretty sure someone mentioned the way to do that on Windows in
this list in the last few months, but I can't seem to find it.  I
thought it was the initdb fsync thread.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Use of fsync; was Re: [HACKERS] Pg_upgrade speed for many tables

2012-11-23 Thread Bruce Momjian
 On Mon, Nov 19, 2012 at 12:11:26PM -0800, Jeff Janes wrote:

[ Sorry for the delay in replying.]

 On Wed, Nov 14, 2012 at 3:55 PM, Bruce Momjian br...@momjian.us wrote:
  On Mon, Nov 12, 2012 at 10:29:39AM -0800, Jeff Janes wrote:
 
  Is turning off synchronous_commit enough?  What about turning off fsync?
 
  I did some testing with the attached patch on a magnetic disk with no
  BBU that turns off fsync;
 
 With which file system? I wouldn't expect you to see a benefit with
 ext2 or ext3, it seems to be a peculiarity of ext4 that inhibits
 group fsync of new file creations but rather does each one serially.
  Whether it is worth applying a fix that is only needed for that one
 file system, I don't know.  The trade-offs are not all that clear to
 me yet.

That only ext4 shows the difference seems possible.

   I got these results
 
   sync_com=off  fsync=off
  115.90 13.51
   100026.09 24.56
   200033.41 31.20
   400057.39 57.74
   8000   102.84116.28
  16000   189.43207.84
 
  It shows fsync faster for  4k, and slower for  4k.  Not sure why this
  is the cause but perhaps the buffering of the fsync is actually faster
  than doing a no-op fsync.
 
 synchronous-commit=off turns off not only the fsync at each commit,
 but also the write-to-kernel at each commit; so it is not surprising
 that it is faster at large scale.  I would specify both
 synchronous-commit=off and fsync=off.

I would like to see actual numbers showing synchronous-commit=off is
also useful if we use fsync=off.

  When I'm doing a pg_upgrade with thousands of tables, the shutdown
  checkpoint after restoring the dump to the new cluster takes a very
  long time, as the writer drains its operation table by opening and
  individually fsync-ing thousands of files.  This takes about 40 ms per
  file, which I assume is a combination of slow lap-top disk drive, and
  a strange deal with ext4 which makes fsyncing a recently created file
  very slow.   But even with faster hdd, this would still be a problem
  if it works the same way, with every file needing 4 rotations to be
  fsynced and this happens in serial.
 
  Is this with the current code that does synchronous_commit=off?  If not,
  can you test to see if this is still a problem?
 
 Yes, it is with synchronous_commit=off. (or if it wasn't originally,
 it is now, with the same result)
 
 Applying your fsync patch does solve the problem for me on ext4.
 Having the new cluster be on ext3 rather than ext4 also solves the
 problem, without the need for a patch; but it would be nice to more
 friendly to ext4, which is popular even though not recommended.

Do you have numbers with synchronous-commit=off, fsync=off, and both, on
ext4?

  Anyway, the reason I think turning fsync off might be reasonable is
  that as soon as the new cluster is shut down, pg_upgrade starts
  overwriting most of those just-fsynced file with other files from the
  old cluster, and AFAICT makes no effort to fsync them.  So until there
  is a system-wide sync after the pg_upgrade finishes, your new cluster
  is already in mortal danger anyway.
 
  pg_upgrade does a cluster shutdown before overwriting those files.
 
 Right.  So as far as the cluster is concerned, those files have been
 fsynced.  But then the next step is go behind the cluster's back and
 replace those fsynced files with different files, which may or may not
 have been fsynced.  This is what makes me thing the new cluster is in
 mortal danger.  Not only have the new files perhaps not been fsynced,
 but the cluster is not even aware of this fact, so you can start it
 up, and then shut it down, and it still won't bother to fsync them,
 because as far as it is concerned they already have been.
 
 Given that, how much extra danger would be added by having the new
 cluster schema restore run with fsync=off?
 
 In any event, I think the documentation should caution that the
 upgrade should not be deemed to be a success until after a system-wide
 sync has been done.  Even if we use the link rather than copy method,
 are we sure that that is safe if the directories recording those links
 have not been fsynced?

OK, the above is something I have been thinking about, and obviously you
have too.  If you change fsync from off to on in a cluster, and restart
it, there is no guarantee that the dirty pages you read from the kernel
are actually on disk, because Postgres doesn't know they are dirty. 
They probably will be pushed to disk by the kernel in less than one
minute, but still, it doesn't seem reliable. Should this be documented
in the fsync section?

Again, another reason not to use fsync=off, though your example of the
file copy is a good one.  As you stated, this is a problem with the file
copy/link, independent of how Postgres handles the files.  We can tell
people to use 'sync' as root on Unix, but what about Windows?

-- 
  

Re: [HACKERS] Pg_upgrade speed for many tables

2012-11-19 Thread Jeff Janes
On Wed, Nov 14, 2012 at 3:55 PM, Bruce Momjian br...@momjian.us wrote:
 On Mon, Nov 12, 2012 at 10:29:39AM -0800, Jeff Janes wrote:

 Is turning off synchronous_commit enough?  What about turning off fsync?

 I did some testing with the attached patch on a magnetic disk with no
 BBU that turns off fsync;

With which file system? I wouldn't expect you to see a benefit with
ext2 or ext3, it seems to be a peculiarity of ext4 that inhibits
group fsync of new file creations but rather does each one serially.
 Whether it is worth applying a fix that is only needed for that one
file system, I don't know.  The trade-offs are not all that clear to
me yet.

  I got these results

  sync_com=off  fsync=off
 115.90 13.51
  100026.09 24.56
  200033.41 31.20
  400057.39 57.74
  8000   102.84116.28
 16000   189.43207.84

 It shows fsync faster for  4k, and slower for  4k.  Not sure why this
 is the cause but perhaps the buffering of the fsync is actually faster
 than doing a no-op fsync.

synchronous-commit=off turns off not only the fsync at each commit,
but also the write-to-kernel at each commit; so it is not surprising
that it is faster at large scale.  I would specify both
synchronous-commit=off and fsync=off.


 When I'm doing a pg_upgrade with thousands of tables, the shutdown
 checkpoint after restoring the dump to the new cluster takes a very
 long time, as the writer drains its operation table by opening and
 individually fsync-ing thousands of files.  This takes about 40 ms per
 file, which I assume is a combination of slow lap-top disk drive, and
 a strange deal with ext4 which makes fsyncing a recently created file
 very slow.   But even with faster hdd, this would still be a problem
 if it works the same way, with every file needing 4 rotations to be
 fsynced and this happens in serial.

 Is this with the current code that does synchronous_commit=off?  If not,
 can you test to see if this is still a problem?

Yes, it is with synchronous_commit=off. (or if it wasn't originally,
it is now, with the same result)

Applying your fsync patch does solve the problem for me on ext4.
Having the new cluster be on ext3 rather than ext4 also solves the
problem, without the need for a patch; but it would be nice to more
friendly to ext4, which is popular even though not recommended.


 Anyway, the reason I think turning fsync off might be reasonable is
 that as soon as the new cluster is shut down, pg_upgrade starts
 overwriting most of those just-fsynced file with other files from the
 old cluster, and AFAICT makes no effort to fsync them.  So until there
 is a system-wide sync after the pg_upgrade finishes, your new cluster
 is already in mortal danger anyway.

 pg_upgrade does a cluster shutdown before overwriting those files.

Right.  So as far as the cluster is concerned, those files have been
fsynced.  But then the next step is go behind the cluster's back and
replace those fsynced files with different files, which may or may not
have been fsynced.  This is what makes me thing the new cluster is in
mortal danger.  Not only have the new files perhaps not been fsynced,
but the cluster is not even aware of this fact, so you can start it
up, and then shut it down, and it still won't bother to fsync them,
because as far as it is concerned they already have been.

Given that, how much extra danger would be added by having the new
cluster schema restore run with fsync=off?

In any event, I think the documentation should caution that the
upgrade should not be deemed to be a success until after a system-wide
sync has been done.  Even if we use the link rather than copy method,
are we sure that that is safe if the directories recording those links
have not been fsynced?

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg_upgrade speed for many tables

2012-11-14 Thread Bruce Momjian
On Mon, Nov 12, 2012 at 10:29:39AM -0800, Jeff Janes wrote:
 On Mon, Nov 5, 2012 at 12:08 PM, Bruce Momjian br...@momjian.us wrote:
  Magnus reported that a customer with a million tables was finding
  pg_upgrade slow.  I had never considered many table to be a problem, but
  decided to test it.  I created a database with 2k tables like this:
 
  CREATE TABLE test1990 (x SERIAL);
 
  Running the git version of pg_upgrade on that took 203 seconds.  Using
  synchronous_commit=off dropped the time to 78 seconds.  This was tested
  on magnetic disks with a write-through cache.  (No change on an SSD with
  a super-capacitor.)
 
  I don't see anything unsafe about having pg_upgrade use
  synchronous_commit=off.  I could set it just for the pg_dump reload, but
  it seems safe to just use it always.  We don't write to the old cluster,
  and if pg_upgrade fails, you have to re-initdb the new cluster anyway.
 
  Patch attached.  I think it should be applied to 9.2 as well.
 
 Is turning off synchronous_commit enough?  What about turning off fsync?

I did some testing with the attached patch on a magnetic disk with no
BBU that turns off fsync;   I got these results:

 sync_com=off  fsync=off
115.90 13.51
 100026.09 24.56
 200033.41 31.20
 400057.39 57.74
 8000   102.84116.28
16000   189.43207.84

It shows fsync faster for  4k, and slower for  4k.  Not sure why this
is the cause but perhaps the buffering of the fsync is actually faster
than doing a no-op fsync.

I don't think fsync=off makes sense, except for testing;  let me know if
I should test something else.

 When I'm doing a pg_upgrade with thousands of tables, the shutdown
 checkpoint after restoring the dump to the new cluster takes a very
 long time, as the writer drains its operation table by opening and
 individually fsync-ing thousands of files.  This takes about 40 ms per
 file, which I assume is a combination of slow lap-top disk drive, and
 a strange deal with ext4 which makes fsyncing a recently created file
 very slow.   But even with faster hdd, this would still be a problem
 if it works the same way, with every file needing 4 rotations to be
 fsynced and this happens in serial.

Is this with the current code that does synchronous_commit=off?  If not,
can you test to see if this is still a problem?

 Worse, the shutdown only waits for the default of 60 seconds for the
 shutdown to take place before it throws an error and the entire
 pg_upgrade gives up.  It seems to me that either the -t setting should
 be increased, or should be an option to pg_upgrade.
 
 My work around was to invoke a system-wide sync a couple seconds
 after the 'pg_ctl stop' is initiated.  Flushing the files wholesale
 seems to work to make the checkpoint writer rapidly find it has
 nothing to do when it tries to flush them retail.
 
 Anyway, the reason I think turning fsync off might be reasonable is
 that as soon as the new cluster is shut down, pg_upgrade starts
 overwriting most of those just-fsynced file with other files from the
 old cluster, and AFAICT makes no effort to fsync them.  So until there
 is a system-wide sync after the pg_upgrade finishes, your new cluster
 is already in mortal danger anyway.

pg_upgrade does a cluster shutdown before overwriting those files.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/server.c b/contrib/pg_upgrade/server.c
new file mode 100644
index 49d4c8f..01e0dd3
*** a/contrib/pg_upgrade/server.c
--- b/contrib/pg_upgrade/server.c
*** start_postmaster(ClusterInfo *cluster)
*** 219,225 
  			 (cluster-controldata.cat_ver =
  			  BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ?  -b :
  			  -c autovacuum=off -c autovacuum_freeze_max_age=20,
! 			 (cluster == new_cluster) ?  -c synchronous_commit=off : ,
  			 cluster-pgopts ? cluster-pgopts : , socket_string);
  
  	/*
--- 219,225 
  			 (cluster-controldata.cat_ver =
  			  BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ?  -b :
  			  -c autovacuum=off -c autovacuum_freeze_max_age=20,
! 			 (cluster == new_cluster) ?  -c fsync=off : ,
  			 cluster-pgopts ? cluster-pgopts : , socket_string);
  
  	/*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg_upgrade speed for many tables

2012-11-12 Thread Jeff Janes
On Mon, Nov 5, 2012 at 12:08 PM, Bruce Momjian br...@momjian.us wrote:
 Magnus reported that a customer with a million tables was finding
 pg_upgrade slow.  I had never considered many table to be a problem, but
 decided to test it.  I created a database with 2k tables like this:

 CREATE TABLE test1990 (x SERIAL);

 Running the git version of pg_upgrade on that took 203 seconds.  Using
 synchronous_commit=off dropped the time to 78 seconds.  This was tested
 on magnetic disks with a write-through cache.  (No change on an SSD with
 a super-capacitor.)

 I don't see anything unsafe about having pg_upgrade use
 synchronous_commit=off.  I could set it just for the pg_dump reload, but
 it seems safe to just use it always.  We don't write to the old cluster,
 and if pg_upgrade fails, you have to re-initdb the new cluster anyway.

 Patch attached.  I think it should be applied to 9.2 as well.

Is turning off synchronous_commit enough?  What about turning off fsync?

When I'm doing a pg_upgrade with thousands of tables, the shutdown
checkpoint after restoring the dump to the new cluster takes a very
long time, as the writer drains its operation table by opening and
individually fsync-ing thousands of files.  This takes about 40 ms per
file, which I assume is a combination of slow lap-top disk drive, and
a strange deal with ext4 which makes fsyncing a recently created file
very slow.   But even with faster hdd, this would still be a problem
if it works the same way, with every file needing 4 rotations to be
fsynced and this happens in serial.

Worse, the shutdown only waits for the default of 60 seconds for the
shutdown to take place before it throws an error and the entire
pg_upgrade gives up.  It seems to me that either the -t setting should
be increased, or should be an option to pg_upgrade.

My work around was to invoke a system-wide sync a couple seconds
after the 'pg_ctl stop' is initiated.  Flushing the files wholesale
seems to work to make the checkpoint writer rapidly find it has
nothing to do when it tries to flush them retail.

Anyway, the reason I think turning fsync off might be reasonable is
that as soon as the new cluster is shut down, pg_upgrade starts
overwriting most of those just-fsynced file with other files from the
old cluster, and AFAICT makes no effort to fsync them.  So until there
is a system-wide sync after the pg_upgrade finishes, your new cluster
is already in mortal danger anyway.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg_upgrade speed for many tables

2012-11-12 Thread Jeff Davis
On Mon, 2012-11-12 at 10:29 -0800, Jeff Janes wrote:
 When I'm doing a pg_upgrade with thousands of tables, the shutdown
 checkpoint after restoring the dump to the new cluster takes a very
 long time, as the writer drains its operation table by opening and
 individually fsync-ing thousands of files.

This reminds me of the fix I did for initdb to sync the files. I think
we do need to make sure they are sync'd, because ext4 can keep buffers
around for quite a long time without cleaning them.

I ended up using sync_file_range(..., SYNC_FILE_RANGE_WRITE) on linux,
and posix_fadvise(..., POSIX_FADV_DONTNEED) on everything else, and that
made subsequent fsyncs more efficient.

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg_upgrade speed for many tables

2012-11-06 Thread Bruce Momjian
On Mon, Nov  5, 2012 at 03:08:17PM -0500, Bruce Momjian wrote:
 Magnus reported that a customer with a million tables was finding
 pg_upgrade slow.  I had never considered many table to be a problem, but
 decided to test it.  I created a database with 2k tables like this:
 
   CREATE TABLE test1990 (x SERIAL);
 
 Running the git version of pg_upgrade on that took 203 seconds.  Using
 synchronous_commit=off dropped the time to 78 seconds.  This was tested
 on magnetic disks with a write-through cache.  (No change on an SSD with
 a super-capacitor.)
 
 I don't see anything unsafe about having pg_upgrade use
 synchronous_commit=off.  I could set it just for the pg_dump reload, but
 it seems safe to just use it always.  We don't write to the old cluster,
 and if pg_upgrade fails, you have to re-initdb the new cluster anyway.
 
 Patch attached.  I think it should be applied to 9.2 as well.

Modified patch attached and applied to head and 9.2.  I decided to use
synchronous_commit=off only on the new cluster, just in case we ever do
make a modification of the old cluster.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/server.c b/contrib/pg_upgrade/server.c
new file mode 100644
index a9f9d85..49d4c8f
*** a/contrib/pg_upgrade/server.c
--- b/contrib/pg_upgrade/server.c
*** start_postmaster(ClusterInfo *cluster)
*** 208,220 
  	 * maximum.  We assume all datfrozenxid and relfrozen values are less than
  	 * a gap of 20 from the current xid counter, so autovacuum will
  	 * not touch them.
  	 */
  	snprintf(cmd, sizeof(cmd),
! 			 \%s/pg_ctl\ -w -l \%s\ -D \%s\ -o \-p %d %s %s%s\ start,
  		  cluster-bindir, SERVER_LOG_FILE, cluster-pgconfig, cluster-port,
  			 (cluster-controldata.cat_ver =
! 			  BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? -b :
! 			 -c autovacuum=off -c autovacuum_freeze_max_age=20,
  			 cluster-pgopts ? cluster-pgopts : , socket_string);
  
  	/*
--- 208,225 
  	 * maximum.  We assume all datfrozenxid and relfrozen values are less than
  	 * a gap of 20 from the current xid counter, so autovacuum will
  	 * not touch them.
+ 	 *
+ 	 *	synchronous_commit=off improves object creation speed, and we only
+ 	 *	modify the new cluster, so only use it there.  If there is a crash,
+ 	 *	the new cluster has to be recreated anyway.
  	 */
  	snprintf(cmd, sizeof(cmd),
! 			 \%s/pg_ctl\ -w -l \%s\ -D \%s\ -o \-p %d%s%s%s%s\ start,
  		  cluster-bindir, SERVER_LOG_FILE, cluster-pgconfig, cluster-port,
  			 (cluster-controldata.cat_ver =
! 			  BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ?  -b :
! 			  -c autovacuum=off -c autovacuum_freeze_max_age=20,
! 			 (cluster == new_cluster) ?  -c synchronous_commit=off : ,
  			 cluster-pgopts ? cluster-pgopts : , socket_string);
  
  	/*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Pg_upgrade speed for many tables

2012-11-05 Thread Bruce Momjian
Magnus reported that a customer with a million tables was finding
pg_upgrade slow.  I had never considered many table to be a problem, but
decided to test it.  I created a database with 2k tables like this:

CREATE TABLE test1990 (x SERIAL);

Running the git version of pg_upgrade on that took 203 seconds.  Using
synchronous_commit=off dropped the time to 78 seconds.  This was tested
on magnetic disks with a write-through cache.  (No change on an SSD with
a super-capacitor.)

I don't see anything unsafe about having pg_upgrade use
synchronous_commit=off.  I could set it just for the pg_dump reload, but
it seems safe to just use it always.  We don't write to the old cluster,
and if pg_upgrade fails, you have to re-initdb the new cluster anyway.

Patch attached.  I think it should be applied to 9.2 as well.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/server.c b/contrib/pg_upgrade/server.c
new file mode 100644
index a9f9d85..e64d0c4
*** a/contrib/pg_upgrade/server.c
--- b/contrib/pg_upgrade/server.c
*** start_postmaster(ClusterInfo *cluster)
*** 207,216 
  	 * vacuums can still happen, so we set autovacuum_freeze_max_age to its
  	 * maximum.  We assume all datfrozenxid and relfrozen values are less than
  	 * a gap of 20 from the current xid counter, so autovacuum will
! 	 * not touch them.
  	 */
  	snprintf(cmd, sizeof(cmd),
! 			 \%s/pg_ctl\ -w -l \%s\ -D \%s\ -o \-p %d %s %s%s\ start,
  		  cluster-bindir, SERVER_LOG_FILE, cluster-pgconfig, cluster-port,
  			 (cluster-controldata.cat_ver =
  			  BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? -b :
--- 207,217 
  	 * vacuums can still happen, so we set autovacuum_freeze_max_age to its
  	 * maximum.  We assume all datfrozenxid and relfrozen values are less than
  	 * a gap of 20 from the current xid counter, so autovacuum will
! 	 * not touch them.  synchronous_commit=off improves object creation speed.
  	 */
  	snprintf(cmd, sizeof(cmd),
! 			 \%s/pg_ctl\ -w -l \%s\ -D \%s\ -o \-p %d 
! 			 -c synchronous_commit=off %s %s%s\ start,
  		  cluster-bindir, SERVER_LOG_FILE, cluster-pgconfig, cluster-port,
  			 (cluster-controldata.cat_ver =
  			  BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? -b :

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg_upgrade speed for many tables

2012-11-05 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 Magnus reported that a customer with a million tables was finding
 pg_upgrade slow.

You sure there's not an O(N^2) issue in there somewhere?

 I don't see anything unsafe about having pg_upgrade use
 synchronous_commit=off.

No objection, but this seems unlikely to be better than linear speedup,
with a not-terribly-large constant factor.

BTW, does pg_upgrade run pg_restore in --single-transaction mode?
That would probably make synchronous_commit moot, at least for that
step.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg_upgrade speed for many tables

2012-11-05 Thread Magnus Hagander
On Mon, Nov 5, 2012 at 9:14 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Bruce Momjian br...@momjian.us writes:
  Magnus reported that a customer with a million tables was finding
  pg_upgrade slow.

 You sure there's not an O(N^2) issue in there somewhere?


  I don't see anything unsafe about having pg_upgrade use
  synchronous_commit=off.

 No objection, but this seems unlikely to be better than linear speedup,
 with a not-terribly-large constant factor.

 BTW, does pg_upgrade run pg_restore in --single-transaction mode?
 That would probably make synchronous_commit moot, at least for that
 step.



It doesn't use pg_restore at all - it uses the dump from pg_dumpall, which
you can't reload with pg_restore.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Pg_upgrade speed for many tables

2012-11-05 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On Mon, Nov 5, 2012 at 9:14 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 BTW, does pg_upgrade run pg_restore in --single-transaction mode?
 That would probably make synchronous_commit moot, at least for that
 step.

 It doesn't use pg_restore at all - it uses the dump from pg_dumpall, which
 you can't reload with pg_restore.

Sorry, I should've said psql --single-transaction.  Although that isn't
going to work either given the presence of \connect commands in the
script.  I wonder whether pg_dumpall ought to have some sort of one
transaction per database please option.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg_upgrade speed for many tables

2012-11-05 Thread Bruce Momjian
On Mon, Nov  5, 2012 at 03:30:32PM -0500, Tom Lane wrote:
 Magnus Hagander mag...@hagander.net writes:
  On Mon, Nov 5, 2012 at 9:14 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  BTW, does pg_upgrade run pg_restore in --single-transaction mode?
  That would probably make synchronous_commit moot, at least for that
  step.
 
  It doesn't use pg_restore at all - it uses the dump from pg_dumpall, which
  you can't reload with pg_restore.
 
 Sorry, I should've said psql --single-transaction.  Although that isn't
 going to work either given the presence of \connect commands in the
 script.  I wonder whether pg_dumpall ought to have some sort of one
 transaction per database please option.

pg_dumpall is already doing lots of gymnastics with SQL, and pg_upgrade
splits the output file into db/user creation and object creation, so I
am hesitant to add anything more in there.

I was surprised by the scale of the performance improvement, but a
simple table creation test confirmed that improvement, irregardless of
pg_upgrade.  Perhaps we should suggest synchronous_commit=off for
pg_dumpall restores, particularly when using --schema-only.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg_upgrade speed for many tables

2012-11-05 Thread Magnus Hagander
On Mon, Nov 5, 2012 at 9:49 PM, Bruce Momjian br...@momjian.us wrote:

 On Mon, Nov  5, 2012 at 03:30:32PM -0500, Tom Lane wrote:
  Magnus Hagander mag...@hagander.net writes:
   On Mon, Nov 5, 2012 at 9:14 PM, Tom Lane t...@sss.pgh.pa.us wrote:
   BTW, does pg_upgrade run pg_restore in --single-transaction mode?
   That would probably make synchronous_commit moot, at least for that
   step.
 
   It doesn't use pg_restore at all - it uses the dump from pg_dumpall,
 which
   you can't reload with pg_restore.
 
  Sorry, I should've said psql --single-transaction.  Although that isn't
  going to work either given the presence of \connect commands in the
  script.  I wonder whether pg_dumpall ought to have some sort of one
  transaction per database please option.

 pg_dumpall is already doing lots of gymnastics with SQL, and pg_upgrade
 splits the output file into db/user creation and object creation, so I
 am hesitant to add anything more in there.


What about running pg_dump in a loop instead of pg_dumpall?



-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Pg_upgrade speed for many tables

2012-11-05 Thread Bruce Momjian
On Mon, Nov  5, 2012 at 10:01:22PM +0100, Magnus Hagander wrote:
 On Mon, Nov 5, 2012 at 9:49 PM, Bruce Momjian br...@momjian.us wrote:
 
 On Mon, Nov  5, 2012 at 03:30:32PM -0500, Tom Lane wrote:
  Magnus Hagander mag...@hagander.net writes:
   On Mon, Nov 5, 2012 at 9:14 PM, Tom Lane t...@sss.pgh.pa.us wrote:
   BTW, does pg_upgrade run pg_restore in --single-transaction mode?
   That would probably make synchronous_commit moot, at least for that
   step.
 
   It doesn't use pg_restore at all - it uses the dump from pg_dumpall,
 which
   you can't reload with pg_restore.
 
  Sorry, I should've said psql --single-transaction.  Although that isn't
  going to work either given the presence of \connect commands in the
  script.  I wonder whether pg_dumpall ought to have some sort of one
  transaction per database please option.
 
 pg_dumpall is already doing lots of gymnastics with SQL, and pg_upgrade
 splits the output file into db/user creation and object creation, so I
 am hesitant to add anything more in there.
 
 
 What about running pg_dump in a loop instead of pg_dumpall?

Well, I could cetainly do pg_dumpall --globals-only, and then I have to
create a pg_dump file for every database, and then add the \connect in
there;  it just seemed easier to use pg_dumpall, though the file split
thing is certainly something I would like to get rid of.

I think I used pg_dumpall because it was an existing tool that I assumed
would be maintained to dump a full cluster.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg_upgrade speed for many tables

2012-11-05 Thread Jeff Janes
On Mon, Nov 5, 2012 at 12:49 PM, Bruce Momjian br...@momjian.us wrote:
 On Mon, Nov  5, 2012 at 03:30:32PM -0500, Tom Lane wrote:
 Magnus Hagander mag...@hagander.net writes:
  On Mon, Nov 5, 2012 at 9:14 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  BTW, does pg_upgrade run pg_restore in --single-transaction mode?
  That would probably make synchronous_commit moot, at least for that
  step.

  It doesn't use pg_restore at all - it uses the dump from pg_dumpall, which
  you can't reload with pg_restore.

 Sorry, I should've said psql --single-transaction.  Although that isn't
 going to work either given the presence of \connect commands in the
 script.  I wonder whether pg_dumpall ought to have some sort of one
 transaction per database please option.

 pg_dumpall is already doing lots of gymnastics with SQL, and pg_upgrade
 splits the output file into db/user creation and object creation, so I
 am hesitant to add anything more in there.

 I was surprised by the scale of the performance improvement, but a
 simple table creation test confirmed that improvement, irregardless of
 pg_upgrade.  Perhaps we should suggest synchronous_commit=off for
 pg_dumpall restores, particularly when using --schema-only.

Or have options for pg_dump and pg_restore to insert set
synchronous_commit=off into the SQL stream?

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg_upgrade speed for many tables

2012-11-05 Thread Robert Haas
On Mon, Nov 5, 2012 at 4:07 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 Or have options for pg_dump and pg_restore to insert set
 synchronous_commit=off into the SQL stream?

It would be kind of neat if we had a command that would force all
previously-asynchronous commits to complete.  It seems likely that
very, very few people would care about intermediate pg_dump states, so
we could do the whole dump asynchronously and then do FORCE ALL
COMMITS; or whatever at the end.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg_upgrade speed for many tables

2012-11-05 Thread Bruce Momjian
On Mon, Nov  5, 2012 at 04:14:47PM -0500, Robert Haas wrote:
 On Mon, Nov 5, 2012 at 4:07 PM, Jeff Janes jeff.ja...@gmail.com wrote:
  Or have options for pg_dump and pg_restore to insert set
  synchronous_commit=off into the SQL stream?
 
 It would be kind of neat if we had a command that would force all
 previously-asynchronous commits to complete.  It seems likely that
 very, very few people would care about intermediate pg_dump states, so
 we could do the whole dump asynchronously and then do FORCE ALL
 COMMITS; or whatever at the end.

Actually, I had assumed that a session disconnection forced a WAL fsync
flush, but now I doubt that.  Seems only server shutdown does that, or a
checkpoint.  Would this work?

SET synchronous_commit=on;
CREATE TABLE dummy(x int);
DROP TABLE dummy;

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg_upgrade speed for many tables

2012-11-05 Thread Jeff Janes
On Mon, Nov 5, 2012 at 1:14 PM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Nov 5, 2012 at 4:07 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 Or have options for pg_dump and pg_restore to insert set
 synchronous_commit=off into the SQL stream?

 It would be kind of neat if we had a command that would force all
 previously-asynchronous commits to complete.  It seems likely that
 very, very few people would care about intermediate pg_dump states, so
 we could do the whole dump asynchronously and then do FORCE ALL
 COMMITS; or whatever at the end.

Yeah, I was wondering what a fool-proof way of doing that would be,
without implementing a new feature.  Turning synchronous_commits back
on and then doing and committing a transaction guaranteed to generate
WAL would do it.

Would a simple 'select pg_switch_xlog();' always accomplish the desired flush?

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg_upgrade speed for many tables

2012-11-05 Thread Bruce Momjian
On Mon, Nov  5, 2012 at 01:23:58PM -0800, Jeff Janes wrote:
 On Mon, Nov 5, 2012 at 1:14 PM, Robert Haas robertmh...@gmail.com wrote:
  On Mon, Nov 5, 2012 at 4:07 PM, Jeff Janes jeff.ja...@gmail.com wrote:
  Or have options for pg_dump and pg_restore to insert set
  synchronous_commit=off into the SQL stream?
 
  It would be kind of neat if we had a command that would force all
  previously-asynchronous commits to complete.  It seems likely that
  very, very few people would care about intermediate pg_dump states, so
  we could do the whole dump asynchronously and then do FORCE ALL
  COMMITS; or whatever at the end.
 
 Yeah, I was wondering what a fool-proof way of doing that would be,
 without implementing a new feature.  Turning synchronous_commits back
 on and then doing and committing a transaction guaranteed to generate
 WAL would do it.
 
 Would a simple 'select pg_switch_xlog();' always accomplish the desired flush?

That could generate a lot of WAL files if used regularly.  :-(  Does
SELECT txid_current() generate WAL?  I think it does.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg_upgrade speed for many tables

2012-11-05 Thread Alvaro Herrera
Bruce Momjian escribió:
 On Mon, Nov  5, 2012 at 04:14:47PM -0500, Robert Haas wrote:
  On Mon, Nov 5, 2012 at 4:07 PM, Jeff Janes jeff.ja...@gmail.com wrote:
   Or have options for pg_dump and pg_restore to insert set
   synchronous_commit=off into the SQL stream?
  
  It would be kind of neat if we had a command that would force all
  previously-asynchronous commits to complete.  It seems likely that
  very, very few people would care about intermediate pg_dump states, so
  we could do the whole dump asynchronously and then do FORCE ALL
  COMMITS; or whatever at the end.
 
 Actually, I had assumed that a session disconnection forced a WAL fsync
 flush, but now I doubt that.  Seems only server shutdown does that, or a
 checkpoint.  Would this work?
 
   SET synchronous_commit=on;
   CREATE TABLE dummy(x int);
   DROP TABLE dummy;

AFAIR any transaction that modifies catalogs gets sync commit forcibly,
regardless of the setting.  And sync commit means you get to wait for
all previous transactions to be flushed as well.  So simply creating a
temp table ought to do the trick ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg_upgrade speed for many tables

2012-11-05 Thread Bruce Momjian
On Mon, Nov  5, 2012 at 06:33:16PM -0300, Alvaro Herrera wrote:
 Bruce Momjian escribió:
  On Mon, Nov  5, 2012 at 04:14:47PM -0500, Robert Haas wrote:
   On Mon, Nov 5, 2012 at 4:07 PM, Jeff Janes jeff.ja...@gmail.com wrote:
Or have options for pg_dump and pg_restore to insert set
synchronous_commit=off into the SQL stream?
   
   It would be kind of neat if we had a command that would force all
   previously-asynchronous commits to complete.  It seems likely that
   very, very few people would care about intermediate pg_dump states, so
   we could do the whole dump asynchronously and then do FORCE ALL
   COMMITS; or whatever at the end.
  
  Actually, I had assumed that a session disconnection forced a WAL fsync
  flush, but now I doubt that.  Seems only server shutdown does that, or a
  checkpoint.  Would this work?
  
  SET synchronous_commit=on;
  CREATE TABLE dummy(x int);
  DROP TABLE dummy;
 
 AFAIR any transaction that modifies catalogs gets sync commit forcibly,
 regardless of the setting.  And sync commit means you get to wait for

Uh, I am not seeing that my testing because I was only doing CREATE
TABLE and it was affected by the synchronous_commit value.

 all previous transactions to be flushed as well.  So simply creating a
 temp table ought to do the trick ...

I don't think TEMP tables write to WAL, for performance reasons.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg_upgrade speed for many tables

2012-11-05 Thread Robert Haas
On Mon, Nov 5, 2012 at 4:33 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 AFAIR any transaction that modifies catalogs gets sync commit forcibly,
 regardless of the setting.  And sync commit means you get to wait for
 all previous transactions to be flushed as well.  So simply creating a
 temp table ought to do the trick ...

I don't think there's a carve-out for system tables ... but creating a
temp table with synchronous_commit=on will certainly do the trick.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg_upgrade speed for many tables

2012-11-05 Thread Bruce Momjian
On Mon, Nov  5, 2012 at 04:39:27PM -0500, Robert Haas wrote:
 On Mon, Nov 5, 2012 at 4:33 PM, Alvaro Herrera alvhe...@2ndquadrant.com 
 wrote:
  AFAIR any transaction that modifies catalogs gets sync commit forcibly,
  regardless of the setting.  And sync commit means you get to wait for
  all previous transactions to be flushed as well.  So simply creating a
  temp table ought to do the trick ...
 
 I don't think there's a carve-out for system tables ... but creating a
 temp table with synchronous_commit=on will certainly do the trick.

What is a temp table writing to WAL?  The pg_class/pg_attribute changes?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg_upgrade speed for many tables

2012-11-05 Thread Robert Haas
On Mon, Nov 5, 2012 at 4:42 PM, Bruce Momjian br...@momjian.us wrote:
 On Mon, Nov  5, 2012 at 04:39:27PM -0500, Robert Haas wrote:
 On Mon, Nov 5, 2012 at 4:33 PM, Alvaro Herrera alvhe...@2ndquadrant.com 
 wrote:
  AFAIR any transaction that modifies catalogs gets sync commit forcibly,
  regardless of the setting.  And sync commit means you get to wait for
  all previous transactions to be flushed as well.  So simply creating a
  temp table ought to do the trick ...

 I don't think there's a carve-out for system tables ... but creating a
 temp table with synchronous_commit=on will certainly do the trick.

 What is a temp table writing to WAL?  The pg_class/pg_attribute changes?

Yes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg_upgrade speed for many tables

2012-11-05 Thread Andres Freund
On Mon, Nov 05, 2012 at 04:42:56PM -0500, Bruce Momjian wrote:
 On Mon, Nov  5, 2012 at 04:39:27PM -0500, Robert Haas wrote:
  On Mon, Nov 5, 2012 at 4:33 PM, Alvaro Herrera alvhe...@2ndquadrant.com 
  wrote:
   AFAIR any transaction that modifies catalogs gets sync commit forcibly,
   regardless of the setting.  And sync commit means you get to wait for
   all previous transactions to be flushed as well.  So simply creating a
   temp table ought to do the trick ...

SET synchronous_commit = on;
SELECT txid_current();

Should be enough.

  I don't think there's a carve-out for system tables ... but creating a
  temp table with synchronous_commit=on will certainly do the trick.

 What is a temp table writing to WAL?  The pg_class/pg_attribute changes?

Yes.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg_upgrade speed for many tables

2012-11-05 Thread Jeff Janes
On Mon, Nov 5, 2012 at 1:39 PM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Nov 5, 2012 at 4:33 PM, Alvaro Herrera alvhe...@2ndquadrant.com 
 wrote:
 AFAIR any transaction that modifies catalogs gets sync commit forcibly,
 regardless of the setting.  And sync commit means you get to wait for
 all previous transactions to be flushed as well.  So simply creating a
 temp table ought to do the trick ...

 I don't think there's a carve-out for system tables ... but creating a
 temp table with synchronous_commit=on will certainly do the trick.

But that seems like something that might be optimized away in the
future (for example, so that temp tables can be used on hot standbys)
resulting in action-at-a-distance breakage.

Is txid_current() more fundamental, i.e. less likely to change?

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg_upgrade speed for many tables

2012-11-05 Thread Jeff Janes
On Mon, Nov 5, 2012 at 12:14 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Bruce Momjian br...@momjian.us writes:
 Magnus reported that a customer with a million tables was finding
 pg_upgrade slow.

 You sure there's not an O(N^2) issue in there somewhere?

There certainly will be before he gets to a million, but it probably
doesn't show up yet testing at 2000.

He will probably have to hack pg_dump, as discussed here:

http://archives.postgresql.org/pgsql-performance/2012-09/msg3.php



 I don't see anything unsafe about having pg_upgrade use
 synchronous_commit=off.

 No objection, but this seems unlikely to be better than linear speedup,
 with a not-terribly-large constant factor.

 BTW, does pg_upgrade run pg_restore in --single-transaction mode?
 That would probably make synchronous_commit moot, at least for that
 step.

Doing that might make the sync problem better, but would make the N^2
problem worse if upgrading to = 9.2 .

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg_upgrade speed for many tables

2012-11-05 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 That could generate a lot of WAL files if used regularly.  :-(  Does
 SELECT txid_current() generate WAL?  I think it does.

Well, it assigns a XID.  I'm not sure it'd be a good idea to assume that
the mere act of doing that, without actually writing anything to tables,
would result in a synchronous commit.  (For example, if the transaction
were to abort not commit, I'm pretty sure we'd not bother to fsync its
abort record.  There might be, today or in the future, a similar
optimization for successful xacts that created no WAL records.)

I thought the idea of creating a temp table was the most robust one.
A regular table would be even more certain to generate an fsync, but
it has the disadvantages that you can't easily guarantee no name
collision against a user table, nor guarantee that the table wouldn't
be left behind after a crash at the wrong instant.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg_upgrade speed for many tables

2012-11-05 Thread Josh Berkus

 Sorry, I should've said psql --single-transaction.  Although that isn't
 going to work either given the presence of \connect commands in the
 script.  I wonder whether pg_dumpall ought to have some sort of one
 transaction per database please option.

pg_dumpall ought to support -Fc output ...


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg_upgrade speed for many tables

2012-11-05 Thread Bruce Momjian
On Mon, Nov  5, 2012 at 05:39:40PM -0800, Josh Berkus wrote:
 
  Sorry, I should've said psql --single-transaction.  Although that isn't
  going to work either given the presence of \connect commands in the
  script.  I wonder whether pg_dumpall ought to have some sort of one
  transaction per database please option.
 
 pg_dumpall ought to support -Fc output ...

That is already a TODO:

Add pg_dumpall custom format dumps?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg_upgrade speed for many tables

2012-11-05 Thread Andrew Dunstan


On 11/05/2012 08:52 PM, Bruce Momjian wrote:

On Mon, Nov  5, 2012 at 05:39:40PM -0800, Josh Berkus wrote:

Sorry, I should've said psql --single-transaction.  Although that isn't
going to work either given the presence of \connect commands in the
script.  I wonder whether pg_dumpall ought to have some sort of one
transaction per database please option.

pg_dumpall ought to support -Fc output ...

That is already a TODO:

Add pg_dumpall custom format dumps?



That '?' isn't an accident. Custom format is currently inherently 
single-database. Unless you're going to make pg_dumpall produce multiple 
custom format archives, that would involve a major change that nobody 
has designed AFAIK.



cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers