Re: Use of fsync; was Re: [HACKERS] Pg_upgrade speed for many tables
Applied. --- On Fri, Nov 30, 2012 at 10:43:29PM -0500, Bruce Momjian wrote: On Mon, Nov 26, 2012 at 02:43:19PM -0500, Bruce Momjian wrote: In any event, I think the documentation should caution that the upgrade should not be deemed to be a success until after a system-wide sync has been done. Even if we use the link rather than copy method, are we sure that that is safe if the directories recording those links have not been fsynced? OK, the above is something I have been thinking about, and obviously you have too. If you change fsync from off to on in a cluster, and restart it, there is no guarantee that the dirty pages you read from the kernel are actually on disk, because Postgres doesn't know they are dirty. They probably will be pushed to disk by the kernel in less than one minute, but still, it doesn't seem reliable. Should this be documented in the fsync section? Again, another reason not to use fsync=off, though your example of the file copy is a good one. As you stated, this is a problem with the file copy/link, independent of how Postgres handles the files. We can tell people to use 'sync' as root on Unix, but what about Windows? I'm pretty sure someone mentioned the way to do that on Windows in this list in the last few months, but I can't seem to find it. I thought it was the initdb fsync thread. Yep, the code is already in initdb to fsync a directory --- we just need a way for pg_upgrade to access it. I have developed the attached patch that does this. It basically adds an --sync-only option to initdb, then turns off all durability in pg_upgrade and has pg_upgrade run initdb --sync-only; this give us another nice speedup! -- SSD -- magnetic --- gitpatchgitpatch 1 11.11 11.11 11.10 11.13 1000 20.57 19.89 20.72 19.30 2000 28.02 25.81 28.50 27.53 4000 42.00 43.59 46.71 46.84 8000 89.66 74.16 89.10 73.67 16000 157.66 135.98 159.97 153.48 32000 316.24 296.90 334.74 308.59 64000 814.97 715.53 797.34 727.94 (I am very happy with these times. Thanks to Jeff Janes for his suggestions.) I have also added documentation to the 'fsync' configuration variable warning about dirty buffers and recommending flushing them to disk before the cluster is crash-recovery safe. I consider this patch ready for 9.3 application (meaning it is not a prototype). -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/contrib/pg_upgrade/pg_upgrade.c b/contrib/pg_upgrade/pg_upgrade.c new file mode 100644 index c12f15b..63df529 *** a/contrib/pg_upgrade/pg_upgrade.c --- b/contrib/pg_upgrade/pg_upgrade.c *** main(int argc, char **argv) *** 150,155 --- 150,161 new_cluster.pgdata); check_ok(); + prep_status(Sync data directory to disk); + exec_prog(UTILITY_LOG_FILE, NULL, true, + \%s/initdb\ --sync-only \%s\, new_cluster.bindir, + new_cluster.pgdata); + check_ok(); + create_script_for_cluster_analyze(analyze_script_file_name); create_script_for_old_cluster_deletion(deletion_script_file_name); diff --git a/contrib/pg_upgrade/server.c b/contrib/pg_upgrade/server.c new file mode 100644 index 49d4c8f..05d8cc0 *** a/contrib/pg_upgrade/server.c --- b/contrib/pg_upgrade/server.c *** start_postmaster(ClusterInfo *cluster) *** 209,217 * a gap of 20 from the current xid counter, so autovacuum will * not touch them. * ! * synchronous_commit=off improves object creation speed, and we only ! * modify the new cluster, so only use it there. If there is a crash, ! * the new cluster has to be recreated anyway. */ snprintf(cmd, sizeof(cmd), \%s/pg_ctl\ -w -l \%s\ -D \%s\ -o \-p %d%s%s%s%s\ start, --- 209,217 * a gap of 20 from the current xid counter, so autovacuum will * not touch them. * ! * Turn off durability requirements to improve object creation speed, and ! * we only modify the new cluster, so only use it there. If there is a ! * crash, the new cluster has to be recreated anyway. */ snprintf(cmd, sizeof(cmd), \%s/pg_ctl\ -w -l \%s\ -D \%s\ -o \-p %d%s%s%s%s\ start, *** start_postmaster(ClusterInfo *cluster) *** 219,225 (cluster-controldata.cat_ver =
Re: Use of fsync; was Re: [HACKERS] Pg_upgrade speed for many tables
On Mon, Nov 26, 2012 at 02:43:19PM -0500, Bruce Momjian wrote: In any event, I think the documentation should caution that the upgrade should not be deemed to be a success until after a system-wide sync has been done. Even if we use the link rather than copy method, are we sure that that is safe if the directories recording those links have not been fsynced? OK, the above is something I have been thinking about, and obviously you have too. If you change fsync from off to on in a cluster, and restart it, there is no guarantee that the dirty pages you read from the kernel are actually on disk, because Postgres doesn't know they are dirty. They probably will be pushed to disk by the kernel in less than one minute, but still, it doesn't seem reliable. Should this be documented in the fsync section? Again, another reason not to use fsync=off, though your example of the file copy is a good one. As you stated, this is a problem with the file copy/link, independent of how Postgres handles the files. We can tell people to use 'sync' as root on Unix, but what about Windows? I'm pretty sure someone mentioned the way to do that on Windows in this list in the last few months, but I can't seem to find it. I thought it was the initdb fsync thread. Yep, the code is already in initdb to fsync a directory --- we just need a way for pg_upgrade to access it. I have developed the attached patch that does this. It basically adds an --sync-only option to initdb, then turns off all durability in pg_upgrade and has pg_upgrade run initdb --sync-only; this give us another nice speedup! -- SSD -- magnetic --- gitpatchgitpatch 1 11.11 11.11 11.10 11.13 1000 20.57 19.89 20.72 19.30 2000 28.02 25.81 28.50 27.53 4000 42.00 43.59 46.71 46.84 8000 89.66 74.16 89.10 73.67 16000 157.66 135.98 159.97 153.48 32000 316.24 296.90 334.74 308.59 64000 814.97 715.53 797.34 727.94 (I am very happy with these times. Thanks to Jeff Janes for his suggestions.) I have also added documentation to the 'fsync' configuration variable warning about dirty buffers and recommending flushing them to disk before the cluster is crash-recovery safe. I consider this patch ready for 9.3 application (meaning it is not a prototype). -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/contrib/pg_upgrade/pg_upgrade.c b/contrib/pg_upgrade/pg_upgrade.c new file mode 100644 index c12f15b..63df529 *** a/contrib/pg_upgrade/pg_upgrade.c --- b/contrib/pg_upgrade/pg_upgrade.c *** main(int argc, char **argv) *** 150,155 --- 150,161 new_cluster.pgdata); check_ok(); + prep_status(Sync data directory to disk); + exec_prog(UTILITY_LOG_FILE, NULL, true, + \%s/initdb\ --sync-only \%s\, new_cluster.bindir, + new_cluster.pgdata); + check_ok(); + create_script_for_cluster_analyze(analyze_script_file_name); create_script_for_old_cluster_deletion(deletion_script_file_name); diff --git a/contrib/pg_upgrade/server.c b/contrib/pg_upgrade/server.c new file mode 100644 index 49d4c8f..05d8cc0 *** a/contrib/pg_upgrade/server.c --- b/contrib/pg_upgrade/server.c *** start_postmaster(ClusterInfo *cluster) *** 209,217 * a gap of 20 from the current xid counter, so autovacuum will * not touch them. * ! * synchronous_commit=off improves object creation speed, and we only ! * modify the new cluster, so only use it there. If there is a crash, ! * the new cluster has to be recreated anyway. */ snprintf(cmd, sizeof(cmd), \%s/pg_ctl\ -w -l \%s\ -D \%s\ -o \-p %d%s%s%s%s\ start, --- 209,217 * a gap of 20 from the current xid counter, so autovacuum will * not touch them. * ! * Turn off durability requirements to improve object creation speed, and ! * we only modify the new cluster, so only use it there. If there is a ! * crash, the new cluster has to be recreated anyway. */ snprintf(cmd, sizeof(cmd), \%s/pg_ctl\ -w -l \%s\ -D \%s\ -o \-p %d%s%s%s%s\ start, *** start_postmaster(ClusterInfo *cluster) *** 219,225 (cluster-controldata.cat_ver = BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? -b : -c autovacuum=off -c autovacuum_freeze_max_age=20, ! (cluster == new_cluster) ? -c synchronous_commit=off : , cluster-pgopts ? cluster-pgopts : , socket_string); /* --- 219,226 (cluster-controldata.cat_ver = BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? -b : -c autovacuum=off -c autovacuum_freeze_max_age=20, ! (cluster == new_cluster) ? ! -c
Re: Use of fsync; was Re: [HACKERS] Pg_upgrade speed for many tables
On Sat, Nov 24, 2012 at 09:42:08PM -0800, Jeff Janes wrote: On Fri, Nov 23, 2012 at 7:22 PM, Bruce Momjian br...@momjian.us wrote: On Mon, Nov 19, 2012 at 12:11:26PM -0800, Jeff Janes wrote: Yes, it is with synchronous_commit=off. (or if it wasn't originally, it is now, with the same result) Applying your fsync patch does solve the problem for me on ext4. Having the new cluster be on ext3 rather than ext4 also solves the problem, without the need for a patch; but it would be nice to more friendly to ext4, which is popular even though not recommended. Do you have numbers with synchronous-commit=off, fsync=off, and both, on ext4? for 5,000 tables like create table fooN (x serial), upgrading from 9.3dev to 9.3dev: Timings are in seconds, done twice. I had to hack pg_upgrade so that the pg_ctl stop command did -w -t 3600, otherwise I'd get an database did not shut down error for the first two. both on648.29 608.42 synchronous_commit off 250.24 366.50 fsync off 46.91 43.96 both off 41.44 44.81 Also, I did a manual sync as soon as Removing support functions from new cluster OK appears, with synchronous_commit off bug fsync on: 45.96 46.46 OK, these very convincing numbers. I am going to modify initdb to have an --fsync-only option, and have pg_upgrade use that. This is 9.3 material. In any event, I think the documentation should caution that the upgrade should not be deemed to be a success until after a system-wide sync has been done. Even if we use the link rather than copy method, are we sure that that is safe if the directories recording those links have not been fsynced? OK, the above is something I have been thinking about, and obviously you have too. If you change fsync from off to on in a cluster, and restart it, there is no guarantee that the dirty pages you read from the kernel are actually on disk, because Postgres doesn't know they are dirty. They probably will be pushed to disk by the kernel in less than one minute, but still, it doesn't seem reliable. Should this be documented in the fsync section? Again, another reason not to use fsync=off, though your example of the file copy is a good one. As you stated, this is a problem with the file copy/link, independent of how Postgres handles the files. We can tell people to use 'sync' as root on Unix, but what about Windows? I'm pretty sure someone mentioned the way to do that on Windows in this list in the last few months, but I can't seem to find it. I thought it was the initdb fsync thread. Yep, the code is already in initdb to fsync a directory --- we just need a way for pg_upgrade to access it. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Use of fsync; was Re: [HACKERS] Pg_upgrade speed for many tables
On Fri, Nov 23, 2012 at 7:22 PM, Bruce Momjian br...@momjian.us wrote: On Mon, Nov 19, 2012 at 12:11:26PM -0800, Jeff Janes wrote: Yes, it is with synchronous_commit=off. (or if it wasn't originally, it is now, with the same result) Applying your fsync patch does solve the problem for me on ext4. Having the new cluster be on ext3 rather than ext4 also solves the problem, without the need for a patch; but it would be nice to more friendly to ext4, which is popular even though not recommended. Do you have numbers with synchronous-commit=off, fsync=off, and both, on ext4? for 5,000 tables like create table fooN (x serial), upgrading from 9.3dev to 9.3dev: Timings are in seconds, done twice. I had to hack pg_upgrade so that the pg_ctl stop command did -w -t 3600, otherwise I'd get an database did not shut down error for the first two. both on648.29 608.42 synchronous_commit off 250.24 366.50 fsync off 46.91 43.96 both off 41.44 44.81 Also, I did a manual sync as soon as Removing support functions from new cluster OK appears, with synchronous_commit off bug fsync on: 45.96 46.46 In any event, I think the documentation should caution that the upgrade should not be deemed to be a success until after a system-wide sync has been done. Even if we use the link rather than copy method, are we sure that that is safe if the directories recording those links have not been fsynced? OK, the above is something I have been thinking about, and obviously you have too. If you change fsync from off to on in a cluster, and restart it, there is no guarantee that the dirty pages you read from the kernel are actually on disk, because Postgres doesn't know they are dirty. They probably will be pushed to disk by the kernel in less than one minute, but still, it doesn't seem reliable. Should this be documented in the fsync section? Again, another reason not to use fsync=off, though your example of the file copy is a good one. As you stated, this is a problem with the file copy/link, independent of how Postgres handles the files. We can tell people to use 'sync' as root on Unix, but what about Windows? I'm pretty sure someone mentioned the way to do that on Windows in this list in the last few months, but I can't seem to find it. I thought it was the initdb fsync thread. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Use of fsync; was Re: [HACKERS] Pg_upgrade speed for many tables
On Mon, Nov 19, 2012 at 12:11:26PM -0800, Jeff Janes wrote: [ Sorry for the delay in replying.] On Wed, Nov 14, 2012 at 3:55 PM, Bruce Momjian br...@momjian.us wrote: On Mon, Nov 12, 2012 at 10:29:39AM -0800, Jeff Janes wrote: Is turning off synchronous_commit enough? What about turning off fsync? I did some testing with the attached patch on a magnetic disk with no BBU that turns off fsync; With which file system? I wouldn't expect you to see a benefit with ext2 or ext3, it seems to be a peculiarity of ext4 that inhibits group fsync of new file creations but rather does each one serially. Whether it is worth applying a fix that is only needed for that one file system, I don't know. The trade-offs are not all that clear to me yet. That only ext4 shows the difference seems possible. I got these results sync_com=off fsync=off 115.90 13.51 100026.09 24.56 200033.41 31.20 400057.39 57.74 8000 102.84116.28 16000 189.43207.84 It shows fsync faster for 4k, and slower for 4k. Not sure why this is the cause but perhaps the buffering of the fsync is actually faster than doing a no-op fsync. synchronous-commit=off turns off not only the fsync at each commit, but also the write-to-kernel at each commit; so it is not surprising that it is faster at large scale. I would specify both synchronous-commit=off and fsync=off. I would like to see actual numbers showing synchronous-commit=off is also useful if we use fsync=off. When I'm doing a pg_upgrade with thousands of tables, the shutdown checkpoint after restoring the dump to the new cluster takes a very long time, as the writer drains its operation table by opening and individually fsync-ing thousands of files. This takes about 40 ms per file, which I assume is a combination of slow lap-top disk drive, and a strange deal with ext4 which makes fsyncing a recently created file very slow. But even with faster hdd, this would still be a problem if it works the same way, with every file needing 4 rotations to be fsynced and this happens in serial. Is this with the current code that does synchronous_commit=off? If not, can you test to see if this is still a problem? Yes, it is with synchronous_commit=off. (or if it wasn't originally, it is now, with the same result) Applying your fsync patch does solve the problem for me on ext4. Having the new cluster be on ext3 rather than ext4 also solves the problem, without the need for a patch; but it would be nice to more friendly to ext4, which is popular even though not recommended. Do you have numbers with synchronous-commit=off, fsync=off, and both, on ext4? Anyway, the reason I think turning fsync off might be reasonable is that as soon as the new cluster is shut down, pg_upgrade starts overwriting most of those just-fsynced file with other files from the old cluster, and AFAICT makes no effort to fsync them. So until there is a system-wide sync after the pg_upgrade finishes, your new cluster is already in mortal danger anyway. pg_upgrade does a cluster shutdown before overwriting those files. Right. So as far as the cluster is concerned, those files have been fsynced. But then the next step is go behind the cluster's back and replace those fsynced files with different files, which may or may not have been fsynced. This is what makes me thing the new cluster is in mortal danger. Not only have the new files perhaps not been fsynced, but the cluster is not even aware of this fact, so you can start it up, and then shut it down, and it still won't bother to fsync them, because as far as it is concerned they already have been. Given that, how much extra danger would be added by having the new cluster schema restore run with fsync=off? In any event, I think the documentation should caution that the upgrade should not be deemed to be a success until after a system-wide sync has been done. Even if we use the link rather than copy method, are we sure that that is safe if the directories recording those links have not been fsynced? OK, the above is something I have been thinking about, and obviously you have too. If you change fsync from off to on in a cluster, and restart it, there is no guarantee that the dirty pages you read from the kernel are actually on disk, because Postgres doesn't know they are dirty. They probably will be pushed to disk by the kernel in less than one minute, but still, it doesn't seem reliable. Should this be documented in the fsync section? Again, another reason not to use fsync=off, though your example of the file copy is a good one. As you stated, this is a problem with the file copy/link, independent of how Postgres handles the files. We can tell people to use 'sync' as root on Unix, but what about Windows? --
Re: [HACKERS] Pg_upgrade speed for many tables
On Wed, Nov 14, 2012 at 3:55 PM, Bruce Momjian br...@momjian.us wrote: On Mon, Nov 12, 2012 at 10:29:39AM -0800, Jeff Janes wrote: Is turning off synchronous_commit enough? What about turning off fsync? I did some testing with the attached patch on a magnetic disk with no BBU that turns off fsync; With which file system? I wouldn't expect you to see a benefit with ext2 or ext3, it seems to be a peculiarity of ext4 that inhibits group fsync of new file creations but rather does each one serially. Whether it is worth applying a fix that is only needed for that one file system, I don't know. The trade-offs are not all that clear to me yet. I got these results sync_com=off fsync=off 115.90 13.51 100026.09 24.56 200033.41 31.20 400057.39 57.74 8000 102.84116.28 16000 189.43207.84 It shows fsync faster for 4k, and slower for 4k. Not sure why this is the cause but perhaps the buffering of the fsync is actually faster than doing a no-op fsync. synchronous-commit=off turns off not only the fsync at each commit, but also the write-to-kernel at each commit; so it is not surprising that it is faster at large scale. I would specify both synchronous-commit=off and fsync=off. When I'm doing a pg_upgrade with thousands of tables, the shutdown checkpoint after restoring the dump to the new cluster takes a very long time, as the writer drains its operation table by opening and individually fsync-ing thousands of files. This takes about 40 ms per file, which I assume is a combination of slow lap-top disk drive, and a strange deal with ext4 which makes fsyncing a recently created file very slow. But even with faster hdd, this would still be a problem if it works the same way, with every file needing 4 rotations to be fsynced and this happens in serial. Is this with the current code that does synchronous_commit=off? If not, can you test to see if this is still a problem? Yes, it is with synchronous_commit=off. (or if it wasn't originally, it is now, with the same result) Applying your fsync patch does solve the problem for me on ext4. Having the new cluster be on ext3 rather than ext4 also solves the problem, without the need for a patch; but it would be nice to more friendly to ext4, which is popular even though not recommended. Anyway, the reason I think turning fsync off might be reasonable is that as soon as the new cluster is shut down, pg_upgrade starts overwriting most of those just-fsynced file with other files from the old cluster, and AFAICT makes no effort to fsync them. So until there is a system-wide sync after the pg_upgrade finishes, your new cluster is already in mortal danger anyway. pg_upgrade does a cluster shutdown before overwriting those files. Right. So as far as the cluster is concerned, those files have been fsynced. But then the next step is go behind the cluster's back and replace those fsynced files with different files, which may or may not have been fsynced. This is what makes me thing the new cluster is in mortal danger. Not only have the new files perhaps not been fsynced, but the cluster is not even aware of this fact, so you can start it up, and then shut it down, and it still won't bother to fsync them, because as far as it is concerned they already have been. Given that, how much extra danger would be added by having the new cluster schema restore run with fsync=off? In any event, I think the documentation should caution that the upgrade should not be deemed to be a success until after a system-wide sync has been done. Even if we use the link rather than copy method, are we sure that that is safe if the directories recording those links have not been fsynced? Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pg_upgrade speed for many tables
On Mon, Nov 12, 2012 at 10:29:39AM -0800, Jeff Janes wrote: On Mon, Nov 5, 2012 at 12:08 PM, Bruce Momjian br...@momjian.us wrote: Magnus reported that a customer with a million tables was finding pg_upgrade slow. I had never considered many table to be a problem, but decided to test it. I created a database with 2k tables like this: CREATE TABLE test1990 (x SERIAL); Running the git version of pg_upgrade on that took 203 seconds. Using synchronous_commit=off dropped the time to 78 seconds. This was tested on magnetic disks with a write-through cache. (No change on an SSD with a super-capacitor.) I don't see anything unsafe about having pg_upgrade use synchronous_commit=off. I could set it just for the pg_dump reload, but it seems safe to just use it always. We don't write to the old cluster, and if pg_upgrade fails, you have to re-initdb the new cluster anyway. Patch attached. I think it should be applied to 9.2 as well. Is turning off synchronous_commit enough? What about turning off fsync? I did some testing with the attached patch on a magnetic disk with no BBU that turns off fsync; I got these results: sync_com=off fsync=off 115.90 13.51 100026.09 24.56 200033.41 31.20 400057.39 57.74 8000 102.84116.28 16000 189.43207.84 It shows fsync faster for 4k, and slower for 4k. Not sure why this is the cause but perhaps the buffering of the fsync is actually faster than doing a no-op fsync. I don't think fsync=off makes sense, except for testing; let me know if I should test something else. When I'm doing a pg_upgrade with thousands of tables, the shutdown checkpoint after restoring the dump to the new cluster takes a very long time, as the writer drains its operation table by opening and individually fsync-ing thousands of files. This takes about 40 ms per file, which I assume is a combination of slow lap-top disk drive, and a strange deal with ext4 which makes fsyncing a recently created file very slow. But even with faster hdd, this would still be a problem if it works the same way, with every file needing 4 rotations to be fsynced and this happens in serial. Is this with the current code that does synchronous_commit=off? If not, can you test to see if this is still a problem? Worse, the shutdown only waits for the default of 60 seconds for the shutdown to take place before it throws an error and the entire pg_upgrade gives up. It seems to me that either the -t setting should be increased, or should be an option to pg_upgrade. My work around was to invoke a system-wide sync a couple seconds after the 'pg_ctl stop' is initiated. Flushing the files wholesale seems to work to make the checkpoint writer rapidly find it has nothing to do when it tries to flush them retail. Anyway, the reason I think turning fsync off might be reasonable is that as soon as the new cluster is shut down, pg_upgrade starts overwriting most of those just-fsynced file with other files from the old cluster, and AFAICT makes no effort to fsync them. So until there is a system-wide sync after the pg_upgrade finishes, your new cluster is already in mortal danger anyway. pg_upgrade does a cluster shutdown before overwriting those files. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/contrib/pg_upgrade/server.c b/contrib/pg_upgrade/server.c new file mode 100644 index 49d4c8f..01e0dd3 *** a/contrib/pg_upgrade/server.c --- b/contrib/pg_upgrade/server.c *** start_postmaster(ClusterInfo *cluster) *** 219,225 (cluster-controldata.cat_ver = BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? -b : -c autovacuum=off -c autovacuum_freeze_max_age=20, ! (cluster == new_cluster) ? -c synchronous_commit=off : , cluster-pgopts ? cluster-pgopts : , socket_string); /* --- 219,225 (cluster-controldata.cat_ver = BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? -b : -c autovacuum=off -c autovacuum_freeze_max_age=20, ! (cluster == new_cluster) ? -c fsync=off : , cluster-pgopts ? cluster-pgopts : , socket_string); /* -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pg_upgrade speed for many tables
On Mon, Nov 5, 2012 at 12:08 PM, Bruce Momjian br...@momjian.us wrote: Magnus reported that a customer with a million tables was finding pg_upgrade slow. I had never considered many table to be a problem, but decided to test it. I created a database with 2k tables like this: CREATE TABLE test1990 (x SERIAL); Running the git version of pg_upgrade on that took 203 seconds. Using synchronous_commit=off dropped the time to 78 seconds. This was tested on magnetic disks with a write-through cache. (No change on an SSD with a super-capacitor.) I don't see anything unsafe about having pg_upgrade use synchronous_commit=off. I could set it just for the pg_dump reload, but it seems safe to just use it always. We don't write to the old cluster, and if pg_upgrade fails, you have to re-initdb the new cluster anyway. Patch attached. I think it should be applied to 9.2 as well. Is turning off synchronous_commit enough? What about turning off fsync? When I'm doing a pg_upgrade with thousands of tables, the shutdown checkpoint after restoring the dump to the new cluster takes a very long time, as the writer drains its operation table by opening and individually fsync-ing thousands of files. This takes about 40 ms per file, which I assume is a combination of slow lap-top disk drive, and a strange deal with ext4 which makes fsyncing a recently created file very slow. But even with faster hdd, this would still be a problem if it works the same way, with every file needing 4 rotations to be fsynced and this happens in serial. Worse, the shutdown only waits for the default of 60 seconds for the shutdown to take place before it throws an error and the entire pg_upgrade gives up. It seems to me that either the -t setting should be increased, or should be an option to pg_upgrade. My work around was to invoke a system-wide sync a couple seconds after the 'pg_ctl stop' is initiated. Flushing the files wholesale seems to work to make the checkpoint writer rapidly find it has nothing to do when it tries to flush them retail. Anyway, the reason I think turning fsync off might be reasonable is that as soon as the new cluster is shut down, pg_upgrade starts overwriting most of those just-fsynced file with other files from the old cluster, and AFAICT makes no effort to fsync them. So until there is a system-wide sync after the pg_upgrade finishes, your new cluster is already in mortal danger anyway. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pg_upgrade speed for many tables
On Mon, 2012-11-12 at 10:29 -0800, Jeff Janes wrote: When I'm doing a pg_upgrade with thousands of tables, the shutdown checkpoint after restoring the dump to the new cluster takes a very long time, as the writer drains its operation table by opening and individually fsync-ing thousands of files. This reminds me of the fix I did for initdb to sync the files. I think we do need to make sure they are sync'd, because ext4 can keep buffers around for quite a long time without cleaning them. I ended up using sync_file_range(..., SYNC_FILE_RANGE_WRITE) on linux, and posix_fadvise(..., POSIX_FADV_DONTNEED) on everything else, and that made subsequent fsyncs more efficient. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pg_upgrade speed for many tables
On Mon, Nov 5, 2012 at 03:08:17PM -0500, Bruce Momjian wrote: Magnus reported that a customer with a million tables was finding pg_upgrade slow. I had never considered many table to be a problem, but decided to test it. I created a database with 2k tables like this: CREATE TABLE test1990 (x SERIAL); Running the git version of pg_upgrade on that took 203 seconds. Using synchronous_commit=off dropped the time to 78 seconds. This was tested on magnetic disks with a write-through cache. (No change on an SSD with a super-capacitor.) I don't see anything unsafe about having pg_upgrade use synchronous_commit=off. I could set it just for the pg_dump reload, but it seems safe to just use it always. We don't write to the old cluster, and if pg_upgrade fails, you have to re-initdb the new cluster anyway. Patch attached. I think it should be applied to 9.2 as well. Modified patch attached and applied to head and 9.2. I decided to use synchronous_commit=off only on the new cluster, just in case we ever do make a modification of the old cluster. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/contrib/pg_upgrade/server.c b/contrib/pg_upgrade/server.c new file mode 100644 index a9f9d85..49d4c8f *** a/contrib/pg_upgrade/server.c --- b/contrib/pg_upgrade/server.c *** start_postmaster(ClusterInfo *cluster) *** 208,220 * maximum. We assume all datfrozenxid and relfrozen values are less than * a gap of 20 from the current xid counter, so autovacuum will * not touch them. */ snprintf(cmd, sizeof(cmd), ! \%s/pg_ctl\ -w -l \%s\ -D \%s\ -o \-p %d %s %s%s\ start, cluster-bindir, SERVER_LOG_FILE, cluster-pgconfig, cluster-port, (cluster-controldata.cat_ver = ! BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? -b : ! -c autovacuum=off -c autovacuum_freeze_max_age=20, cluster-pgopts ? cluster-pgopts : , socket_string); /* --- 208,225 * maximum. We assume all datfrozenxid and relfrozen values are less than * a gap of 20 from the current xid counter, so autovacuum will * not touch them. + * + * synchronous_commit=off improves object creation speed, and we only + * modify the new cluster, so only use it there. If there is a crash, + * the new cluster has to be recreated anyway. */ snprintf(cmd, sizeof(cmd), ! \%s/pg_ctl\ -w -l \%s\ -D \%s\ -o \-p %d%s%s%s%s\ start, cluster-bindir, SERVER_LOG_FILE, cluster-pgconfig, cluster-port, (cluster-controldata.cat_ver = ! BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? -b : ! -c autovacuum=off -c autovacuum_freeze_max_age=20, ! (cluster == new_cluster) ? -c synchronous_commit=off : , cluster-pgopts ? cluster-pgopts : , socket_string); /* -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Pg_upgrade speed for many tables
Magnus reported that a customer with a million tables was finding pg_upgrade slow. I had never considered many table to be a problem, but decided to test it. I created a database with 2k tables like this: CREATE TABLE test1990 (x SERIAL); Running the git version of pg_upgrade on that took 203 seconds. Using synchronous_commit=off dropped the time to 78 seconds. This was tested on magnetic disks with a write-through cache. (No change on an SSD with a super-capacitor.) I don't see anything unsafe about having pg_upgrade use synchronous_commit=off. I could set it just for the pg_dump reload, but it seems safe to just use it always. We don't write to the old cluster, and if pg_upgrade fails, you have to re-initdb the new cluster anyway. Patch attached. I think it should be applied to 9.2 as well. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/contrib/pg_upgrade/server.c b/contrib/pg_upgrade/server.c new file mode 100644 index a9f9d85..e64d0c4 *** a/contrib/pg_upgrade/server.c --- b/contrib/pg_upgrade/server.c *** start_postmaster(ClusterInfo *cluster) *** 207,216 * vacuums can still happen, so we set autovacuum_freeze_max_age to its * maximum. We assume all datfrozenxid and relfrozen values are less than * a gap of 20 from the current xid counter, so autovacuum will ! * not touch them. */ snprintf(cmd, sizeof(cmd), ! \%s/pg_ctl\ -w -l \%s\ -D \%s\ -o \-p %d %s %s%s\ start, cluster-bindir, SERVER_LOG_FILE, cluster-pgconfig, cluster-port, (cluster-controldata.cat_ver = BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? -b : --- 207,217 * vacuums can still happen, so we set autovacuum_freeze_max_age to its * maximum. We assume all datfrozenxid and relfrozen values are less than * a gap of 20 from the current xid counter, so autovacuum will ! * not touch them. synchronous_commit=off improves object creation speed. */ snprintf(cmd, sizeof(cmd), ! \%s/pg_ctl\ -w -l \%s\ -D \%s\ -o \-p %d ! -c synchronous_commit=off %s %s%s\ start, cluster-bindir, SERVER_LOG_FILE, cluster-pgconfig, cluster-port, (cluster-controldata.cat_ver = BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? -b : -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pg_upgrade speed for many tables
Bruce Momjian br...@momjian.us writes: Magnus reported that a customer with a million tables was finding pg_upgrade slow. You sure there's not an O(N^2) issue in there somewhere? I don't see anything unsafe about having pg_upgrade use synchronous_commit=off. No objection, but this seems unlikely to be better than linear speedup, with a not-terribly-large constant factor. BTW, does pg_upgrade run pg_restore in --single-transaction mode? That would probably make synchronous_commit moot, at least for that step. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pg_upgrade speed for many tables
On Mon, Nov 5, 2012 at 9:14 PM, Tom Lane t...@sss.pgh.pa.us wrote: Bruce Momjian br...@momjian.us writes: Magnus reported that a customer with a million tables was finding pg_upgrade slow. You sure there's not an O(N^2) issue in there somewhere? I don't see anything unsafe about having pg_upgrade use synchronous_commit=off. No objection, but this seems unlikely to be better than linear speedup, with a not-terribly-large constant factor. BTW, does pg_upgrade run pg_restore in --single-transaction mode? That would probably make synchronous_commit moot, at least for that step. It doesn't use pg_restore at all - it uses the dump from pg_dumpall, which you can't reload with pg_restore. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Pg_upgrade speed for many tables
Magnus Hagander mag...@hagander.net writes: On Mon, Nov 5, 2012 at 9:14 PM, Tom Lane t...@sss.pgh.pa.us wrote: BTW, does pg_upgrade run pg_restore in --single-transaction mode? That would probably make synchronous_commit moot, at least for that step. It doesn't use pg_restore at all - it uses the dump from pg_dumpall, which you can't reload with pg_restore. Sorry, I should've said psql --single-transaction. Although that isn't going to work either given the presence of \connect commands in the script. I wonder whether pg_dumpall ought to have some sort of one transaction per database please option. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pg_upgrade speed for many tables
On Mon, Nov 5, 2012 at 03:30:32PM -0500, Tom Lane wrote: Magnus Hagander mag...@hagander.net writes: On Mon, Nov 5, 2012 at 9:14 PM, Tom Lane t...@sss.pgh.pa.us wrote: BTW, does pg_upgrade run pg_restore in --single-transaction mode? That would probably make synchronous_commit moot, at least for that step. It doesn't use pg_restore at all - it uses the dump from pg_dumpall, which you can't reload with pg_restore. Sorry, I should've said psql --single-transaction. Although that isn't going to work either given the presence of \connect commands in the script. I wonder whether pg_dumpall ought to have some sort of one transaction per database please option. pg_dumpall is already doing lots of gymnastics with SQL, and pg_upgrade splits the output file into db/user creation and object creation, so I am hesitant to add anything more in there. I was surprised by the scale of the performance improvement, but a simple table creation test confirmed that improvement, irregardless of pg_upgrade. Perhaps we should suggest synchronous_commit=off for pg_dumpall restores, particularly when using --schema-only. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pg_upgrade speed for many tables
On Mon, Nov 5, 2012 at 9:49 PM, Bruce Momjian br...@momjian.us wrote: On Mon, Nov 5, 2012 at 03:30:32PM -0500, Tom Lane wrote: Magnus Hagander mag...@hagander.net writes: On Mon, Nov 5, 2012 at 9:14 PM, Tom Lane t...@sss.pgh.pa.us wrote: BTW, does pg_upgrade run pg_restore in --single-transaction mode? That would probably make synchronous_commit moot, at least for that step. It doesn't use pg_restore at all - it uses the dump from pg_dumpall, which you can't reload with pg_restore. Sorry, I should've said psql --single-transaction. Although that isn't going to work either given the presence of \connect commands in the script. I wonder whether pg_dumpall ought to have some sort of one transaction per database please option. pg_dumpall is already doing lots of gymnastics with SQL, and pg_upgrade splits the output file into db/user creation and object creation, so I am hesitant to add anything more in there. What about running pg_dump in a loop instead of pg_dumpall? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Pg_upgrade speed for many tables
On Mon, Nov 5, 2012 at 10:01:22PM +0100, Magnus Hagander wrote: On Mon, Nov 5, 2012 at 9:49 PM, Bruce Momjian br...@momjian.us wrote: On Mon, Nov 5, 2012 at 03:30:32PM -0500, Tom Lane wrote: Magnus Hagander mag...@hagander.net writes: On Mon, Nov 5, 2012 at 9:14 PM, Tom Lane t...@sss.pgh.pa.us wrote: BTW, does pg_upgrade run pg_restore in --single-transaction mode? That would probably make synchronous_commit moot, at least for that step. It doesn't use pg_restore at all - it uses the dump from pg_dumpall, which you can't reload with pg_restore. Sorry, I should've said psql --single-transaction. Although that isn't going to work either given the presence of \connect commands in the script. I wonder whether pg_dumpall ought to have some sort of one transaction per database please option. pg_dumpall is already doing lots of gymnastics with SQL, and pg_upgrade splits the output file into db/user creation and object creation, so I am hesitant to add anything more in there. What about running pg_dump in a loop instead of pg_dumpall? Well, I could cetainly do pg_dumpall --globals-only, and then I have to create a pg_dump file for every database, and then add the \connect in there; it just seemed easier to use pg_dumpall, though the file split thing is certainly something I would like to get rid of. I think I used pg_dumpall because it was an existing tool that I assumed would be maintained to dump a full cluster. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pg_upgrade speed for many tables
On Mon, Nov 5, 2012 at 12:49 PM, Bruce Momjian br...@momjian.us wrote: On Mon, Nov 5, 2012 at 03:30:32PM -0500, Tom Lane wrote: Magnus Hagander mag...@hagander.net writes: On Mon, Nov 5, 2012 at 9:14 PM, Tom Lane t...@sss.pgh.pa.us wrote: BTW, does pg_upgrade run pg_restore in --single-transaction mode? That would probably make synchronous_commit moot, at least for that step. It doesn't use pg_restore at all - it uses the dump from pg_dumpall, which you can't reload with pg_restore. Sorry, I should've said psql --single-transaction. Although that isn't going to work either given the presence of \connect commands in the script. I wonder whether pg_dumpall ought to have some sort of one transaction per database please option. pg_dumpall is already doing lots of gymnastics with SQL, and pg_upgrade splits the output file into db/user creation and object creation, so I am hesitant to add anything more in there. I was surprised by the scale of the performance improvement, but a simple table creation test confirmed that improvement, irregardless of pg_upgrade. Perhaps we should suggest synchronous_commit=off for pg_dumpall restores, particularly when using --schema-only. Or have options for pg_dump and pg_restore to insert set synchronous_commit=off into the SQL stream? Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pg_upgrade speed for many tables
On Mon, Nov 5, 2012 at 4:07 PM, Jeff Janes jeff.ja...@gmail.com wrote: Or have options for pg_dump and pg_restore to insert set synchronous_commit=off into the SQL stream? It would be kind of neat if we had a command that would force all previously-asynchronous commits to complete. It seems likely that very, very few people would care about intermediate pg_dump states, so we could do the whole dump asynchronously and then do FORCE ALL COMMITS; or whatever at the end. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pg_upgrade speed for many tables
On Mon, Nov 5, 2012 at 04:14:47PM -0500, Robert Haas wrote: On Mon, Nov 5, 2012 at 4:07 PM, Jeff Janes jeff.ja...@gmail.com wrote: Or have options for pg_dump and pg_restore to insert set synchronous_commit=off into the SQL stream? It would be kind of neat if we had a command that would force all previously-asynchronous commits to complete. It seems likely that very, very few people would care about intermediate pg_dump states, so we could do the whole dump asynchronously and then do FORCE ALL COMMITS; or whatever at the end. Actually, I had assumed that a session disconnection forced a WAL fsync flush, but now I doubt that. Seems only server shutdown does that, or a checkpoint. Would this work? SET synchronous_commit=on; CREATE TABLE dummy(x int); DROP TABLE dummy; -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pg_upgrade speed for many tables
On Mon, Nov 5, 2012 at 1:14 PM, Robert Haas robertmh...@gmail.com wrote: On Mon, Nov 5, 2012 at 4:07 PM, Jeff Janes jeff.ja...@gmail.com wrote: Or have options for pg_dump and pg_restore to insert set synchronous_commit=off into the SQL stream? It would be kind of neat if we had a command that would force all previously-asynchronous commits to complete. It seems likely that very, very few people would care about intermediate pg_dump states, so we could do the whole dump asynchronously and then do FORCE ALL COMMITS; or whatever at the end. Yeah, I was wondering what a fool-proof way of doing that would be, without implementing a new feature. Turning synchronous_commits back on and then doing and committing a transaction guaranteed to generate WAL would do it. Would a simple 'select pg_switch_xlog();' always accomplish the desired flush? Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pg_upgrade speed for many tables
On Mon, Nov 5, 2012 at 01:23:58PM -0800, Jeff Janes wrote: On Mon, Nov 5, 2012 at 1:14 PM, Robert Haas robertmh...@gmail.com wrote: On Mon, Nov 5, 2012 at 4:07 PM, Jeff Janes jeff.ja...@gmail.com wrote: Or have options for pg_dump and pg_restore to insert set synchronous_commit=off into the SQL stream? It would be kind of neat if we had a command that would force all previously-asynchronous commits to complete. It seems likely that very, very few people would care about intermediate pg_dump states, so we could do the whole dump asynchronously and then do FORCE ALL COMMITS; or whatever at the end. Yeah, I was wondering what a fool-proof way of doing that would be, without implementing a new feature. Turning synchronous_commits back on and then doing and committing a transaction guaranteed to generate WAL would do it. Would a simple 'select pg_switch_xlog();' always accomplish the desired flush? That could generate a lot of WAL files if used regularly. :-( Does SELECT txid_current() generate WAL? I think it does. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pg_upgrade speed for many tables
Bruce Momjian escribió: On Mon, Nov 5, 2012 at 04:14:47PM -0500, Robert Haas wrote: On Mon, Nov 5, 2012 at 4:07 PM, Jeff Janes jeff.ja...@gmail.com wrote: Or have options for pg_dump and pg_restore to insert set synchronous_commit=off into the SQL stream? It would be kind of neat if we had a command that would force all previously-asynchronous commits to complete. It seems likely that very, very few people would care about intermediate pg_dump states, so we could do the whole dump asynchronously and then do FORCE ALL COMMITS; or whatever at the end. Actually, I had assumed that a session disconnection forced a WAL fsync flush, but now I doubt that. Seems only server shutdown does that, or a checkpoint. Would this work? SET synchronous_commit=on; CREATE TABLE dummy(x int); DROP TABLE dummy; AFAIR any transaction that modifies catalogs gets sync commit forcibly, regardless of the setting. And sync commit means you get to wait for all previous transactions to be flushed as well. So simply creating a temp table ought to do the trick ... -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pg_upgrade speed for many tables
On Mon, Nov 5, 2012 at 06:33:16PM -0300, Alvaro Herrera wrote: Bruce Momjian escribió: On Mon, Nov 5, 2012 at 04:14:47PM -0500, Robert Haas wrote: On Mon, Nov 5, 2012 at 4:07 PM, Jeff Janes jeff.ja...@gmail.com wrote: Or have options for pg_dump and pg_restore to insert set synchronous_commit=off into the SQL stream? It would be kind of neat if we had a command that would force all previously-asynchronous commits to complete. It seems likely that very, very few people would care about intermediate pg_dump states, so we could do the whole dump asynchronously and then do FORCE ALL COMMITS; or whatever at the end. Actually, I had assumed that a session disconnection forced a WAL fsync flush, but now I doubt that. Seems only server shutdown does that, or a checkpoint. Would this work? SET synchronous_commit=on; CREATE TABLE dummy(x int); DROP TABLE dummy; AFAIR any transaction that modifies catalogs gets sync commit forcibly, regardless of the setting. And sync commit means you get to wait for Uh, I am not seeing that my testing because I was only doing CREATE TABLE and it was affected by the synchronous_commit value. all previous transactions to be flushed as well. So simply creating a temp table ought to do the trick ... I don't think TEMP tables write to WAL, for performance reasons. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pg_upgrade speed for many tables
On Mon, Nov 5, 2012 at 4:33 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: AFAIR any transaction that modifies catalogs gets sync commit forcibly, regardless of the setting. And sync commit means you get to wait for all previous transactions to be flushed as well. So simply creating a temp table ought to do the trick ... I don't think there's a carve-out for system tables ... but creating a temp table with synchronous_commit=on will certainly do the trick. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pg_upgrade speed for many tables
On Mon, Nov 5, 2012 at 04:39:27PM -0500, Robert Haas wrote: On Mon, Nov 5, 2012 at 4:33 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: AFAIR any transaction that modifies catalogs gets sync commit forcibly, regardless of the setting. And sync commit means you get to wait for all previous transactions to be flushed as well. So simply creating a temp table ought to do the trick ... I don't think there's a carve-out for system tables ... but creating a temp table with synchronous_commit=on will certainly do the trick. What is a temp table writing to WAL? The pg_class/pg_attribute changes? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pg_upgrade speed for many tables
On Mon, Nov 5, 2012 at 4:42 PM, Bruce Momjian br...@momjian.us wrote: On Mon, Nov 5, 2012 at 04:39:27PM -0500, Robert Haas wrote: On Mon, Nov 5, 2012 at 4:33 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: AFAIR any transaction that modifies catalogs gets sync commit forcibly, regardless of the setting. And sync commit means you get to wait for all previous transactions to be flushed as well. So simply creating a temp table ought to do the trick ... I don't think there's a carve-out for system tables ... but creating a temp table with synchronous_commit=on will certainly do the trick. What is a temp table writing to WAL? The pg_class/pg_attribute changes? Yes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pg_upgrade speed for many tables
On Mon, Nov 05, 2012 at 04:42:56PM -0500, Bruce Momjian wrote: On Mon, Nov 5, 2012 at 04:39:27PM -0500, Robert Haas wrote: On Mon, Nov 5, 2012 at 4:33 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: AFAIR any transaction that modifies catalogs gets sync commit forcibly, regardless of the setting. And sync commit means you get to wait for all previous transactions to be flushed as well. So simply creating a temp table ought to do the trick ... SET synchronous_commit = on; SELECT txid_current(); Should be enough. I don't think there's a carve-out for system tables ... but creating a temp table with synchronous_commit=on will certainly do the trick. What is a temp table writing to WAL? The pg_class/pg_attribute changes? Yes. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pg_upgrade speed for many tables
On Mon, Nov 5, 2012 at 1:39 PM, Robert Haas robertmh...@gmail.com wrote: On Mon, Nov 5, 2012 at 4:33 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: AFAIR any transaction that modifies catalogs gets sync commit forcibly, regardless of the setting. And sync commit means you get to wait for all previous transactions to be flushed as well. So simply creating a temp table ought to do the trick ... I don't think there's a carve-out for system tables ... but creating a temp table with synchronous_commit=on will certainly do the trick. But that seems like something that might be optimized away in the future (for example, so that temp tables can be used on hot standbys) resulting in action-at-a-distance breakage. Is txid_current() more fundamental, i.e. less likely to change? Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pg_upgrade speed for many tables
On Mon, Nov 5, 2012 at 12:14 PM, Tom Lane t...@sss.pgh.pa.us wrote: Bruce Momjian br...@momjian.us writes: Magnus reported that a customer with a million tables was finding pg_upgrade slow. You sure there's not an O(N^2) issue in there somewhere? There certainly will be before he gets to a million, but it probably doesn't show up yet testing at 2000. He will probably have to hack pg_dump, as discussed here: http://archives.postgresql.org/pgsql-performance/2012-09/msg3.php I don't see anything unsafe about having pg_upgrade use synchronous_commit=off. No objection, but this seems unlikely to be better than linear speedup, with a not-terribly-large constant factor. BTW, does pg_upgrade run pg_restore in --single-transaction mode? That would probably make synchronous_commit moot, at least for that step. Doing that might make the sync problem better, but would make the N^2 problem worse if upgrading to = 9.2 . Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pg_upgrade speed for many tables
Bruce Momjian br...@momjian.us writes: That could generate a lot of WAL files if used regularly. :-( Does SELECT txid_current() generate WAL? I think it does. Well, it assigns a XID. I'm not sure it'd be a good idea to assume that the mere act of doing that, without actually writing anything to tables, would result in a synchronous commit. (For example, if the transaction were to abort not commit, I'm pretty sure we'd not bother to fsync its abort record. There might be, today or in the future, a similar optimization for successful xacts that created no WAL records.) I thought the idea of creating a temp table was the most robust one. A regular table would be even more certain to generate an fsync, but it has the disadvantages that you can't easily guarantee no name collision against a user table, nor guarantee that the table wouldn't be left behind after a crash at the wrong instant. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pg_upgrade speed for many tables
Sorry, I should've said psql --single-transaction. Although that isn't going to work either given the presence of \connect commands in the script. I wonder whether pg_dumpall ought to have some sort of one transaction per database please option. pg_dumpall ought to support -Fc output ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pg_upgrade speed for many tables
On Mon, Nov 5, 2012 at 05:39:40PM -0800, Josh Berkus wrote: Sorry, I should've said psql --single-transaction. Although that isn't going to work either given the presence of \connect commands in the script. I wonder whether pg_dumpall ought to have some sort of one transaction per database please option. pg_dumpall ought to support -Fc output ... That is already a TODO: Add pg_dumpall custom format dumps? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pg_upgrade speed for many tables
On 11/05/2012 08:52 PM, Bruce Momjian wrote: On Mon, Nov 5, 2012 at 05:39:40PM -0800, Josh Berkus wrote: Sorry, I should've said psql --single-transaction. Although that isn't going to work either given the presence of \connect commands in the script. I wonder whether pg_dumpall ought to have some sort of one transaction per database please option. pg_dumpall ought to support -Fc output ... That is already a TODO: Add pg_dumpall custom format dumps? That '?' isn't an accident. Custom format is currently inherently single-database. Unless you're going to make pg_dumpall produce multiple custom format archives, that would involve a major change that nobody has designed AFAIK. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers