Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
Gregory Stark wrote: "Kris Jurka" <[EMAIL PROTECTED]> writes: On Mon, 28 Jan 2008, Jeff Davis wrote: I think that pg_dump is a reasonable use case for synchoronized scans when the table has not been clustered. It could potentially make pg_dump have much less of a performance impact when run against an active system. One of the advantages I see with maintaining table dump order is that rsyncing backups to remote locations will work better. I can't see what scenario you're talking about here. pg_dump your live database, restore it elsewhere, then shut down the production database and run rsync from the live database to the restored one? Why not just run rsync for the initial transfer? take a dump (maybe in plaintext format) save it to disk and use rsync to copy it elsewhere. the more "similiar" the dumps the more efficient rsync can copy the data over. Stefan ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
"Kris Jurka" <[EMAIL PROTECTED]> writes: > On Mon, 28 Jan 2008, Jeff Davis wrote: > >> I think that pg_dump is a reasonable use case for synchoronized scans >> when the table has not been clustered. It could potentially make pg_dump >> have much less of a performance impact when run against an active >> system. > > One of the advantages I see with maintaining table dump order is that rsyncing > backups to remote locations will work better. I can't see what scenario you're talking about here. pg_dump your live database, restore it elsewhere, then shut down the production database and run rsync from the live database to the restored one? Why not just run rsync for the initial transfer? I can't see that working well for a real database and a database loaded from a pg_dump anyways. Every dead record will introduce skew, plus page headers, and every record will have a different system data such as xmin for one. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
On Mon, 28 Jan 2008, Jeff Davis wrote: I think that pg_dump is a reasonable use case for synchoronized scans when the table has not been clustered. It could potentially make pg_dump have much less of a performance impact when run against an active system. One of the advantages I see with maintaining table dump order is that rsyncing backups to remote locations will work better. Kris Jurka ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
On Sun, 2008-01-27 at 15:07 -0500, Tom Lane wrote: > Per today's -hackers discussion, add a GUC variable to allow clients to > disable the new synchronized-scanning behavior, and make pg_dump disable > sync scans so that it will reliably preserve row ordering. This is a > pretty trivial patch, but seeing how late we are in the 8.3 release > cycle, I thought I'd better post it for comment anyway. I apologize for the late reply, but I have one comment I'd like to add. > + if (g_fout->remoteVersion >= 80300) > + do_sql_command(g_conn, "SET synchronized_scanning TO off"); > + > + /* >* Start serializable transaction to dump consistent data. >*/ I think that pg_dump is a reasonable use case for synchoronized scans when the table has not been clustered. It could potentially make pg_dump have much less of a performance impact when run against an active system. I think it's worth considering enabling sync scans for non-clustered tables if it would not interfere with the release. Of course, a painless 8.3 release is the top priority. Regards, Jeff Davis ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
Guillaume Smet wrote: Hi Russell, On Jan 28, 2008 7:27 AM, Russell Smith <[EMAIL PROTECTED]> wrote: Can somebody explain why it's important to load with synchronized_scanning off? do_sql_command(g_conn, "SET synchronized_scanning TO off"); It's the start point of this patch. See this thread [ http://archives.postgresql.org/pgsql-hackers/2008-01/msg00987.php ] for more information Sorry, total brain fade in interpreting the patch. g_conn is our connection to the database, not the command we are dumping. Sorry Russell ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
Hi Russell, On Jan 28, 2008 7:27 AM, Russell Smith <[EMAIL PROTECTED]> wrote: > Can somebody explain why it's important to load with > synchronized_scanning off? > > do_sql_command(g_conn, "SET synchronized_scanning TO off"); It's the start point of this patch. See this thread [ http://archives.postgresql.org/pgsql-hackers/2008-01/msg00987.php ] for more information. -- Guillaume ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
On Mon, 2008-01-28 at 17:27 +1100, Russell Smith wrote: > Can somebody explain why it's important to load with > synchronized_scanning off? *Loading* with synchronized scanning off is not important (and is not implemented by the patch). *Dumping* with synchronized scanning off is necessary to ensure that the order of the rows in the pg_dump matches the on-disk order of the rows in the table, which is important if you want to preserve the clustering of the table data on restore. See the -hackers thread: http://markmail.org/message/qbytsco6oj2qkxsa -Neil ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
Tom Lane wrote: Per today's -hackers discussion, add a GUC variable to allow clients to disable the new synchronized-scanning behavior, and make pg_dump disable sync scans so that it will reliably preserve row ordering. This is a pretty trivial patch, but seeing how late we are in the 8.3 release cycle, I thought I'd better post it for comment anyway. regards, tom lane Index: src/bin/pg_dump/pg_dump.c === RCS file: /cvsroot/pgsql/src/bin/pg_dump/pg_dump.c,v retrieving revision 1.481 diff -c -r1.481 pg_dump.c *** src/bin/pg_dump/pg_dump.c 1 Jan 2008 19:45:55 - 1.481 --- src/bin/pg_dump/pg_dump.c 27 Jan 2008 20:00:18 - *** *** 553,558 --- 553,572 do_sql_command(g_conn, "SET DATESTYLE = ISO"); /* +* If supported, set extra_float_digits so that we can dump float data +* exactly (given correctly implemented float I/O code, anyway) +*/ + if (g_fout->remoteVersion >= 70400) + do_sql_command(g_conn, "SET extra_float_digits TO 2"); + + /* +* If synchronized scanning is supported, disable it, to prevent +* unpredictable changes in row ordering across a dump and reload. +*/ + if (g_fout->remoteVersion >= 80300) + do_sql_command(g_conn, "SET synchronized_scanning TO off"); + + /* * Start serializable transaction to dump consistent data. */ do_sql_command(g_conn, "BEGIN"); Hi, Can somebody explain why it's important to load with synchronized_scanning off? do_sql_command(g_conn, "SET synchronized_scanning TO off"); Thanks Russell Smith ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
On Sun, 2008-01-27 at 21:54 +, Gregory Stark wrote: > I liked the "synchronized_sequential_scans" idea myself. I think that's a bit too long. How about "synchronized_scans", or "synchronized_seqscans"? -Neil ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
Gregory Stark <[EMAIL PROTECTED]> writes: > I liked the "synchronized_sequential_scans" idea myself. The name is still open for discussion --- it's an easy search-and-replace in the patch ... regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
> >> Per today's -hackers discussion, add a GUC variable to > allow clients > >> to disable the new synchronized-scanning behavior, and > make pg_dump > >> disable sync scans so that it will reliably preserve row > ordering. > >> This is a pretty trivial patch, but seeing how late we are > in the 8.3 > >> release cycle, I thought I'd better post it for comment anyway. > > > > +1 > > I liked the "synchronized_sequential_scans" idea myself. > > But otherwise, sure. > this will save some hackwork I have to do for a import/export project I am doing. +1, please Regards, Gevik Babakhani PostgreSQL NL http://www.postgresql.nl TrueSoftware BV http://www.truesoftware.nl ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
"Jonah H. Harris" <[EMAIL PROTECTED]> writes: > On Jan 27, 2008 3:07 PM, Tom Lane <[EMAIL PROTECTED]> wrote: >> Per today's -hackers discussion, add a GUC variable to allow clients to >> disable the new synchronized-scanning behavior, and make pg_dump disable >> sync scans so that it will reliably preserve row ordering. This is a >> pretty trivial patch, but seeing how late we are in the 8.3 release >> cycle, I thought I'd better post it for comment anyway. > > +1 I liked the "synchronized_sequential_scans" idea myself. But otherwise, sure. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
On Jan 27, 2008 3:07 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > Per today's -hackers discussion, add a GUC variable to allow clients to > disable the new synchronized-scanning behavior, and make pg_dump disable > sync scans so that it will reliably preserve row ordering. This is a > pretty trivial patch, but seeing how late we are in the 8.3 release > cycle, I thought I'd better post it for comment anyway. +1 -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PATCHES] Proposed patch: synchronized_scanning GUC variable
Per today's -hackers discussion, add a GUC variable to allow clients to disable the new synchronized-scanning behavior, and make pg_dump disable sync scans so that it will reliably preserve row ordering. This is a pretty trivial patch, but seeing how late we are in the 8.3 release cycle, I thought I'd better post it for comment anyway. regards, tom lane Index: doc/src/sgml/config.sgml === RCS file: /cvsroot/pgsql/doc/src/sgml/config.sgml,v retrieving revision 1.162 diff -c -r1.162 config.sgml *** doc/src/sgml/config.sgml27 Jan 2008 19:12:28 - 1.162 --- doc/src/sgml/config.sgml27 Jan 2008 20:00:16 - *** *** 4611,4616 --- 4611,4638 + + synchronized_scanning (boolean) + +synchronized_scanning configuration parameter + + + + This allows sequential scans of large tables to synchronize with each + other, so that concurrent scans read the same block at about the + same time and hence share the I/O workload. When this is enabled, + a scan might start in the middle of the table and then wrap + around the end to cover all rows, so as to synchronize with the + activity of scans already in progress. This can result in + unpredictable changes in the row ordering returned by queries that + have no ORDER BY clause. Setting this parameter to + off ensures the pre-8.3 behavior in which a sequential + scan always starts from the beginning of the table. The default + is on. + + + + Index: src/backend/access/heap/heapam.c === RCS file: /cvsroot/pgsql/src/backend/access/heap/heapam.c,v retrieving revision 1.248 diff -c -r1.248 heapam.c *** src/backend/access/heap/heapam.c14 Jan 2008 01:39:09 - 1.248 --- src/backend/access/heap/heapam.c27 Jan 2008 20:00:18 - *** *** 59,64 --- 59,68 #include "utils/syscache.h" + /* GUC variable */ + bool synchronized_scanning = true; + + static HeapScanDesc heap_beginscan_internal(Relation relation, Snapshot snapshot, int nkeys, ScanKey key, *** *** 104,110 * the thresholds for these features could be different, we make them the * same so that there are only two behaviors to tune rather than four. * (However, some callers need to be able to disable one or both of !* these behaviors, independently of the size of the table.) * * During a rescan, don't make a new strategy object if we don't have to. */ --- 108,115 * the thresholds for these features could be different, we make them the * same so that there are only two behaviors to tune rather than four. * (However, some callers need to be able to disable one or both of !* these behaviors, independently of the size of the table; also there !* is a GUC variable that can disable synchronized scanning.) * * During a rescan, don't make a new strategy object if we don't have to. */ *** *** 129,135 scan->rs_strategy = NULL; } ! if (allow_sync) { scan->rs_syncscan = true; scan->rs_startblock = ss_get_location(scan->rs_rd, scan->rs_nblocks); --- 134,140 scan->rs_strategy = NULL; } ! if (allow_sync && synchronized_scanning) { scan->rs_syncscan = true; scan->rs_startblock = ss_get_location(scan->rs_rd, scan->rs_nblocks); Index: src/backend/utils/misc/guc.c === RCS file: /cvsroot/pgsql/src/backend/utils/misc/guc.c,v retrieving revision 1.431 diff -c -r1.431 guc.c *** src/backend/utils/misc/guc.c27 Jan 2008 19:12:28 - 1.431 --- src/backend/utils/misc/guc.c27 Jan 2008 20:00:18 - *** *** 110,115 --- 110,116 extern intCommitSiblings; extern char *default_tablespace; extern char *temp_tablespaces; + extern bool synchronized_scanning; extern bool fullPageWrites; #ifdef TRACE_SORT *** *** 1053,1058 --- 1054,1068 }, { + {"synchronized_scanning", PGC_USERSET, COMPAT_OPTIONS_PREVIOUS, + gettext_noop("Enable synchronized scans."), + NULL + }, + &synchronized_scanning, + true, NULL, NULL + }, + + { {"archive_mode", PGC_POSTMASTER, WAL_SETTINGS, gettext_noop("Allows archiving of WAL files