Re: [HACKERS] Proposal for changes to recovery.conf API
On Thu, Sep 1, 2016 at 4:01 AM, Abhijit Menon-Sen wrote: > At 2016-08-31 17:15:59 +0100, si...@2ndquadrant.com wrote: >> >> * Recovery parameters would now be part of the main postgresql.conf >> infrastructure >> Any parameters set in $DATADIR/recovery.conf will be read after the >> main parameter file, similar to the way that postgresql.conf.auto is >> read. >> (Abhijit) >> >> * Parameters >> All of the parameters formerly set in recovery.conf can be set in >> postgresql.conf using RELOAD >> These parameters will have no defaults in postgresql.conf.sample >> Setting them has no effect during normal running, or once recovery ends. >> https://www.postgresql.org/docs/devel/static/archive-recovery-settings.html >> https://www.postgresql.org/docs/devel/static/recovery-target-settings.html >> https://www.postgresql.org/docs/devel/static/standby-settings.html >> (Abhijit) > > I've attached a WIP patch for the above (recovery_guc_v20160831.patch). > This was based on the unite_recoveryconf_postgresqlconf_v3.patch posted > by Fujii Masao. > > Unfortunately, some parts conflict with the patch that Simon just posted > (e.g., his patch removes trigger_file altogether, whereas mine converts > it into a GUC along the lines of the original patch). Rather than trying > to untangle that right now, I'm posting what I have as-is, and I'll post > an updated version tomorrow. - else if (recoveryTarget == RECOVERY_TARGET_XID) - ereport(LOG, - (errmsg("starting point-in-time recovery to XID %u", - recoveryTargetXid))); User loses information if those logs are removed. + {"recovery_min_apply_delay", PGC_SIGHUP, WAL_RECOVERY_TARGET, + gettext_noop("Sets the minimum delay to apply changes during recovery."), + NULL, + GUC_UNIT_MS + }, What's the point of having them all as SIGHUP? The startup process does not reload GUC parameters with ProcessConfigFile(), it works on recoveryWakeupLatch though. I'd rather let them as PGC_POSTMASTER to begin with as that's the safest approach, and then get a second patch that processes ProcessConfigFile in the startup process and switch some of them to PGC_SIGHUP. Recovery targets should not be SIGHUP, but recovery_min_apply_delay applies definitely applies to that, as well as archive_cleanup_command or restore_command. +static void +assign_recovery_target_name(const char *newval, void *extra) +{ + if (recovery_target_xid != InvalidTransactionId) + recovery_target = RECOVERY_TARGET_XID; + else if (newval[0]) + recovery_target = RECOVERY_TARGET_NAME; + else if (recovery_target_time != 0) + recovery_target = RECOVERY_TARGET_TIME; + else + recovery_target = RECOVERY_TARGET_UNSET; +} That's brittle to put that in the GUC machinery... The recovery target type should be determined only once, if archive recovery is wanted at the beginning of the startup process once and for all, and just be set up within the startup process. If multiple recovery_target_* parameters are set, we should just define the target type in order of priority, instead of the-last-one-wins that is currently present. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Add support for restrictive RLS policies
Greetings, As outlined in the commit message, this adds support for restrictive RLS policies. We've had this in the backend since 9.5, but they were only available via hooks and therefore extensions. This adds support for them to be configured through regular DDL commands. These policies are, essentially "AND"d instead of "OR"d. Includes updates to the catalog, grammer, psql, pg_dump, and regression tests. Documentation will be added soon, but until then, would be great to get feedback on the grammer, catalog and code changes. Thanks! Stephen From f4195e9c109d8323266419e487eed2b4cbaafdef Mon Sep 17 00:00:00 2001 From: Stephen Frost Date: Thu, 1 Sep 2016 02:11:30 -0400 Subject: [PATCH] Add support for restrictive RLS policies We have had support for restrictive RLS policies since 9.5, but they were only available through extensions which use the appropriate hooks. This adds support into the grammer, catalog, psql and pg_dump for restrictive RLS policies, thus reducing the cases where an extension is necessary. --- src/backend/commands/policy.c | 9 ++ src/backend/nodes/copyfuncs.c | 1 + src/backend/nodes/equalfuncs.c| 1 + src/backend/parser/gram.y | 15 +++ src/backend/rewrite/rowsecurity.c | 7 +- src/bin/pg_dump/pg_dump.c | 39 -- src/bin/pg_dump/pg_dump.h | 1 + src/bin/psql/describe.c | 109 src/bin/psql/tab-complete.c | 1 + src/include/catalog/pg_policy.h | 16 ++- src/include/nodes/parsenodes.h| 1 + src/include/rewrite/rowsecurity.h | 1 + src/test/regress/expected/rowsecurity.out | 207 ++ src/test/regress/sql/rowsecurity.sql | 22 +++- 14 files changed, 332 insertions(+), 98 deletions(-) diff --git a/src/backend/commands/policy.c b/src/backend/commands/policy.c index d694cf8..70e22c1 100644 --- a/src/backend/commands/policy.c +++ b/src/backend/commands/policy.c @@ -235,6 +235,7 @@ RelationBuildRowSecurity(Relation relation) { Datum value_datum; char cmd_value; + bool permissive_value; Datum roles_datum; char *qual_value; Expr *qual_expr; @@ -257,6 +258,12 @@ RelationBuildRowSecurity(Relation relation) Assert(!isnull); cmd_value = DatumGetChar(value_datum); + /* Get policy permissive or restrictive */ + value_datum = heap_getattr(tuple, Anum_pg_policy_polpermissive, + RelationGetDescr(catalog), &isnull); + Assert(!isnull); + permissive_value = DatumGetBool(value_datum); + /* Get policy name */ value_datum = heap_getattr(tuple, Anum_pg_policy_polname, RelationGetDescr(catalog), &isnull); @@ -298,6 +305,7 @@ RelationBuildRowSecurity(Relation relation) policy = palloc0(sizeof(RowSecurityPolicy)); policy->policy_name = pstrdup(policy_name_value); policy->polcmd = cmd_value; + policy->permissive = permissive_value; policy->roles = DatumGetArrayTypePCopy(roles_datum); policy->qual = copyObject(qual_expr); policy->with_check_qual = copyObject(with_check_qual); @@ -796,6 +804,7 @@ CreatePolicy(CreatePolicyStmt *stmt) values[Anum_pg_policy_polname - 1] = DirectFunctionCall1(namein, CStringGetDatum(stmt->policy_name)); values[Anum_pg_policy_polcmd - 1] = CharGetDatum(polcmd); + values[Anum_pg_policy_polpermissive - 1] = BoolGetDatum(stmt->permissive); values[Anum_pg_policy_polroles - 1] = PointerGetDatum(role_ids); /* Add qual if present. */ diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index 1877fb4..4fc9525 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.c @@ -4150,6 +4150,7 @@ _copyCreatePolicyStmt(const CreatePolicyStmt *from) COPY_STRING_FIELD(policy_name); COPY_NODE_FIELD(table); COPY_STRING_FIELD(cmd_name); + COPY_SCALAR_FIELD(permissive); COPY_NODE_FIELD(roles); COPY_NODE_FIELD(qual); COPY_NODE_FIELD(with_check); diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c index 448e1a9..3e4e15b 100644 --- a/src/backend/nodes/equalfuncs.c +++ b/src/backend/nodes/equalfuncs.c @@ -2122,6 +2122,7 @@ _equalCreatePolicyStmt(const CreatePolicyStmt *a, const CreatePolicyStmt *b) COMPARE_STRING_FIELD(policy_name); COMPARE_NODE_FIELD(table); COMPARE_STRING_FIELD(cmd_name); + COMPARE_SCALAR_FIELD(permissive); COMPARE_NODE_FIELD(roles); COMPARE_NODE_FIELD(qual); COMPARE_NODE_FIELD(with_check); diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index cb5cfc4..a79a1e6 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -4633,11 +4633,26 @@ CreatePolicyStmt: n->policy_name = $3; n->table = $5; n->cmd_name = $6; + n->permissive = true; n->roles = $7; n->qual = $8; n->with_check = $9; $$ = (Node *) n; } + | CREATE RESTRICT POLICY name ON qualified_name RowSecurit
Re: [HACKERS] Logical Replication WIP
On 2016-09-01 01:04, Erik Rijkers wrote: On 2016-08-31 22:51, Petr Jelinek wrote: Here are some small changes to logical-replication.sgml ... and other .sgml files. Erik Rijkers--- doc/src/sgml/ref/alter_publication.sgml.orig 2016-09-01 07:20:03.280295807 +0200 +++ doc/src/sgml/ref/alter_publication.sgml 2016-09-01 07:34:43.448625900 +0200 @@ -42,15 +42,15 @@ The first variant of this command listed in the synopsis can change - all of the publication attributes that can be specified in + all of the publication attributes specified in . Attributes not mentioned in the command retain their previous settings. Database superusers can change any of these settings for any role. - The other variants this command deal with table membership in the - publication. The FOR TABLE subscommand will replace + The other variants of this command deal with table membership in the + publication. The FOR TABLE subcommand will replace the current list of tables in the publication with the specified one. The FOR ALL TABLES variant will mark the publication as one that replicates all tables in the database (including the future --- doc/src/sgml/ref/drop_publication.sgml.orig 2016-09-01 07:43:26.599160896 +0200 +++ doc/src/sgml/ref/drop_publication.sgml 2016-09-01 07:44:16.051797277 +0200 @@ -21,7 +21,7 @@ -DROP PUBLCATION name [, ...] +DROP PUBLICATION name [, ...] @@ -29,7 +29,7 @@ Description - DROP PUBLCATION removes publications from the database. + DROP PUBLICATION removes publications from the database. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for changes to recovery.conf API
On Wed, Aug 31, 2016 at 05:15:59PM +0100, Simon Riggs wrote: > Recover mode starts the server as a standby server which > expects to receive changes from a primary/master server using physical > streaming replication or is used for performing a recovery from > backup. I understand where this is coming from, but wouldn't "standby", "replication" or something be more generally understood than "recover"? Streaming replication got bolted on to recovery, but that seems like an implementation detail by now. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.ba...@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for changes to recovery.conf API
On Thu, Sep 1, 2016 at 1:15 AM, Simon Riggs wrote: > This is a summary of proposed changes to the recovery.conf API for > v10. These are based in part on earlier discussions, and represent a > minimal modification to current usage. > > Proposed changes (with reference to patches from Abhijit Menon-Sen and myself) > > * pg_ctl start -M (normal|recover|continue) > pg_ctl can now start the server directly as a standby, similar to the > way pg_ctl promote works. The internal implementation is also similar, > with pg_ctl writing a "recovery.trigger" file that initiates recovery > in the same way recovery.conf used to do. It is still possible to > manually add a file called "recovery.trigger" and have that work if > users desire that mechanism. > Different startup methods can be selected with the -M > option. Normal mode starts the server for read/write, > overriding any previous use in Recover mode. > Recover mode starts the server as a standby server which > expects to receive changes from a primary/master server using physical > streaming replication or is used for > performing a recovery from backup. Continue mode is > the default and will startup the server in whatever mode it was in at > last proper shutdown, or as modified by any trigger files present. > (Patch: recovery_startup_r10_api.v1b.patch) So you basically just set ArchiveRecoveryRequested based on the presence of recovery.trigger instead of recovery.conf. And the interface of pg_ctl: - recover mode => creates recovery.trigger - continue mode => does nothing - normal mode => removes recovery.trigger That looks like a sound design. > * Recovery parameters would now be part of the main postgresql.conf > infrastructure > Any parameters set in $DATADIR/recovery.conf will be read after the > main parameter file, similar to the way that postgresql.conf.auto is > read. > (Abhijit) > * pg_basebackup -R will continue to generate a parameter file called > recovery.conf as it does now, but will also create a file named > recovery.trigger. > (This part is WIP; patch doesn't include implementation for tar format yet) Or we could just throw away this dependency with recovery.conf, simply. I see no need to keep it if recovery is triggered depending on recovery.trigger, nor recommend its use. We could instead add include_if_exists 'recovery.conf' at the bottom of postgresql.conf and let the infrastructure do the rest to simplify the patch. > * Parameters > All of the parameters formerly set in recovery.conf can be set in > postgresql.conf using RELOAD > These parameters will have no defaults in postgresql.conf.sample > Setting them has no effect during normal running, or once recovery ends. > https://www.postgresql.org/docs/devel/static/archive-recovery-settings.html > https://www.postgresql.org/docs/devel/static/recovery-target-settings.html > https://www.postgresql.org/docs/devel/static/standby-settings.html > (Abhijit) Hm. I think that what would make sense here is a new GUC category, meaning that once recovery is launched the new parameters are not taken into account once again. Even updating primary_conninfo would need a restart of the WAL receiver, so we could just make them GUC_POSTMASTER and be done with it. > Related cleanup > * Promotion signal file is now called "promote.trigger" rather than > just "promote" If that's not strictly necessary this renaming is not mandatory. > * Remove user configurable "trigger_file" mechanism - use > "promote.trigger" for all cases Ugh. I am -1 on that. There are likely backup tools and users that rely on this option, particularly to be able to trigger promotion using a file that is on a different partition than PGDATA. > * Remove Fallback promote mechanism, so all promotion is now "fast" in xlog.c No problem with that. Now others have surely other opinions. That could be addressed as a separate patch. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)
On 1 September 2016 at 13:08, Craig Ringer wrote: > On 29 August 2016 at 15:53, Craig Ringer wrote: > >> Said better approach attached in revised series. Thanks. > > Here's another minor update to the txid_status() and > txid_convert_if_recent() patches. The only change is moving > get_xid_in_recent_past from src/backend/utils/adt/txid.c to > src/backend/access/transam/xlog.c to permit its use by other code. > Specifically, I think it'll be needed for logical decoding on standby. Ahem, wrong patches. Attached correctly now. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From 1e81a5a7c8a450925919903ab68b70926af9d69d Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Fri, 19 Aug 2016 14:44:15 +0800 Subject: [PATCH 1/3] Introduce txid_status(bigint) to get status of an xact If an appliation is disconnected while a COMMIT request is in flight, the backend crashes mid-commit, etc, then an application may not be sure whether or not a commit completed successfully or was rolled back. While two-phase commit solves this it does so at a considerable overhead, so introduce a lighter alternative. txid_status(bigint) lets an application determine the status of a a commit based on an xid-with-epoch as returned by txid_current() or similar. Status may be committed, aborted, in-progress (including prepared xacts) or null if the xact is too old for its commit status to still be retained because it has passed the wrap-around epoch boundary. Applications must call txid_current() in their transactions to make much use of this since PostgreSQL does not automatically report an xid to the client when one is assigned. --- doc/src/sgml/func.sgml | 31 + src/backend/access/transam/clog.c | 23 - src/backend/access/transam/xlog.c | 62 ++ src/backend/utils/adt/txid.c | 58 src/include/access/clog.h | 23 + src/include/access/xlog.h | 2 ++ src/include/catalog/pg_proc.h | 2 ++ src/include/utils/builtins.h | 1 + src/test/regress/expected/txid.out | 68 ++ src/test/regress/sql/txid.sql | 38 + 10 files changed, 285 insertions(+), 23 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 5148095..d8b086f 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -17143,6 +17143,10 @@ SELECT collation for ('foo' COLLATE "de_DE"); txid_visible_in_snapshot + +txid_status + + The functions shown in provide server transaction information in an exportable form. The main @@ -17193,6 +17197,11 @@ SELECT collation for ('foo' COLLATE "de_DE"); boolean is transaction ID visible in snapshot? (do not use with subtransaction ids) + + txid_status(bigint) + txid_status + report the status of the given xact - committed, aborted, in progress, or NULL if the xid is too old + @@ -17263,6 +17272,28 @@ SELECT collation for ('foo' COLLATE "de_DE"); +txid_status(bigint) reports the commit status of a recent +transaction. Applications may use it to determine whether a transaction +committed or aborted when the application and/or database server crashed or +lost connection while a COMMIT command was in progress. +The status of a transaction will be reported as one of: + + 'in progress' + 'committed' + 'aborted' + NULL if xid too old + +PostgreSQL discards the commit status transactions after no references to +the transaction survive in other active transactions, tables, replication +slots, etc. This means that the status of older transactions cannot be +determined. txid_status(bigint) returns NULL if a +transaction is too old to look up. Prepared transactions are reported as +in progress; applications must check pg_prepared_xacts if they +need to determine if the xid is a prepared transaction. + + + The functions shown in provide information about transactions that have been already committed. These functions mainly provide information about when the transactions diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c index 2634476..1a6e26d 100644 --- a/src/backend/access/transam/clog.c +++ b/src/backend/access/transam/clog.c @@ -41,29 +41,6 @@ #include "miscadmin.h" #include "pg_trace.h" -/* - * Defines for CLOG page sizes. A page is the same BLCKSZ as is used - * everywhere else in Postgres. - * - * Note: because TransactionIds are 32 bits and wrap around at 0x, - * CLOG page numbering also wraps around at 0x/CLOG_XACTS_PER_PAGE, - * and CLOG segment numbering at - * 0x/CLOG_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need take no - * explicit notice of th
Re: [HACKERS] WAL consistency check facility
On Thu, Sep 1, 2016 at 9:43 AM, Peter Geoghegan wrote: > On Wed, Aug 31, 2016 at 8:26 PM, Amit Kapila wrote: >>> As far as I am understanding things, we are aiming at something that >>> could be used on production systems. >>> >> >> I don't think you can enable it by default in production systems. >> Enabling it will lead to significant performance drop as it writes the >> whole page after each record for most type of RMGR ID's. >> >>> And, honestly, any people >>> enabling it would just do it for all RMGRs because that's a >>> no-brainer. >> >> Agreed, but remember enabling it for all is not free. > > I have sympathy for the idea that this should be as low overhead as > possible, even if that means adding complexity to the interface -- > within reason. I would like to hear a practical example of where this > RMGR id interface could be put to good use, when starting with little > initial information about a problem. > One example that comes to mind is for the cases where the problem reproduces only under high concurrency or some stress test. Now assume the problem is with index, enabling it for all rmgr's could reduce the probability of problem due to it's performance impact. The second advantage which I have already listed is it helps in future development like the one I am doing now for hash indexes (making them logged). > And, ideally, we'd also have some > indication of how big a difference that would make, it terms of > measurable performance impact. > Yes, that's a valid point. I think we can do some tests to see the difference. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2016/08/25 16:15, Ashutosh Bapat wrote: > On Thu, Aug 25, 2016 at 12:22 PM, Amit Langote wrote: >> b) >> when accumulating append subpaths, do not flatten a subpath that is itself >> an append when ((AppendPath *) subpath)->path.parent is a RelOptInfo with >> non-NULL partitioning info.Is the latter somehow necessary for >> pairwise-join considerations? > > I don't think you need to do anything in the path creation code for this. > As is it flattens all AppendPath hierarchies whether for partitioning or > inheritance or subqueries. We should leave it as it is. I thought it would be convenient for pairwise join code to work with the hierarchy intact even within the AppendPath tree. If it turns out to be so, maybe that patch can take care of it. >> I think I can manage to squeeze in (a) in the next version patch and will >> also start working on (b), mainly the part about RelOptInfo getting some >> partitioning info. > > I am fine with b, where you would include some partitioning information in > RelOptInfo. But you don't need to do what you said in (b) above. > > In a private conversation Robert Haas suggested a way slightly different > than what my patch for partition-wise join does. He suggested that the > partitioning schemes i.e strategy, number of partitions and bounds of the > partitioned elations involved in the query should be stored in PlannerInfo > in the form of a list. Each partitioning scheme is annotated with the > relids of the partitioned relations. RelOptInfo of the partitioned relation > will point to the partitioning scheme in PlannerInfo. Along-with that each > RelOptInfo will need to store partition keys for corresponding relation. > This simplifies matching the partitioning schemes of the joining relations. > Also it reduces the number of copies of partition bounds floating around as > we expect that a query will involve multiple partitioned tables following > similar partitioning schemes. May be you want to consider this idea while > working on (b). So IIUC, a partitioned relation's (baserel or joinrel) RelOptInfo has only the information about partition keys. They will be matched with query restriction quals pruning away any unneeded partitions which happens individually for each such parent baserel (within set_append_rel_size() I suppose). Further, two joining relations are eligible to be considered for pairwise joining if they have identical partition keys and query equi-join quals match the same. The resulting joinrel will have the same partition key (as either joining relation) and will have as many partitions as there are in the intersection of sets of partitions of joining rels (intersection proceeds by matching partition bounds). "Partition scheme" structs go into a PlannerInfo list member, one corresponding to each partitioned relation - baserel or joinrel, right? As you say, each such struct has the following pieces of information: strategy, num_partitions, bounds (and other auxiliary info). Before make_one_rel() starts, the list has one for each partitioned baserel. After make_one_rel() has formed baserel pathlists and before make_rel_from_joinlist() is called, are the partition scheme structs of processed baserels marked with some information about the pruning activity that occurred so far? Then as we build successively higher levels of joinrels, new entries will be made for those joinrels for which we added pairwise join paths, with relids matching the corresponding joinrels. Does that make sense? Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)
On 29 August 2016 at 15:53, Craig Ringer wrote: > Said better approach attached in revised series. Thanks. Here's another minor update to the txid_status() and txid_convert_if_recent() patches. The only change is moving get_xid_in_recent_past from src/backend/utils/adt/txid.c to src/backend/access/transam/xlog.c to permit its use by other code. Specifically, I think it'll be needed for logical decoding on standby. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From 44f1ddfcf43bb83ce1478c0290fbbaa916fdc22e Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Fri, 19 Aug 2016 14:44:15 +0800 Subject: [PATCH 1/3] Introduce txid_status(bigint) to get status of an xact If an appliation is disconnected while a COMMIT request is in flight, the backend crashes mid-commit, etc, then an application may not be sure whether or not a commit completed successfully or was rolled back. While two-phase commit solves this it does so at a considerable overhead, so introduce a lighter alternative. txid_status(bigint) lets an application determine the status of a a commit based on an xid-with-epoch as returned by txid_current() or similar. Status may be committed, aborted, in-progress (including prepared xacts) or null if the xact is too old for its commit status to still be retained because it has passed the wrap-around epoch boundary. Applications must call txid_current() in their transactions to make much use of this since PostgreSQL does not automatically report an xid to the client when one is assigned. --- doc/src/sgml/func.sgml | 31 + src/backend/access/transam/clog.c | 23 - src/backend/access/transam/xlog.c | 61 ++ src/backend/utils/adt/txid.c | 58 src/include/access/clog.h | 23 + src/include/access/xlog.h | 2 ++ src/include/catalog/pg_proc.h | 2 ++ src/include/utils/builtins.h | 1 + src/test/regress/expected/txid.out | 68 ++ src/test/regress/sql/txid.sql | 38 + 10 files changed, 284 insertions(+), 23 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 5148095..d8b086f 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -17143,6 +17143,10 @@ SELECT collation for ('foo' COLLATE "de_DE"); txid_visible_in_snapshot + +txid_status + + The functions shown in provide server transaction information in an exportable form. The main @@ -17193,6 +17197,11 @@ SELECT collation for ('foo' COLLATE "de_DE"); boolean is transaction ID visible in snapshot? (do not use with subtransaction ids) + + txid_status(bigint) + txid_status + report the status of the given xact - committed, aborted, in progress, or NULL if the xid is too old + @@ -17263,6 +17272,28 @@ SELECT collation for ('foo' COLLATE "de_DE"); +txid_status(bigint) reports the commit status of a recent +transaction. Applications may use it to determine whether a transaction +committed or aborted when the application and/or database server crashed or +lost connection while a COMMIT command was in progress. +The status of a transaction will be reported as one of: + + 'in progress' + 'committed' + 'aborted' + NULL if xid too old + +PostgreSQL discards the commit status transactions after no references to +the transaction survive in other active transactions, tables, replication +slots, etc. This means that the status of older transactions cannot be +determined. txid_status(bigint) returns NULL if a +transaction is too old to look up. Prepared transactions are reported as +in progress; applications must check pg_prepared_xacts if they +need to determine if the xid is a prepared transaction. + + + The functions shown in provide information about transactions that have been already committed. These functions mainly provide information about when the transactions diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c index 2634476..1a6e26d 100644 --- a/src/backend/access/transam/clog.c +++ b/src/backend/access/transam/clog.c @@ -41,29 +41,6 @@ #include "miscadmin.h" #include "pg_trace.h" -/* - * Defines for CLOG page sizes. A page is the same BLCKSZ as is used - * everywhere else in Postgres. - * - * Note: because TransactionIds are 32 bits and wrap around at 0x, - * CLOG page numbering also wraps around at 0x/CLOG_XACTS_PER_PAGE, - * and CLOG segment numbering at - * 0x/CLOG_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need take no - * explicit notice of that fact in this module, except when comparing segment - * and page numbers in TruncateCLOG (see CLOGPagePrecedes)
[HACKERS] Re: [COMMITTERS] pgsql: Add putenv support for msvcrt from Visual Studio 2013
On Wed, Apr 27, 2016 at 2:39 AM, Christian Ullrich wrote: > * Christian Ullrich wrote: > >> wrong even without considering the debug/release split. If we load a >> compiled extension built with a CRT we have not seen yet, _after_ the >> first call to pgwin32_putenv(), that module's CRT's view of its >> environment will be frozen because we will never attempt to update it. > > > Four patches attached: > > master --- 0001 --- 0002 --- 0003 > \ > `- 0004 > > 0001 is just some various fixes to set the stage. > > 0002 fixes this "load race" by not remembering a negative result anymore. > However, ... >From 0001, which does not apply anymore on HEAD because of the integration with MS2015: if (rtmodules[i].putenvFunc == NULL) { - CloseHandle(rtmodules[i].hmodule); rtmodules[i].hmodule = INVALID_HANDLE_VALUE; continue; } Nice catch. This portion is a bug and should be backpatched. As far as I can read from MS docs, GetModuleHandle() retrieves an existing handle so there is no need to free it. Or that would fail. And actually, by looking at those patches, isn't it a dangerous practice to be able to load multiple versions of the same DLL routines in the same workspace? I have personally very bad souvenirs with that, and particularly loading multiple versions of msvcr into the same workspace can cause unwanted crashes and corruptions of processes. In short I mean this thing: https://en.wikipedia.org/wiki/DLL_Hell. So, shouldn't we first make the DLL list a little bit more severe depending on the value of _MSC_VER? I would mean something like that: #ifdef _MSC_VER >= 1900 {"ucrtbase",NULL, NULL}, #elif _MSC_VER >= 1800 {"msvcr120",NULL, NULL}, #elif etc, etc. [...] #endif This would require modules to be built using the same msvc version as the core server, but that's better than just plain crash if loaded DLLs corrupt the stack. Am I missing something? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_recvlogical --endpos
Hi all Here's a rebased version of my pg_recvlogical --endpos patch from the 9.5 series, updated to incoroprate Álvaro's changes. This will be mainly useful for recovery tests as we start adding more logical decoding testing. See original post for more detail: https://www.postgresql.org/message-id/CAMsr+YHBm3mUtXb2_RD=qsnupdt0dr8k-+gtbbgprdyuzfm...@mail.gmail.com -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From 464e487e6235e1f86d06aa82a4a57716ff188579 Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Thu, 1 Sep 2016 12:37:40 +0800 Subject: [PATCH] Add an optional --endpos LSN argument to pg_reclogical MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit pg_recvlogical usually just runs until cancelled or until the upstream server disconnects. For some purposes, especially testing, it's useful to have the ability to stop receive at a specified LSN without having to parse the output and deal with buffering issues, etc. Add a --endpos parameter that takes the LSN at which no further messages should be written and receive should stop. Craig Ringer, Álvaro Herrera --- doc/src/sgml/ref/pg_recvlogical.sgml | 35 + src/bin/pg_basebackup/pg_recvlogical.c | 137 + 2 files changed, 157 insertions(+), 15 deletions(-) diff --git a/doc/src/sgml/ref/pg_recvlogical.sgml b/doc/src/sgml/ref/pg_recvlogical.sgml index b35881f..00829a7 100644 --- a/doc/src/sgml/ref/pg_recvlogical.sgml +++ b/doc/src/sgml/ref/pg_recvlogical.sgml @@ -155,6 +155,41 @@ PostgreSQL documentation + -E lsn + --endpos=lsn + + +In --start mode, automatically stop replication +and exit with normal exit status 0 when receiving reaches the +specified LSN. If specified when not in --start +mode, an error is raised. + + + +Note the following points: + + + + If there's a record with LSN exactly equal to lsn, + the record will not be output. If you want to receive up to and + including a given LSN, specify LSN + 1 as the desired stop point. + + + + + The --endpos option is not aware of transaction + boundaries and may truncate output partway through a transaction. + Any partially output transaction will not be consumed and will be + replayed again when the slot is next read from. Individual messages + are never truncated. + + + + + + + + --if-not-exists diff --git a/src/bin/pg_basebackup/pg_recvlogical.c b/src/bin/pg_basebackup/pg_recvlogical.c index 4c6cf70..5108222 100644 --- a/src/bin/pg_basebackup/pg_recvlogical.c +++ b/src/bin/pg_basebackup/pg_recvlogical.c @@ -37,6 +37,7 @@ static int noloop = 0; static int standby_message_timeout = 10 * 1000; /* 10 sec = default */ static int fsync_interval = 10 * 1000; /* 10 sec = default */ static XLogRecPtr startpos = InvalidXLogRecPtr; +static XLogRecPtr endpos = InvalidXLogRecPtr; static bool do_create_slot = false; static bool slot_exists_ok = false; static bool do_start_slot = false; @@ -60,6 +61,9 @@ static XLogRecPtr output_fsync_lsn = InvalidXLogRecPtr; static void usage(void); static void StreamLogicalLog(void); static void disconnect_and_exit(int code); +static bool flushAndSendFeedback(PGconn *conn, TimestampTz *now); +static void prepareToTerminate(PGconn *conn, XLogRecPtr endpos, + bool keepalive, XLogRecPtr lsn); static void usage(void) @@ -78,6 +82,7 @@ usage(void) " time between fsyncs to the output file (default: %d)\n"), (fsync_interval / 1000)); printf(_(" --if-not-existsdo not error if slot already exists when creating a slot\n")); printf(_(" -I, --startpos=LSN where in an existing slot should the streaming start\n")); + printf(_(" -E, --endpos=LSN exit upon receiving the specified LSN\n")); printf(_(" -n, --no-loop do not loop on connection lost\n")); printf(_(" -o, --option=NAME[=VALUE]\n" " pass option NAME with optional value VALUE to the\n" @@ -278,6 +283,7 @@ StreamLogicalLog(void) int bytes_written; int64 now; int hdr_len; + XLogRecPtr cur_record_lsn = InvalidXLogRecPtr; if (copybuf != NULL) { @@ -451,6 +457,7 @@ StreamLogicalLog(void) int pos; bool replyRequested; XLogRecPtr walEnd; + bool endposReached = false; /* * Parse the keepalive message, enclosed in the CopyData message. @@ -473,18 +480,32 @@ StreamLogicalLog(void) } replyRequested = copybuf[pos]; - /* If the server requested an immediate reply, send one. */ - if (replyRequested) + if (endpos != InvalidXLogRecPtr && walEnd >= endpos) { -/* fsync data, so we send a rec
Re: [HACKERS] WAL consistency check facility
On Wed, Aug 31, 2016 at 8:26 PM, Amit Kapila wrote: >> As far as I am understanding things, we are aiming at something that >> could be used on production systems. >> > > I don't think you can enable it by default in production systems. > Enabling it will lead to significant performance drop as it writes the > whole page after each record for most type of RMGR ID's. > >> And, honestly, any people >> enabling it would just do it for all RMGRs because that's a >> no-brainer. > > Agreed, but remember enabling it for all is not free. I have sympathy for the idea that this should be as low overhead as possible, even if that means adding complexity to the interface -- within reason. I would like to hear a practical example of where this RMGR id interface could be put to good use, when starting with little initial information about a problem. And, ideally, we'd also have some indication of how big a difference that would make, it terms of measurable performance impact. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] Logical decoding timeline following take II
Hi all Attached is a rebased and updated logical decoding timeline following patch for 10.0. This is a pre-requisite for the pending work on logical decoding on standby servers and simplified failover of logical decoding. Restating the commit message: __ Follow timeline switches in logical decoding When decoding from a logical slot, it's necessary for xlog reading to be able to read xlog from historical (i.e. not current) timelines. Otherwise decoding fails after failover to a physical replica because the oldest still-needed archives are in the historical timeline. Supporting logical decoding timeline following is a pre-requisite for logical decoding on physical standby servers. It also makes it possible to promote a replica with logical slots to a master and replay from those slots, allowing logical decoding applications to follow physical failover. Logical slots cannot actually be created on a replica without use of the low-level C slot management APIs so this is mostly foundation work for subsequent changes to enable logical decoding on standbys. This commit includes a module in src/test/modules with functions to manipulate the slots (which is not otherwise possible in SQL code) in order to enable testing, and a new test in src/test/recovery to ensure that the behavior is as expected. Note that an earlier version of logical decoding timeline following was committed to 9.5 as 24c5f1a103ce, 3a3b309041b0, 82c83b337202, and f07d18b6e94d. It was then reverted by c1543a81a7a8 just after 9.5 feature freeze when issues were discovered too late to safely fix them in the 9.5 release cycle. The prior approach failed to consider that a record could be split across pages that are on different segments, where the new segment contains the start of a new timeline. In that case the old segment might be missing or renamed with a .partial suffix. This patch reworks the logic to be page-based and in the process simplify how the last timeline for a segment is looked up. Slot timeline following only works in a backend. Frontend support can be aded separately, where it could be useful for pg_xlogdump etc once support for timeline.c, List, etc is added for frontend code. __ I'm hoping to find time to refactor timeline following so that we avoid passing timeline information around the xlogreader using globals, but that'd be a separate change that can be made after this. I've omitted the --endpos changes for pg_recvlogical, which again can be added separately. The test harness code will become unnecessary when proper support for logical failover or logical decoding on standby is added, so I'm not really sure it should be committed. Prior threads: * https://www.postgresql.org/message-id/camsr+yg_1fu_-l8qwsk6okft4jt8dpory2rhxdymy0b5zfk...@mail.gmail.com * https://www.postgresql.org/message-id/CAMsr+YH-C1-X_+s=2nzapnr0wwqja-rumvhsyyzansn93mu...@mail.gmail.com * http://www.postgresql.org/message-id/20160503165812.GA29604@alvherre.pgsql -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From 10d5f4652689fbcaf5b14fe6bd991c98dbf60e00 Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Thu, 1 Sep 2016 10:16:55 +0800 Subject: [PATCH] Follow timeline switches in logical decoding When decoding from a logical slot, it's necessary for xlog reading to be able to read xlog from historical (i.e. not current) timelines. Otherwise decoding fails after failover to a physical replica because the oldest still-needed archives are in the historical timeline. Supporting logical decoding timeline following is a pre-requisite for logical decoding on physical standby servers. It also makes it possible to promote a replica with logical slots to a master and replay from those slots, allowing logical decoding applications to follow physical failover. Logical slots cannot actually be created on a replica without use of the low-level C slot management APIs so this is mostly foundation work for subsequent changes to enable logical decoding on standbys. This commit includes a module in src/test/modules with functions to manipulate the slots (which is not otherwise possible in SQL code) in order to enable testing, and a new test in src/test/recovery to ensure that the behavior is as expected. Note that an earlier version of logical decoding timeline following was committed to 9.5 as 24c5f1a103ce, 3a3b309041b0, 82c83b337202, and f07d18b6e94d. It was then reverted by c1543a81a7a8 just after 9.5 feature freeze when issues were discovered too late to safely fix them in the 9.5 release cycle. The prior approach failed to consider that a record could be split across pages that are on different segments, where the new segment contains the start of a new timeline. In that case the old segment might be missing or renamed with a .partial suffix. This patch reworks the logic to be page-based and in the process simplify how the last timeline for a segment is
Re: [HACKERS] WAL consistency check facility
On Thu, Sep 1, 2016 at 8:02 AM, Amit Kapila wrote: > On Wed, Aug 31, 2016 at 7:02 PM, Simon Riggs wrote: >> On 27 August 2016 at 12:09, Kuntal Ghosh wrote: >> > * wal_consistency_mask = 511 /* Enable consistency check mask bit*/ What does this mean? (No docs) >>> >>> I was using this parameter as a masking integer to indicate the >>> operations(rmgr list) for which we need this feature to be enabled. >>> Since, this could be confusing, I've changed it accordingly so that it >>> accepts a list of rmgrIDs. (suggested by Michael, Amit and Robert) >> >> Why would we want that? >> > > It would be easier to test and develop the various modules separately. > As an example, if we develop a new AM which needs WAL facility or > adding WAL capability to an existing system (say Hash Index), we can > just test that module, rather than whole system. I think it can help > us in narrowing down the problem, if we have facility to enable it at > RMGR ID level. Having said that, I think this must have the facility > to enable it for all the RMGR ID's (say ALL) and probably that should > be default. > oops, I think having an option of specifying 'ALL' is good, but that shouldn't be default, because it could have serious performance implications. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Thu, Sep 1, 2016 at 8:30 AM, Michael Paquier wrote: > On Thu, Sep 1, 2016 at 11:32 AM, Amit Kapila wrote: >> On Wed, Aug 31, 2016 at 7:02 PM, Simon Riggs wrote: >>> On 27 August 2016 at 12:09, Kuntal Ghosh wrote: >>> >> * wal_consistency_mask = 511 /* Enable consistency check mask bit*/ > > What does this mean? (No docs) I was using this parameter as a masking integer to indicate the operations(rmgr list) for which we need this feature to be enabled. Since, this could be confusing, I've changed it accordingly so that it accepts a list of rmgrIDs. (suggested by Michael, Amit and Robert) >>> >>> Why would we want that? >> >> It would be easier to test and develop the various modules separately. >> As an example, if we develop a new AM which needs WAL facility or >> adding WAL capability to an existing system (say Hash Index), we can >> just test that module, rather than whole system. I think it can help >> us in narrowing down the problem, if we have facility to enable it at >> RMGR ID level. Having said that, I think this must have the facility >> to enable it for all the RMGR ID's (say ALL) and probably that should >> be default. > > As far as I am understanding things, we are aiming at something that > could be used on production systems. > I don't think you can enable it by default in production systems. Enabling it will lead to significant performance drop as it writes the whole page after each record for most type of RMGR ID's. > And, honestly, any people > enabling it would just do it for all RMGRs because that's a > no-brainer. Agreed, but remember enabling it for all is not free. > If we are designing something for testing purposes > instead, something is wrong with this patch then. > What is wrong? > Doing filtering at RMGR level for testing and development purposes > will be done by somebody who has the skills to filter out which > records he should look at. > Right, but in that way, if you see many of our guc parameters needs a good level of understanding to set the correct values for them. For example, do you think it is easy for user to set value for "replacement_sort_tuples" without reading the description or understanding the meaning of same. This example might not be the best example, but I think there are other parameters which do require some deeper understanding of system. The main thing is default values for such parameters should be chosen carefully such that it represents most common usage. > Or he'll bump into an existing bump. So I'd > rather keep this thing simple. > It seems to me that having an option of 'ALL' would make it easier for users to set it. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Thu, Sep 1, 2016 at 11:32 AM, Amit Kapila wrote: > On Wed, Aug 31, 2016 at 7:02 PM, Simon Riggs wrote: >> On 27 August 2016 at 12:09, Kuntal Ghosh wrote: >> > * wal_consistency_mask = 511 /* Enable consistency check mask bit*/ What does this mean? (No docs) >>> >>> I was using this parameter as a masking integer to indicate the >>> operations(rmgr list) for which we need this feature to be enabled. >>> Since, this could be confusing, I've changed it accordingly so that it >>> accepts a list of rmgrIDs. (suggested by Michael, Amit and Robert) >> >> Why would we want that? > > It would be easier to test and develop the various modules separately. > As an example, if we develop a new AM which needs WAL facility or > adding WAL capability to an existing system (say Hash Index), we can > just test that module, rather than whole system. I think it can help > us in narrowing down the problem, if we have facility to enable it at > RMGR ID level. Having said that, I think this must have the facility > to enable it for all the RMGR ID's (say ALL) and probably that should > be default. As far as I am understanding things, we are aiming at something that could be used on production systems. And, honestly, any people enabling it would just do it for all RMGRs because that's a no-brainer. If we are designing something for testing purposes instead, something is wrong with this patch then. Doing filtering at RMGR level for testing and development purposes will be done by somebody who has the skills to filter out which records he should look at. Or he'll bump into an existing bump. So I'd rather keep this thing simple. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Missing checks when malloc returns NULL...
On Thu, Sep 1, 2016 at 12:14 AM, Tom Lane wrote: > I still think it'd be better to fix that as attached, because it > represents a net reduction not net addition of code, and it provides > a defense against future repetitions of the same omission. If only > 4 out of 11 existing calls were properly checked --- some of them > adjacent to calls with checks --- that should tell us that we *will* > have more instances of the same bug if we don't fix it centrally. > > I also note that your patch missed checks for two ShmemAlloc calls in > InitShmemAllocation and ShmemInitStruct. Admittedly, since those are > the very first such calls, it's highly unlikely they'd fail; but I think > this exercise is not about dismissing failures as improbable. Almost > all of these failures are improbable, given that we precalculate the > shmem space requirement. OK, that looks fine to me after review. Also, we could take one extra step forward then, and just introduce ShmemAllocExtended that holds two flags as per the attached: - SHMEM_ALLOC_ZERO that zeros all the fields - SHMEM_ALLOC_NO_OOM that does not fail Or we could just put a call to MemSet directly in ShmemAlloc(), but I'd rather keep the base routines extensible. What do you think about the attached? One other possibility would be to take your patch, but use MemSet unconditionally on it as this should not cause any overhead. -- Michael malloc-nulls-v8.patch Description: invalid/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Wed, Aug 31, 2016 at 7:02 PM, Simon Riggs wrote: > On 27 August 2016 at 12:09, Kuntal Ghosh wrote: > * wal_consistency_mask = 511 /* Enable consistency check mask bit*/ >>> >>> What does this mean? (No docs) >> >> I was using this parameter as a masking integer to indicate the >> operations(rmgr list) for which we need this feature to be enabled. >> Since, this could be confusing, I've changed it accordingly so that it >> accepts a list of rmgrIDs. (suggested by Michael, Amit and Robert) > > Why would we want that? > It would be easier to test and develop the various modules separately. As an example, if we develop a new AM which needs WAL facility or adding WAL capability to an existing system (say Hash Index), we can just test that module, rather than whole system. I think it can help us in narrowing down the problem, if we have facility to enable it at RMGR ID level. Having said that, I think this must have the facility to enable it for all the RMGR ID's (say ALL) and probably that should be default. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replace use malloc() & friend by memory contexts for plperl and pltcl
On Thu, Sep 1, 2016 at 8:57 AM, Tom Lane wrote: > I wrote: >> Michael Paquier writes: >>> Attached are a set of patches to replace those memory system calls by >>> proper memory contexts: >>> - 0001 does the cleanup work for pltcl >>> - 0002 does this cleanup for plperl > >> Off to look at 0002 next. > > Pushed 0002 as well. The main thing missing there was to use a PG_TRY > block to replace the bulky-and-yet-incomplete assorted invocations of > free_plperl_function. Thanks. That's neat! -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] restoration after crash slowness, any way to improve?
-hackers, So this is more of a spit balling thread than anything. As I understand it, if we have a long running transaction or a large number of wal logs and we crash, we then have to restore those logs on restart to the last known good transaction. No problem. I recently ran a very long transaction. I was building up a large number of rows into a two column table to test index performance. I ended up having to kill the connection and thus the transaction after I realized I had an extra zero in my generate_series(). (Side note: Amazing the difference a single zero can make ;)). When coming back up, I watched the machine and I was averaging anywhere from 60MBs to 97MBs on writes. That in itself isn't that bad over a single thread and a single SSD, doing what it is doing. However, since I know this machine can get well over 400MBs when using multiple connections I can't help but wonder if there is anything we can do to make restoration more efficient without sacrificing the purpose of what it is doing? Can we have multiple readers pull transaction logs into shared_buffers (on recovery only), sort the good transactions and then push them back to the walwriter or bgwriter to the pages? Anyway, like I said, spitballing and I thought I would start the thread. Sincerely, JD -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replace use malloc() & friend by memory contexts for plperl and pltcl
I wrote: > Michael Paquier writes: >> Attached are a set of patches to replace those memory system calls by >> proper memory contexts: >> - 0001 does the cleanup work for pltcl >> - 0002 does this cleanup for plperl > Off to look at 0002 next. Pushed 0002 as well. The main thing missing there was to use a PG_TRY block to replace the bulky-and-yet-incomplete assorted invocations of free_plperl_function. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical Replication WIP
On 2016-08-31 22:51, Petr Jelinek wrote: Hi, and one more version with bug fixes, improved code docs and couple more tests, some general cleanup and also rebased on current master for the start of CF. Clear, well-written docs, thanks. Here are some small changes to logical-replication.sgml Erik Rijkers--- doc/src/sgml/logical-replication.sgml.orig 2016-09-01 00:04:11.484729675 +0200 +++ doc/src/sgml/logical-replication.sgml 2016-09-01 00:54:07.817799915 +0200 @@ -22,11 +22,11 @@ cascading replication or more complex configurations. -Logical replication typically starts with snapshot of the data on -the publisher database. Once that is done, the changes on publisher -are sent to subscriber as they occur in real-time. The subscriber -applies the data in same order as the publisher so that the -transactional consistency is guaranteed for the publications within +Logical replication typically starts with a snapshot of the data on +the publisher database. Once that is done, the changes on the publisher +are sent to the subscriber as they occur in real-time. The subscriber +applies the data in the same order as the publisher so that +transactional consistency is guaranteed for publications within a single subscription. This method of data replication is sometimes referred to as transactional replication. @@ -54,23 +54,23 @@ -Replicating between different major versions of the PostgreSQL +Replicating between different major versions of PostgreSQL -Giving access to the replicated data to different groups of +Giving access to replicated data to different groups of users. -Sharing subset of the database between multiple databases. +Sharing a subset of the database between multiple databases. -The subscriber database behaves in a same way as any other +The subscriber database behaves in the same way as any other PostgreSQL instance and can be used as a publisher for other databases by defining its own publications. When the subscriber is treated as read-only by application, there will be no conflicts from @@ -83,9 +83,9 @@ Publication A publication object can be defined on any physical -replication master, the node where publication is deined is referred to -as publisher. Only superusers or members of -REPLICATION role can define publication. A publication is +replication master. The node where a publication is defined is referred +to as publisher. Only superusers or members of the +REPLICATION role can define a publication. A publication is a set of changes generated from a group of tables, and might also be described as a change set or replication set. Each publication exists in only one database. @@ -93,7 +93,7 @@ Publications are different from table schema and do not affect how the table is accessed. Each table can be added to multiple -publications if needed. Publications may currenly only contain +publications if needed. Publications may currently only contain tables. Objects must be added explicitly, except when a publication is created for ALL TABLES. There is no default name for a publication which specifies all tables. @@ -103,8 +103,8 @@ any combination of INSERT, UPDATE, DELETE and TRUNCATE in a similar way to the way triggers are fired by particular event types. Only -tables with REPLICA IDENTITY index can be added to -publication which replicate UPDATE and DELETE +tables with a REPLICA IDENTITY index can be added to a +publication which replicates UPDATE and DELETE operation. @@ -129,20 +129,20 @@ Subscription -A subscription is the downstream side of the logical +A subscription is the downstream side of logical replication. The node where subscription is defined is referred to as -a subscriber. Subscription defines the connection to +the subscriber. Subscription defines the connection to another database and set of publications (one or more) to which it wants to be subscribed. -The subscriber database behaves in a same way as any other +The subscriber database behaves in the same way as any other PostgreSQL instance and can be used as a publisher for other databases by defining its own publications. A subscriber may have multiple subscriptions if desired. It is -possible to define multiple subscriptions between single +possible to define multiple subscriptions between a single publisher-subscriber pair, in which case extra care must be taken to ensure that the subscribed publication objects don't overlap. @@ -153,17 +153,17 @@ of pre-existing table data. -Subscriptions are not dumped by pg_dump by default,
Re: [HACKERS] Notice lock waits
On Tue, Aug 9, 2016 at 5:17 PM, Jim Nasby wrote: > On 8/5/16 12:00 PM, Jeff Janes wrote: > >> So I created a new guc, notice_lock_waits, which acts like >> log_lock_waits but sends the message as NOTICE so it will show up on >> interactive connections like psql. >> > > I would strongly prefer that this accept a log level instead of being > hard-coded to NOTICE. The reason is that I find the NOTICE chatter from > many DDL commands to be completely worthless (looking at you %TYPE), Perhaps we should do something about those notices? In 9.3 we removed ones about adding implicit unique indexes to implement primary keys, and I think that that was a pretty good call. > so I normally set client_min_messages to WARNING in DDL scripts. I can > work on that patch; would it essentially be a matter of changing > notice_lock_waits to int lock_wait_level? How would it be turned off? Is there a err level which would work for that? And what levels would non-superusers be allowed to set it to? And, I'd be happy if you were to work on a patch to implement it. Cheers, Jeff
Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
On Wed, Aug 31, 2016 at 6:07 PM, Andres Freund wrote: > > In my experience pg attribute is usually the worst affected. Many tech > takes won't even have stays entries... > > Mine too. One database currently has a 400GB pg_attribute table, because we chew through temp tables like popcorn.
Re: [HACKERS] _mdfd_getseg can be expensive
On Wed, Aug 31, 2016 at 3:08 PM, Andres Freund wrote: > On August 31, 2016 3:06:23 PM PDT, Peter Geoghegan wrote: > >>In other painfully pedantic news, I should point out that >>sizeof(size_t) isn't necessarily word size (the most generic >>definition of word size for the architecture), contrary to my reading >>of the 0002-* patch comments. I'm mostly talking thinking about x86_64 >>here, of course. > > Uh? Sorry, I really should have not said anything. It is true that x86_64 word size is sometimes reported as 16 and/or 32 bits [1], because of legacy issues. [1] https://en.wikipedia.org/wiki/Word_(computer_architecture)#Table_of_word_sizes -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autonomous transactions
On 08/31/2016 03:09 PM, Joel Jacobson wrote: > On Wed, Aug 31, 2016 at 6:41 AM, Jaime Casanova > wrote: >> >> On 30 August 2016 at 23:10, Joel Jacobson wrote: >>> >>> There should be a way to within the session and/or txn permanently >>> block autonomous transactions. >>> >> >> This will defeat one of the use cases of autonomous transactions: auditing > > My idea on how to deal with this would be to mark the function to be > "AUTONOMOUS" similar to how a function is marked to be "PARALLEL > SAFE", > and to throw an error if a caller that has blocked autonomous > transactions tries to call a function that is marked to be autonomous. > > That way none of the code that needs to be audited would ever get executed. Part of what people want this for is to audit what people *try* to do. We can already audit what they've actually done. With your solution, we still wouldn't know when an unauthorized attempt to do something happened. -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] _mdfd_getseg can be expensive
On August 31, 2016 3:06:23 PM PDT, Peter Geoghegan wrote: >In other painfully pedantic news, I should point out that >sizeof(size_t) isn't necessarily word size (the most generic >definition of word size for the architecture), contrary to my reading >of the 0002-* patch comments. I'm mostly talking thinking about x86_64 >here, of course. Uh? -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
On August 31, 2016 3:00:15 PM PDT, Tomas Vondra wrote: > > >On 08/31/2016 11:43 PM, Andres Freund wrote: >> On 2016-08-31 23:40:46 +0200, Tomas Vondra wrote: >>> It's an improvement (and it's pretty much exactly what I proposed >>> upthread). But it does not solve the problems with pg_statistic for >>> example (each backend needs it's own statistics. So we'd either >bloat >>> the pg_statistic (if we manage to solve the problem that the table >has >>> the same oid in all backends), or we would need in-memory tuples >(just >>> like discussed in the thread so far). >> >> Creating a session private version of pg_statistic would be fairly >> simple. > >Sure. I'm just saying it's not as simple as overriding relpath. > >ISTM we only need the pg_statistics (as other catalogs are connected to >the pg_class entry), which does not have the dependency issues. Or do >we >need other catalogs? In my experience pg attribute is usually the worst affected. Many tech takes won't even have stays entries... Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] _mdfd_getseg can be expensive
On Wed, Aug 31, 2016 at 2:37 PM, Andres Freund wrote: >> This looks good. > > Thanks for looking! No problem. In other painfully pedantic news, I should point out that sizeof(size_t) isn't necessarily word size (the most generic definition of word size for the architecture), contrary to my reading of the 0002-* patch comments. I'm mostly talking thinking about x86_64 here, of course. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Wed, Aug 31, 2016 at 10:32 PM, Simon Riggs wrote: > On 27 August 2016 at 12:09, Kuntal Ghosh wrote: > * wal_consistency_mask = 511 /* Enable consistency check mask bit*/ >>> >>> What does this mean? (No docs) >> >> I was using this parameter as a masking integer to indicate the >> operations(rmgr list) for which we need this feature to be enabled. >> Since, this could be confusing, I've changed it accordingly so that it >> accepts a list of rmgrIDs. (suggested by Michael, Amit and Robert) > > Why would we want that? I am still in for just an on/off switch instead of this complication. An all-or-nothing feature is what we are looking at here. Still a list is an improvement compared to a bitmap. 1. Add support for other Resource Managers. >>> >>> We probably need to have a discussion as to why you think this should >>> be Rmgr dependent? >>> Code comments would help there. >>> >>> If it does, then you should probably do this by extending RmgrTable >>> with an rm_check, so you can call it like this... >>> >>> RmgrTable[record->xl_rmid].rm_check >> >> +1. >> I'm modifying it accordingly. I'm calling this function after >> RmgrTable[record->xl_rmid].rm_redo. >> 5. Generalize the page type identification technique. >>> >>> Why not do this first? >>> >> >> At present, I'm using special page size and page ID to identify page >> type. But, I've noticed some cases where the entire page is >> initialized to zero (Ex: hash_xlog_squeeze_page). RmgrID and info bit >> can help us to identify those pages. > > I'd prefer a solution that was not dependent upon RmgrID at all. So you'd rather identify the page types by looking at pd_special? That seems worse to me but.. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
On 08/31/2016 11:43 PM, Andres Freund wrote: > On 2016-08-31 23:40:46 +0200, Tomas Vondra wrote: >> It's an improvement (and it's pretty much exactly what I proposed >> upthread). But it does not solve the problems with pg_statistic for >> example (each backend needs it's own statistics. So we'd either bloat >> the pg_statistic (if we manage to solve the problem that the table has >> the same oid in all backends), or we would need in-memory tuples (just >> like discussed in the thread so far). > > Creating a session private version of pg_statistic would be fairly > simple. Sure. I'm just saying it's not as simple as overriding relpath. ISTM we only need the pg_statistics (as other catalogs are connected to the pg_class entry), which does not have the dependency issues. Or do we need other catalogs? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_basebackup stream xlog to tar
Attached patch adds support for -X stream to work with .tar and .tar.gz file formats. If tar mode is specified, a separate pg_xlog.tar (or .tar.gz) file is created and the data is streamed into it. Regular mode is (should not) see any changes in how it works. The implementation creates a "walmethod" for directory and one for tar, which is basically a set of function pointers that we pass around as part of the StreamCtl structure. All calls to modify the files are sent through the current method, using the normal open/read/write calls as it is now for directories, and the more complicated method for tar and targz. The tar method doesn't support all things that are required for pg_receivexlog, but I don't think it makes any real sense to have support for pg_receivexlog in tar mode. But it does support all the things that pg_basebackup needs. Some smaller pieces of functionality like unlinking files on failure and padding files have been moved into the walmethod because they have to be differently implemented (we cannot pre-pad a compressed file -- the size will depend on the compression ration anyway -- for example). AFAICT we never actually documented that -X stream doesn't work with tar in the manpage of current versions. Only in the error message. We might want to fix that in backbranches. In passing this also fixes an XXX comment about not re-lseeking on the WAL file all the time -- the walmethod now tracks the current position in the file in a variable. Finally, to make this work, the pring_tar_number() function is now exported from port/tar.c along with the other ones already exported from there. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml index 03615da..981d201 100644 --- a/doc/src/sgml/ref/pg_basebackup.sgml +++ b/doc/src/sgml/ref/pg_basebackup.sgml @@ -180,7 +180,8 @@ PostgreSQL documentation target directory, the tar contents will be written to standard output, suitable for piping to for example gzip. This is only possible if -the cluster has no additional tablespaces. +the cluster has no additional tablespaces and transaction +log streaming is not used. @@ -323,6 +324,10 @@ PostgreSQL documentation If the log has been rotated when it's time to transfer it, the backup will fail and be unusable. + +The transaction log files will be written to + the base.tar file. + @@ -339,6 +344,9 @@ PostgreSQL documentation client can keep up with transaction log received, using this mode requires no extra transaction logs to be saved on the master. + The transactionn log files are written to a separate file +called pg_xlog.tar. + diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile index fa1ce8b..52ac9e9 100644 --- a/src/bin/pg_basebackup/Makefile +++ b/src/bin/pg_basebackup/Makefile @@ -19,7 +19,7 @@ include $(top_builddir)/src/Makefile.global override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS) LDFLAGS += -L$(top_builddir)/src/fe_utils -lpgfeutils -lpq -OBJS=receivelog.o streamutil.o $(WIN32RES) +OBJS=receivelog.o streamutil.o walmethods.o $(WIN32RES) all: pg_basebackup pg_receivexlog pg_recvlogical diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c index 351a420..58c0821 100644 --- a/src/bin/pg_basebackup/pg_basebackup.c +++ b/src/bin/pg_basebackup/pg_basebackup.c @@ -365,7 +365,7 @@ typedef struct { PGconn *bgconn; XLogRecPtr startptr; - char xlogdir[MAXPGPATH]; + char xlog[MAXPGPATH]; /* directory or tarfile depending on mode */ char *sysidentifier; int timeline; } logstreamer_param; @@ -383,9 +383,13 @@ LogStreamerMain(logstreamer_param *param) stream.standby_message_timeout = standby_message_timeout; stream.synchronous = false; stream.mark_done = true; - stream.basedir = param->xlogdir; stream.partial_suffix = NULL; + if (format == 'p') + stream.walmethod = CreateWalDirectoryMethod(param->xlog); + else + stream.walmethod = CreateWalTarMethod(param->xlog, compresslevel); + if (!ReceiveXlogStream(param->bgconn, &stream)) /* @@ -395,6 +399,14 @@ LogStreamerMain(logstreamer_param *param) */ return 1; + if (!stream.walmethod->finish()) + { + fprintf(stderr, +_("%s: could not finish writing WAL files: %s\n"), +progname, strerror(errno)); + return 1; + } + PQfinish(param->bgconn); return 0; } @@ -445,22 +457,25 @@ StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier) /* Error message already written in GetConnection() */ exit(1); - snprintf(param->xlogdir, sizeof(param->xlogdir), "%s/pg_xlog", b
Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
On 2016-08-31 23:40:46 +0200, Tomas Vondra wrote: > It's an improvement (and it's pretty much exactly what I proposed > upthread). But it does not solve the problems with pg_statistic for > example (each backend needs it's own statistics. So we'd either bloat > the pg_statistic (if we manage to solve the problem that the table has > the same oid in all backends), or we would need in-memory tuples (just > like discussed in the thread so far). Creating a session private version of pg_statistic would be fairly simple. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
On 08/31/2016 09:20 PM, Vik Fearing wrote: > On 08/24/2016 06:16 PM, Robert Haas wrote: >> On Tue, Aug 23, 2016 at 6:11 PM, Tomas Vondra >> wrote: >>> Could someone please explain how the unlogged tables are supposed to fix the >>> catalog bloat problem, as stated in the initial patch submission? We'd still >>> need to insert/delete the catalog rows when creating/dropping the temporary >>> tables, causing the bloat. Or is there something I'm missing? >> >> No, not really. Jim just asked if the idea of partitioning the >> columns was completely dead in the water, and I said, no, you could >> theoretically salvage it. Whether that does you much good is another >> question. >> >> IMV, the point here is that you MUST have globally visible dependency >> entries for this to work sanely. If they're not in a catalog, they >> have to be someplace else, and backend-private memory isn't good >> enough, because that's not globally visible. Until we've got a >> strategy for that problem, this whole effort is going nowhere - even >> though in other respects it may be a terrific idea. > > Why not just have a regular-looking table, with a "global temporary" > relpersistence (I don't care which letter it gets) and when a backend > tries to access it, it uses its own private relfilenode instead of > whatever is in pg_class, creating one if necessary. That way the > structure of the table is fixed, with all the dependencies and whatnot, > but the content is private to each backend. What's wrong with this idea? > It's an improvement (and it's pretty much exactly what I proposed upthread). But it does not solve the problems with pg_statistic for example (each backend needs it's own statistics. So we'd either bloat the pg_statistic (if we manage to solve the problem that the table has the same oid in all backends), or we would need in-memory tuples (just like discussed in the thread so far). -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] _mdfd_getseg can be expensive
On 2016-08-31 14:09:47 -0700, Peter Geoghegan wrote: > On Thu, Aug 18, 2016 at 5:26 PM, Andres Freund wrote: > > Rebased version attached. A review would be welcome. Plan to push this > > forward otherwise in the not too far away future. > > This looks good. Thanks for looking! > The only thing that stuck out to any degree is that we don't grow the > "reln->md_seg_fds[forknum]" array within the new _fdvec_resize() > function geometrically. In other words, we don't separately keep track > of the array size and allocated size, and grow the allocated size > using the doubling strategy that you see in many places. I can't really see that being worth the code here. The cost of open()/lseek()'ing etc. is going to dwarf the cost of this. Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Reload SSL certificates on SIGHUP
On Sun, Nov 22, 2015 at 7:29 PM, Andreas Karlsson wrote: > Sorry for dropping this patch, but now I have started looking at it again. Any chance of picking this up again soon, Andreas? I think it's an important project. I would like to review it. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replace use malloc() & friend by memory contexts for plperl and pltcl
Michael Paquier writes: > Cleanup $subject has been raised a couple of times, like one year ago here: > https://www.postgresql.org/message-id/cab7npqrxvq+q66ufzd9wa5uaftyn4wauadbjxkfrync96kf...@mail.gmail.com > And more recently here while working on the NULL checks for malloc(): > https://www.postgresql.org/message-id/CAB7nPqR7ozfCqc6C=2+ctCcnuehTZ-vTDQ8MMhY=bqyet0h...@mail.gmail.com > Attached are a set of patches to replace those memory system calls by > proper memory contexts: > - 0001 does the cleanup work for pltcl > - 0002 does this cleanup for plperl I looked at 0001. It seemed to me that it wasn't that useful to add a context unless we did something about actually freeing it; otherwise we're just increasing the amount of memory leaked after a function redefinition. However, it proved pretty easy to shoehorn in a refcounting mechanism like plpgsql has, so I did that and pushed it. Off to look at 0002 next. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] _mdfd_getseg can be expensive
On Wed, Aug 31, 2016 at 2:09 PM, Peter Geoghegan wrote: > The only thing that stuck out to any degree is that we don't grow the > "reln->md_seg_fds[forknum]" array within the new _fdvec_resize() > function geometrically. That new function looks like this: > +static void > +_fdvec_resize(SMgrRelation reln, > + ForkNumber forknum, > + int nseg) > { > - return (MdfdVec *) MemoryContextAlloc(MdCxt, sizeof(MdfdVec)); > + if (nseg == 0) > + { > + if (reln->md_num_open_segs[forknum] > 0) > + { > + pfree(reln->md_seg_fds[forknum]); > + reln->md_seg_fds[forknum] = NULL; > + } > + } > + else if (reln->md_num_open_segs[forknum] == 0) > + { > + reln->md_seg_fds[forknum] = > + MemoryContextAlloc(MdCxt, sizeof(MdfdVec) * nseg); > + } > + else > + { > + reln->md_seg_fds[forknum] = > + repalloc(reln->md_seg_fds[forknum], > +sizeof(MdfdVec) * nseg); > + } > + > + reln->md_num_open_segs[forknum] = nseg; > } Another tiny tweak that you might consider occurs to me here: It couldn't hurt to "Assert(reln->md_seg_fds[forknum] == NULL)" within the "else if (reln->md_num_open_segs[forknum] == 0)" path here, prior to the MemoryContextAlloc(). If it's worth setting it to NULL when there are 0 segs (i.e., "reln->md_seg_fds[forknum] = NULL"), then perhaps it's worth checking that things are found that way later. I guess that this is at odds with remarks in my last mail about considering geometric allocations for the "reln->md_seg_fds[forknum]" array. Both feedback items are more or less just things that bothered me ever so slightly, which I don't necessarily expect you to act on, but assume you'd like to hear. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] _mdfd_getseg can be expensive
On Thu, Aug 18, 2016 at 5:26 PM, Andres Freund wrote: > Rebased version attached. A review would be welcome. Plan to push this > forward otherwise in the not too far away future. This looks good. The only thing that stuck out to any degree is that we don't grow the "reln->md_seg_fds[forknum]" array within the new _fdvec_resize() function geometrically. In other words, we don't separately keep track of the array size and allocated size, and grow the allocated size using the doubling strategy that you see in many places. Now, I freely admit that that probably doesn't matter, given what that array tracks. I'm just pointing out that that aspect did give me pause. The struct MdfdVec is small enough that that *might* be worthwhile. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
I have merged in the changes since v4 (a year and a half ago) and cured all bit-rot I found, to get the attached v5 which runs `make check world` without problem -- including the tests added for this feature. I did remove the contrib code that David Fetter wrote to demonstrate the correctness and performance of the tuplestores as created during the transaction, and how to use them directly from C code, before any API code was written. If we want that to be committed, it should be considered separately after the main feature is in. Thanks to Thomas Munro who took a look at v4 and pointed out a bug (which is fixed in v5) and suggested a way forward for using the parameters. Initial attempts to get that working were not successful,, but I think it is fundamentally the right course, should we reach a consensus to go that way, On Thu, Jul 7, 2016 at 5:07 PM, Robert Haas wrote: > On Fri, May 13, 2016 at 2:37 PM, Kevin Grittner wrote: >> On Fri, May 13, 2016 at 1:02 PM, David Fetter wrote: >>> On Thu, Jan 22, 2015 at 02:41:42PM +, Kevin Grittner wrote: >> [ideas on how to pass around references to ephemeral relations] >>> >>> [almost 17 months later] >>> >>> It seems like now is getting close to the time to get this into >>> master. The patch might have suffered from some bit rot, but the >>> design so far seems sound. >>> >>> What say? >> >> I had a talk with Tom in Brussels about this. As I understood it, >> he wasn't too keen on the suggestion by Heikki (vaguely >> sorta-endorsed by Robert) of passing ephemeral named relations such >> as these tuplestores around in the structures currently used for >> parameter values. He intuitively foresaw the types of problems I >> had run into trying to use the same structure to pass a relation >> (with structure and rows and columns of data) as is used to pass, >> say, an integer. > > See, the thing that disappoints me about this is that I think we were > pretty closed to having the ParamListInfo-based approach working. Maybe, but the thing I would like to do before proceeding down that road is to confirm that we have a consensus that such a course is better than what Is on the branch which is currently working. If that's the consensus here, I'll work on that for the next CF. If not, there may not be a lot left to do before commit. (Notably, we may want to provide a way to free a tuplestore pointer when done with it, but that's not too much work.) Let me describe the API I have working. There are 11 function prototypes modified under src/include, in all cases to add a Tsrcache parameter: 1 createas.h 3 explain.h 1 prepare.h 1 analyze.h 2 tcopprot.h 3 utility.h There are three new function prototypes in SPI. NOTE: This does *not* mean that SPI is required to use named tuplestores in queries, just that there are convenience functions for any queries being run through SPI, so that any code using SPI (including any PLs that do) will find assigning a name to a tuplestore and referencing that within a query about as easy as falling off a log. A tuplestore is registered to the current SPI context and not visible outside that context. This results in a Tsrcache being passed to the functions mentioned above when that context is active, just as any non-SPI code could do. > The thing I liked about that approach is that we already know that any > place where you can provide parameters for a query, there will also be > an opportunity to provide a ParamListInfo. And I think that > parameterizing a query by an ephemeral table is conceptually similar > to parameterizing it by a scalar value. If we invent a new mechanism, > it's got to reach all of those same places in the code. Do you see someplace that the patch missed? > One other comment that I would make here is that I think that it's > important, however we pass the data, that the scope of the tuplestores > is firmly lexical and not dynamic. We need to make sure that we don't > set some sort of context via global variables that might get used for > some query other than the one to which it's intended to apply. Is this based on anything actually in the patch? For this CF, the main patch attached is a working version of the feature that people can test, review documentation, etc. Any API changes are not expected to change these visible behaviors, so any feedback on usability or documentation can be directly useful regardless of the API discussion. I have also attached a smaller patch which applies on top of the main one which rips out the Tsrcache API to get to a "no API" version that compiles cleanly and runs fine as long as you don't try to use the feature, in which case it will not recognize the tuplestore names and will give this message: "executor could not find named tuplestore \"%s\"". There may be a little bit left to rip out when adding a parameter-based API, but this is basically where we're moving from if we go that way. I include it both to help isolate the API we're discussing an
Re: [HACKERS] ICU integration
Hi all. I’m new to the PostgreSQL code and the mailing list, but I’ve had a lot of experience with using ICU in a different database product. So while I’m not up to speed on the code yet, I can offer some insights on using ICU. > On Aug 30, 2016, at 9:12 PM, Peter Eisentraut > wrote: >> How stable are the UCU locales? Most importantly, does ICU offer any >> way to "pin" a locale version, so we can say "we want de_DE as it was >> in ICU 4.6" and get consistent behaviour when the user sets up a >> replica on some other system with ICU 4.8? Even if the German >> government has changed its mind (again) about some details of the >> language and 4.8 knows about the changes but 4.4 doesn’t? ICU explicitly does not provide stability in their locales and collations. We pushed them hard to provide this, but between changes to the CLDR data and changes to the ICU code it just wasn’t feasible for them to provide version to version stability. What they do offer is a compile option when building ICU to version all their APIs. So instead of calling icu_foo() you’d call icu_foo46(). (Or something like this - it’s been a few years since I actually worked with the ICU code.) This ultimately allows you to load multiple versions of the ICU library into a single program and provide stability by calling the appropriate version of the library. (Unfortunately, the OS - at least my Linux box - only provides the generic version of ICU and not the version annotated APIs, which means a separate compile of ICU is needed.) The catch with this is that it means you likely want to expose the version information. In another note it was suggested to use something like fr_FR%icu. If you want to pin it to a specific version of ICU, you’ll likely need something like fr_FR%icu46. (There’s nothing wrong with supporting fr_FR%icu to give users an easy way of saying “give me the latest and greatest”, but you’d probably want to harden it to a specific ICU version internally.) > I forgot to mention this, but the patch adds a collversion column that > stores the collation version (provided by ICU). And then when you > upgrade ICU to something incompatible you get > > + if (numversion != collform->collversion) > + ereport(WARNING, > + (errmsg("ICU collator version mismatch"), > +errdetail("The database was created using > version 0x%08X, the library provides version 0x%08X.", > + (uint32) collform->collversion, > (uint32) numversion), > +errhint("Rebuild affected indexes, or build > PostgreSQL with the right version of ICU."))); > > So you still need to manage this carefully, but at least you have a > chance to learn about it. Indexes are the obvious place where collation comes into play, and are relatively easy to address. But consider all the places where string comparisons can be done. For example, check constraints and referential constraints can depend on string comparisons. If the collation rules change because of a new version of ICU, the database can become inconsistent and will need a lot more work than an index rebuild. > Suggestions for refining this are welcome. > > -- > Peter Eisentraut http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Doug Doole Salesforce -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
On Wed, Aug 31, 2016 at 04:03:29PM -0400, Bruce Momjian wrote: > Why not just remember the tid of chains converted from WARM to HOT, then > use "amrecheck" on an index entry matching that tid to see if the index > matches one of the entries in the chain. (It will match all of them or > none of them, because they are all red.) I don't see a point in > coloring the index entries as reds as later you would have to convert to > blue in the WARM-to-HOT conversion, and a vacuum crash could lead to > inconsistencies. Consider that you can just call "amrecheck" on the few > chains that have converted from WARM to HOT. I believe this is more > crash-safe too. However, if you have converted WARM to HOT in the heap, > but crash during the index entry removal, you could potentially have > duplicates in the index later, which is bad. I think Pavan had the "crash durin the index entry removal" fixed via: > During the second heap scan, we fix WARM chain by clearing HEAP_WARM_TUPLE > flag > and also reset Red flag to Blue. so the marking from WARM to HOT only happens after the index has been cleaned. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
On Wed, Aug 31, 2016 at 10:15:33PM +0530, Pavan Deolasee wrote: > Instead, what I would like to propose and the patch currently implements is to > restrict WARM update to once per chain. So the first non-HOT update to a tuple > or a HOT chain can be a WARM update. The chain can further be HOT updated any > number of times. But it can no further be WARM updated. This might look too > restrictive, but it can still bring down the number of regular updates by > almost 50%. Further, if we devise a strategy to convert a WARM chain back to > HOT chain, it can again be WARM updated. (This part is currently not > implemented). A good side effect of this simple strategy is that we know there > can maximum two index entries pointing to any given WARM chain. I like the simplified approach, as long as it doesn't block further improvements. > Headline TPS numbers: > > Master: > > transaction type: update.sql > scaling factor: 700 > query mode: simple > number of clients: 16 > number of threads: 8 > duration: 57600 s > number of transactions actually processed: 65552986 > latency average: 14.059 ms > tps = 1138.072117 (including connections establishing) > tps = 1138.072156 (excluding connections establishing) > > > WARM: > > transaction type: update.sql > scaling factor: 700 > query mode: simple > number of clients: 16 > number of threads: 8 > duration: 57600 s > number of transactions actually processed: 116168454 > latency average: 7.933 ms > tps = 2016.812924 (including connections establishing) > tps = 2016.812997 (excluding connections establishing) These are very impressive results. > Converting WARM chains back to HOT chains (VACUUM ?) > - > > The current implementation of WARM allows only one WARM update per chain. This > simplifies the design and addresses certain issues around duplicate scans. But > this also implies that the benefit of WARM will be no more than 50%, which is > still significant, but if we could return WARM chains back to normal status, > we > could do far more WARM updates. > > A distinct property of a WARM chain is that at least one index has more than > one live index entries pointing to the root of the chain. In other words, if > we > can remove duplicate entry from every index or conclusively prove that there > are no duplicate index entries for the root line pointer, the chain can again > be marked as HOT. I had not thought of how to convert from WARM to HOT yet. > Here is one idea, but more thoughts/suggestions are most welcome. > > A WARM chain has two parts, separated by the tuple that caused WARM update. > All > tuples in each part has matching index keys, but certain index keys may not > match between these two parts. Lets say we mark heap tuples in each part with > a > special Red-Blue flag. The same flag is replicated in the index tuples. For > example, when new rows are inserted in a table, they are marked with Blue flag > and the index entries associated with those rows are also marked with Blue > flag. When a row is WARM updated, the new version is marked with Red flag and > the new index entry created by the update is also marked with Red flag. > > > Heap chain: lp [1] [2] [3] [4] > [, ]B -> [, ]B -> [, ]R -> [, ]R > > Index1: ()B points to 1 (satisfies only tuples marked with B) > ()R points to 1 (satisfies only tuples marked with R) > > Index2: ()B points to 1 (satisfies both B and R tuples) > > > It's clear that for indexes with Red and Blue pointers, a heap tuple with Blue > flag will be reachable from Blue pointer and that with Red flag will be > reachable from Red pointer. But for indexes which did not create a new entry, > both Blue and Red tuples will be reachable from Blue pointer (there is no Red > pointer in such indexes). So, as a side note, matching Red and Blue flags is > not enough from index scan perspective. > > During first heap scan of VACUUM, we look for tuples with HEAP_WARM_TUPLE set. > If all live tuples in the chain are either marked with Blue flag or Red flag > (but no mix of Red and Blue), then the chain is a candidate for HOT > conversion. Uh, if the chain is all blue, then there is are WARM entries so it already a HOT chain, so there is nothing to do, right? > We remember the root line pointer and Red-Blue flag of the WARM chain in a > separate array. > > If we have a Red WARM chain, then our goal is to remove Blue pointers and vice > versa. But there is a catch. For Index2 above, there is only Blue pointer > and that must not be removed. IOW we should remove Blue pointer iff a Red > pointer exists. Since index vacuum may visit Red and Blue pointers in any > order, I think we will need another index pass to remove dead > index pointers. So in the first index pass we check which WARM candidates have > 2 index pointers. In the second pass, we remove the dead pointer and reset Red > flag is the sur
Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
On 08/24/2016 06:16 PM, Robert Haas wrote: > On Tue, Aug 23, 2016 at 6:11 PM, Tomas Vondra > wrote: >> Could someone please explain how the unlogged tables are supposed to fix the >> catalog bloat problem, as stated in the initial patch submission? We'd still >> need to insert/delete the catalog rows when creating/dropping the temporary >> tables, causing the bloat. Or is there something I'm missing? > > No, not really. Jim just asked if the idea of partitioning the > columns was completely dead in the water, and I said, no, you could > theoretically salvage it. Whether that does you much good is another > question. > > IMV, the point here is that you MUST have globally visible dependency > entries for this to work sanely. If they're not in a catalog, they > have to be someplace else, and backend-private memory isn't good > enough, because that's not globally visible. Until we've got a > strategy for that problem, this whole effort is going nowhere - even > though in other respects it may be a terrific idea. Why not just have a regular-looking table, with a "global temporary" relpersistence (I don't care which letter it gets) and when a backend tries to access it, it uses its own private relfilenode instead of whatever is in pg_class, creating one if necessary. That way the structure of the table is fixed, with all the dependencies and whatnot, but the content is private to each backend. What's wrong with this idea? -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for changes to recovery.conf API
At 2016-08-31 17:15:59 +0100, si...@2ndquadrant.com wrote: > > * Recovery parameters would now be part of the main postgresql.conf > infrastructure > Any parameters set in $DATADIR/recovery.conf will be read after the > main parameter file, similar to the way that postgresql.conf.auto is > read. > (Abhijit) > > * Parameters > All of the parameters formerly set in recovery.conf can be set in > postgresql.conf using RELOAD > These parameters will have no defaults in postgresql.conf.sample > Setting them has no effect during normal running, or once recovery ends. > https://www.postgresql.org/docs/devel/static/archive-recovery-settings.html > https://www.postgresql.org/docs/devel/static/recovery-target-settings.html > https://www.postgresql.org/docs/devel/static/standby-settings.html > (Abhijit) I've attached a WIP patch for the above (recovery_guc_v20160831.patch). This was based on the unite_recoveryconf_postgresqlconf_v3.patch posted by Fujii Masao. Unfortunately, some parts conflict with the patch that Simon just posted (e.g., his patch removes trigger_file altogether, whereas mine converts it into a GUC along the lines of the original patch). Rather than trying to untangle that right now, I'm posting what I have as-is, and I'll post an updated version tomorrow. -- Abhijit commit 999c0c2632f0d4c20d20b9ac30cd258305f74bac Author: Abhijit Menon-Sen Date: Wed Aug 31 22:18:07 2016 +0530 Convert recovery.conf settings into GUCs Based on unite_recoveryconf_postgresqlconf_v3.patch by Fujii Masao. diff --git a/contrib/pg_standby/pg_standby.c b/contrib/pg_standby/pg_standby.c index e4136f9..8fcb85c 100644 --- a/contrib/pg_standby/pg_standby.c +++ b/contrib/pg_standby/pg_standby.c @@ -520,7 +520,7 @@ usage(void) printf(" -w MAXWAITTIME max seconds to wait for a file (0=no limit) (default=0)\n"); printf(" -?, --help show this help, then exit\n"); printf("\n" - "Main intended use as restore_command in recovery.conf:\n" + "Main intended use as restore_command in postgresql.conf:\n" " restore_command = 'pg_standby [OPTION]... ARCHIVELOCATION %%f %%p %%r'\n" "e.g.\n" " restore_command = 'pg_standby /mnt/server/archiverdir %%f %%p %%r'\n"); diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml index 0f09d82..f43e41e 100644 --- a/doc/src/sgml/backup.sgml +++ b/doc/src/sgml/backup.sgml @@ -1174,8 +1174,15 @@ SELECT pg_stop_backup(); - Create a recovery command file recovery.conf in the cluster - data directory (see ). You might + Set up recovery parameters in postgresql.conf (see + and + ). + + + + + Create a recovery trigger file recovery.trigger + in the cluster data directory. You might also want to temporarily modify pg_hba.conf to prevent ordinary users from connecting until you are sure the recovery was successful. @@ -1187,7 +1194,7 @@ SELECT pg_stop_backup(); recovery be terminated because of an external error, the server can simply be restarted and it will continue recovery. Upon completion of the recovery process, the server will rename - recovery.conf to recovery.done (to prevent + recovery.trigger to recovery.done (to prevent accidentally re-entering recovery mode later) and then commence normal database operations. @@ -1203,12 +1210,11 @@ SELECT pg_stop_backup(); -The key part of all this is to set up a recovery configuration file that -describes how you want to recover and how far the recovery should -run. You can use recovery.conf.sample (normally -located in the installation's share/ directory) as a -prototype. The one thing that you absolutely must specify in -recovery.conf is the restore_command, +The key part of all this is to set up recovery parameters that +specify how you want to recover and how far the recovery should +run. The one thing that you absolutely must specify in +postgresql.conf to recover from the backup is +the restore_command, which tells PostgreSQL how to retrieve archived WAL file segments. Like the archive_command, this is a shell command string. It can contain %f, which is @@ -1270,7 +1276,7 @@ restore_command = 'cp /mnt/server/archivedir/%f %p' If you want to recover to some previous point in time (say, right before the junior DBA dropped your main transaction table), just specify the -required stopping point in recovery.conf. You can specify +required stopping point in postgresql.conf. You can specify the stop point, known as the recovery target, either by date/time, named restore point or by completion of a specific transaction ID. As of this writing only the date/time and named restore point options @@ -1367,8 +1373,9 @@ restore_command = 'cp /mnt/server/archivedir/%f %p' The default behavior of recovery is to recover along the same timeline that was current
Re: [HACKERS] autonomous transactions
> On Aug 31, 2016, at 6:46 AM, Greg Stark wrote: > > Using a background worker mean that the autonomous transaction can't > access any state from the process memory. Parameters in plpgsql are a > symptom of this but I suspect there will be others. What happens if a > statement timeout occurs during an autonomous transaction? What > happens if you use a pl language in the autonomous transaction and if > it tries to use non-transactional information such as prepared > statements? > +1 on this. The proposed solution loosely matches what was done in DB2 9.7 and it runs into the same complexity. Passing local variable or session level variables back and forth became a source of grief. At SFDC PG we have taken a different tack: 1. Gather up all the transaction state that is scattered across global variables into a struct 2. backup/restore transaction state when an autonomous transaction is invoked. This allows full access to all non-transactional state. The downside is that full access also includes uncommitted DDL (shared recache). So we had to restrict DDL in the parent transaction prior to the spawning of the child. If there is interest in exploring this kind of solution as an alternative I can elaborate. Cheers Serge -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_sequence catalog
On 2016-08-31 14:23:41 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2016-08-31 13:59:48 -0400, Tom Lane wrote: > >> You are ignoring the performance costs associated with eating 100x more > >> shared buffer space than necessary. > > > I doubt that's measurable in any real-world scenario. You seldomly have > > hundreds of thousands of sequences that you all select from at a high > > rate. > > If there are only a few sequences in the database, cross-sequence > contention is not going to be a big issue anyway. Isn't that *precisely* when it's going to matter? If you have 5 active tables & sequences where the latter previously used independent locks, and they'd now be contending on a single lock. If you have hundreds of thousands of active sequences, I doubt individual page locks would become a point of contention. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_sequence catalog
Andres Freund writes: > On 2016-08-31 13:59:48 -0400, Tom Lane wrote: >> You are ignoring the performance costs associated with eating 100x more >> shared buffer space than necessary. > I doubt that's measurable in any real-world scenario. You seldomly have > hundreds of thousands of sequences that you all select from at a high > rate. If there are only a few sequences in the database, cross-sequence contention is not going to be a big issue anyway. I think most of the point of making this change at all is to make things work better when you do have a boatload of sequences. Also, we could probably afford to add enough dummy padding to the sequence tuples so that they're each exactly 64 bytes, thereby having only one or two active counters per cache line. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
On Wed, Aug 24, 2016 at 12:39 PM, Andres Freund wrote: > > > On August 24, 2016 9:32:48 AM PDT, Tomas Vondra < > tomas.von...@2ndquadrant.com> wrote: > > > > > >On 08/24/2016 12:20 AM, Andres Freund wrote: > >> On 2016-08-23 19:18:04 -0300, Claudio Freire wrote: > >>> On Tue, Aug 23, 2016 at 7:11 PM, Tomas Vondra > >>> wrote: > Could someone please explain how the unlogged tables are supposed > >to fix the > catalog bloat problem, as stated in the initial patch submission? > >We'd still > need to insert/delete the catalog rows when creating/dropping the > >temporary > tables, causing the bloat. Or is there something I'm missing? > >> > >> Beats me. > >> > > > >Are you puzzled just like me, or are you puzzled why I'm puzzled? > > Like you. I don't think this addresses the problem to a significant enough > degree to care. > > Andres > -- > Sent from my Android device with K-9 Mail. Please excuse my brevity. > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > Ok, here's a wild idea, and it probably depends on having native partitioning implemented. Propagate relpersistence, or a boolean flag on (relpersistence = 't') from pg_class into the child pg_attribute records. Partition the tables pg_class and pg_attribute first by relpersistence, and then by oid. The partitions holding data on persistent objects would basically stay as-is, but the partition wouldn't have much activity and no temp-table churn. The temporary ones, however, would fall into essentially a rotating set of partitions. Pick enough partitions such that the active transactions only cover some of the partitions. The rest can be safely truncated by vacuum. It would mitigate the bloat, existing dictionary queries would still work, but the additional lookup cost might not be worth it.
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
On Wed, Aug 31, 2016 at 10:38 PM, Claudio Freire wrote: > > On Wed, Aug 31, 2016 at 1:45 PM, Pavan Deolasee > wrote: > >> We discussed a few ideas to address the "Duplicate Scan" problem. For >> example, we can teach Index AMs to discard any duplicate (key, CTID) insert >> requests. Or we could guarantee uniqueness by either only allowing updates >> in one lexical order. While the former is a more complete solution to avoid >> duplicate entries, searching through large number of keys for non-unique >> indexes could be a drag on performance. The latter approach may not be >> sufficient for many workloads. Also tracking increment/decrement for many >> indexes will be non-trivial. >> >> There is another problem with allowing many index entries pointing to the >> same WARM chain. It will be non-trivial to know how many index entries are >> currently pointing to the WARM chain and index/heap vacuum will throw up >> more challenges. >> >> Instead, what I would like to propose and the patch currently implements >> is to restrict WARM update to once per chain. So the first non-HOT update >> to a tuple or a HOT chain can be a WARM update. The chain can further be >> HOT updated any number of times. But it can no further be WARM updated. >> This might look too restrictive, but it can still bring down the number of >> regular updates by almost 50%. Further, if we devise a strategy to convert >> a WARM chain back to HOT chain, it can again be WARM updated. (This part is >> currently not implemented). A good side effect of this simple strategy is >> that we know there can maximum two index entries pointing to any given WARM >> chain. >> > > We should probably think about coordinating with my btree patch. > > From the description above, the strategy is quite readily "upgradable" to > one in which the indexam discards duplicate (key,ctid) pairs and that would > remove the limitation of only one WARM update... right? > Yes, we should be able to add further optimisations on lines you're working on, but what I like about the current approach is that a) it reduces complexity of the patch and b) having thought about cleaning up WARM chains, limiting number of index entries per root chain to a small number will simplify that aspect too. Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] pg_sequence catalog
On 2016-08-31 13:59:48 -0400, Tom Lane wrote: > and...@anarazel.de (Andres Freund) writes: > > On 2016-08-31 14:25:43 -0300, Alvaro Herrera wrote: > >> Yes, sure, we're still improving even if we stick to one-seq-per-bufpage, > >> but while we're at it, we could as well find a way to make it as best as > >> we can. And allowing multiple seqs per page seems a much better > >> situation, so let's try to get there. > > > It's not really that simple. Having independent sequence rows closer > > together will have its own performance cost. > > You are ignoring the performance costs associated with eating 100x more > shared buffer space than necessary. I doubt that's measurable in any real-world scenario. You seldomly have hundreds of thousands of sequences that you all select from at a high rate. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_sequence catalog
and...@anarazel.de (Andres Freund) writes: > On 2016-08-31 14:25:43 -0300, Alvaro Herrera wrote: >> Yes, sure, we're still improving even if we stick to one-seq-per-bufpage, >> but while we're at it, we could as well find a way to make it as best as >> we can. And allowing multiple seqs per page seems a much better >> situation, so let's try to get there. > It's not really that simple. Having independent sequence rows closer > together will have its own performance cost. You are ignoring the performance costs associated with eating 100x more shared buffer space than necessary. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Use static inline functions for float <-> Datum conversions.
Heikki Linnakangas writes: > On 08/31/2016 08:27 PM, Tom Lane wrote: >> We had a bunch of similar problems in the recent work on exact degree trig >> functions, and eventually found that storing values into volatile float8 >> variables was the most reliable way to force rounding to the expected >> storage width. I propose to replace the above hack with declaring >> newelemorder as volatile, and see if that makes things better. > Sounds reasonable. None of this is supposed to be necessary, we're just > working around broken compilers, so whatever works. > I'll prepare a patch for that tomorrow, unless you're on it already. On it already. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ICU integration
On Tue, Aug 30, 2016 at 7:46 PM, Peter Eisentraut wrote: > In initdb, I initialize the default collation set as before from the > `locale -a` output, but also add all available ICU locales with a "%icu" > appended (so "fr_FR%icu"). I suppose one could create a configuration > option perhaps in initdb to change the default so that, say, "fr_FR" > uses ICU and "fr_FR%posix" uses the old stuff. I suspect that we'd be better off adding a mechanism for adding a new collation after initdb runs, on a live production instance. Maybe that part can come later. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Use static inline functions for float <-> Datum conversions.
On 08/31/2016 08:27 PM, Tom Lane wrote: Heikki Linnakangas writes: Use static inline functions for float <-> Datum conversions. Hmm, it looks like narwhal for one is not happy with this: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=narwhal&dt=2016-08-31%2016%3A00%3A03 I suspect the failure traces to this bit in pg_enum.c: /* * On some machines, newelemorder may be in a register that's * wider than float4. We need to force it to be rounded to float4 * precision before making the following comparisons, or we'll get * wrong results. (Such behavior violates the C standard, but * fixing the compilers is out of our reach.) */ newelemorder = DatumGetFloat4(Float4GetDatum(newelemorder)); If you suppose that inlining those macros allows gcc to decide that the assignment is a no-op, the observed failure would be explained. Ugh. We had a bunch of similar problems in the recent work on exact degree trig functions, and eventually found that storing values into volatile float8 variables was the most reliable way to force rounding to the expected storage width. I propose to replace the above hack with declaring newelemorder as volatile, and see if that makes things better. Sounds reasonable. None of this is supposed to be necessary, we're just working around broken compilers, so whatever works. I'll prepare a patch for that tomorrow, unless you're on it already. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ICU integration
On Tue, Aug 30, 2016 at 7:46 PM, Peter Eisentraut wrote: > Here is a patch I've been working on to allow the use of ICU for sorting > and other locale things. I'm really happy that you're working on this. This is more important than is widely appreciated, and very important overall. In a world where ICU becomes the defacto standard (i.e. is used on major platforms by default), what remaining barriers are there to documenting and enforcing the binary compatibility of replicas? I may be mistaken, but offhand I can't think of any. Being able to describe exactly what works and what doesn't is very important. After all, failure to adhere to "the rules" today, such as they are, can leave you with a subtly broken replica. I'd like to make that scenario mechanically impossible, by locking everything down. > I'm not sure how well it will work to replace all the bits of LIKE and > regular expressions with ICU API calls. One problem is that ICU likes > to do case folding as a whole string, not by character. I need to do > more research about that. My guess is that there are cultural reasons why it wants to operate on a whole string, at least in some cases. > Also note that ICU locales are encoding-independent and don't support a > separate collcollate and collctype, so the existing catalog structure is > not optimal. That makes more sense to me, personally. ICU very explicitly decouples technical issues (like the representation of strxfrm() keys, and, I gather, encoding) from cultural issues (the actual user-visible behaviors). This allows us to use strxfrm()-style binary keys in indexes directly, since they're versioned independently from their underlying collation; they can add a new optimization to strxfrm()-key generation to the next ICU version, and we can detect that and require a REINDEX, even when the collation version itself (the user-visible behaviors) are unchanged. I'm getting ahead of myself here, but that does seem very useful. The Unicode collation algorithm [1] that ICU is directly based on knows plenty about the requirements of indexing. It contains guidance about equivalence vs. equality that we learned the hard way in commit 656beff5, for example. > Where it gets really interesting is what to do with the database > locales. They just set the global process locale. So in order to port > that to ICU we'd need to check every implicit use of the process locale > and tweak it. We could add a datcollprovider column or something. But > we also rely on the datctype setting to validate the encoding of the > database. Maybe we wouldn't need that anymore, but it sounds risky. Not sure about that. Whatever we come up with here needs to mesh well with the existing conventions around collation versioning that ICU has, in the context of various operating system packages in particular. We can arrange it so that in practice, an ICU upgrade doesn't often break your indexes due to a collation rule change; ICU is happy to have multiple versions of a collation at a time, and you'll probably retain the old collation version in ICU. Even if your old collation version isn't available in a new ICU release (which I think is unlikely in practice), or you downgrade ICU, it might be possible to give guidance on how to download a "Collation Resource Bundle" [2][3] that *does* have the right collation version, which presumably satisfies the requirement immediately. Firebird already uses ICU. Maybe we have something to learn from them here. In particular, where do they (by which I mean the ICU version that Firebird links to) get its collations from in practice? I think that the CLDR Data collations were at one time not even distributed with ICU source. It might be a matter of individual OS packagers of ICU deciding what exact CLDR data to use, which may or may not be of any significant consequence in practice. [1] http://unicode.org/reports/tr10 [2] http://site.icu-project.org/design/size/collation [3] http://userguide.icu-project.org/icudata -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Use static inline functions for float <-> Datum conversions.
Heikki Linnakangas writes: > Use static inline functions for float <-> Datum conversions. Hmm, it looks like narwhal for one is not happy with this: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=narwhal&dt=2016-08-31%2016%3A00%3A03 I suspect the failure traces to this bit in pg_enum.c: /* * On some machines, newelemorder may be in a register that's * wider than float4. We need to force it to be rounded to float4 * precision before making the following comparisons, or we'll get * wrong results. (Such behavior violates the C standard, but * fixing the compilers is out of our reach.) */ newelemorder = DatumGetFloat4(Float4GetDatum(newelemorder)); If you suppose that inlining those macros allows gcc to decide that the assignment is a no-op, the observed failure would be explained. We had a bunch of similar problems in the recent work on exact degree trig functions, and eventually found that storing values into volatile float8 variables was the most reliable way to force rounding to the expected storage width. I propose to replace the above hack with declaring newelemorder as volatile, and see if that makes things better. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_sequence catalog
On 2016-08-31 14:25:43 -0300, Alvaro Herrera wrote: > Andres Freund wrote: > > Hi, > > > > On 2016-08-31 12:53:30 -0400, Tom Lane wrote: > > > Improving on the space wastage is exactly the point IMO. If it's still > > > going to be 8k per sequence on disk (*and* in shared buffers, remember), > > > I'm not sure it's worth all the work to change things at all. > > > > A separate file is a heck lot more heavyweight than another 8 kb in an > > existing file. > > Yes, sure, we're still improving even if we stick to one-seq-per-bufpage, > but while we're at it, we could as well find a way to make it as best as > we can. And allowing multiple seqs per page seems a much better > situation, so let's try to get there. It's not really that simple. Having independent sequence rows closer together will have its own performance cost. Suddenly independent sequences will compete for the same page level lock, NUMA systems will have to transfer the page/cacheline around even if it's independent sequences being accessed in different backends, we'll have to take care about cacheline padding... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_sequence catalog
Andres Freund wrote: > Hi, > > On 2016-08-31 12:53:30 -0400, Tom Lane wrote: > > Improving on the space wastage is exactly the point IMO. If it's still > > going to be 8k per sequence on disk (*and* in shared buffers, remember), > > I'm not sure it's worth all the work to change things at all. > > A separate file is a heck lot more heavyweight than another 8 kb in an > existing file. Yes, sure, we're still improving even if we stick to one-seq-per-bufpage, but while we're at it, we could as well find a way to make it as best as we can. And allowing multiple seqs per page seems a much better situation, so let's try to get there. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_sequence catalog
Hi, On 2016-08-31 12:53:30 -0400, Tom Lane wrote: > Improving on the space wastage is exactly the point IMO. If it's still > going to be 8k per sequence on disk (*and* in shared buffers, remember), > I'm not sure it's worth all the work to change things at all. A separate file is a heck lot more heavyweight than another 8 kb in an existing file. > Another idea would be to have nominally per-sequence LWLocks (or > spinlocks?) controlling nextval's nontransactional accesses to the catalog > rows, but to map those down to some fixed number of locks in a way similar > to the current fallback implementation for spinlocks, which maps them onto > a fixed number of semaphores. You'd trade off shared memory against > contention while choosing the underlying number of locks. If we could rely on spinlocks to actually be spinlocks, we could just put the spinlock besides the actual state data... Not entirely pretty, but probably pretty simple. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
On Wed, Aug 31, 2016 at 1:45 PM, Pavan Deolasee wrote: > We discussed a few ideas to address the "Duplicate Scan" problem. For > example, we can teach Index AMs to discard any duplicate (key, CTID) insert > requests. Or we could guarantee uniqueness by either only allowing updates > in one lexical order. While the former is a more complete solution to avoid > duplicate entries, searching through large number of keys for non-unique > indexes could be a drag on performance. The latter approach may not be > sufficient for many workloads. Also tracking increment/decrement for many > indexes will be non-trivial. > > There is another problem with allowing many index entries pointing to the > same WARM chain. It will be non-trivial to know how many index entries are > currently pointing to the WARM chain and index/heap vacuum will throw up > more challenges. > > Instead, what I would like to propose and the patch currently implements > is to restrict WARM update to once per chain. So the first non-HOT update > to a tuple or a HOT chain can be a WARM update. The chain can further be > HOT updated any number of times. But it can no further be WARM updated. > This might look too restrictive, but it can still bring down the number of > regular updates by almost 50%. Further, if we devise a strategy to convert > a WARM chain back to HOT chain, it can again be WARM updated. (This part is > currently not implemented). A good side effect of this simple strategy is > that we know there can maximum two index entries pointing to any given WARM > chain. > We should probably think about coordinating with my btree patch. >From the description above, the strategy is quite readily "upgradable" to one in which the indexam discards duplicate (key,ctid) pairs and that would remove the limitation of only one WARM update... right?
Re: [HACKERS] autonomous transactions
On Wed, Aug 31, 2016 at 3:11 PM, Craig Ringer wrote: > > I suspect that there'll be way too much code that relies on stashing > xact-scoped stuff in globals for that to be viable. Caches alone. > Peter will be able to explain more, I'm sure. > > We'd probably need a new transaction data object that everything > xact-scoped hangs off, so we can pass it everywhere or swap it out of > some global. The mechanical refactoring alone would be pretty scary, > not to mention the complexity of actually identifying all the less > obvious places that need changing. Well this is the converse of the same problem. Today process state and transaction are tied together. One way or another you're trying to split that -- either by having two processes share state or by having one process manage two transactions. I suppose we already have the infrastructure for parallel query so there's at least some shared problem space there. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_sequence catalog
Andres Freund writes: > On 2016-08-31 12:56:45 -0300, Alvaro Herrera wrote: >> I was thinking that nextval could grab a shared buffer lock and release >> immediately, just to ensure no one holds exclusive buffer lock >> concurrently (which would be used for things like dropping one seq tuple >> from the page, when a sequence is dropped); then control access to each >> sequence tuple using LockDatabaseObject. This is a HW lock, heavier >> than a buffer's LWLock, but it seems better than wasting a full 8kb for >> each sequence. > That's going to go be a *lot* slower, I don't think that's ok. I've a > hard time worrying about the space waste here; especially considering > where we're coming from. Improving on the space wastage is exactly the point IMO. If it's still going to be 8k per sequence on disk (*and* in shared buffers, remember), I'm not sure it's worth all the work to change things at all. We're already dealing with taking a heavyweight lock for each sequence (the relation AccessShareLock). I wonder whether it'd be possible to repurpose that effort somehow. Another idea would be to have nominally per-sequence LWLocks (or spinlocks?) controlling nextval's nontransactional accesses to the catalog rows, but to map those down to some fixed number of locks in a way similar to the current fallback implementation for spinlocks, which maps them onto a fixed number of semaphores. You'd trade off shared memory against contention while choosing the underlying number of locks. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: psql \setfileref
2016-08-31 18:24 GMT+02:00 Corey Huinker : > On Wed, Aug 31, 2016 at 11:32 AM, Pavel Stehule > wrote: > >> Hi >> >> I propose a new type of psql variables - file references. The content of >> file reference is specified by referenced file. It allows simple inserting >> large data without necessity of manual escaping or using LO api. >> >> When I wrote the patch, I used parametrized queries for these data >> instead escaped strings - the code is not much bigger, and the error >> messages are much more friendly if query is not bloated by bigger content. >> The text mode is used only - when escaping is not required, then content is >> implicitly transformed to bytea. By default the content of file is bytea. >> When use requires escaping, then he enforces text escaping - because it has >> sense only for text type. >> >> postgres=# \setfileref a ~/test2.xml >> postgres=# \setfileref b ~/avatar.gif >> postgres=# insert into test values(convert_from(:a, 'latin2')::xml, :b); >> -- xml is passed as bytea >> postgres=# insert into test values(:'a', :b); -- xml is passed via >> unknown text value >> >> The content of file reference variables is not persistent in memory. >> >> Comments, notes? >> >> Regards >> >> Pavel >> >> >> -- >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-hackers >> >> > Clearly jumping ahead on this one, but if the fileref is essentially a > pipe to "cat /path/to/file.name", is there anything stopping us from > setting pipes? > > My interest is primarily in ways that COPY could use this. > I don't see a reason why it should not be possible - the current code can't do it, but with 20 lines more, it should be possible There is one disadvantage against copy - the content should be fully loaded to memory, but any other functionality should be same. Regards Pavel
Re: [HACKERS] autonomous transactions
On 31 August 2016 at 14:09, Joel Jacobson wrote: > On Wed, Aug 31, 2016 at 6:41 AM, Jaime Casanova > wrote: >> >> On 30 August 2016 at 23:10, Joel Jacobson wrote: >> > >> > There should be a way to within the session and/or txn permanently >> > block autonomous transactions. >> > >> >> This will defeat one of the use cases of autonomous transactions: auditing > > My idea on how to deal with this would be to mark the function to be > "AUTONOMOUS" similar to how a function is marked to be "PARALLEL > SAFE", > and to throw an error if a caller that has blocked autonomous > transactions tries to call a function that is marked to be autonomous. > > That way none of the code that needs to be audited would ever get executed. Not sure I see why you would want to turn off execution for only some functions. What happens if your function calls some other function with side-effects? How would you roll that back? How would you mark functions for the general case? Functions with side effects can't be tested with simple unit tests; that has nothing to do with autonomous transactions. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: psql \setfileref
On Wed, Aug 31, 2016 at 11:32 AM, Pavel Stehule wrote: > Hi > > I propose a new type of psql variables - file references. The content of > file reference is specified by referenced file. It allows simple inserting > large data without necessity of manual escaping or using LO api. > > When I wrote the patch, I used parametrized queries for these data instead > escaped strings - the code is not much bigger, and the error messages are > much more friendly if query is not bloated by bigger content. The text mode > is used only - when escaping is not required, then content is implicitly > transformed to bytea. By default the content of file is bytea. When use > requires escaping, then he enforces text escaping - because it has sense > only for text type. > > postgres=# \setfileref a ~/test2.xml > postgres=# \setfileref b ~/avatar.gif > postgres=# insert into test values(convert_from(:a, 'latin2')::xml, :b); > -- xml is passed as bytea > postgres=# insert into test values(:'a', :b); -- xml is passed via unknown > text value > > The content of file reference variables is not persistent in memory. > > Comments, notes? > > Regards > > Pavel > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > > Clearly jumping ahead on this one, but if the fileref is essentially a pipe to "cat /path/to/file.name", is there anything stopping us from setting pipes? My interest is primarily in ways that COPY could use this.
Re: [HACKERS] pg_sequence catalog
On 2016-08-31 12:56:45 -0300, Alvaro Herrera wrote: > I was thinking that nextval could grab a shared buffer lock and release > immediately, just to ensure no one holds exclusive buffer lock > concurrently (which would be used for things like dropping one seq tuple > from the page, when a sequence is dropped); then control access to each > sequence tuple using LockDatabaseObject. This is a HW lock, heavier > than a buffer's LWLock, but it seems better than wasting a full 8kb for > each sequence. That's going to go be a *lot* slower, I don't think that's ok. I've a hard time worrying about the space waste here; especially considering where we're coming from. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Proposal for changes to recovery.conf API
This is a summary of proposed changes to the recovery.conf API for v10. These are based in part on earlier discussions, and represent a minimal modification to current usage. Proposed changes (with reference to patches from Abhijit Menon-Sen and myself) * pg_ctl start -M (normal|recover|continue) pg_ctl can now start the server directly as a standby, similar to the way pg_ctl promote works. The internal implementation is also similar, with pg_ctl writing a "recovery.trigger" file that initiates recovery in the same way recovery.conf used to do. It is still possible to manually add a file called "recovery.trigger" and have that work if users desire that mechanism. Different startup methods can be selected with the -M option. Normal mode starts the server for read/write, overriding any previous use in Recover mode. Recover mode starts the server as a standby server which expects to receive changes from a primary/master server using physical streaming replication or is used for performing a recovery from backup. Continue mode is the default and will startup the server in whatever mode it was in at last proper shutdown, or as modified by any trigger files present. (Patch: recovery_startup_r10_api.v1b.patch) * $DATADIR/recovery.conf no longer triggers recovery (Patch: recovery_startup_r10_api.v1b.patch) * Recovery parameters would now be part of the main postgresql.conf infrastructure Any parameters set in $DATADIR/recovery.conf will be read after the main parameter file, similar to the way that postgresql.conf.auto is read. (Abhijit) * pg_basebackup -R will continue to generate a parameter file called recovery.conf as it does now, but will also create a file named recovery.trigger. (This part is WIP; patch doesn't include implementation for tar format yet) * Parameters All of the parameters formerly set in recovery.conf can be set in postgresql.conf using RELOAD These parameters will have no defaults in postgresql.conf.sample Setting them has no effect during normal running, or once recovery ends. https://www.postgresql.org/docs/devel/static/archive-recovery-settings.html https://www.postgresql.org/docs/devel/static/recovery-target-settings.html https://www.postgresql.org/docs/devel/static/standby-settings.html (Abhijit) Related cleanup * Promotion signal file is now called "promote.trigger" rather than just "promote" * Remove user configurable "trigger_file" mechanism - use "promote.trigger" for all cases * Remove Fallback promote mechanism, so all promotion is now "fast" in xlog.c * Rename CheckForStandbyTrigger() to CheckForPromoteTrigger() for clarity (Patch: recovery_startup_r10_api.v1b.patch) -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services recovery_startup_r10_api.v1b.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimizing aggregates
On 2016-08-31 19:07:00 +0300, Heikki Linnakangas wrote: > On 08/31/2016 06:51 PM, Andres Freund wrote: > > I've first combined the projection for all the aggregates, ordered set, > > or not, into one projetion. That got rid of a fair amount of overhead > > when you have multiple aggregates. I attached an, probably out of date, > > WIP version of that patch. > > A-ha, I also considered doing just that! I also considered a variant where > we call ExecProject once for all non-ordered aggregates, and a separate > ExecProject() for each ordered one. But just switching back to straight > ExecEvalExprs seemed easier. The issue is that might, I think, end up iteratively deforming the underlying tuple. The projection machinery takes care of that, if we do it in one go. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimizing aggregates
On 08/31/2016 06:51 PM, Andres Freund wrote: On 2016-08-31 17:47:18 +0300, Heikki Linnakangas wrote: We actually used to call ExecEvalExpr() directly for each argument, but that was changed by the patch that added support for ordered set aggregates. It looks like that was a bad idea, from a performance point of view. I complained about that as well http://archives.postgresql.org/message-id/20160519175727.ymv2y5tye4qgcmqx%40alap3.anarazel.de Ah, missed that! I propose that we go back to calling ExecEvalExpr() directly, for non-ordered aggregates, per the attached patch. That makes that example query about 10% faster on my laptop, which is in line with the fact that ExecProject() accounted for about 13% of the CPU time. My approach is a bit different. I've first combined the projection for all the aggregates, ordered set, or not, into one projetion. That got rid of a fair amount of overhead when you have multiple aggregates. I attached an, probably out of date, WIP version of that patch. A-ha, I also considered doing just that! I also considered a variant where we call ExecProject once for all non-ordered aggregates, and a separate ExecProject() for each ordered one. But just switching back to straight ExecEvalExprs seemed easier. Secondly, I'm working on overhauling expression evaluation to be faster. Even without the ExecProject overhead, the computations very quickly become the bottleneck. During that I pretty much merged ExecProject and ExecEvalExpr into one - they're really not that different, and the distinction serves no purpose, except to increase the number of function calls. The reason I'm working on getting rid of targetlist SRFs is precisely that. A proof of concept of that is attached to http://archives.postgresql.org/message-id/20160714011850.bd5zhu35szle3n3c%40alap3.anarazel.de Cool, yes, all that should help. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_sequence catalog
Andres Freund wrote: > On 2016-08-31 11:23:27 -0400, Tom Lane wrote: > > Another issue is what is the low-level interlock between nextvals > > in different processes. Right now it's the buffer lock on the > > sequence's page. With a scheme like this, if we just kept doing > > that, we'd have a single lock covering probably O(100) different > > sequences which might lead to contention problems. We could probably > > improve on that with some thought. > > I was thinking of forcing the rows to be spread to exactly one page per > sequence... I was thinking that nextval could grab a shared buffer lock and release immediately, just to ensure no one holds exclusive buffer lock concurrently (which would be used for things like dropping one seq tuple from the page, when a sequence is dropped); then control access to each sequence tuple using LockDatabaseObject. This is a HW lock, heavier than a buffer's LWLock, but it seems better than wasting a full 8kb for each sequence. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimizing aggregates
Hi, On 2016-08-31 17:47:18 +0300, Heikki Linnakangas wrote: > # .. . > > # > 25.70% 0.00% postmaster [unknown] [k] > 14.23%13.75% postmaster postgres [.] ExecProject > ExecProject stands out. I find that pretty surprising. > > We're using ExecProject to extract the arguments from the input tuples, to > pass to the aggregate transition functions. It looks like that's a pretty > expensive way of doing it, for a typical aggregate that takes only one > argument. > > We actually used to call ExecEvalExpr() directly for each argument, but that > was changed by the patch that added support for ordered set aggregates. It > looks like that was a bad idea, from a performance point of view. I complained about that as well http://archives.postgresql.org/message-id/20160519175727.ymv2y5tye4qgcmqx%40alap3.anarazel.de > I propose that we go back to calling ExecEvalExpr() directly, for > non-ordered aggregates, per the attached patch. That makes that example > query about 10% faster on my laptop, which is in line with the fact that > ExecProject() accounted for about 13% of the CPU time. My approach is a bit different. I've first combined the projection for all the aggregates, ordered set, or not, into one projetion. That got rid of a fair amount of overhead when you have multiple aggregates. I attached an, probably out of date, WIP version of that patch. Secondly, I'm working on overhauling expression evaluation to be faster. Even without the ExecProject overhead, the computations very quickly become the bottleneck. During that I pretty much merged ExecProject and ExecEvalExpr into one - they're really not that different, and the distinction serves no purpose, except to increase the number of function calls. The reason I'm working on getting rid of targetlist SRFs is precisely that. A proof of concept of that is attached to http://archives.postgresql.org/message-id/20160714011850.bd5zhu35szle3n3c%40alap3.anarazel.de Greetings, Andres Freund >From 384845bea72d28952d88e58e55f81aaa5addd930 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Tue, 12 Jul 2016 01:01:28 -0700 Subject: [PATCH] WIP: Only perform one projection in aggregation. --- src/backend/executor/nodeAgg.c | 112 - 1 file changed, 88 insertions(+), 24 deletions(-) diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c index f655aec..4499d5f 100644 --- a/src/backend/executor/nodeAgg.c +++ b/src/backend/executor/nodeAgg.c @@ -210,6 +210,9 @@ typedef struct AggStatePerTransData */ int numInputs; + /* offset of input columns in Aggstate->evalslot */ + int inputoff; + /* * Number of aggregated input columns to pass to the transfn. This * includes the ORDER BY columns for ordered-set aggs, but not for plain @@ -836,14 +839,20 @@ advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup) int setno = 0; int numGroupingSets = Max(aggstate->phase->numsets, 1); int numTrans = aggstate->numtrans; + TupleTableSlot *slot = aggstate->evalslot; + AggStatePerTrans pertrans; - for (transno = 0; transno < numTrans; transno++) + /* compute input for all aggregates */ + if (aggstate->evalproj) + ExecProjectIntoSlot(aggstate->evalproj, aggstate->evalslot); + + for (transno = 0, pertrans = aggstate->pertrans; transno < numTrans; + transno++, pertrans++) { - AggStatePerTrans pertrans = &aggstate->pertrans[transno]; ExprState *filter = pertrans->aggfilter; int numTransInputs = pertrans->numTransInputs; int i; - TupleTableSlot *slot; + int inputoff = pertrans->inputoff; /* Skip anything FILTERed out */ if (filter) @@ -857,13 +866,10 @@ advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup) continue; } - /* Evaluate the current input expressions for this aggregate */ - slot = ExecProject(pertrans->evalproj, NULL); - if (pertrans->numSortCols > 0) { /* DISTINCT and/or ORDER BY case */ - Assert(slot->tts_nvalid == pertrans->numInputs); + Assert(slot->tts_nvalid >= pertrans->numInputs); /* * If the transfn is strict, we want to check for nullity before @@ -876,7 +882,7 @@ advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup) { for (i = 0; i < numTransInputs; i++) { - if (slot->tts_isnull[i]) + if (slot->tts_isnull[i + inputoff]) break; } if (i < numTransInputs) @@ -888,10 +894,22 @@ advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup) /* OK, put the tuple into the tuplesort object */ if (pertrans->numInputs == 1) tuplesort_putdatum(pertrans->sortstates[setno], - slot->tts_values[0], - slot->tts_isnull[0]); + slot->tts_values[inputoff], + slot->tts_isnull[inputoff]); else - tuplesort_puttupleslot(pertrans->sortstates[setno],
Re: [HACKERS] pg_sequence catalog
On 2016-08-31 11:23:27 -0400, Tom Lane wrote: > Another issue is what is the low-level interlock between nextvals > in different processes. Right now it's the buffer lock on the > sequence's page. With a scheme like this, if we just kept doing > that, we'd have a single lock covering probably O(100) different > sequences which might lead to contention problems. We could probably > improve on that with some thought. I was thinking of forcing the rows to be spread to exactly one page per sequence... Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] proposal: psql \setfileref
Hi I propose a new type of psql variables - file references. The content of file reference is specified by referenced file. It allows simple inserting large data without necessity of manual escaping or using LO api. When I wrote the patch, I used parametrized queries for these data instead escaped strings - the code is not much bigger, and the error messages are much more friendly if query is not bloated by bigger content. The text mode is used only - when escaping is not required, then content is implicitly transformed to bytea. By default the content of file is bytea. When use requires escaping, then he enforces text escaping - because it has sense only for text type. postgres=# \setfileref a ~/test2.xml postgres=# \setfileref b ~/avatar.gif postgres=# insert into test values(convert_from(:a, 'latin2')::xml, :b); -- xml is passed as bytea postgres=# insert into test values(:'a', :b); -- xml is passed via unknown text value The content of file reference variables is not persistent in memory. Comments, notes? Regards Pavel commit 077c71b1f8ae24ccf2f3723e1e4ca5bf05bca0d3 Author: Pavel Stehule Date: Wed Aug 31 17:15:33 2016 +0200 initial diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 4aaf657..3150510 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -1355,6 +1355,32 @@ exec_command(const char *cmd, free(envval); } + /* \setfileref - set variable by reference on file */ + else if (strcmp(cmd, "setfileref") == 0) + { + char *name = psql_scan_slash_option(scan_state, + OT_NORMAL, NULL, false); + + char *ref = psql_scan_slash_option(scan_state, + OT_NORMAL, NULL, false); + + success = false; + + if (!name || !ref) + { + psql_error("\\%s: missing required argument\n", cmd); + success = false; + } + else + { + if (!SetFileRef(pset.vars, name, ref)) + { +psql_error("\\%s: error while setting variable\n", cmd); +success = false; + } + } + } + /* \sf -- show a function's source code */ else if (strcmp(cmd, "sf") == 0 || strcmp(cmd, "sf+") == 0) { diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c index 7399950..3c1db17 100644 --- a/src/bin/psql/common.c +++ b/src/bin/psql/common.c @@ -32,7 +32,6 @@ static bool ExecQueryUsingCursor(const char *query, double *elapsed_msec); static bool command_no_begin(const char *query); static bool is_select_command(const char *query); - /* * openQueryOutputFile --- attempt to open a query output file * @@ -108,6 +107,123 @@ setQFout(const char *fname) return true; } +void +psql_reset_query_params(void) +{ + int i; + + for (i = 0; i < pset.nparams; i++) + if (pset.params[i] != NULL) + { + PQfreemem(pset.params[i]); + pset.params[i] = NULL; + } + + pset.nparams = 0; +} + +/* + * Load a content of the file_ref related file to query params buffer. + * When escaping is requested, then the text content is expected. + * Without escaping the bytea content is expected and related bytea + * escaping is processed. + */ +#define BYTEAOID 17 +#define UNKNOWNOID 0 + +static char * +get_file_ref_content(const char *value, bool escape, bool as_ident) +{ + PQExpBufferData buffer; + FILE *fd = NULL; + char *fname; + char *escaped_value; + charline[1024]; + size_tsize; + + fname = pstrdup(value); + + expand_tilde(&fname); + if (!fname) + { + psql_error("missing valid path to file\n"); + return NULL; + } + + canonicalize_path(fname); + + fd = fopen(fname, PG_BINARY_R); + if (!fd) + { + psql_error("%s: %s\n", fname, strerror(errno)); + PQfreemem(fname); + return NULL; + } + + /* can append another parameter */ + if (pset.nparams >= MAX_BINARY_PARAMS) + { + psql_error("too much binary parameters"); + PQfreemem(fname); + return NULL; + } + + if (!pset.db) + { + psql_error("cannot escape without active connection\n"); + PQfreemem(fname); + return NULL; + } + + initPQExpBuffer(&buffer); + + while ((size = fread(line, 1, sizeof(line), fd)) > 0) + appendBinaryPQExpBuffer(&buffer, line, size); + + if (ferror(fd)) + { + psql_error("%s: %s\n", fname, strerror(errno)); + PQfreemem(fname); + termPQExpBuffer(&buffer); + return NULL; + } + + if (escape) + { + if (as_ident) + escaped_value = +PQescapeIdentifier(pset.db, buffer.data, buffer.len); + else + escaped_value = +PQescapeLiteral(pset.db, buffer.data, buffer.len); + pset.paramTypes[pset.nparams] = UNKNOWNOID; + } + else + { + escaped_value = (char *) +PQescapeByteaConn(pset.db, + (const unsigned char *) buffer.data, buffer.len, &size); + pset.paramTypes[pset.nparams] = BYTEAOID; + } + + /* fname, buffer is not necessary longer */ + PQfreemem(fname); + termPQExpBuffer(&buffer); + + if (escaped_value == NULL) + { + const char *error = PQerrorMessage(pset.db); + + psql_error("%s", error); + return NULL; + } + + pset.params[pset.nparams] = escaped_value; + + snprintf(line, sizeof(line) - 1, "$%d", ++pset.nparams); + + return pstrdup(lin
Re: [HACKERS] pg_sequence catalog
Craig Ringer writes: > On 31 August 2016 at 22:01, Tom Lane wrote: >> Personally, my big beef with the current approach to sequences is that >> we eat a whole relation (including a whole relfilenode) per sequence. >> I wish that we could reduce a sequence to just a single row in a >> catalog, including the nontransactional state. > It sounds like you're thinking of something like a normal(ish) heap > tuple where we just overwrite some fields in-place without fiddling > xmin/xmax and making a new row version. Right? Like we currently > overwrite the lone Form_pg_sequence on the 1-page sequence relations. That would be what to do with the nontransactional state. If I recall previous discussions correctly, there's a stumbling block if you want to treat ALTER SEQUENCE changes as transactional --- but maybe that doesn't make sense anyway. If we did want to try that, maybe we need two auxiliary catalogs, one for the transactionally-updatable sequence fields and one for the nontransactional fields. > It feels intuitively pretty gross to effectively dirty-read and write > a few fields of a tuple. But that's what we do all the time with > xmin/xmax etc, it's not really that different. True. I think two rows would work around that, but maybe we don't have to. Another issue is what is the low-level interlock between nextvals in different processes. Right now it's the buffer lock on the sequence's page. With a scheme like this, if we just kept doing that, we'd have a single lock covering probably O(100) different sequences which might lead to contention problems. We could probably improve on that with some thought. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Missing checks when malloc returns NULL...
Michael Paquier writes: > Which means that processes have an escape window when initializing > shared memory by cleaning up the index if an entry cannot be found and > then cannot be created properly. I don't think that it is a good idea > to change that by forcing ShmemAlloc to fail. So I would tend to just > have the patch attached and add those missing NULL-checks on all the > existing ShmemAlloc() calls. > Opinions? I still think it'd be better to fix that as attached, because it represents a net reduction not net addition of code, and it provides a defense against future repetitions of the same omission. If only 4 out of 11 existing calls were properly checked --- some of them adjacent to calls with checks --- that should tell us that we *will* have more instances of the same bug if we don't fix it centrally. I also note that your patch missed checks for two ShmemAlloc calls in InitShmemAllocation and ShmemInitStruct. Admittedly, since those are the very first such calls, it's highly unlikely they'd fail; but I think this exercise is not about dismissing failures as improbable. Almost all of these failures are improbable, given that we precalculate the shmem space requirement. regards, tom lane diff --git a/src/backend/storage/ipc/shmem.c b/src/backend/storage/ipc/shmem.c index 1efe020..cc3af2d 100644 *** a/src/backend/storage/ipc/shmem.c --- b/src/backend/storage/ipc/shmem.c *** InitShmemAllocation(void) *** 163,177 /* * ShmemAlloc -- allocate max-aligned chunk from shared memory * ! * Assumes ShmemLock and ShmemSegHdr are initialized. * ! * Returns: real pointer to memory or NULL if we are out ! * of space. Has to return a real pointer in order ! * to be compatible with malloc(). */ void * ShmemAlloc(Size size) { Size newStart; Size newFree; void *newSpace; --- 163,194 /* * ShmemAlloc -- allocate max-aligned chunk from shared memory * ! * Throws error if request cannot be satisfied. * ! * Assumes ShmemLock and ShmemSegHdr are initialized. */ void * ShmemAlloc(Size size) { + void *newSpace; + + newSpace = ShmemAllocNoError(size); + if (!newSpace) + ereport(ERROR, + (errcode(ERRCODE_OUT_OF_MEMORY), + errmsg("out of shared memory (%zu bytes requested)", + size))); + return newSpace; + } + + /* + * ShmemAllocNoError -- allocate max-aligned chunk from shared memory + * + * As ShmemAlloc, but returns NULL if out of space, rather than erroring. + */ + void * + ShmemAllocNoError(Size size) + { Size newStart; Size newFree; void *newSpace; *** ShmemAlloc(Size size) *** 206,216 SpinLockRelease(ShmemLock); ! if (!newSpace) ! ereport(WARNING, ! (errcode(ERRCODE_OUT_OF_MEMORY), ! errmsg("out of shared memory"))); ! Assert(newSpace == (void *) CACHELINEALIGN(newSpace)); return newSpace; --- 223,229 SpinLockRelease(ShmemLock); ! /* note this assert is okay with newSpace == NULL */ Assert(newSpace == (void *) CACHELINEALIGN(newSpace)); return newSpace; *** ShmemInitHash(const char *name, /* table *** 293,299 * The shared memory allocator must be specified too. */ infoP->dsize = infoP->max_dsize = hash_select_dirsize(max_size); ! infoP->alloc = ShmemAlloc; hash_flags |= HASH_SHARED_MEM | HASH_ALLOC | HASH_DIRSIZE; /* look it up in the shmem index */ --- 306,312 * The shared memory allocator must be specified too. */ infoP->dsize = infoP->max_dsize = hash_select_dirsize(max_size); ! infoP->alloc = ShmemAllocNoError; hash_flags |= HASH_SHARED_MEM | HASH_ALLOC | HASH_DIRSIZE; /* look it up in the shmem index */ *** ShmemInitStruct(const char *name, Size s *** 364,375 */ Assert(shmemseghdr->index == NULL); structPtr = ShmemAlloc(size); - if (structPtr == NULL) - ereport(ERROR, - (errcode(ERRCODE_OUT_OF_MEMORY), - errmsg("not enough shared memory for data structure" - " \"%s\" (%zu bytes requested)", - name, size))); shmemseghdr->index = structPtr; *foundPtr = FALSE; } --- 377,382 *** ShmemInitStruct(const char *name, Size s *** 410,416 else { /* It isn't in the table yet. allocate and initialize it */ ! structPtr = ShmemAlloc(size); if (structPtr == NULL) { /* out of memory; remove the failed ShmemIndex entry */ --- 417,423 else { /* It isn't in the table yet. allocate and initialize it */ ! structPtr = ShmemAllocNoError(size); if (structPtr == NULL) { /* out of memory; remove the failed ShmemIndex entry */ diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c index 7cdb355..4064b20 100644 *** a/src/backend/storage/lmgr/predicate.c --- b/src/backend/storage/lmgr/predicate.c *** InitPredicateLocks(void) *** 1184,1195
Re: [HACKERS] [GENERAL] C++ port of Postgres
[trimmed cc list because of big attachments] On 8/16/16 4:22 PM, Jim Nasby wrote: > Joy, do you have an idea what a *minimally invasive* patch for C++ > support would look like? That's certainly the first step here. I developed a minimally invasive patch for C++ support a few years ago shortly after I wrote that blog post. Since there appears to have been some interest here now, I have updated that and split it up into logical chunks. So here you go. To build this, you need to configure with g++ <= version 5. (4.x works, too.) g++ version 6 does not work yet because of the issues described in patch 0013. Then you also need to edit src/Makefile.custom and set COPT = -fpermissive -Wno-sign-compare -Wno-write-strings The -W options are optional just to reduce some noise. Cleaning up those warnings can be a separate project that might also have some benefit under C. The -fpermissive option is a g++ specific option that reduces some errors to warnings. (So this won't work with clang or other compilers at all at this point.) In particular, C++ does not allow casting from or to void pointers without a cast, but -fpermissive allows that. The step from this to "real" C++ would be adding a bunch of casts around things like malloc and palloc and other places. That would be mostly busy work, so I have excluded that here. The patches are numbered approximately in increasing order of dubiosity. So 0001 is probably a straight bug fix, 0002 and 0003 are arguably minor bug fixes as well. The patches through 0012 can probably be considered for committing in some form. After that it gets a bit hackish. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From 86dbd4a5a0ab3212cd340e1fa56f03e864aa8e1a Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Tue, 30 Aug 2016 12:00:00 -0400 Subject: [PATCH 01/27] Fix use of offsetof() Using offsetof() with a run-time computed argument is not allowed in either C or C++. Apparently, gcc allows it, but g++ doesn't. --- contrib/bloom/blutils.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c index debf4f4..b68a0d1 100644 --- a/contrib/bloom/blutils.c +++ b/contrib/bloom/blutils.c @@ -75,7 +75,7 @@ _PG_init(void) bl_relopt_tab[i + 1].optname = MemoryContextStrdup(TopMemoryContext, buf); bl_relopt_tab[i + 1].opttype = RELOPT_TYPE_INT; - bl_relopt_tab[i + 1].offset = offsetof(BloomOptions, bitSize[i]); + bl_relopt_tab[i + 1].offset = offsetof(BloomOptions, bitSize[0]) + sizeof(int) * i; } } -- 2.9.3 From d3c7c1fbb346fd5c040a7a379d971db7b5129581 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Tue, 30 Aug 2016 12:00:00 -0400 Subject: [PATCH 02/27] Use return instead of exit() in configure Using exit() requires stdlib.h, which is not included. Use return instead. Also add return type for main(). --- config/c-compiler.m4 | 4 +++- config/c-library.m4 | 4 +++- configure| 12 +--- 3 files changed, 15 insertions(+), 5 deletions(-) diff --git a/config/c-compiler.m4 b/config/c-compiler.m4 index a7f6773..7d901e1 100644 --- a/config/c-compiler.m4 +++ b/config/c-compiler.m4 @@ -71,8 +71,10 @@ int does_int64_work() return 0; return 1; } + +int main() { - exit(! does_int64_work()); + return (! does_int64_work()); }])], [Ac_cachevar=yes], [Ac_cachevar=no], diff --git a/config/c-library.m4 b/config/c-library.m4 index 50d068d..56658b5 100644 --- a/config/c-library.m4 +++ b/config/c-library.m4 @@ -204,8 +204,10 @@ int does_int64_snprintf_work() return 0; /* either multiply or snprintf is busted */ return 1; } + +int main() { - exit(! does_int64_snprintf_work()); + return (! does_int64_snprintf_work()); }]])], [pgac_cv_snprintf_long_long_int_modifier=$pgac_modifier; break], [], diff --git a/configure b/configure index 45c8eef..36d9a54 100755 --- a/configure +++ b/configure @@ -13563,8 +13563,10 @@ int does_int64_work() return 0; return 1; } + +int main() { - exit(! does_int64_work()); + return (! does_int64_work()); } _ACEOF if ac_fn_c_try_run "$LINENO"; then : @@ -13645,8 +13647,10 @@ int does_int64_work() return 0; return 1; } + +int main() { - exit(! does_int64_work()); + return (! does_int64_work()); } _ACEOF if ac_fn_c_try_run "$LINENO"; then : @@ -13739,8 +13743,10 @@ int does_int64_snprintf_work() return 0; /* either multiply or snprintf is busted */ return 1; } + +int main() { - exit(! does_int64_snprintf_work()); + return (! does_int64_snprintf_work()); } _ACEOF if ac_fn_c_try_run "$LINENO"; then : -- 2.9.3 From 54ed07be5a29d4955a9485316e68da5c6896797b Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Tue, 30 Aug 2016 12:00:00 -0400 Subject: [PATCH 03/27] Add missing include files to configure tests atoi() needs stdlib.h strcmp() needs string.h --- config/c-library.m4 |
[HACKERS] less expensive pg_buffercache on big shmem
Hi hackers, Recently I have finished my work on a patch for pg_buffercache contrib, I think it's time to share my results. Introduction I want to offer you the implementation that allows to decrease system workload by partially sacrificing (fully snapshot consistency) data consistency. Sometimes we do not need full data consistency, for example on quantitative rather than qualitative analysis of memory contents, or when we want to catch insufficient memory resources or how often relation is used. Problem description === Currently, the pg_buffercache v1.1 and prior takes an exclusive lock on all shared buffers, which greatly affects system performance. Usually we use pg_buffercache to find out why DB is working slower than expected or examine what occupies the entire memory. So when we run pg_buffercache on such system, we make it even slower. Implementation == Vanilla implementation contains loop which collecting statistic from whole shared memory acquire, read and release Spinlocks one by one, page by page while holding LWLock. V1.2 implementation contains flexible loop which can collect shared memory statistic using three different methods: 1) with holding LWLock only on one partition of shared memory (semiconsistent method) 2) without LWLocks (nonconsistent method), 3) or in vanilia way (consistent method) The aforementioned allow us to launch pg_buffercache in the three different ways. Each of them have some advantages and some disadvantages: Consistent: + 100% consistency of shared memory snapshot - Slowdown the system with whole shared memory exclusive lock Semiconsistent: + Faster than consistent method + Mostly doesn`t affect on the system load - Speed of taking that snapshot is low Nonconsistent: The fastest + Doesn`t noticeably affects on the systems - <3% lost of snapshot consistency What works == Actually, it work well even on big load, but of course there might be things I've overlooked. VIEW pg_buffercache_cons VIEW pg_buffercache_noncons VIEW pg_buffercache_semicons Examples from docs in new realization: SELECT c.relname, count(*) AS buffers FROM pg_buffercache_noncons b INNER JOIN pg_class c ON b.relfilenode = pg_relation_filenode(c.oid) AND b.reldatabase IN (0, (SELECT oid FROM pg_database WHERE datname = current_database())) GROUP BY c.relname ORDER BY 2 DESC LIMIT 10; SELECT c.relname, count(*) AS buffers FROM pg_buffercache_semicons b INNER JOIN pg_class c ON b.relfilenode = pg_relation_filenode(c.oid) AND b.reldatabase IN (0, (SELECT oid FROM pg_database WHERE datname = current_database())) GROUP BY c.relname ORDER BY 2 DESC LIMIT 10; Testing the implementation == How implementation tested: 1) Start server 2) Make pgbench tps pgbench -c 250 -s 1000 -T 200 -P1 3) Compare how tps sags under load if: SELECT count(*) FROM pg_buffercache_cons; SELECT count(*) FROM pg_buffercache_semicons; SELECT count(*) FROM pg_buffercache_noncons; This test was made on server (server parameters) Model name:Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz CPU(s):144 Socket(s): 4 Shared_buffers:200GB Results of testing == Our DBA team obtained the following results: Nonconsistent: * 10% faster then consistent method * doesn`t noticeably affects on the systems * the maximum loss of accuracy was less then 3%* ( in most situation it is permissible accuracy loss ) Semiconsistent: * 5 time slower then nonconsistent * made less affects on system compared to consistent Overall results: Our clients was pleased with this implementation. Implementation is made with backward compatibility, as a conclusion old pg_buffercache v1.1 queries will work well. Semiconsistent show results approaching to nonconsistent on SELECTONLY queries. * this values were obtained from our DBA tests. What can be better === It is unclear how to optimize the semiconsistent method to make it faster, and reduce temporary effect that appears from time to time. I will be glad to see your feedback! --- Ivan Kartyshov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/contrib/pg_buffercache/Makefile b/contrib/pg_buffercache/Makefile index 065d3d6..8813c50 100644 --- a/contrib/pg_buffercache/Makefile +++ b/contrib/pg_buffercache/Makefile @@ -4,7 +4,7 @@ MODULE_big = pg_buffercache OBJS = pg_buffercache_pages.o $(WIN32RES) EXTENSION = pg_buffercache -DATA = pg_buffercache--1.1.sql pg_buffercache--1.0--1.1.sql pg_buffercache--unpackaged--1.0.sql +DATA = pg_buffercache--1.2.sql pg_buffercache--1.1--1.2.sql pg_buffercache--1.0--1.1.sql pg_buffercache--unpackaged--1.0.sql PGFILEDESC = "pg_buffercache - monitoring of shared buffer cache in real-time" ifdef USE_PGXS diff --git a/contrib/pg_buffercache/README b/contrib/pg_buffercache/
Re: [HACKERS] make async slave to wait for lsn to be replayed
On 31 August 2016 at 22:16, Ivan Kartyshov wrote: > Our clients who deal with 9.5 and use asynchronous master-slave replication, > asked to make the wait-mechanism on the slave side to prevent the situation > when slave handles query which needs data (LSN) that was received, flushed, > but still not replayed. I like the broad idea - I've wanted something like it for a while. BDR has pg_xlog_wait_remote_receive() and pg_xlog_wait_remote_apply() for use in tests for this reason, but they act on the *upstream* side, waiting until the downstream has acked the data. Not as useful for ensuring that apps connected to both master and one or more replicas get a consistent view of data. How do you get the commit LSN to watch for? Grab pg_current_xlog_insert_location() just after the commit and figure that replaying to that point guarantees you get the commit? Some time ago[1] I raised the idea of reporting commit LSN on the wire to clients. That didn't go anywhere due to compatibility and security concerns. I think those were resolvable, but it wasn't enough of a priority to push hard on at the time. A truly "right" solution has to wait for a protocol bump, but I think good-enough solutions are possible now. So you might want to read that thread. It also mentions hesitations about exposing LSN to clients even more. I think we're *way* past that now - we have replication origins and replication slots relying on it, it's exposed in a pg_lsn datatype, a bunch of views expose it, etc. But it might be reasonable to ask "should the client instead be expected to wait for the confirmed commit of a 64-bit epoch-extended xid, like that returned by txid_current()?" . One advantage of using xid is that you can get it while you're still in the xact, so there's no race between commit and checking the lsn after commit. Are you specifically trying to ensure "this commit has replayed on the replica before we run queries on it" ? Or something else? (Also, on a side note, Kevin mentioned that it may be possible to use SSI data to achieve SERIALIZABLE read-only queries on replicas, where they get the same protection from commit-order related anomalies as queries on the master. You might want to look more deeply into that too at some stage, if you're trying to ensure the app can do read only queries on the master and expect fully consistent results). [1] https://www.postgresql.org/message-id/flat/53E41EC1.5050603%402ndquadrant.com#53e41ec1.5050...@2ndquadrant.com -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autonomous transactions
On 31/08/16 16:11, Craig Ringer wrote: On 31 August 2016 at 21:46, Greg Stark wrote: On Wed, Aug 31, 2016 at 2:50 AM, Peter Eisentraut wrote: - A API interface to open a "connection" to a background worker, run queries, get results: AutonomousSessionStart(), AutonomousSessionEnd(), AutonomousSessionExecute(), etc. The communication happens using the client/server protocol. Peter, you mention "Oracle-style autonomous transaction blocks". What are the semantics to be expected of those with regards to: - Accessing objects exclusively locked by the outer xact or where the requested lockmode conflicts with a lock held by the outer xact - Visibility of data written by the outer xact That would be my question as well. Also, is it intended (outside the plpgsql interface) that the autonomous xact can proceed concurrently/interleaved with a local backend xact? i.e. the local backend xact isn't suspended and you're allowed to do things on the local backend as well? If so, what handling do you have in mind for deadlocks between the local backend xact and the bgworker with the autonomous xact? I'd expect the local backend to always win, killing the autonomous xact every time. I would expect that in PLs it's handled by them, if you misuse this on C level that's your problem? I'm surprised by the background worker. I had envisioned autonomous transactions being implemented by allowing a process to reserve a second entry in PGPROC with the same pid. Or perhaps save its existing information in a separate pgproc slot (or stack of slots) and restoring it after the autonomous transaction commits. I suspect that there'll be way too much code that relies on stashing xact-scoped stuff in globals for that to be viable. Caches alone. Peter will be able to explain more, I'm sure. I can also see some advantages in bgworker approach. For example it could be used for "fire and forget" type of interface in the future, where you return as soon as you send exec and don't care about waiting for result. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Optimizing aggregates
I've been profiling simple aggregate queries, looking for any low-hanging fruit. For this query: -- setup create table floats as select g::float8 as a, g::float8 as b, g::float8 as c from generate_series(1, 1000) g; vacuum freeze floats; -- query select sum(a), sum(b+c) from floats; perf report says: # Children Self Command Shared Object Symbol # .. . # 25.70% 0.00% postmaster [unknown] [k] 14.23%13.75% postmaster postgres [.] ExecProject 11.18%10.57% postmaster postgres [.] slot_deform_tuple 9.58% 9.04% postmaster postgres [.] advance_aggregates 8.96% 0.00% postmaster [unknown] [.] 0x000298d4 8.77% 8.42% postmaster postgres [.] ExecMakeFunctionResultNoSets 7.78% 0.00% postmaster [unknown] [.] 0x01d38260 6.63% 6.15% postmaster postgres [.] advance_transition_function 6.61% 0.00% postmaster [unknown] [.] 0x01e99e40 6.47% 0.00% postmaster libc-2.23.so [.] __GI___libc_read 6.24% 5.88% postmaster postgres [.] heap_getnext 4.62% 4.62% postmaster [kernel.kallsyms] [k] copy_user_enhanced_fast_string 3.91% 3.82% postmaster postgres [.] slot_getsomeattrs 3.29% 3.18% postmaster postgres [.] slot_getattr 3.06% 3.00% postmaster postgres [.] ExecClearTuple 2.59% 0.00% postmaster [unknown] [.] 0x01e9a370 2.57% 2.45% postmaster postgres [.] ExecScan 2.56% 2.37% postmaster postgres [.] float8pl 2.54% 2.43% postmaster postgres [.] heapgetpage 2.25% 2.17% postmaster postgres [.] ExecAgg 2.10% 1.96% postmaster postgres [.] ExecStoreTuple 2.00% 1.91% postmaster postgres [.] ExecProcNode ExecProject stands out. I find that pretty surprising. We're using ExecProject to extract the arguments from the input tuples, to pass to the aggregate transition functions. It looks like that's a pretty expensive way of doing it, for a typical aggregate that takes only one argument. We actually used to call ExecEvalExpr() directly for each argument, but that was changed by the patch that added support for ordered set aggregates. It looks like that was a bad idea, from a performance point of view. I propose that we go back to calling ExecEvalExpr() directly, for non-ordered aggregates, per the attached patch. That makes that example query about 10% faster on my laptop, which is in line with the fact that ExecProject() accounted for about 13% of the CPU time. Another idea is that maybe we should add a fast-path to ExecProject(), for these trivial cases. - Heikki From 106c5742fde2ec83576323db74a7249d7d85101f Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Wed, 31 Aug 2016 17:27:52 +0300 Subject: [PATCH] Skip ExecProject for non-ordered aggregates. When support for ordered aggregates was added, we started using ExecProject to compute the arguments for the transient function. However, it turns out that the old way of just calling ExecEvalExpr() directly for each argument is somewhat faster. At least for typical aggregates that only take one or two arguments. So go back to using ExecEvalExpr() for non-ordered aggregates. diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c index ce2fc28..e32b140 100644 --- a/src/backend/executor/nodeAgg.c +++ b/src/backend/executor/nodeAgg.c @@ -229,9 +229,8 @@ typedef struct AggStatePerTransData /* Oid of state value's datatype */ Oid aggtranstype; - /* ExprStates of the FILTER and argument expressions. */ + /* ExprStates of the FILTER and direct-argument expressions. */ ExprState *aggfilter; /* state of FILTER expression, if any */ - List *args; /* states of aggregated-argument expressions */ List *aggdirectargs; /* states of direct-argument expressions */ /* @@ -288,17 +287,21 @@ typedef struct AggStatePerTransData transtypeByVal; /* - * Stuff for evaluation of inputs. We used to just use ExecEvalExpr, but - * with the addition of ORDER BY we now need at least a slot for passing - * data to the sort object, which requires a tupledesc, so we might as - * well go whole hog and use ExecProject too. + * Stuff for evaluation of inputs. + * + * For non-ordered aggregates, we call ExecEvalExpr for each argument, + * represented by the expression trees in transInputs. For ordered + * aggregates, we need at least a slot for passing data to the sort + * object, which requires a tupledesc, so we might as well go whole hog + * and use ExecProject to evaluate the arguments, too. */ + ExprState **transInputs; /* express
Re: [HACKERS] some requests on auditing
2016-08-31 16:00 GMT+02:00 David Steele : > On 8/31/16 9:39 AM, David Steele wrote: > >> On 8/30/16 10:12 AM, Pavel Stehule wrote: >> > > #3 is not likely without changes to logging in Postgres. However, there >> are plenty of tools for log analysis (e.g. ELK) that might help and a >> Postgres extension that allows log messages to be directed elsewhere >> (can't remember the name but Gabrielle or Simon would know). >> > > Here's the extension I was thinking of: > > https://github.com/2ndquadrant-it/redislog > > This one is more general purpose: > > https://github.com/mpihlak/pg_logforward > many thanks you for these informations - I'll check it. Regards Pavel > -- > -David > da...@pgmasters.net >
Re: [HACKERS] pg_sequence catalog
On 31/08/16 16:10, Tom Lane wrote: I wrote: Personally, my big beef with the current approach to sequences is that we eat a whole relation (including a whole relfilenode) per sequence. I wish that we could reduce a sequence to just a single row in a catalog, including the nontransactional state. Not sure how feasible that is either, but accomplishing it would move the benefits of making a change out of the "debatable whether it's worth it" category, IMO. BTW, another thing to keep in mind here is the ideas that have been kicked around in the past about alternative sequence implementations managed through a "sequence AM API". I dunno whether now is the time to start creating that API abstraction, but let's at least consider it if we're whacking the catalog representation around. FWIW if I was going to continue with the sequence AM API, the next patch would have included split of sequence metadata and sequence state into separate catalogs, so from that point this actually seems like an improvement (I didn't look at the code though). As a side note, I don't plan to resurrect the seqam patch at least until we have reasonable built-in logical replication functionality. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_sequence catalog
On 31 August 2016 at 22:01, Tom Lane wrote: > Personally, my big beef with the current approach to sequences is that > we eat a whole relation (including a whole relfilenode) per sequence. > I wish that we could reduce a sequence to just a single row in a > catalog, including the nontransactional state. I'd be happy to see incremental improvement in this space as Peter has suggested, though I certainly see the value of something like seqam too. It sounds like you're thinking of something like a normal(ish) heap tuple where we just overwrite some fields in-place without fiddling xmin/xmax and making a new row version. Right? Like we currently overwrite the lone Form_pg_sequence on the 1-page sequence relations. I initially thought that TRUNCATE ... RESTART IDENTITY would be somewhat of a problem with this. We effectively have a temporary "timeline" fork in the sequence value where it's provisionally restarted and we start using values from the restarted sequence within the xact that restarted it. But actually, it'd fit pretty well. TRUNCATE ... RESTART IDENTITY would write a new row version with a new xmin, and set xmax on the old sequence row. nextval(...) within the truncating xact would update the new row's non-transactional fields when it allocated new sequence chunks. On commit, everyone starts using the new row due to normal transactional visibility rules. On rollback everyone ignores it like they would any other dead tuple from an aborted act and uses the old tuple's nontransactional fields. It Just Works(TM). nextval(...) takes AccessShareLock on a sequence relation. TRUNCATE ... RESTART IDENTITY takes AccessExclusiveLock. So we can never have nextval(...) advancing the "old" timeline in other xacts at the same time as we consume values on the restarted sequence inside the xact that did the restarting. We still need the new "timeline" though, because we have to retain the old value for rollback. It feels intuitively pretty gross to effectively dirty-read and write a few fields of a tuple. But that's what we do all the time with xmin/xmax etc, it's not really that different. It'd certainly make sequence decoding easier too. A LOT easier. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] make async slave to wait for lsn to be replayed
Hi hackers, Few days earlier I've finished my work on WAITLSN statement utility, so I’d like to share it. Introduction Our clients who deal with 9.5 and use asynchronous master-slave replication, asked to make the wait-mechanism on the slave side to prevent the situation when slave handles query which needs data (LSN) that was received, flushed, but still not replayed. Problem description === The implementation: Must handle the wait-mechanism using pg_sleep() in order not to load system Must avoid race conditions if different backend want to wait for different LSN Must not take snapshot of DB, to avoid troubles with sudden minXID change Must have optional timeout parameter if LSN traffic has stalled. Must release on postmaster’s death or interrupts. Implementation == To avoid troubles with snapshots, WAITLSN was implemented as a utility statement, this allows us to circumvent the snapshot-taking mechanism. We tried different variants and the most effective way was to use Latches. To handle interprocess interaction all Latches are stored in shared memory and to cope with race conditions, each Latch is protected by a Spinlock. Timeout was made optional parameter, it is set in milliseconds. What works == Actually, it works well even with significant timeout or wait period values, but of course there might be things I've overlooked. How to use it == WAITLSN ‘LSN’ [, timeout in ms]; #Wait until LSN 0/303EC60 will be replayed, or 10 second passed. WAITLSN ‘0/303EC60’, 1; #Or same without timeout. WAITLSN ‘0/303EC60’; Notice: WAITLSN will release on PostmasterDeath or Interruption events if they come earlier then LSN or timeout. Testing the implementation == The implementation was tested with testgres and unittest python modules. How to test this implementation: Start master server Make table test, insert tuple 1 Make asynchronous slave replication (9.5 wal_level = standby, 9.6 or higher wal_level = replica) Slave: START TRANSACTION ISOLATION LEVEL REPEATABLE READ ; SELECT * FROM test; Master: delete tuple + make vacuum + get new LSN Slave: WAITLSN ‘newLSN’, 6; Waitlsn finished with FALSE “LSN doesn`t reached” Slave: COMMIT; WAITLSN ‘newLSN’, 6; Waitlsn finished with success (without NOTICE message) The WAITLSN as expected wait LSN, and interrupts on PostmasterDeath, interrupts or timeout. Your feedback is welcome! --- Ivan Kartyshov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml index 77667bd..72c5390 100644 --- a/doc/src/sgml/ref/allfiles.sgml +++ b/doc/src/sgml/ref/allfiles.sgml @@ -172,6 +172,7 @@ Complete list of usable sgml source files in this directory. + diff --git a/doc/src/sgml/ref/waitlsn.sgml b/doc/src/sgml/ref/waitlsn.sgml new file mode 100644 index 000..6a8bdca --- /dev/null +++ b/doc/src/sgml/ref/waitlsn.sgml @@ -0,0 +1,108 @@ + + + + + WAITLSN + + + + WAITLSN + 7 + SQL - Language Statements + + + + WAITLSN + wait when target LSN been replayed + + + + +WAITLSN 'LSN' [ , delay ] + + + + + Description + + + The WAITLSN wait till target LSN will + be replayed with an optional delay (milliseconds by default + infinity) to be wait for LSN to replayed. + + + + WAITLSN provides a simple + interprocess LSN wait mechanism for a backends on slave + in master-slave replication scheme on PostgreSQL database. + + + + + Parameters + + + +LSN + + + Target log sequence number to be wait for. + + + + +delay + + + Time in miliseconds to waiting for LSN to be replayed. + + + + + + + + Notes + + + Delay time to waiting for LSN to be replayed must be integer. For + default it is infinity. Waiting can be interupped using Ctl+C, or + by Postmaster death. + + + + + Examples + + + Configure and execute a waitlsn from + psql: + + +WAITLSN '0/3F07A6B1', 1; +NOTICE: LSN is not reached. Try to make bigger delay. +WAITLSN + +WAITLSN '0/3F07A611'; +WAITLSN + +WAITLSN '0/3F0FF791', 50; +^CCancel request sent +NOTICE: LSN is not reached. Try to make bigger delay. +ERROR: canceling statement due to user request + + + + + + Compatibility + + + There is no WAITLSN statement in the SQL + standard. + + + diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml index 8acdff1..3733ad9 100644 --- a/doc/src/sgml/reference.sgml +++ b/doc/src/sgml/reference.sgml @@ -200,6 +200,7 @@ &update; &vacuum; &values; + &waitlsn; diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index f13f9c1..609c83e 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -39,6 +39,7 @@ #inclu
Re: [HACKERS] INSERT .. SET syntax
Hello hello, Here's a rebased and updated patch for $SUBJECT for the September commit fest. .m *** a/doc/src/sgml/ref/insert.sgml --- b/doc/src/sgml/ref/insert.sgml *** *** 22,33 PostgreSQL documentation [ WITH [ RECURSIVE ] with_query [, ...] ] ! INSERT INTO table_name [ AS alias ] [ ( column_name [, ...] ) ] ! { DEFAULT VALUES | VALUES ( { expression | DEFAULT } [, ...] ) [, ...] | query } [ ON CONFLICT [ conflict_target ] conflict_action ] [ RETURNING * | output_expression [ [ AS ] output_name ] [, ...] ] ! where conflict_target can be one of: ( { index_column_name | ( index_expression ) } [ COLLATE collation ] [ opclass ] [, ...] ) [ WHERE index_predicate ] ON CONSTRAINT constraint_name --- 22,42 [ WITH [ RECURSIVE ] with_query [, ...] ] ! INSERT INTO table_name [ AS alias ] ! { ! [ column_list ] VALUES ( { expression | DEFAULT } [, ...] ) [, ...] | ! [ column_list ] query | ! DEFAULT VALUES | ! SET column_name = { expression | DEFAULT } [, ...] ! } [ ON CONFLICT [ conflict_target ] conflict_action ] [ RETURNING * | output_expression [ [ AS ] output_name ] [, ...] ] ! where column_list is: ! ! ( column_name [, ...] ) ! ! and conflict_target can be one of: ( { index_column_name | ( index_expression ) } [ COLLATE collation ] [ opclass ] [, ...] ) [ WHERE index_predicate ] ON CONSTRAINT constraint_name *** *** 53,65 INSERT INTO table_name [ AS !The target column names can be listed in any order. If no list of !column names is given at all, the default is all the columns of the !table in their declared order; or the first N column !names, if there are only N columns supplied by the !VALUES clause or query. The values !supplied by the VALUES clause or query are !associated with the explicit or implicit column list left-to-right. --- 62,87 !The target column names in a column_list can be !listed in any order. If no column_list is given at !all (and the SET syntax is not used), the default is all !the columns of the table in their declared order; or the first !N column names, if there are only N !columns supplied by the VALUES clause or !query. The values supplied by the VALUES !clause or query are associated with the explicit or !implicit column list left-to-right. ! ! ! ! Instead of a column_list and a VALUES ! clause, a SET clause similar to that of an ! UPDATE can be used instead. The advantage of the ! SET clause is that instead of matching the elements in ! the two lists by ordinal position, the column name and the ! expression to assign to that column are visually next to each other. ! This can make long column assignment lists significantly more ! readable. *** *** 690,702 INSERT INTO distributors (did, dname) VALUES (10, 'Conrad International') INSERT conforms to the SQL standard, except that the RETURNING clause is a !PostgreSQL extension, as is the ability !to use WITH with INSERT, and the ability to !specify an alternative action with ON CONFLICT. !Also, the case in !which a column name list is omitted, but not all the columns are !filled from the VALUES clause or query, !is disallowed by the standard. --- 712,724 INSERT conforms to the SQL standard, except that the RETURNING clause is a !PostgreSQL extension, as is the !SET clause when used instead of a VALUES clause, the !ability to use WITH with INSERT, and the !ability to specify an alternative action with ON !CONFLICT. Also, the case in which a column name list is omitted, !but not all the columns are filled from the VALUES clause !or query, is disallowed by the standard. *** a/src/backend/parser/analyze.c --- b/src/backend/parser/analyze.c *** *** 467,474 transformInsertStmt(ParseState *pstate, InsertStmt *stmt) stmt->onConflictClause->action == ONCONFLICT_UPDATE); /* ! * We have three cases to deal with: DEFAULT VALUES (selectStmt == NULL), ! * VALUES list, or general SELECT input. We special-case VALUES, both for * efficiency and so we can handle DEFAULT specifications. * * The grammar allows attaching ORDER BY, LIMIT, FOR UPDATE, or WITH to a --- 467,475 stmt->onConflictClause->action == ONCONFLICT_UPDATE); /* ! * We have four cases to deal with: DEFAULT VALUES (selectStmt == NULL and ! * cols == NIL), SET syntax (selectStmt == NULL but cols != NIL), VALUES ! * list, or general SELECT input. We special-case VALUES, both for * efficiency and so we can handle DEFAULT specifications. * * The grammar allows attaching ORDER BY, LIMIT, FOR UPDATE, or WITH to a *** *** 523,529 transfo
Re: [HACKERS] autonomous transactions
On 31 August 2016 at 21:46, Greg Stark wrote: > On Wed, Aug 31, 2016 at 2:50 AM, Peter Eisentraut > wrote: >> - A API interface to open a "connection" to a background worker, run >> queries, get results: AutonomousSessionStart(), AutonomousSessionEnd(), >> AutonomousSessionExecute(), etc. The communication happens using the >> client/server protocol. Peter, you mention "Oracle-style autonomous transaction blocks". What are the semantics to be expected of those with regards to: - Accessing objects exclusively locked by the outer xact or where the requested lockmode conflicts with a lock held by the outer xact - Visibility of data written by the outer xact ? Also, is it intended (outside the plpgsql interface) that the autonomous xact can proceed concurrently/interleaved with a local backend xact? i.e. the local backend xact isn't suspended and you're allowed to do things on the local backend as well? If so, what handling do you have in mind for deadlocks between the local backend xact and the bgworker with the autonomous xact? I'd expect the local backend to always win, killing the autonomous xact every time. > I'm surprised by the background worker. I had envisioned autonomous > transactions being implemented by allowing a process to reserve a > second entry in PGPROC with the same pid. Or perhaps save its existing > information in a separate pgproc slot (or stack of slots) and > restoring it after the autonomous transaction commits. I suspect that there'll be way too much code that relies on stashing xact-scoped stuff in globals for that to be viable. Caches alone. Peter will be able to explain more, I'm sure. We'd probably need a new transaction data object that everything xact-scoped hangs off, so we can pass it everywhere or swap it out of some global. The mechanical refactoring alone would be pretty scary, not to mention the complexity of actually identifying all the less obvious places that need changing. Consider invalidation callbacks. They're always "fun", and so simple to get right -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_sequence catalog
I wrote: > Personally, my big beef with the current approach to sequences is that > we eat a whole relation (including a whole relfilenode) per sequence. > I wish that we could reduce a sequence to just a single row in a > catalog, including the nontransactional state. Not sure how feasible > that is either, but accomplishing it would move the benefits of making > a change out of the "debatable whether it's worth it" category, IMO. BTW, another thing to keep in mind here is the ideas that have been kicked around in the past about alternative sequence implementations managed through a "sequence AM API". I dunno whether now is the time to start creating that API abstraction, but let's at least consider it if we're whacking the catalog representation around. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Leftover member in openssl part of Port struct
> On 31 Aug 2016, at 15:12, Tom Lane wrote: > > Daniel Gustafsson writes: >> When SSL renegotiation was removed in 426746b9 the only consumer of the >> openssl >> specific count member in the Port struct was removed, but the member was left >> together with a few updates to it which are unused. Attached patch removes >> the >> leftovers which now serves no purpose unless I’m missing something. > > Hm, well, we can't remove those case labels as control would then end > up in the default case which throws an error. Doh, I’ll go stand in the corner. > But otherwise this seems > sound. Without renegotiation the count is not very useful anyway > since it's likely to overflow (at least if long is 32 bits). Yeah. Thanks for applying. cheers ./daniel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_sequence catalog
Craig Ringer writes: > On 31 August 2016 at 21:17, Peter Eisentraut > wrote: >> I don't know if this is a net improvement. Maybe this introduces as >> many new issues as it removes. But I figured I'll post this, so that at >> least we can discuss it. > This will change behaviour subtly. Uh, not as subtly as all that, because "select * from sequence" will now return a different set of columns, which will flat out break a lot of clients that use that method to get sequence properties. In previous discussions of this idea, we had speculated about turning sequences into, effectively, views, so that we could still return all the same columns --- only now some of them would be coming from a catalog rather than the non-transactional per-sequence storage. I'm not sure how feasible that is, but it's worth thinking about. Personally, my big beef with the current approach to sequences is that we eat a whole relation (including a whole relfilenode) per sequence. I wish that we could reduce a sequence to just a single row in a catalog, including the nontransactional state. Not sure how feasible that is either, but accomplishing it would move the benefits of making a change out of the "debatable whether it's worth it" category, IMO. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] some requests on auditing
On 8/31/16 9:39 AM, David Steele wrote: On 8/30/16 10:12 AM, Pavel Stehule wrote: #3 is not likely without changes to logging in Postgres. However, there are plenty of tools for log analysis (e.g. ELK) that might help and a Postgres extension that allows log messages to be directed elsewhere (can't remember the name but Gabrielle or Simon would know). Here's the extension I was thinking of: https://github.com/2ndquadrant-it/redislog This one is more general purpose: https://github.com/mpihlak/pg_logforward -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autonomous transactions
On Wed, Aug 31, 2016 at 2:50 AM, Peter Eisentraut wrote: > - A API interface to open a "connection" to a background worker, run > queries, get results: AutonomousSessionStart(), AutonomousSessionEnd(), > AutonomousSessionExecute(), etc. The communication happens using the > client/server protocol. > I'm surprised by the background worker. I had envisioned autonomous transactions being implemented by allowing a process to reserve a second entry in PGPROC with the same pid. Or perhaps save its existing information in a separate pgproc slot (or stack of slots) and restoring it after the autonomous transaction commits. Using a background worker mean that the autonomous transaction can't access any state from the process memory. Parameters in plpgsql are a symptom of this but I suspect there will be others. What happens if a statement timeout occurs during an autonomous transaction? What happens if you use a pl language in the autonomous transaction and if it tries to use non-transactional information such as prepared statements? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Use static inline functions for Float <-> Datum conversions
Heikki Linnakangas writes: > On 08/31/2016 02:38 PM, Tom Lane wrote: >> I wonder whether there is a compiler-dependent way of avoiding the union >> trick ... or maybe gcc is already smart enough that it doesn't matter? > It seems to compile into a single instruction, so it can't get any > better from a performance point of view. Yeah, confirmed here. On my not-real-new gcc (version 4.4.7, which ships with RHEL6), these test functions: Datum compare_int8(PG_FUNCTION_ARGS) { int64 x = PG_GETARG_INT64(0); int64 y = PG_GETARG_INT64(1); PG_RETURN_BOOL(x < y); } Datum compare_float8(PG_FUNCTION_ARGS) { double x = PG_GETARG_FLOAT8(0); double y = PG_GETARG_FLOAT8(1); PG_RETURN_BOOL(x < y); } compile into this (at -O2): compare_int8: .cfi_startproc movq40(%rdi), %rax cmpq%rax, 32(%rdi) setl%al movzbl %al, %eax ret .cfi_endproc compare_float8: .cfi_startproc movsd 40(%rdi), %xmm0 xorl%eax, %eax ucomisd 32(%rdi), %xmm0 seta%al ret .cfi_endproc (Not sure why the compiler does the widening of the comparison result differently, but it doesn't look like it matters.) Before this patch, that looked like: compare_float8: .cfi_startproc pushq %rbx .cfi_def_cfa_offset 16 .cfi_offset 3, -16 movq%rdi, %rbx subq$16, %rsp .cfi_def_cfa_offset 32 movq32(%rdi), %rdi callDatumGetFloat8 movq40(%rbx), %rdi movsd %xmm0, 8(%rsp) callDatumGetFloat8 xorl%eax, %eax ucomisd 8(%rsp), %xmm0 seta%al addq$16, %rsp .cfi_def_cfa_offset 16 popq%rbx .cfi_def_cfa_offset 8 ret .cfi_endproc Nice. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_sequence catalog
On 31 August 2016 at 21:17, Peter Eisentraut wrote: > While I was hacking around sequence stuff, I felt the urge to look into > an old peeve: That sequence metadata is not stored in a proper catalog. > Right now in order to find out some metadata about sequences (min, max, > increment, etc.), you need to look into the sequence. That is like > having to query a table in order to find out about its schema. > > There are also known issues with the current storage such as that we > can't safely update the sequence name stored in the sequence when we > rename, so we just don't. ... and don't have a comment warning the poor confused reader about that, either. As I discovered when doing my first pass at sequence decoding. > I don't know if this is a net improvement. Maybe this introduces as > many new issues as it removes. But I figured I'll post this, so that at > least we can discuss it. This will change behaviour subtly. Probably not in ways we care much about, but I'd rather that be an informed decision than an implicit "oops we didn't think about that" one. Things stored in the Form_pg_sequence are affected immediately, non-transactionally, by ALTER SEQUENCE. If you: BEGIN; ALTER SEQUENCE myseq INTERVAL 10; ROLLBACK; then the *next call to nextval* after the ALTER will step by 10. Well, roughly, there's some slush there due to caching etc. Rolling back has no effect. Yet other effects of ALTER SEQUENCE, notably RENAME, are transactional and are subject to normal locking, visibility and rollback rules. Even more fun, ALTER SEQUENCE ... RESTART is immediate and non-transactional but TRUNCATE ... RESTART IDENTITY *is* transactional and takes effect only at commit. ALTER SEQUENCE writes a new Form_pg_sequence with the new value to the existing relfilenode. TRUNCATE instead updates pg_class for the sequence with a new relfilenode and writes its changes to the new relfilenode. So even two operations that seem the same are very different. If understand the proposed change correctly, this change will make previously non-transactional ALTER SEQUENCE operations transactional and subject to normal rules, since the relevant information is now in a proper catalog. Personally I think that's a big improvement. The current situation is warts upon warts. Prior proposals to move sequences away from a one-relation-and-one-filenode-per-sequence model have fallen down in early planning stages. This seems like a simpler, more sensible step to take, and it won't block later cleanups of how we store sequences if we decide to go there. It'll also make it slightly easier to handle logical decoding of sequence advances, since it'll be possible to send *just* the new sequence value. Right now we can't tell if interval, etc, also got changed, and have to send most of the Form_pg_sequence on the wire for every sequence advance, which is a little sucky. This change won't solve the problem I outlined in the other thread though, that sequences are transactional sometimes and not other times. So +1 for the idea from me. It'll just need relnotes warning of the subtle behaviour change. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] some requests on auditing
On 8/30/16 10:12 AM, Pavel Stehule wrote: I am working on pgaudit customization for one my customer. There are few requests: 1. flat format without complex types, without nesting - CSV is ideal. 2. all important attributes should be separated - is not possible to search in original queries: table name, database name, role name, rights. 3. if it is possible - own log file 4. one statement can have more rows (flat format is required), but it should be logged only once success/failed 5. any activity should be logged You may want to take a look at pgaudit_analyze which I think addresses #1, #2, and #4: https://github.com/pgaudit/pgaudit/tree/master/analyze #3 is not likely without changes to logging in Postgres. However, there are plenty of tools for log analysis (e.g. ELK) that might help and a Postgres extension that allows log messages to be directed elsewhere (can't remember the name but Gabrielle or Simon would know). As for #5, which activities aren't being logged? -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On 27 August 2016 at 12:09, Kuntal Ghosh wrote: >>> * wal_consistency_mask = 511 /* Enable consistency check mask bit*/ >> >> What does this mean? (No docs) > > I was using this parameter as a masking integer to indicate the > operations(rmgr list) for which we need this feature to be enabled. > Since, this could be confusing, I've changed it accordingly so that it > accepts a list of rmgrIDs. (suggested by Michael, Amit and Robert) Why would we want that? >>> 1. Add support for other Resource Managers. >> >> We probably need to have a discussion as to why you think this should >> be Rmgr dependent? >> Code comments would help there. >> >> If it does, then you should probably do this by extending RmgrTable >> with an rm_check, so you can call it like this... >> >> RmgrTable[record->xl_rmid].rm_check > > +1. > I'm modifying it accordingly. I'm calling this function after > RmgrTable[record->xl_rmid].rm_redo. > >>> 5. Generalize the page type identification technique. >> >> Why not do this first? >> > > At present, I'm using special page size and page ID to identify page > type. But, I've noticed some cases where the entire page is > initialized to zero (Ex: hash_xlog_squeeze_page). RmgrID and info bit > can help us to identify those pages. I'd prefer a solution that was not dependent upon RmgrID at all. If there are various special cases that we need to cater for, ISTM they would be flaws in the existing WAL implementation rather than anything we would want to perpetuate. I hope we'll spend time fixing them rather than add loads of weird code to work around the imperfections. Underdocumented special case code is going to be unbelievably difficult to get right in the long term. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autonomous transactions
2016-08-31 15:09 GMT+02:00 Joel Jacobson : > On Wed, Aug 31, 2016 at 6:41 AM, Jaime Casanova > wrote: > > > > On 30 August 2016 at 23:10, Joel Jacobson wrote: > > > > > > There should be a way to within the session and/or txn permanently > > > block autonomous transactions. > > > > > > > This will defeat one of the use cases of autonomous transactions: > auditing > > My idea on how to deal with this would be to mark the function to be > "AUTONOMOUS" similar to how a function is marked to be "PARALLEL > SAFE", > and to throw an error if a caller that has blocked autonomous > transactions tries to call a function that is marked to be autonomous. > > That way none of the code that needs to be audited would ever get executed. > I like this idea - it allows better (cleaner) snapshot isolation. Regards Pavel > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Re: [HACKERS] [COMMITTERS] pgsql: Fix pg_receivexlog --synchronous
On 29 August 2016 at 12:34, Tom Lane wrote: > Simon Riggs writes: >> Fix pg_receivexlog --synchronous > > The buildfarm says you broke the 9.5 branch. > > In general, pushing inessential patches just a few hours before a wrap > deadline is a dangerous business. Pushing them without any testing > is close to irresponsible. Sorry about that everybody. Thanks to Alvaro for doing that in my absence. I pushed to 9.5 because of a misunderstanding that the author was saying to me they had also tested it for 9.5. It was not knowingly untested, but responsibility and mistake was mine in not confirming that with my own eyes before pushing. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_sequence catalog
While I was hacking around sequence stuff, I felt the urge to look into an old peeve: That sequence metadata is not stored in a proper catalog. Right now in order to find out some metadata about sequences (min, max, increment, etc.), you need to look into the sequence. That is like having to query a table in order to find out about its schema. There are also known issues with the current storage such as that we can't safely update the sequence name stored in the sequence when we rename, so we just don't. This patch introduces a new catalog pg_sequence that stores the sequence metadata. The sequences themselves now only store the counter and supporting information. I don't know if this is a net improvement. Maybe this introduces as many new issues as it removes. But I figured I'll post this, so that at least we can discuss it. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile index 1ce7610..cbf0d79 100644 --- a/src/backend/catalog/Makefile +++ b/src/backend/catalog/Makefile @@ -41,7 +41,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\ pg_foreign_data_wrapper.h pg_foreign_server.h pg_user_mapping.h \ pg_foreign_table.h pg_policy.h pg_replication_origin.h \ pg_default_acl.h pg_init_privs.h pg_seclabel.h pg_shseclabel.h \ - pg_collation.h pg_range.h pg_transform.h \ + pg_collation.h pg_range.h pg_transform.h pg_sequence.h \ toasting.h indexing.h \ ) diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c index 04d7840..8e1e1ac 100644 --- a/src/backend/catalog/dependency.c +++ b/src/backend/catalog/dependency.c @@ -66,6 +66,7 @@ #include "commands/proclang.h" #include "commands/schemacmds.h" #include "commands/seclabel.h" +#include "commands/sequence.h" #include "commands/trigger.h" #include "commands/typecmds.h" #include "nodes/nodeFuncs.h" @@ -1155,6 +1156,8 @@ doDeletion(const ObjectAddress *object, int flags) else heap_drop_with_catalog(object->objectId); } +if (relKind == RELKIND_SEQUENCE) + DeleteSequenceTuple(object->objectId); break; } diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql index 00550eb..182d2d0 100644 --- a/src/backend/catalog/information_schema.sql +++ b/src/backend/catalog/information_schema.sql @@ -1535,15 +1535,16 @@ CREATE VIEW sequences AS CAST(64 AS cardinal_number) AS numeric_precision, CAST(2 AS cardinal_number) AS numeric_precision_radix, CAST(0 AS cardinal_number) AS numeric_scale, - CAST(p.start_value AS character_data) AS start_value, - CAST(p.minimum_value AS character_data) AS minimum_value, - CAST(p.maximum_value AS character_data) AS maximum_value, - CAST(p.increment AS character_data) AS increment, - CAST(CASE WHEN p.cycle_option THEN 'YES' ELSE 'NO' END AS yes_or_no) AS cycle_option -FROM pg_namespace nc, pg_class c, LATERAL pg_sequence_parameters(c.oid) p + CAST(s.seqstart AS character_data) AS start_value, + CAST(s.seqmin AS character_data) AS minimum_value, + CAST(s.seqmax AS character_data) AS maximum_value, + CAST(s.seqincrement AS character_data) AS increment, + CAST(CASE WHEN s.seqcycle THEN 'YES' ELSE 'NO' END AS yes_or_no) AS cycle_option +FROM pg_namespace nc, pg_class c, pg_sequence s WHERE c.relnamespace = nc.oid AND c.relkind = 'S' AND (NOT pg_is_other_temp_schema(nc.oid)) + AND c.oid = s.seqrelid AND (pg_has_role(c.relowner, 'USAGE') OR has_sequence_privilege(c.oid, 'SELECT, UPDATE, USAGE') ); diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c index c98f981..95bd172 100644 --- a/src/backend/commands/sequence.c +++ b/src/backend/commands/sequence.c @@ -22,8 +22,10 @@ #include "access/xloginsert.h" #include "access/xlogutils.h" #include "catalog/dependency.h" +#include "catalog/indexing.h" #include "catalog/namespace.h" #include "catalog/objectaccess.h" +#include "catalog/pg_sequence.h" #include "catalog/pg_type.h" #include "commands/defrem.h" #include "commands/sequence.h" @@ -74,7 +76,7 @@ typedef struct SeqTableData int64 cached; /* last value already cached for nextval */ /* if last != cached, we have not used up all the cached values */ int64 increment; /* copy of sequence's increment field */ - /* note that increment is zero until we first do read_seq_tuple() */ + /* note that increment is zero until we first do nextval_internal() */ } SeqTableData; typedef SeqTableData *SeqTable; @@ -92,10 +94,11 @@ static int64 nextval_internal(Oid relid); static Relation open_share_lock(SeqTable seq); static void create_seq_hashtable(void); static void init_sequence(Oid relid, SeqTable *p_elm,
Re: [HACKERS] Exclude schema during pg_restore
Hi, Am Mittwoch, den 31.08.2016, 07:59 -0300 schrieb Fabrízio de Royes Mello: > Please add it to the next open commitfest. I had done so already: https://commitfest.postgresql.org/10/762/ Regards, Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.ba...@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers