[HACKERS] Customized Options Threshold Error
Hi, When we specify a value which exceeds valid range in Customized Options , its behavior is different from Parameter Interaction via Configuration File behavior. In case of Parameter Interaction via Configuration File, it finish with FATAL error when it get threshold error. But in Customized Options, it just printout WARNING and continue the process with default value. In manual, it says issue warnings for any unrecognized placeholders that begin with its extension name written in 18.16. Customized Options. So when extension name is correct and the value exceeds the threshold range, I think it should end with FATAL error just like Parameter Interaction via Configuration File. Thought? Following is the threshold error log. One is Customized Options Threshold Error and another is Parameter Interaction via Configuration File Threshold Error. Customized Options Threshold Error WARNING: invalid value for parameter pg_receivexlog_mng.monitor_interval: 2147483648 HINT: Value exceeds integer range. LOG: registering background worker pg_receivexlog_mng Parameter Interaction via Configuration File Threshold Error FATAL: configuration file /dbfp/data/postgresql.conf contains errors Regards, -- Furuya Osamu -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alter user/role CURRENT_USER
Hi, this is revised version. Kyotaro HORIGUCHI wrote: - Storage for new information The new struct NameId stores an identifier which telling what it logically is using the new enum NameIdTypes. I think NameId is a bad name for this. My point is that NameId, as it stands, might be a name for anything, not just a role; and the object it identifies is not an Id either. Maybe RoleSpec? Yeah! I felt it no good even if it were a generic type for various Name of something or its oid. RoleSpec sounds much better. Do we need a public_ok argument to get_nameid_oid() (get a better name for this function too) Maybe get_rolespec_oid() as a name ofter its parameter type? so that callers don't have to check for InvalidOid argument? I think the arrangement you propose is not very convenient; it'd be best to avoid duplicating the check for InvalidOid in all callers of the new function, particularly where there was no check before. I agree that It'd be better keeping away from duplicated InvalidOid checks, but public_ok seems a bit myopic. Since there's no reasonable border between functions accepting 'public' and others, such kind of solution would not be reasonable.. What about checking it being a PUBLIC or not *before* calling get_rolespec_oid()? The attached patch modified in the following points. - rename NameId to RoleSpec and NameIdType to RoleSpecTypes. - rename get_nameid_oid() to get_rolespec_oid(). - rename roleNamesToIds() to roleSpecsToIds(). - some struct members are changed such as authname to authrole. - check if rolespec is public or not before calling get_rolespec_oid() - ExecAlterDefaultPrivilegesStmt and ExecuteGrantStmt does slightly different things about ACL_ID_PUBLIC but I unified it to the latter. - rebased to the current master regards, -- Kyotaro Horiguchi NTT Open Source Software Center CreateStmt-authrole = NULL = ? From 307249654c97b6449261febbfd84190fbad9111d Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi horiguchi.kyot...@lab.ntt.co.jp Date: Fri, 14 Nov 2014 17:37:22 +0900 Subject: [PATCH] ALTER USER CURRENT_USER v3 --- src/backend/catalog/aclchk.c | 30 +++--- src/backend/commands/alter.c | 2 +- src/backend/commands/extension.c | 2 +- src/backend/commands/foreigncmds.c | 57 +-- src/backend/commands/schemacmds.c | 30 +- src/backend/commands/tablecmds.c | 4 +- src/backend/commands/tablespace.c | 2 +- src/backend/commands/user.c| 86 + src/backend/nodes/copyfuncs.c | 37 +--- src/backend/nodes/equalfuncs.c | 35 --- src/backend/parser/gram.y | 190 +++-- src/backend/parser/parse_utilcmd.c | 4 +- src/backend/utils/adt/acl.c| 34 +++ src/include/commands/user.h| 2 +- src/include/nodes/nodes.h | 1 + src/include/nodes/parsenodes.h | 48 +++--- src/include/utils/acl.h| 2 +- 17 files changed, 385 insertions(+), 181 deletions(-) diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c index d30612c..24811c6 100644 --- a/src/backend/catalog/aclchk.c +++ b/src/backend/catalog/aclchk.c @@ -430,13 +430,16 @@ ExecuteGrantStmt(GrantStmt *stmt) foreach(cell, stmt-grantees) { PrivGrantee *grantee = (PrivGrantee *) lfirst(cell); + Oid grantee_uid = ACL_ID_PUBLIC; - if (grantee-rolname == NULL) - istmt.grantees = lappend_oid(istmt.grantees, ACL_ID_PUBLIC); - else - istmt.grantees = -lappend_oid(istmt.grantees, - get_role_oid(grantee-rolname, false)); + /* public is mapped to ACL_ID_PUBLIC */ + if (grantee-role-roltype != ROLESPEC_PUBLIC) + { + grantee_uid = get_rolespec_oid(grantee-role, false); + if (!OidIsValid(grantee_uid)) +grantee_uid = ACL_ID_PUBLIC; + } + istmt.grantees = lappend_oid(istmt.grantees, grantee_uid); } /* @@ -913,13 +916,16 @@ ExecAlterDefaultPrivilegesStmt(AlterDefaultPrivilegesStmt *stmt) foreach(cell, action-grantees) { PrivGrantee *grantee = (PrivGrantee *) lfirst(cell); + Oid grantee_uid = ACL_ID_PUBLIC; - if (grantee-rolname == NULL) - iacls.grantees = lappend_oid(iacls.grantees, ACL_ID_PUBLIC); - else - iacls.grantees = -lappend_oid(iacls.grantees, - get_role_oid(grantee-rolname, false)); + /* public is mapped to ACL_ID_PUBLIC */ + if (grantee-role-roltype != ROLESPEC_PUBLIC) + { + grantee_uid = get_rolespec_oid(grantee-role, false); + if (!OidIsValid(grantee_uid)) +grantee_uid = ACL_ID_PUBLIC; + } + iacls.grantees = lappend_oid(iacls.grantees, grantee_uid); } /* diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c index c9a9baf..c53d4e5 100644 --- a/src/backend/commands/alter.c +++ b/src/backend/commands/alter.c @@ -678,7 +678,7 @@ AlterObjectNamespace_internal(Relation rel, Oid objid, Oid nspOid) Oid ExecAlterOwnerStmt(AlterOwnerStmt *stmt) { - Oid newowner = get_role_oid(stmt-newowner, false); + Oid
Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan
On 14 November 2014 20:37, Jim Nasby jim.na...@bluetreble.com wrote: On 11/12/14, 1:54 AM, David Rowley wrote: We'd also need to add some infrastructure to merge aggregate states together for this to work properly. This means that could also work for avg() and stddev etc. For max() and min() the merge functions would likely just be the same as the transition functions. Sanity check: what % of a large aggregate query fed by a seqscan actually spent in the aggregate functions? Even if you look strictly at CPU cost, isn't there more code involved to get data to the aggregate function than in the aggregation itself, except maybe for numeric? You might be right, but that sounds like it would need all the parallel workers to send each matching tuple to a queue to be processed by some aggregate node. I guess this would have more advantage for wider tables or tables with many dead tuples, or if the query has quite a selective where clause, as less data would make it onto that queue. Perhaps I've taken 1 step too far forward here. I had been thinking that each worker would perform the partial seqscan and in the worker context pass the tuple down to the aggregate node. Then later once each worker had complete some other perhaps new node type (MergeAggregateStates) would merge all those intermediate agg states into the final agg state (which would then be ready for the final function to be called). Are there any plans for what will be in charge of deciding how many workers would be allocated to a parallel query? Will this be something that's done at planning time? Or should the planner just create a parallel friendly plan, iif the plan is costly enough and then just allow the executor decide how many workers to throw at the job based on how busy the system is with other tasks at execution time? Regards David Rowley
Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan
On 14 November 2014 07:37, Jim Nasby jim.na...@bluetreble.com wrote: On 11/12/14, 1:54 AM, David Rowley wrote: On Tue, Nov 11, 2014 at 9:29 PM, Simon Riggs si...@2ndquadrant.com mailto:si...@2ndquadrant.com wrote: This plan type is widely used in reporting queries, so will hit the mainline of BI applications and many Mat View creations. This will allow SELECT count(*) FROM foo to go faster also. We'd also need to add some infrastructure to merge aggregate states together for this to work properly. This means that could also work for avg() and stddev etc. For max() and min() the merge functions would likely just be the same as the transition functions. Sanity check: what % of a large aggregate query fed by a seqscan actually spent in the aggregate functions? Even if you look strictly at CPU cost, isn't there more code involved to get data to the aggregate function than in the aggregation itself, except maybe for numeric? Yes, which is why I suggested pre-aggregating before collecting the streams together. The point is not that the aggregation is expensive, its that the aggregation eats data and the required bandwidth for later steps is reduced and hence does not then become a bottleneck that renders the parallel Seq Scan ineffective. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan
On 14 November 2014 01:51, David Rowley dgrowle...@gmail.com wrote: When I mentioned this, I didn't mean to appear to be placing a road block.I was just bringing to the table the information that COUNT(*) + COUNT(*) works ok for merging COUNT(*)'s sub totals, but AVG(n) + AVG(n) does not. Merge functions should be a simple patch to write. If I thought there was going to be a use for them in this release, I'd put my hand up to put a patch together. The hard part is describing and then agreeing the semantics. If you raise a separate post on this, copy me in and I will help. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unintended restart after recovery error
Fujii Masao masao.fu...@gmail.com wrote: On Thu, Nov 13, 2014 at 8:30 AM, Robert Haas robertmh...@gmail.com wrote: It's true that if the startup process dies we don't try to restart, but it's also true that if the checkpointer dies we do try to restart. I'm not sure why this specific situation should be an exception to that general rule. My distinction was during recovery vs outside recovery, rather than startup process vs checkpointer. But I'm not sure it's easy enough to recognize that checkpointer (maybe also bgwriter) no longer participates in recovery. 442231d7f71764b8c628044e7ce2225f9aa43b6 introduced the latter rule for hot-standby case. I didn't fully understand the purpose of this condition until I saw the commit message. Thanks for pointing out. Maybe *during crash recovery* (i.e., hot standby should not be enabled) it's better to treat the crash of startup process as a catastrophic crash. Yes, that's what I thought too. But I think the current StartupXLOG() does not always (need to) determine the exact boundary between crash and archive recovery. I'd need to think more if the end of crash recovery can be safely identified during the replay and signaled to postmaster. -- Antonin Houska Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] PostgreSQL doesn't stop propley when --slot option is specified with pg_receivexlog.
Hi, pg_ctl stop does't work propley, if --slot option is specified when WAL is flushed only it has switched. These processes still continue even after the posmaster failed:pg_receivexlog, walsender and logger. How to reproduce: 1.Start PostgreSQL 2.Create slot 3.Specify --slot option to pg_receivexlog and start it 4.Stop PostgreSQL Example commands for reproduce: 1.pg_ctl start 2.psql -c select pg_create_physical_replication_slot('test'); 3.pg_receivexlog --slot='test' -D ./testdir 4.pg_ctl stop This happen cause --slot option set a flush location to feedback messages, but it only flush when wal has switched, so walsender wait for the WAL have been replicated forever. When --slot option is not specified, 'invalid' is set to flush location and it use write location to check replication so it can stop the process propley. So I thought set 'invalid' to flush location as well in this case, this problem will be solved. Does --slot option need to set flush location? Thought? Regards, -- Furuya Osamu -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Size of regression database
Tom Lane wrote: I was testing backwards compatibility of pg_dumpall just now, and was somewhat astonished to notice the size of the output for the regression database compared to what it was not too long ago: -rw-rw-r--. 1 tgl tgl 4509135 Nov 13 16:19 dumpall.83 -rw-rw-r--. 1 tgl tgl 4514441 Nov 13 16:24 dumpall.84 -rw-rw-r--. 1 tgl tgl 4666917 Nov 13 16:15 dumpall.90 -rw-rw-r--. 1 tgl tgl 4681235 Nov 13 16:15 dumpall.91 -rw-rw-r--. 1 tgl tgl 5333587 Nov 13 16:15 dumpall.92 -rw-rw-r--. 1 tgl tgl 5409083 Nov 13 16:15 dumpall.93 -rw-rw-r--. 1 tgl tgl 5493686 Nov 13 16:15 dumpall.94 -rw-rw-r--. 1 tgl tgl 27151777 Nov 13 16:21 dumpall.head A quick eyeball check says that that quintupling of the database size is all in BRIN index tests. Could we dial that back to something a bit saner please? Oops. Sure, will see about this. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan
On 14 November 2014 07:37, Jim Nasby jim.na...@bluetreble.com wrote: On 11/12/14, 1:54 AM, David Rowley wrote: On Tue, Nov 11, 2014 at 9:29 PM, Simon Riggs si...@2ndquadrant.com mailto:si...@2ndquadrant.com wrote: This plan type is widely used in reporting queries, so will hit the mainline of BI applications and many Mat View creations. This will allow SELECT count(*) FROM foo to go faster also. We'd also need to add some infrastructure to merge aggregate states together for this to work properly. This means that could also work for avg() and stddev etc. For max() and min() the merge functions would likely just be the same as the transition functions. Sanity check: what % of a large aggregate query fed by a seqscan actually spent in the aggregate functions? Even if you look strictly at CPU cost, isn't there more code involved to get data to the aggregate function than in the aggregation itself, except maybe for numeric? Yes, which is why I suggested pre-aggregating before collecting the streams together. The point is not that the aggregation is expensive, its that the aggregation eats data and the required bandwidth for later steps is reduced and hence does not then become a bottleneck that renders the parallel Seq Scan ineffective. I'd like to throw community folks a question. Did someone have a discussion to the challenge of aggregate push-down across relations join in the past? It potentially reduces number of rows to be joined. If we already had, I'd like to check up the discussion at that time. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort
On 14/11/14 00:46, Simon Riggs wrote: Limit (cost= rows=20 width=175) (actual time= rows=20 loops=1) - Sort (cost= rows=568733 width=175) (actual time= rows=20 loops=1) Sort Method: top-N heapsort Going off on a tangent, when I was playing with a merge-sort implementation I propagated limit information into the sort node, for a significant win. Eliding the Limit node gave a further slight win. I wasn't convinced the use-case was common enough to justify the replacement of quicksort (despite having consistently fewer compares, the merge sort was slightly slower. I never understood why) - but I never asked. Is there any appetite for supporting alternate sort algorithms? -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan
On Fri, Nov 14, 2014 at 1:27 PM, Simon Riggs si...@2ndquadrant.com wrote: On 12 November 2014 00:54, Robert Haas robertmh...@gmail.com wrote: Interestingly, I have a fairly solid idea of what proisparallel is, but I have no clear idea what CONTAINS NO SQL is or why it's relevant. I would imagine that srandom() contains no SQL under any reasonable definition of what that means, but it ain't parallel-safe. What is wrong in generating random numbers in parallel? I was just watching Robert's talk on Parallel query on youtube... I think the answer is at 41:09, the link below should take you there: https://www.youtube.com/watch?v=3klfarKEtMQ#t=2469 Regards David Rowley
Re: [HACKERS] Re: Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS
* Tom Lane (t...@sss.pgh.pa.us) wrote: Noah Misch n...@leadboat.com writes: On Thu, Nov 13, 2014 at 08:24:36PM -0500, Stephen Frost wrote: Agreed. I'll take care of both and we'll make sure the new role attributes being added will do the same for upgrades also. That would make pg_dumpall less faithful for every role other than the bootstrap superuser, a net loss. It would be defensible to do this for BOOTSTRAP_SUPERUSERID only. Huh? It seems difficult to argue that it's less faithful to do this when the previous version didn't have the concept at all. I believe what Noah is pointing out is that we'll end up adding attributes which weren't there already for superusers created by users. You're correct that we currently enable all attributes for the bootstrap superuser and therefore a dump/restore upgrade looks different from an initdb, unless the dump includes all new attributes for the bootstrap superuser. There's a couple ways to address this- Stop enabling all the role attribute bits for the bootstrap superuser, in which case it'd look a lot more like other superusers that a user might create (at least, in my experience, no one bothers to set the role attributes beyond superuser in real environments). or Reflect actual capability in what is viewed through the catalog. This might actually dovetail nicely with the role attribute representation change which is also being discussed, were we to make pg_authid a view which called 'has_rolX_privilege' to get the value for each attribute. What's actually in the bitmask might end up being different, but at least what's seen in pg_authid (and hopefully for all client tools) would make sense. Of course, this also has the downside that if the superuser bit is removed later, we'd revert to whatever is actually in the catalog for the user and that'd potentially be different for the bootstrap superuser vs. user-created superusers. Personally, I'm leaning towards the first as it's less clutter in the output of psql. If we add the role attributes proposed and continue to enable all of them explicitly for the bootstrap superuser, the 'Attributes' column is going to get mighty wide. I don't really see the explicit list of attributes as helping de-mystify what is going on for users as it's akin to root anyway- doing an 'ls' as root doesn't show all the file permissions based on what root can do. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Re: Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS
On Fri, Nov 14, 2014 at 08:35:25AM -0500, Stephen Frost wrote: * Tom Lane (t...@sss.pgh.pa.us) wrote: Noah Misch n...@leadboat.com writes: On Thu, Nov 13, 2014 at 08:24:36PM -0500, Stephen Frost wrote: Agreed. I'll take care of both and we'll make sure the new role attributes being added will do the same for upgrades also. That would make pg_dumpall less faithful for every role other than the bootstrap superuser, a net loss. It would be defensible to do this for BOOTSTRAP_SUPERUSERID only. Huh? It seems difficult to argue that it's less faithful to do this when the previous version didn't have the concept at all. I believe what Noah is pointing out is that we'll end up adding attributes which weren't there already for superusers created by users. Yes. There's a couple ways to address this- Stop enabling all the role attribute bits for the bootstrap superuser, in which case it'd look a lot more like other superusers that a user might create (at least, in my experience, no one bothers to set the role attributes beyond superuser in real environments). or [...] Personally, I'm leaning towards the first as it's less clutter in the output of psql. I'd agree for a new design, but I see too little to gain from changing it now. Today's behavior is fine. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Customized Options Threshold Error
furu...@pm.nttdata.co.jp writes: When we specify a value which exceeds valid range in Customized Options , its behavior is different from Parameter Interaction via Configuration File behavior. In case of Parameter Interaction via Configuration File, it finish with FATAL error when it get threshold error. But in Customized Options, it just printout WARNING and continue the process with default value. I think this is based on a misunderstanding. Bad values in postgresql.conf only result in a FATAL error at initial postmaster startup; if you change them and SIGHUP the server, an erroneous new value just results in a WARNING. This is because we don't really want the server crashing once it's up. The case you're showing with registering a new background worker is something that happens after the server is up, so it's consistent for it to produce a WARNING not a fatal error. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Teaching pg_dump to use NOT VALID constraints
Peter Eisentraut wrote: On 11/13/14 5:07 AM, Simon Riggs wrote: On 13 November 2014 00:20, Jim Nasby jim.na...@bluetreble.com wrote: Isn't the real use-case here that if constraints were valid when you dumped then we shouldn't have to *any* re-validate when we load? (Though, we'd have to be careful of that with CHECK because that can call user code...) Yes, that is the use case this patch would improve considerably. But your patch doesn't really address that. It just leaves the constraints as invalid, and someone will have to revalidate them later (how?). What Jim was describing was a mode that creates the constraints as valid but doesn't actually validate them. I can see both sides of that kind of feature. This might lead to users introducing invalid data by way of declaring constants as valid but not checked by the system; if they turn out to be invalid, the final state is a mess. I would only buy such a feature if we had some way to pass down the knowledge of the constraint being valid in the original system through some other means; say emit a CRC of the copy data in the pg_dump output that can be verified while loading, and only allow unvalidated constraints to be marked VALID if the sum matches. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort
Jeremy Harris j...@wizmail.org writes: On 14/11/14 00:46, Simon Riggs wrote: Limit (cost= rows=20 width=175) (actual time= rows=20 loops=1) - Sort (cost= rows=568733 width=175) (actual time= rows=20 loops=1) Sort Method: top-N heapsort Going off on a tangent, when I was playing with a merge-sort implementation I propagated limit information into the sort node, for a significant win. I'm not entirely following. The top-N heapsort approach already makes use of the limit info. If the limit is so large that the sort spills to disk, then we stop thinking about the limit. But I'm finding it doubtful either that that's a case worthy of extra code or that you could get very much win if you did add code for it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
On 2014-11-14 02:04:00 -0600, Jim Nasby wrote: On 11/13/14, 3:50 PM, Andres Freund wrote: Having been responsible for a site where downtime was a 6 figure dollar amount per hour, I've spent a LOT of time worrying about lock problems. The really big issue here isn't grabbing an exclusive lock; it's grabbing one at some random time when no one is there to actively monitor what's happening. (If you can't handle *any* exclusive locks, that also means you can never do an ALTER TABLE ADD COLUMN either.) With that in mind, would it be possible to set this up so that the time-consuming process of building the new index file happens first, and then (optionally) some sort of DBA action is required to actually do the relfilenode swap? I realize that's not the most elegant solution, but it's WAY better than this feature not hitting 9.5 and people having to hand-code a solution. I don't think having a multi step version of the feature and it not making into 9.5 are synonymous. And I really don't want to make it even more complex before we have the basic version in. I think a split like your: Possible syntax: REINDEX CONCURRENTLY -- Does what current patch does REINDEX CONCURRENT BUILD -- Builds new files REINDEX CONCURRENT SWAP -- Swaps new files in could make sense, but it's really an additional feature ontop. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS
* Noah Misch (n...@leadboat.com) wrote: On Fri, Nov 14, 2014 at 08:35:25AM -0500, Stephen Frost wrote: Personally, I'm leaning towards the first as it's less clutter in the output of psql. I'd agree for a new design, but I see too little to gain from changing it now. Today's behavior is fine. To clarify- you mean with the changes described- using usesuper for rolreplication and rolbypassrls instead of 'false' when dumping from older versions, correct? Note for all- rolreplication goes back to 9.1. Are we thinking that change should be backpatched? Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Re: Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS
Stephen Frost sfr...@snowman.net writes: * Noah Misch (n...@leadboat.com) wrote: I'd agree for a new design, but I see too little to gain from changing it now. Today's behavior is fine. To clarify- you mean with the changes described- using usesuper for rolreplication and rolbypassrls instead of 'false' when dumping from older versions, correct? I think Noah is arguing for leaving the pg_dumpall queries as they stand. I disagree, but he's entitled to his opinion. Note for all- rolreplication goes back to 9.1. Are we thinking that change should be backpatched? I would say it's not really worth back-patching. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS
On Fri, Nov 14, 2014 at 11:24:20AM -0500, Tom Lane wrote: Stephen Frost sfr...@snowman.net writes: * Noah Misch (n...@leadboat.com) wrote: I'd agree for a new design, but I see too little to gain from changing it now. Today's behavior is fine. To clarify- you mean with the changes described- using usesuper for rolreplication and rolbypassrls instead of 'false' when dumping from older versions, correct? I think Noah is arguing for leaving the pg_dumpall queries as they stand. I disagree, but he's entitled to his opinion. Yes, that. (Adopt Gilles Darold's fix, of course.) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
On 2014-11-13 11:41:18 -0500, Robert Haas wrote: On Wed, Nov 12, 2014 at 7:31 PM, Andres Freund and...@2ndquadrant.com wrote: But I think it won't work realistically. We have a *lot* of infrastructure that refers to indexes using it's primary key. I don't think we want to touch all those places to also disambiguate on some other factor. All the relevant APIs are either just passing around oids or relcache entries. I'm not quite following this. The whole point is to AVOID having two indexes. You have one index which may at times have two sets of physical storage. Right. But how are we going to refer to these different relfilenodes? All the indexing infrastructure just uses oids and/or Relation pointers to refer to the index. How would you hand down the knowledge which of the relfilenodes is supposed to be used in some callchain? There's ugly solutions like having a flag like 'bool rd_useotherfilenode' inside struct RelationData, but even ignoring the uglyness I don't think that'd work well - what if some function called inside the index code again starts a index lookup? I think I might just not getting your idea here? And all the indexing infrastructure can't deal with that without having separate oids relcache entries. Hopefully explained above? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS
* Noah Misch (n...@leadboat.com) wrote: On Fri, Nov 14, 2014 at 11:24:20AM -0500, Tom Lane wrote: Stephen Frost sfr...@snowman.net writes: * Noah Misch (n...@leadboat.com) wrote: I'd agree for a new design, but I see too little to gain from changing it now. Today's behavior is fine. To clarify- you mean with the changes described- using usesuper for rolreplication and rolbypassrls instead of 'false' when dumping from older versions, correct? I think Noah is arguing for leaving the pg_dumpall queries as they stand. I disagree, but he's entitled to his opinion. Yes, that. Ah, ok. I'm impartial, but I do note that we're currently inconsistent and so I'd prefer to go one way or the other. rolcreaterole uses usesuper, while rolreplication and rolbypassrls do not. Noah- would you argue that we should change rolcreaterole, which has this behavior in all released branches (though, of course, it's only relevant when upgrading from a pre-8.1 server where we didn't have rolcreaterole)? What are your thoughts on the additional role attributes which are being discussed? (Adopt Gilles Darold's fix, of course.) That's been done already. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] alternative model for handling locking in parallel groups
Hi, On 2014-11-13 15:59:11 -0500, Robert Haas wrote: Discussion of my incomplete group locking patch seems to have converged around two points: (1) Everybody agrees that undetected deadlocks are unacceptable. (2) Nobody agrees with my proposal to treat locks held by group members as mutually non-conflicting. As was probably evident from the emails on the other thread, it was not initially clear to me why you'd EVER want heavyweight locks held by different group members to mutually conflict, but after thinking it over for a while, I started to think of cases where you would definitely want that: 1. Suppose two or more parallel workers are doing a parallel COPY. Each time the relation needs to be extended, one backend or the other will need to take the relation extension lock in Exclusive mode. Clearly, taking this lock needs to exclude both workers in the same group and also unrelated processes. 2. Suppose two or more parallel workers are doing a parallel sequential scan, with a filter condition of myfunc(a.somecol), and that myfunc(a.somecal) updates a tuple in some other table. Access to update that tuple will be serialized using tuple locks, and it's no safer for two parallel workers to do this at the same time than it would be for two unrelated processes to do it at the same time. On the other hand, I think there are also some cases where you pretty clearly DO want the locks among group members to be mutually non-conflicting, such as: 3. Parallel VACUUM. VACUUM takes ShareUpdateExclusiveLock, so that only one process can be vacuuming a relation at the same time. Now, if you've got several processes in a group that are collaborating to vacuum that relation, they clearly need to avoid excluding each other, but they still need to exclude other people. And in particular, nobody else should get to start vacuuming that relation until the last member of the group exits. So what you want is a ShareUpdateExclusiveLock that is, in effect, shared across the whole group, so that it's only released when the last process exits. Note that you'd definitely not want to do this naively - currently there's baked in assumptions into the vaccum code that only one backend is doing parts of it. I think there's 4. Parallel query on a locked relation. Parallel query should work on a table created in the current transaction, or one explicitly locked by user action. It's not acceptable for that to just randomly deadlock, and skipping parallelism altogether, while it'd probably be acceptable for a first version, is not going a good long-term solution. FWIW, I think it's perfectly acceptable to refuse to work in parallel in that scenario. And not just for now. The biggest argument I can see to that is parallel index creation. It also sounds buggy and fragile for the query planner to try to guess whether the lock requests in the parallel workers will succeed or fail when issued. I don't know. Checking whether we hold a self exclusive lock on that relation doesn't sound very problematic to me. Figuring such details out is the job of the lock manager or the parallelism infrastructure, not the query planner. I don't think you can sensibly separate the parallelism infrastructure and the query planner. After thinking about these cases for a bit, I came up with a new possible approach to this problem. Suppose that, at the beginning of parallelism, when we decide to start up workers, we grant all of the locks already held by the master to each worker (ignoring the normal rules for lock conflicts). Thereafter, we do everything the same as now, with no changes to the deadlock detector. That allows the lock conflicts to happen normally in the first two cases above, while preventing the unwanted lock conflicts in the second two cases. I don't think that's safe enough. There's e.g. a reason why ANALYZE requires SUE lock. It'd definitely not be safe to simply grant the worker a SUE lock, just because the master backend already analyzed it or something like that. You could end up with the master and worker backends ANALYZEing the same relation. That said, I can definitely see use for a infrastructure where we explicitly can grant another backend a lock that'd conflict with one we're already holding. But I think it'll need to be more explicit than just granting all currently held locks at the highest current level. And I think it's not necessarily going to be granting them all the locks at their current levels. What I'm thinking of is basically add a step to execMain.c:InitPlan() that goes through the locks and grants the child process all the locks that are required for the statement to run. But not possibly preexisting higher locks. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
On Thu, Nov 13, 2014 at 6:55 PM, Simon Riggs si...@2ndquadrant.com wrote: On 13 November 2014 21:24, Robert Haas robertmh...@gmail.com wrote: On Thu, Nov 13, 2014 at 8:18 AM, Simon Riggs si...@2ndquadrant.com wrote: Ordering transactions in LSN order is very precisly the remit of the existing logical decoding API. Any user that wishes to see a commits in sequence can do so using that API. BDR already does this, as do other users of the decoding API. So Slony already has access to a useful ordering if it wishes it. We do not need to anything *on this patch* to enable LSNs for Slony or anyone else. I don't see any reason to provide the same facility twice, in two different ways. Perhaps you could respond more specifically to concerns expressed upthread, like: http://www.postgresql.org/message-id/blu436-smtp28b68b9312cbe5d9125441dc...@phx.gbl I don't see that as a dumb argument on the face of it. We have a clear must have argument for timestamps to support replication conflicts. Adding LSNs, when we already have a way to access them, is much more of a nice to have. I don't really see it as a particularly nice to have either, since the SLRU is in no way ordered. Scope creep is a dangerous thing, so we shouldn't, and elsewhere don't, collect up ideas like a bag of mixed sweets. It's easy to overload things to the point where they don't fly at all. The whole point of this is that we're building something faster than trigger based systems. I think that's slamming the door closed and nailing it shut behind you. If we add the feature without LSNs, how will someone go back and add that later? It would change the on-disk format of the SLRU, so to avoid breaking pg_upgrade, someone would have to write a conversion utility. Even at that, it would slow pg_upgrade down when the feature has been used. By way of contrast, adding the feature now is quite easy. It just requires storing a few more bytes per transaction. I am all in favor of incremental development when possible, but not when it so greatly magnifies the work that needs to be done. People have been asking for the ability to determine the commit ordering for years, and this is a way to do that, very inexpensively, as part of a patch that is needed anyway. We are not talking about loading 20 new requirements on top of this patch; that would be intolerable. We're talking about adding one additional piece of information that has been requested multiple times over the years. The way I see it, there are at least three uses for this information. One, trigger-based replication solutions. While logical decoding will doubtless be preferable, I don't think trigger-based replication solutions will go away completely, and certainly not right away. They've wanted this since forever. Two, for user applications that want to know the commit order for their own purposes, as in Steve's example. Three, for O(1) snapshots. Heikki's patch to make that happen seems to have stalled a bit, but if it's ever to go anywhere it will need something like this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS
On Thu, Nov 13, 2014 at 6:32 PM, Tom Lane t...@sss.pgh.pa.us wrote: What's bothering me is that I see this in pg_dumpall output from a 9.4 or earlier database: ALTER ROLE postgres WITH SUPERUSER INHERIT CREATEROLE CREATEDB LOGIN REPLICATION NOBYPASSRLS; What about leaving out NOBYPASSRLS and letting it go to whatever the default is? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup vs. Windows and tablespaces
On Fri, Nov 14, 2014 at 2:55 AM, Amit Kapila amit.kapil...@gmail.com wrote: On Fri, Nov 14, 2014 at 9:11 AM, Robert Haas robertmh...@gmail.com wrote: On Thu, Nov 13, 2014 at 10:37 PM, Amit Kapila amit.kapil...@gmail.com wrote: I have mentioned that this can be usable for Linux users as well on that thread, however I think we might want to provide it with an option for linux users. In general, I think it is good to have this patch for Windows users and later if we find that Linux users can also get the benefit with this functionality, we can expose the same with an additional option. Why make it an option instead of just always doing it this way? To avoid extra work during archive recovery if it is not required. I understand that this might not create any measurable difference, but still there is addition I/O involved (read from file) which can be avoided. Yeah, but it's trivial. We're not going create enough tablespaces on one cluster for the cost of dropping a few extra symlinks in place to matter. OTOH, if that is okay, then I think we can avoid few #ifdef WIN32 that this patch introduces and can have consistency for this operation on both linux and Windows. Having one code path for everything seems appealing to me, but what do others think? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add CREATE support to event triggers
On Thu, Nov 13, 2014 at 7:45 AM, Andres Freund and...@2ndquadrant.com wrote: Right. And that's why it's cool that logical decoding can operate through DDL differences. The apply side might not be able to cope with what pops out, but that's not logical decoding's fault, and different apply-sides can adopt different policies as to how to deal with whatever problems crop up. I think pretty much all of the solutions just say oops, you're on your own. And I can't blame them for that. Once there's a schema difference and it causes problem there's really not much that can be done. Agreed. I don't know exactly what you mean by a normal single master setup. Surely the point of logical decoding is that the replica might not be identical to the master. I actually think that that's not primary the point if you talk about individual objects. The majority of objects will be exactly the same on all nodes. If you actually want to have differening objects on the nodes you'll have to opt out/in (depending on your solution) of ddl replication for those objects. I'm not saying it's the primary point; I'm just saying that it can happen. And if it isn't, then a command that succeeded on the master might fail on the standby - for example, because an object by that name already exists there, or because a type doesn't exist there. (Even if you replicate a CREATE EXTENSION command, there's no guarantee that the .so exists on the target.) Then what? Sure. There's reasons logical replication isn't always a win. But I don't see why that's a reason not to make it as robust as possible. I agree. Btw, the .so problem exists for wal shipping as as well. Not in the same way. You might not be able to access the data on the standby if the .so defining the datatype isn't present, but the replication itself doesn't care. This is basically the same problem as multi-master replication conflicts, except with DDL. Resolving replication conflicts is not a very easy thing to get right even if you're only concerned about the rows in the tables. It's probably harder if you're worried about the schema, too. I don't think it's a sane thing to do multimaster with differing schemas for individual relations, except maybe additional nonunique indexes. But that's exactly what you ARE doing, isn't it? I mean, if you replicate in only one direction, nothing keeps someone from modifying things on the replica independently of BDR, and if you replicate in both directions, well that's multi-master. The solution here doesn't force you to do that, does it? It's something that can be used by more than replication solution? In theory, yes. What's the practical point here? I am quite doubtful about whether there will ever be a second, working implementation, so I see all of this code - and the maintenance effort associated with it - as something that will really only benefit BDR. I understand that you don't see it that way, and I'm not saying that you are offering anything in bad faith, but it looks to me like even with all of this very substantial amount of infrastructure, you're still going to need a big pile of additional code inside BDR to actually make it work, and I don't hear anyone else offering to develop something similar. I think generally logical replication has more failure cases than physical ones. Which you seem to agree with. Yes, I agree with that. Physical replication is basically no-fail because it just says, hey, go write these bytes into this page, and we can pretty much always do that. But statement-based logical replication means basically executing arbitrary chunks of code all over the backend, and there is just no way to guarantee that code won't throw an error. So the best we can do is to hope that those errors will get reported back to the user, which is going to require some kind of distributed transaction. Your idea to just run the replicated DDL statements on the standby before committing on the master is one approach to that problem, and probably the simplest one, but not the only one - one can imagine something that resembles true clustering, for example. I think that's generally not what people need for primary/standby cases (of subsets of tables). In practice there aren't many failures like that besides schema differences. And there's just many usecases that aren't doable with physical replication, so we can't advise people doing that. I can't really follow this, especially the first sentence. By the way, the fact that you're planning to do log-based replication of DML and trigger-based replication of DDL scares the crap out of me. I'm not sure how that's going to work at all if the two are interleaved in the same transaction. Maybe that's based on a misunderstanding. All the event trigger does is insert a event into a (local) queue. That's then shipped to the other side using the same replication mechanisms as used for rows. Then, when
Re: [HACKERS] pg_basebackup vs. Windows and tablespaces
Robert Haas robertmh...@gmail.com writes: On Fri, Nov 14, 2014 at 2:55 AM, Amit Kapila amit.kapil...@gmail.com wrote: OTOH, if that is okay, then I think we can avoid few #ifdef WIN32 that this patch introduces and can have consistency for this operation on both linux and Windows. Having one code path for everything seems appealing to me, but what do others think? Generally I'd be in favor of avoiding platform-dependent code where possible, but that doesn't represent a YES vote for this particular patch. It looks pretty messy in a quick look, even granting that the #ifdef WIN32's would all go away. A larger question here is about forward/backward compatibility of the basebackup files. Changing the representation of symlinks like this would break that. Maybe we don't care, not sure (is there already a catversion check for these things?). Changing the file format for only some platforms seems like definitely a bad idea though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add CREATE support to event triggers
On 2014-11-14 12:38:52 -0500, Robert Haas wrote: This is basically the same problem as multi-master replication conflicts, except with DDL. Resolving replication conflicts is not a very easy thing to get right even if you're only concerned about the rows in the tables. It's probably harder if you're worried about the schema, too. I don't think it's a sane thing to do multimaster with differing schemas for individual relations, except maybe additional nonunique indexes. But that's exactly what you ARE doing, isn't it? I mean, if you replicate in only one direction, nothing keeps someone from modifying things on the replica independently of BDR, and if you replicate in both directions, well that's multi-master. Meh. By that definition any logical replication solution does multi master replication. Either you tell your users that that's not allowed, or you just prevent it by technical means. Absolutely the same is true for table contents. FWIW, in BDR we *do* prevent schemas from being modified independently on different nodes (unless you set the 'running with scissors' guc). I am quite doubtful about whether there will ever be a second, working implementation, so I see all of this code - and the maintenance effort associated with it - as something that will really only benefit BDR. I understand that you don't see it that way, and I'm not saying that you are offering anything in bad faith, but it looks to me like even with all of this very substantial amount of infrastructure, you're still going to need a big pile of additional code inside BDR to actually make it work, and I don't hear anyone else offering to develop something similar. I don't know what to say about this. I don't see any other realistic way to perform DDL replication in logical rep, and nearly all my conversations with users have indicated that they want that. I think it's a good idea to structure independent features in a way that other solutions can reuse them. But I sure as hell can't force them to use it - especially as there's unfortunately not too much development going on in the existing logical replication solutions for postgres. Btw, I really think there's quite legitimate use cases for this besides logical replication solutions - e.g. schema tracking is quite a sensible use case. By the way, the fact that you're planning to do log-based replication of DML and trigger-based replication of DDL scares the crap out of me. I'm not sure how that's going to work at all if the two are interleaved in the same transaction. Maybe that's based on a misunderstanding. All the event trigger does is insert a event into a (local) queue. That's then shipped to the other side using the same replication mechanisms as used for rows. Then, when receiving changes in that ddl queue the standby performs those actions before continuing with the replay. That makes the interleaving on the standby to be pretty much the same as on the primary. OK. But that's also not something that I can really imagine us ever adopting in core. Well, that bit really depends on what the user (e.g. a replication solution, or a schema tracking feature) needs. There's certain things that you can quite easily do as part of core (e.g. insert something into the WAL stream), that you just can't as external code. I don't think there's any external replication solution that won't have some form of internal queue to manipulate its internal state. For an external solution such a queue currently pretty much has to be some table, but for an eventual in core solution it could be done differently. If we were going to do DDL replication in core, I have to believe we'd find a way to put all of the information in the WAL stream, not use triggers. I agree that we might want to represent the transport to standbys differently for something in core. That there's many different ways the deparsing output could be used imo is a good reason why that part of the mechanism isn't part of this submission. I don't really understand the arguments against triggers in general though. We're already using them quite extensively - I don't see why DDL replication has to meet some completely different bar than say foreign key checks or deferred uniqueness checks. We easily can add 'implicit' event triggers, that aren't defined inside the catalog if we feel like it. I'm just not sure we really would need/want to. Yes, it's not trivial. And I think there's some commands where you might not want to try but either scream ERROR or just rereplicate the whole relation or such. I very strongly feel that we (as postgres devs) have a *much* higher chance of recognizing these cases than either some random users (that write slonik scripts or similar) or some replication solution authors that aren't closely involved with -hackers. It's clearly the job of the replication solution authors to figure these details out. I'm not going to
Re: [HACKERS] PostgreSQL doesn't stop propley when --slot option is specified with pg_receivexlog.
On Fri, Nov 14, 2014 at 7:22 PM, furu...@pm.nttdata.co.jp wrote: Hi, pg_ctl stop does't work propley, if --slot option is specified when WAL is flushed only it has switched. These processes still continue even after the posmaster failed:pg_receivexlog, walsender and logger. I could reproduce this problem. At normal shutdown, walsender keeps waiting for the last WAL record to be replicated and flushed in pg_receivexlog. But pg_receivexlog issues sync command only when WAL file is switched. Thus, since pg_receivexlog may never flush the last WAL record, walsender may have to keep waiting infinitely. pg_recvlogical handles this problem by calling fsync() when it receives the request of immediate reply from the server. That is, at shutdown, walsender sends the request, pg_receivexlog receives it, flushes the last WAL record, and sends the flush location back to the server. Since walsender can see that the last WAL record is successfully flushed in pg_receivexlog, it can exit cleanly. One idea to the problem is to introduce the same logic as pg_recvlogical has, to pg_receivexlog. Thought? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup vs. Windows and tablespaces
On Fri, Nov 14, 2014 at 1:15 PM, Tom Lane t...@sss.pgh.pa.us wrote: Generally I'd be in favor of avoiding platform-dependent code where possible, but that doesn't represent a YES vote for this particular patch. It looks pretty messy in a quick look, even granting that the #ifdef WIN32's would all go away. Hmm, OK. I have not read the patch. Hopefully that's something that could be fixed. A larger question here is about forward/backward compatibility of the basebackup files. Changing the representation of symlinks like this would break that. Maybe we don't care, not sure (is there already a catversion check for these things?). Changing the file format for only some platforms seems like definitely a bad idea though. What are the practical consequences of changing the file format? I think that an old backup containing symlinks could be made to work on a new server that knows how to create them, and we should probably design it that way, but a physical backup isn't compatible across major versions anyway, so it doesn't have the same kinds of repercussions as changing something like the pg_dump file format. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup vs. Windows and tablespaces
Amit Kapila wrote: 2. Symlink file format: oid linkpath 16387 E:\PostgreSQL\tbs Symlink file will contain entries for all the tablspaces under pg_tblspc directory. I have kept the file name as symlink_label (suggestion are welcome if you want some different name for this file). I think symlink_label isn't a very good name. This file is not a label in the sense that backup_label is; it seems more a catalog to me. And it's not, in essence, about symlinks either, but rather about tablespaces. I would name it following the term tablespace catalog or some variation thereof. I know we don't expect that users would have to look at the file or edit it in normal cases, but it seems better to make it be human-readable. I would think that the file needs to have tablespace names too, then, not just OIDs. Maybe we don't use the tablespace name for anything other than documentation purposes if someone decides to look at the file, so perhaps it should look like a comment: oid link path ; tablespace name We already do this in pg_restore -l output IIRC. One use case mentioned upthread is having the clone be created in the same machine as the source server. Does your proposal help with it? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL doesn't stop propley when --slot option is specified with pg_receivexlog.
On 2014-11-15 03:25:16 +0900, Fujii Masao wrote: On Fri, Nov 14, 2014 at 7:22 PM, furu...@pm.nttdata.co.jp wrote: Hi, pg_ctl stop does't work propley, if --slot option is specified when WAL is flushed only it has switched. These processes still continue even after the posmaster failed:pg_receivexlog, walsender and logger. I could reproduce this problem. At normal shutdown, walsender keeps waiting for the last WAL record to be replicated and flushed in pg_receivexlog. But pg_receivexlog issues sync command only when WAL file is switched. Thus, since pg_receivexlog may never flush the last WAL record, walsender may have to keep waiting infinitely. Right. pg_recvlogical handles this problem by calling fsync() when it receives the request of immediate reply from the server. That is, at shutdown, walsender sends the request, pg_receivexlog receives it, flushes the last WAL record, and sends the flush location back to the server. Since walsender can see that the last WAL record is successfully flushed in pg_receivexlog, it can exit cleanly. One idea to the problem is to introduce the same logic as pg_recvlogical has, to pg_receivexlog. Thought? Sounds sane to me. Are you looking into doing that? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Size of regression database
Alvaro Herrera wrote: Tom Lane wrote: I was testing backwards compatibility of pg_dumpall just now, and was somewhat astonished to notice the size of the output for the regression database compared to what it was not too long ago: -rw-rw-r--. 1 tgl tgl 4509135 Nov 13 16:19 dumpall.83 -rw-rw-r--. 1 tgl tgl 4514441 Nov 13 16:24 dumpall.84 -rw-rw-r--. 1 tgl tgl 4666917 Nov 13 16:15 dumpall.90 -rw-rw-r--. 1 tgl tgl 4681235 Nov 13 16:15 dumpall.91 -rw-rw-r--. 1 tgl tgl 5333587 Nov 13 16:15 dumpall.92 -rw-rw-r--. 1 tgl tgl 5409083 Nov 13 16:15 dumpall.93 -rw-rw-r--. 1 tgl tgl 5493686 Nov 13 16:15 dumpall.94 -rw-rw-r--. 1 tgl tgl 27151777 Nov 13 16:21 dumpall.head A quick eyeball check says that that quintupling of the database size is all in BRIN index tests. Could we dial that back to something a bit saner please? Oops. Sure, will see about this. Done: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=86cf9a565069755189e08290343d2d62afdd1f52 Now a pg_dumpall for me is 5510969 bytes which seems reasonable. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] printing table in asciidoc with psql
On 14 November 2014 20:57, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Is anyone going to submit a new version of this patch? Hi Alvaro, due to family issues I will not be able to work on it for the next 10 days. regards, Szymon
Re: [HACKERS] TODO request: log_long_transaction
On 11/7/14, 1:19 PM, Michael Banck wrote: Am Montag, den 27.10.2014, 19:29 + schrieb Thom Brown: On 27 October 2014 19:21, Josh Berkusj...@agliodbs.com wrote: I just realized that there is one thing we can't log currently: transactions which last more than #ms. This is valuable diagnostic information when looking for issues like causes of bloat and deadlocks. I'd like it to be on the TODO list because it seems like part of a good GSOC project or first-time contribution. So effectively, log_min_duration_transaction? Sounds useful. FWIW, I've also wanted the equivalent of statement_timeout for transactions; the ability to abort a transaction if it runs for too long. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort
On 14/11/14 14:54, Tom Lane wrote: Jeremy Harris j...@wizmail.org writes: On 14/11/14 00:46, Simon Riggs wrote: Limit (cost= rows=20 width=175) (actual time= rows=20 loops=1) - Sort (cost= rows=568733 width=175) (actual time= rows=20 loops=1) Sort Method: top-N heapsort Going off on a tangent, when I was playing with a merge-sort implementation I propagated limit information into the sort node, for a significant win. I'm not entirely following. The top-N heapsort approach already makes use of the limit info. Having gone back to look, you're right. It was Uniq nodes I merged (the sort handles both bounded-output and dedup). -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New storage parameter pages_per_range not mentioned in CREATE INDEX doc
Michael Paquier wrote: Hi all, The new storage parameter pages_per_range is missing in the documentation of CREATE INDEX. The patch attached corrects that. Thanks! Pushed. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup vs. Windows and tablespaces
On 11/13/14 4:33 PM, Peter Eisentraut wrote: Is this still relevant after this commit? commit fb05f3ce83d225dd0f39f8860ce04082753e9e98 Author: Peter Eisentraut pete...@gmx.net Date: Sat Feb 22 13:38:06 2014 -0500 pg_basebackup: Add support for relocating tablespaces I believe so. The commit only applies to plain output. Amit's complaint is that tar utilities on Windows don't unpack symlinks, so the tar format isn't useful on Windows when tablespaces are used. So he wants the recovery mechanism to restore the symlinks. Um, wouldn't accepting this patch break the above-mentioned tablespace-relocation feature, because pg_basebackup wouldn't see any more symlinks sent down? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Idle transaction cancel/timeout and SSL revisited
Hello Hackers, After reading up through archives on the two $subj related TODO items I'm under impression that the patches[1,2] didn't make it mainly because of the risk of breaking SSL internals if we try to longjump out of the signal handler in the middle of a blocking SSL read and/or if we try to report that final transaction/connection aborted notice to the client. What if instead of trying to escape from the signal handler we would set a flag and use it to simulate EOF after the recv() is restarted due to EINTR? From the backend perspective it should look like the client has just hang up. Both SSL and cleartext connections are routed through secure_raw_read, so it seems like a good candidate for checking the flag we would set in StatementCancelHandler(). (Should we also add the same check in interactive_getc()?) Ultimately, in PostgresMain() the ReadCommand() would return EOF and we can either let the existing code shutdown the backend, or add special handling that will only abort the transaction and report the case to the client, which was the intent in[1]. And if that works out, the timeout handler in[2] can simply reuse the flag. A potential problem with this is that SSL might perform some eager cleanup when seeing EOF, so the backend might need to reset/restart the connection (is that possible?) I'm attaching a patch (that doesn't compile yet :-p) in hope it can be useful for the purpose of the discussion. Notably, secure_raw_read() can't know about IdleTransactionCancelPending and I'm not sure what would be the best way to let it see that (is it too much of a shortcut anyway?) -- Alex [1] http://www.postgresql.org/message-id/1262268211.19367.10736.camel@ebony [2] http://www.postgresql.org/message-id/538dc843.2070...@dalibo.com diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c new file mode 100644 index b05364c..377984d *** a/src/backend/libpq/be-secure-openssl.c --- b/src/backend/libpq/be-secure-openssl.c *** rloop: *** 543,550 errmsg(SSL error: %s, SSLerrmessage(; /* fall through */ case SSL_ERROR_ZERO_RETURN: ! errno = ECONNRESET; ! n = -1; break; default: ereport(COMMERROR, --- 543,553 errmsg(SSL error: %s, SSLerrmessage(; /* fall through */ case SSL_ERROR_ZERO_RETURN: ! if (!IdleTransactionCancelPending) /* HACK */ ! { ! errno = ECONNRESET; ! n = -1; ! } break; default: ereport(COMMERROR, diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c new file mode 100644 index 41ec1ad..44079ff *** a/src/backend/libpq/be-secure.c --- b/src/backend/libpq/be-secure.c *** secure_raw_read(Port *port, void *ptr, s *** 145,155 { ssize_t n; ! prepare_for_client_read(); ! n = recv(port-sock, ptr, len, 0); ! client_read_ended(); return n; } --- 145,160 { ssize_t n; ! if (IdleTransactionCancelPending) ! n = 0; /* simulate EOF */ ! else ! { ! prepare_for_client_read(); ! n = recv(port-sock, ptr, len, 0); ! client_read_ended(); ! } return n; } diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c new file mode 100644 index cc62b2c..37437fc *** a/src/backend/tcop/postgres.c --- b/src/backend/tcop/postgres.c *** interactive_getc(void) *** 310,318 { int c; ! prepare_for_client_read(); ! c = getc(stdin); ! client_read_ended(); return c; } --- 310,323 { int c; ! if (IdleTransactionCancelPending) ! c = EOF; ! else ! { ! prepare_for_client_read(); ! c = getc(stdin); ! client_read_ended(); ! } return c; } *** StatementCancelHandler(SIGNAL_ARGS) *** 2629,2642 if (!proc_exit_inprogress) { InterruptPending = true; ! QueryCancelPending = true; /* * If it's safe to interrupt, and we're waiting for input or a lock, * service the interrupt immediately */ if (ImmediateInterruptOK InterruptHoldoffCount == 0 ! CritSectionCount == 0) { /* bump holdoff count to make ProcessInterrupts() a no-op */ /* until we are done getting ready for it */ --- 2634,2655 if (!proc_exit_inprogress) { InterruptPending = true; ! ! if (!DoingCommandRead) ! QueryCancelPending = true; ! else if (IsTransactionOrTransactionBlock()) ! IdleTransactionCancelPending = true; /* * If it's safe to interrupt, and we're waiting for input or a lock, * service the interrupt immediately */ if (ImmediateInterruptOK InterruptHoldoffCount == 0 ! CritSectionCount == 0 QueryCancelPending) ! /* ! * ProcessInterrupts() does nothing in case of ! * IdleTransactionCancelPending, it is handled in PostgresMain ! */ { /* bump holdoff count to make ProcessInterrupts() a no-op */ /* until we are done getting ready for it */ *** PostgresMain(int argc, char
Re: [HACKERS] Idle transaction cancel/timeout and SSL revisited
Hi, On 2014-11-15 00:11:36 +0300, Alex Shulgin wrote: After reading up through archives on the two $subj related TODO items I'm under impression that the patches[1,2] didn't make it mainly because of the risk of breaking SSL internals if we try to longjump out of the signal handler in the middle of a blocking SSL read and/or if we try to report that final transaction/connection aborted notice to the client. I've written a patch that allows that - check http://archives.postgresql.org/message-id/20140927225421.GE5423%40alap3.anarazel.de I feel pretty confident that that's the way to go. I just need some time to polish it. What if instead of trying to escape from the signal handler we would set a flag and use it to simulate EOF after the recv() is restarted due to EINTR? From the backend perspective it should look like the client has just hang up. That's pretty much what the above does. Except that it actually deals with blocking syscalls by not using them and relying on latches/select instead. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idle transaction cancel/timeout and SSL revisited
Andres Freund and...@2ndquadrant.com writes: On 2014-11-15 00:11:36 +0300, Alex Shulgin wrote: After reading up through archives on the two $subj related TODO items I'm under impression that the patches[1,2] didn't make it mainly because of the risk of breaking SSL internals if we try to longjump out of the signal handler in the middle of a blocking SSL read and/or if we try to report that final transaction/connection aborted notice to the client. I've written a patch that allows that - check http://archives.postgresql.org/message-id/20140927225421.GE5423%40alap3.anarazel.de I feel pretty confident that that's the way to go. I just need some time to polish it. What if instead of trying to escape from the signal handler we would set a flag and use it to simulate EOF after the recv() is restarted due to EINTR? From the backend perspective it should look like the client has just hang up. That's pretty much what the above does. Except that it actually deals with blocking syscalls by not using them and relying on latches/select instead. Neat, I must have missed it. -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS
On Fri, Nov 14, 2014 at 2:51 PM, Stephen Frost sfr...@snowman.net wrote: * Robert Haas (robertmh...@gmail.com) wrote: On Thu, Nov 13, 2014 at 6:32 PM, Tom Lane t...@sss.pgh.pa.us wrote: What's bothering me is that I see this in pg_dumpall output from a 9.4 or earlier database: ALTER ROLE postgres WITH SUPERUSER INHERIT CREATEROLE CREATEDB LOGIN REPLICATION NOBYPASSRLS; What about leaving out NOBYPASSRLS and letting it go to whatever the default is? I'd be fine with that- but would we want to do it for the other role attributes also? Specifically rolcreaterole and rolreplication for older server versions. I'm still of the opinion that we should just drop the explicit true for all the role attributes for the bootstrap superuser and then go with this suggestion to let it go to the default for upgrades from older versions. Not sure; I was just brainstorming. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On the warpath again about ill-considered inclusion nests
All, * Stephen Frost (sfr...@snowman.net) wrote: * Tom Lane (t...@sss.pgh.pa.us) wrote: Well, if you *only* move RowSecurityDesc and not RowSecurityPolicy, okay, but that seems a bit useless/inconsistent if I'm reading it right that RowSecurityDesc contains a List of RowSecurityPolicy structs. Yes, good point. What seems possibly saner is to just remove the header inclusion in rel.h and declare the new Relation field similarly to the way we handle rd_fdwroutine and some other fields there: /* use struct here to avoid needing to include rowsecurity.h: */ struct RowSecurityDesc *rsdesc; /* Row-security policy, or NULL */ Makes sense to me. And while you are at it, how about renaming rsdesc to rd_rsdesc? The fact that whoever put in trigdesc didn't get the memo about the naming convention for Relation fields doesn't excuse you from following it. Ok. I tend to be bad and mistakenly consider existing code 'gospel'. Will fix. PS: The comments for struct RowSecurityPolicy could stand to be improved. Understood, will do so. And, done. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] initdb -S and tablespaces
On 2014-10-30 14:30:28 +0530, Abhijit Menon-Sen wrote: Here's a proposed patch to initdb to make initdb -S fsync everything under pg_tblspc. It introduces a new function that calls walkdir on every entry under pg_tblspc. This is only one approach: I could have also changed walkdir to follow links, but that would have required a bunch of #ifdefs for Windows (because it doesn't have symlinks), and I guessed a separate function with two calls might be easier to patch into back branches. I've tested this patch under various conditions on Linux, but it could use some testing on Windows. I've pushed this. The windows buildfarm animals that run pg_upgrade (and thus --sync-only) will have to tell us whether there's a problem. I sure hope there's none... Thanks for that patch! Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] END_OF_RECOVERY shutdowns and ResetUnloggedRelations()
On 2014-10-27 16:09:30 +0530, Abhijit Menon-Sen wrote: At 2014-09-25 22:41:18 +0200, and...@2ndquadrant.com wrote: On 2014-09-24 17:06:05 +0530, Abhijit Menon-Sen wrote: 1. Move the call to ResetUnloggedRelations(UNLOGGED_RELATION_INIT) to earlier in StartupXLOG. 2. Inside that function, issue fsync()s for the main forks we create by copying the _init fork. These two imo should definitely be backpatched. I've attached updated versions of these two patches. The only changes are to revise the comments as you suggested. Committed and backpatched (to 9.1) these. That required also backpatching the move of fsync_fname() to fd.c/h to 9.1-9.3. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS
On Fri, Nov 14, 2014 at 11:47:49AM -0500, Stephen Frost wrote: On Fri, Nov 14, 2014 at 11:24:20AM -0500, Tom Lane wrote: I think Noah is arguing for leaving the pg_dumpall queries as they stand. I disagree, but he's entitled to his opinion. Ah, ok. I'm impartial, but I do note that we're currently inconsistent and so I'd prefer to go one way or the other. rolcreaterole uses usesuper, while rolreplication and rolbypassrls do not. Noah- would you argue that we should change rolcreaterole, which has this behavior in all released branches (though, of course, it's only relevant when upgrading from a pre-8.1 server where we didn't have rolcreaterole)? Setting aside that I wouldn't have argued for any change here, yes. I agree that there's no good reason to handle rolcreaterole unlike rolreplication. The choice between their respective techniques has behavior consequences only if you later use ALTER ROLE x NOSUPERUSER, which I have not seen done apart from explicit ALTER ROLE testing. Having said that, I prefer setting these attributes to false when dumping from a version that did not have them, for these reasons: 1) It's fail-safe. The hypothetical ALTER ROLE x NOSUPERUSER may leave the role with fewer privileges than expected, never with more privileges than expected. 2) It's more consistent with how folks create superuser accounts. I've not seen CREATE USER x SUPERUSER CREATEROLE used. Where SUPERUSER preempts a given role attribute, the norm among sites I've seen is to omit the attribute. (The bootstrap superuser does turn this point on its head.) 3) It's cleaner in \du output. I can't pinpoint a technical argument against your proposal to cease adding excess attributes to the bootstrap superuser at initdb time. It feels like a case of the tail wagging the dog, changing antediluvian initdb behavior to make pg_dumpall slightly more transparent. So, if you desire to make this consistent, I recommend using rolreplication's treatment as the gold standard. That is to say, when dumping from an older version, set to false any of these role attributes not existing in that version. Robert's proposal to emit neither NOBYPASSRLS nor BYPASSRLS is a reasonable alternative, though less so for pg_dumpall --clean. It would be defensible to send NOBYPASSRLS under --clean and omit the option otherwise; consider that my second choice. What are your thoughts on the additional role attributes which are being discussed? All three of rolcreaterole, rolreplication and rolbypassrls deserve the same dumpRoles() treatment. nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
On 14 November 2014 17:12, Robert Haas robertmh...@gmail.com wrote: We are not talking about loading 20 new requirements on top of this patch; that would be intolerable. We're talking about adding one additional piece of information that has been requested multiple times over the years. The requested information is already available, as discussed. Logical decoding adds commit ordering for *exactly* the purpose of using it for replication, available to all solutions. This often requested feature has now been added and doesn't need to be added twice. So what we are discussing is adding a completely superfluous piece of information. Not including the LSN info does nothing to trigger based replication, which will no doubt live on happily for many years. But adding LSN will slow down logical replication, for no purpose at all. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Useless dead struct in parse_func.h
It looks like commit 0e99be1c removed the final real use of the struct InhPaths. Attached patch removes it entirely. -- Peter Geoghegan diff --git a/src/include/parser/parse_func.h b/src/include/parser/parse_func.h index b9b06ae..4423bc0 100644 --- a/src/include/parser/parse_func.h +++ b/src/include/parser/parse_func.h @@ -18,18 +18,6 @@ #include parser/parse_node.h -/* - * This structure is used to explore the inheritance hierarchy above - * nodes in the type tree in order to disambiguate among polymorphic - * functions. - */ -typedef struct _InhPaths -{ - int nsupers; /* number of superclasses */ - Oid self; /* this class */ - Oid *supervec; /* vector of superclasses */ -} InhPaths; - /* Result codes for func_get_detail */ typedef enum { -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS
* Noah Misch (n...@leadboat.com) wrote: On Fri, Nov 14, 2014 at 11:47:49AM -0500, Stephen Frost wrote: rolcreaterole uses usesuper, while rolreplication and rolbypassrls do not. Noah- would you argue that we should change rolcreaterole, which has this behavior in all released branches (though, of course, it's only relevant when upgrading from a pre-8.1 server where we didn't have rolcreaterole)? Setting aside that I wouldn't have argued for any change here, yes. I agree that there's no good reason to handle rolcreaterole unlike rolreplication. The choice between their respective techniques has behavior consequences only if you later use ALTER ROLE x NOSUPERUSER, which I have not seen done apart from explicit ALTER ROLE testing. Having said that, I prefer setting these attributes to false when dumping from a version that did not have them, for these reasons: 1) It's fail-safe. The hypothetical ALTER ROLE x NOSUPERUSER may leave the role with fewer privileges than expected, never with more privileges than expected. 2) It's more consistent with how folks create superuser accounts. I've not seen CREATE USER x SUPERUSER CREATEROLE used. Where SUPERUSER preempts a given role attribute, the norm among sites I've seen is to omit the attribute. (The bootstrap superuser does turn this point on its head.) 3) It's cleaner in \du output. I agree with these points and my experience matches yours. The bootstrap user is the one anomaly at those sites and makes it stand out rather than look like the majority of the superuser accounts which exist (where there exists more than one or two anyway). I can't pinpoint a technical argument against your proposal to cease adding excess attributes to the bootstrap superuser at initdb time. It feels like a case of the tail wagging the dog, changing antediluvian initdb behavior to make pg_dumpall slightly more transparent. For my 2c, it feels like it was simply an attempt to reflect actual capabilities without consideration for what happened later rather than any particularly technical reason, and I don't feel that original reasoning (if that's what it was, anyway) was sound. So, if you desire to make this consistent, I recommend using rolreplication's treatment as the gold standard. That is to say, when dumping from an older version, set to false any of these role attributes not existing in that version. Robert's proposal to emit neither NOBYPASSRLS nor BYPASSRLS is a reasonable alternative, though less so for pg_dumpall --clean. It would be defensible to send NOBYPASSRLS under --clean and omit the option otherwise; consider that my second choice. I don't see the point in including them for --clean..? --clean states that DROP commands would be added, not that existing roles would be adjusted in some way. As for using 'always false'- I tend to think Robert actually has it better by using the default for users. Consider rolinherit- that defaults to 'true' and while it would technically be more 'safe' to set it to false, it wouldn't have matched what we provided under the user / group system prior to roles. Doing this would also reduce clutter in pg_dumpall output. What are your thoughts on the additional role attributes which are being discussed? All three of rolcreaterole, rolreplication and rolbypassrls deserve the same dumpRoles() treatment. Agreed. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Re: Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS
On Fri, Nov 14, 2014 at 08:39:28PM -0500, Stephen Frost wrote: * Noah Misch (n...@leadboat.com) wrote: So, if you desire to make this consistent, I recommend using rolreplication's treatment as the gold standard. That is to say, when dumping from an older version, set to false any of these role attributes not existing in that version. Robert's proposal to emit neither NOBYPASSRLS nor BYPASSRLS is a reasonable alternative, though less so for pg_dumpall --clean. It would be defensible to send NOBYPASSRLS under --clean and omit the option otherwise; consider that my second choice. I don't see the point in including them for --clean..? --clean states that DROP commands would be added, not that existing roles would be adjusted in some way. It does state that, but note this comment in dumpRoles(): /* * We dump CREATE ROLE followed by ALTER ROLE to ensure that the role * will acquire the right properties even if it already exists (ie, it * won't hurt for the CREATE to fail). This is particularly important * for the role we are connected as, since even with --clean we will * have failed to drop it. binary_upgrade cannot generate any errors, * so we assume the current role is already created. */ Under --clean, the right properties are those the role would have had if the DROP ROLE had succeeded. Those are necessarily independent of the pre-DROP version of the role. (Otherwise, you potentially get different outcomes depending on which superuser restored the --clean dump.) As for using 'always false'- I tend to think Robert actually has it better by using the default for users. Consider rolinherit- that defaults to 'true' and while it would technically be more 'safe' to set it to false, it wouldn't have matched what we provided under the user / group system prior to roles. Doing this would also reduce clutter in pg_dumpall output. My arguments and conclusion apply only to the permission-like attributes that are subsets of SUPERUSER. rolinherit is indeed not in that category. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS
* Noah Misch (n...@leadboat.com) wrote: On Fri, Nov 14, 2014 at 08:39:28PM -0500, Stephen Frost wrote: I don't see the point in including them for --clean..? --clean states that DROP commands would be added, not that existing roles would be adjusted in some way. It does state that, but note this comment in dumpRoles(): /* * We dump CREATE ROLE followed by ALTER ROLE to ensure that the role * will acquire the right properties even if it already exists (ie, it * won't hurt for the CREATE to fail). This is particularly important * for the role we are connected as, since even with --clean we will * have failed to drop it. binary_upgrade cannot generate any errors, * so we assume the current role is already created. */ Ah, yes, of course. Under --clean, the right properties are those the role would have had if the DROP ROLE had succeeded. Those are necessarily independent of the pre-DROP version of the role. (Otherwise, you potentially get different outcomes depending on which superuser restored the --clean dump.) Agreed, and in this case we'd need to set any attributes not set back to the default, which would include having NOBYPASSRLS. As for using 'always false'- I tend to think Robert actually has it better by using the default for users. Consider rolinherit- that defaults to 'true' and while it would technically be more 'safe' to set it to false, it wouldn't have matched what we provided under the user / group system prior to roles. Doing this would also reduce clutter in pg_dumpall output. My arguments and conclusion apply only to the permission-like attributes that are subsets of SUPERUSER. rolinherit is indeed not in that category. Understood. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] pg_basebackup vs. Windows and tablespaces
On Sat, Nov 15, 2014 at 2:21 AM, Peter Eisentraut pete...@gmx.net wrote: On 11/13/14 4:33 PM, Peter Eisentraut wrote: Is this still relevant after this commit? commit fb05f3ce83d225dd0f39f8860ce04082753e9e98 Author: Peter Eisentraut pete...@gmx.net Date: Sat Feb 22 13:38:06 2014 -0500 pg_basebackup: Add support for relocating tablespaces I believe so. The commit only applies to plain output. Amit's complaint is that tar utilities on Windows don't unpack symlinks, so the tar format isn't useful on Windows when tablespaces are used. So he wants the recovery mechanism to restore the symlinks. Um, wouldn't accepting this patch break the above-mentioned tablespace-relocation feature, because pg_basebackup wouldn't see any more symlinks sent down? No, the new feature is implemented only for tar format and above feature works only with plain format. It will still send the symlink information as previously for plain format. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED
Michael Paquier wrote: Btw, perhaps this diff should be pushed as a different patch as this is a rather different thing: - if (heapRelation-rd_rel-relpersistence == RELPERSISTENCE_UNLOGGED + if (indexRelation-rd_rel-relpersistence == RELPERSISTENCE_UNLOGGED !smgrexists(indexRelation-rd_smgr, INIT_FORKNUM) As this is a correctness fix, it does not seem necessary to back-patch it: the parent relation always has the same persistence as its indexes. There was an argument for doing it this way that only applies if this patch went in, but I can't remember now what it was. Anyway I pushed the patch after some slight additional changes. Thanks! -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New storage parameter pages_per_range not mentioned in CREATE INDEX doc
Michael Paquier wrote: Hi all, The new storage parameter pages_per_range is missing in the documentation of CREATE INDEX. The patch attached corrects that. Ah BTW I had pushed this earlier today. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Useless dead struct in parse_func.h
Peter Geoghegan wrote: It looks like commit 0e99be1c removed the final real use of the struct InhPaths. Attached patch removes it entirely. I didn't verify the claim about 0e99be1c, but I pushed it anyway. Thanks. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
On 11/14/2014 08:21 PM, Simon Riggs wrote: The requested information is already available, as discussed. Logical decoding adds commit ordering for *exactly* the purpose of using it for replication, available to all solutions. This often requested feature has now been added and doesn't need to be added twice. So what we are discussing is adding a completely superfluous piece of information. Not including the LSN info does nothing to trigger based replication, which will no doubt live on happily for many years. But adding LSN will slow down logical replication, for no purpose at all. Simon, The use cases I'm talking about aren't really replication related. Often I have come across systems that want to do something such as 'select * from orders where X the_last_row_I_saw order by X' and then do further processing on the order. This is kind of awkard to do today because you don't have a good candidate for 'X' to order on. Using either a sequence or insert-row timestamp doesn't work well because a transaction with a lower value for X might end up committing after the higher value in in a query result. Yes you could setup a logical wal slot and listen on the stream of inserts into your order table but thats a lot of administration overhead compared to just issuing an SQL query for what really is a query type operation. Using the commit timestamp for my X sounded very tempting but could allow duplicates. One could argue that this patch is about replication features, and providing commit ordering for query purposes should be a separate patch to add that on top of this infrastructure. I see merit to smaller more focused patches but that requires leaving the door open to easily extending things later. It could also be that I'm the only one who wants to order and filter queries in this manner (but that would surprise me). If the commit lsn has limited appeal and we decide we don't want it at all then we shouldn't add it. I've seen this type of requirement in a number of different systems at a number of different companies. I've generally seen it dealt with by either selecting rows behind the last now() timestamp seen and then filtering out already processed rows or by tracking the 'processed' state of each row individually (ie performing an update on each row once its been processed) which performs poorly. Steve -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New storage parameter pages_per_range not mentioned in CREATE INDEX doc
On Sat, Nov 15, 2014 at 1:27 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Michael Paquier wrote: Hi all, The new storage parameter pages_per_range is missing in the documentation of CREATE INDEX. The patch attached corrects that. Ah BTW I had pushed this earlier today. Thanks. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format and API changes (9.5)
On Fri, Nov 14, 2014 at 5:31 PM, Michael Paquier michael.paqu...@gmail.com wrote: 3) pg_xlogdump does not seem to work: $ pg_xlogdump 0001000D pg_xlogdump: FATAL: could not find a valid record after 0/D00 This one is a bad manipulation from my side. Please forget this comment. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup vs. Windows and tablespaces
On Sat, Nov 15, 2014 at 12:03 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Amit Kapila wrote: 2. Symlink file format: oid linkpath 16387 E:\PostgreSQL\tbs Symlink file will contain entries for all the tablspaces under pg_tblspc directory. I have kept the file name as symlink_label (suggestion are welcome if you want some different name for this file). I think symlink_label isn't a very good name. This file is not a label in the sense that backup_label is; it seems more a catalog to me. And it's not, in essence, about symlinks either, but rather about tablespaces. I would name it following the term tablespace catalog or some variation thereof. This file is going to provide the symlink path for each tablespace, so it not be bad to have that in file name. I agree with you that it's more about tablespaces. So how about: tablespace_symlink symlink_tablespace tablespace_info I know we don't expect that users would have to look at the file or edit it in normal cases, but it seems better to make it be human-readable. I would think that the file needs to have tablespace names too, then, not just OIDs. Maybe we don't use the tablespace name for anything other than documentation purposes if someone decides to look at the file, so perhaps it should look like a comment: oid link path ; tablespace name We already do this in pg_restore -l output IIRC. Okay, I will take care of doing this in next version of patch if no one objects to this idea or more people are in favour of doing so. One use case mentioned upthread is having the clone be created in the same machine as the source server. Does your proposal help with it? Sorry, but I am not getting which proposal exactly you are referring here, Could you explain in more detail? In general, if user took the backup (in tar format) using pg_basebackup, this patch will be able to restore such a backup even on the same server. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] pg_basebackup vs. Windows and tablespaces
On Sat, Nov 15, 2014 at 12:01 AM, Robert Haas robertmh...@gmail.com wrote: On Fri, Nov 14, 2014 at 1:15 PM, Tom Lane t...@sss.pgh.pa.us wrote: Generally I'd be in favor of avoiding platform-dependent code where possible, but that doesn't represent a YES vote for this particular patch. It looks pretty messy in a quick look, even granting that the #ifdef WIN32's would all go away. Hmm, OK. I have not read the patch. Hopefully that's something that could be fixed. A larger question here is about forward/backward compatibility of the basebackup files. Changing the representation of symlinks like this would break that. Maybe we don't care, not sure (is there already a catversion check for these things?). Changing the file format for only some platforms seems like definitely a bad idea though. What are the practical consequences of changing the file format? I think that an old backup containing symlinks could be made to work on a new server that knows how to create them, So if I understand correctly, by *old backup* you mean backup created by 9.5 and by *new server*, you mean server 9.5, if yes the current design should handle it. However if the backup is created on version 9.5 using pg_basebackup of same version and trying to restore it with server =9.5 won't work, because server won't have the information about symlinks. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com