Re: [HACKERS] Add support for restrictive RLS policies
Hello Stephen, On Tue, Sep 27, 2016 at 12:57 AM, Stephen Frost wrote: > Jeevan, > > * Jeevan Chalke (jeevan.cha...@enterprisedb.com) wrote: > > I have started reviewing this patch and here are couple of points I have > > observed so far: > > > > 1. Patch applies cleanly > > 2. make / make install / initdb all good. > > 3. make check (regression) FAILED. (Attached diff file for reference). > > I've re-based my patch on top of current head and still don't see the > failures which you are getting during the regression tests. Is it > possible you were doing the tests without a full rebuild of the source > tree..? > > Can you provide details of your build/test environment and the full > regression before and after output? > I still get same failures with latest sources and with new patch. Here are few details of my setup. Let me know if I missed any. $ uname -a Linux centos7 3.10.0-327.28.3.el7.x86_64 #1 SMP Thu Aug 18 19:05:49 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux HEAD at commit 51c3e9fade76c12e4aa37bffdf800bbf74fb3fb1 configure switches: ./configure --with-openssl --with-tcl --with-perl --with-python --with-ossp-uuid --with-ldap --with-pam --with-zlib --with-pgport=5432 --enable-depend --enable-debug --enable-cassert --prefix=`pwd`/install CFLAGS="-g -O0" Regression FAILED. Regression diff is same as previous one. Without patch I don't get any regression failure. Well, I could not restrict myself debugging this mystery and finally able to find the reason why this is failing. It was strange that it did not crash and simply gave different results. With this patch, pg_policy catalog now has seven columns, however Natts_pg_policy is still set to 6. It should be updated to 7 now. Doing this regression seems OK. I am reviewing the latest patch in detail now and will post my review comments later. Thanks > > Thanks! > > Stephen > -- Jeevan B Chalke Principal Software Engineer, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company
Re: [HACKERS] Fix some corner cases that cube_in rejects
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:not tested Note for committer : There are unnecessary files (cube_1.out, cube_2.out & cube_3.out) in contrib directory, that need to be removed at code checkin, other than this concern, I think this patch looks pretty reasonable. Thanks. Regards, Amul The new status of this patch is: Ready for Committer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactoring speculative insertion with unique indexes a little
On 09/24/2016 02:34 PM, Peter Geoghegan wrote: On Tue, Sep 20, 2016 at 8:55 AM, Heikki Linnakangas wrote: Likewise, in ExecInsertIndexTuples(), this makes the deferred-index case work slightly differently from speculative insertion. It's not a big difference, but it again forces you to keep one more scenario in mind, when reading the code This actually does have useful practical consequences, although I didn't point that out earlier (though I should have). To see what I mean, consider the complaint in this recent thread, which is based on an actual user application problem: https://www.postgresql.org/message-id/flat/1472134358.649524...@f146.i.mail.ru#1472134358.649524...@f146.i.mail.ru I go on to explain how this patch represents a partial solution to that [1]. That's what I mean by "useful practical consequences". As I say in [1], I think we could even get a full solution, if we applied this patch and *also* made the ordering in which the relcache returns a list of index OIDs more useful (it should still be based on something stable, to avoid deadlocks, but more than just OID order. Instead, relcache.c can sort indexes such that we insert into primary keys first, then unique indexes, then all other indexes. This also avoids bloat if there is a unique violation, by getting unique indexes out of the way first during ExecInsert()). Hmm, yeah, that'd be nice to fix. I'd like to see a patch specifically to fix that. I'm not convinced that we need all the complications of this patch, to get that fixed. (In particular, indexam's still wouldn't need to care about the different between CHECK_UNIQUE_PARTIAL and CHECK_UNIQUE_SPECULATIVE, I think.) - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactoring speculative insertion with unique indexes a little
On Tue, Sep 27, 2016 at 9:10 AM, Heikki Linnakangas wrote: > Hmm, yeah, that'd be nice to fix. I'd like to see a patch specifically to > fix that. I'm not convinced that we need all the complications of this > patch, to get that fixed. (In particular, indexam's still wouldn't need to > care about the different between CHECK_UNIQUE_PARTIAL and > CHECK_UNIQUE_SPECULATIVE, I think.) But you are convinced that everything else I describe would do it? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactoring speculative insertion with unique indexes a little
On 09/27/2016 11:22 AM, Peter Geoghegan wrote: On Tue, Sep 27, 2016 at 9:10 AM, Heikki Linnakangas wrote: Hmm, yeah, that'd be nice to fix. I'd like to see a patch specifically to fix that. I'm not convinced that we need all the complications of this patch, to get that fixed. (In particular, indexam's still wouldn't need to care about the different between CHECK_UNIQUE_PARTIAL and CHECK_UNIQUE_SPECULATIVE, I think.) But you are convinced that everything else I describe would do it? I didn't think very hard about it, but yeah, think so.. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactoring speculative insertion with unique indexes a little
On Tue, Sep 27, 2016 at 9:25 AM, Heikki Linnakangas wrote: > I didn't think very hard about it, but yeah, think so.. Do you think it's worth a backpatch? Or, too early to tell? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
Sorry about the delay in replying. On 2016/09/15 21:58, Ashutosh Bapat wrote: > Hi Amit, > > It looks like there is some problem while creating paramterized paths > for multi-level partitioned tables. Here's a longish testcase > > [ ... ] > > Please check if you are able to reproduce these errors in your > repository. I made sure that I cleaned up all partition-wise join code > before testing this, but ... . Thanks for the test case. I can reproduce the same. > I tried to debug the problem somewhat. In set_append_rel_pathlist(), > it finds that at least one child has a parameterized path as the > cheapest path, so it doesn't create an unparameterized path for append > rel. At the same time there is a parameterization common to all the > children, so it doesn't create any path. There seem to be two problems > here > 1. The children from second level onwards may not be getting > parameterized for lateral references. That seems unlikely but > possible. > 2. Reparameterization should have corrected this, but > reparameterize_path() does not support AppendPaths. Hmm, 0005-Refactor-optimizer-s-inheritance-set-expansion-code-5.patch is certainly to be blamed here; if I revert the patch, the problem goes away. Based on 2 above, I attempted to add logic for AppendPath in reparameterize_path() as in the attached. It fixes the reported problem and does not break any regression tests. If it's sane to do things this way, I will incorporate the attached into patch 0005 mentioned above. Thoughts? Thanks, Amit commit 86c8fc2c268d17fb598bf6879cfef2b2a2f42417 Author: amit Date: Tue Sep 27 16:02:54 2016 +0900 Handle AppendPath in reparameterize_path(). diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index 2a49639..750f0ea 100644 --- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -1570,6 +1570,40 @@ cost_sort(Path *path, PlannerInfo *root, } /* + * cost_append + * Determines and returns the cost of a Append node + * + * Compute rows and costs as sums of subplan rows and costs. We charge + * nothing extra for the Append itself, which perhaps is too optimistic, + * but since it doesn't do any selection or projection, it is a pretty + * cheap node. + */ +void +cost_append(AppendPath *apath, Relids required_outer) +{ + ListCell *l; + + apath->path.rows = 0; + apath->path.startup_cost = 0; + apath->path.total_cost = 0; + foreach(l, apath->subpaths) + { + Path *subpath = (Path *) lfirst(l); + + apath->path.rows += subpath->rows; + + if (l == list_head(apath->subpaths)) /* first node? */ + apath->path.startup_cost = subpath->startup_cost; + apath->path.total_cost += subpath->total_cost; + apath->path.parallel_safe = apath->path.parallel_safe && + subpath->parallel_safe; + + /* All child paths must have same parameterization */ + Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer)); + } +} + +/* * cost_merge_append * Determines and returns the cost of a MergeAppend node. * diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index abb7507..bf8fb55 100644 --- a/src/backend/optimizer/util/pathnode.c +++ b/src/backend/optimizer/util/pathnode.c @@ -1206,7 +1206,6 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer, int parallel_workers) { AppendPath *pathnode = makeNode(AppendPath); - ListCell *l; pathnode->path.pathtype = T_Append; pathnode->path.parent = rel; @@ -1220,32 +1219,7 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer, * unsorted */ pathnode->subpaths = subpaths; - /* - * We don't bother with inventing a cost_append(), but just do it here. - * - * Compute rows and costs as sums of subplan rows and costs. We charge - * nothing extra for the Append itself, which perhaps is too optimistic, - * but since it doesn't do any selection or projection, it is a pretty - * cheap node. - */ - pathnode->path.rows = 0; - pathnode->path.startup_cost = 0; - pathnode->path.total_cost = 0; - foreach(l, subpaths) - { - Path *subpath = (Path *) lfirst(l); - - pathnode->path.rows += subpath->rows; - - if (l == list_head(subpaths)) /* first node? */ - pathnode->path.startup_cost = subpath->startup_cost; - pathnode->path.total_cost += subpath->total_cost; - pathnode->path.parallel_safe = pathnode->path.parallel_safe && - subpath->parallel_safe; - - /* All child paths must have same parameterization */ - Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer)); - } + cost_append(pathnode, required_outer); return pathnode; } @@ -3204,6 +3178,41 @@ reparameterize_path(PlannerInfo *root, Path *path, spath->path.pathkeys, required_outer); } + case T_Append: + { +AppendPath *apath = (AppendPath *) path; +AppendPath *newpath; +List *new_subpaths = NIL; +ListCell *lc; + +foreach(lc, apath->subpaths) +{ +
Re: [HACKERS] Refactoring speculative insertion with unique indexes a little
On 09/27/2016 11:38 AM, Peter Geoghegan wrote: On Tue, Sep 27, 2016 at 9:25 AM, Heikki Linnakangas wrote: I didn't think very hard about it, but yeah, think so.. Do you think it's worth a backpatch? Or, too early to tell? Too early to tell.. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
>> >> Please check if you are able to reproduce these errors in your >> repository. I made sure that I cleaned up all partition-wise join code >> before testing this, but ... . > > Thanks for the test case. I can reproduce the same. > >> I tried to debug the problem somewhat. In set_append_rel_pathlist(), >> it finds that at least one child has a parameterized path as the >> cheapest path, so it doesn't create an unparameterized path for append >> rel. At the same time there is a parameterization common to all the >> children, so it doesn't create any path. There seem to be two problems >> here >> 1. The children from second level onwards may not be getting >> parameterized for lateral references. That seems unlikely but >> possible. Did you check this? We may be missing on creating index scan paths with parameterization. If we fix this, we don't need to re-parameterize Append. >> 2. Reparameterization should have corrected this, but >> reparameterize_path() does not support AppendPaths. > > Hmm, 0005-Refactor-optimizer-s-inheritance-set-expansion-code-5.patch is > certainly to be blamed here; if I revert the patch, the problem goes away. > > Based on 2 above, I attempted to add logic for AppendPath in > reparameterize_path() as in the attached. It fixes the reported problem > and does not break any regression tests. If it's sane to do things this > way, I will incorporate the attached into patch 0005 mentioned above. > Thoughts? -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2016/09/27 15:44, Ashutosh Bapat wrote: >> By the way, I fixed one thinko in your patch as follows: >> >> -result->oids[i] = oids[mapping[i]]; >> +result->oids[mapping[i]] = oids[i]; > > While I can not spot any problem with this logic, when I make that > change and run partition_join testcase in my patch, it fails because > wrong partitions are matched for partition-wise join of list > partitions. In that patch, RelOptInfo of partitions are saved in > RelOptInfo of the parent by matching their OIDs. They are saved in the > same order as corresponding OIDs. Partition-wise join simply joins the > RelOptInfos at the same positions from both the parent RelOptInfos. I > can not spot an error in this logic too. OTOH, using the original logic makes tuple routing put tuples into the wrong partitions. When debugging why that was happening I discovered this and hence the proposed change. You mean that partition RelOptInfo's are placed using the canonical ordering of OIDs instead of catalog-scan-driven order, right? If that's true, then there is no possibility of wrong pairing happening, even with the new ordering of OIDs in the partition descriptor (ie, the ordering that would be produced by my proposed method above). Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Tue, Sep 27, 2016 at 2:46 PM, Amit Langote wrote: > On 2016/09/27 15:44, Ashutosh Bapat wrote: >>> By the way, I fixed one thinko in your patch as follows: >>> >>> -result->oids[i] = oids[mapping[i]]; >>> +result->oids[mapping[i]] = oids[i]; >> >> While I can not spot any problem with this logic, when I make that >> change and run partition_join testcase in my patch, it fails because >> wrong partitions are matched for partition-wise join of list >> partitions. In that patch, RelOptInfo of partitions are saved in >> RelOptInfo of the parent by matching their OIDs. They are saved in the >> same order as corresponding OIDs. Partition-wise join simply joins the >> RelOptInfos at the same positions from both the parent RelOptInfos. I >> can not spot an error in this logic too. > > OTOH, using the original logic makes tuple routing put tuples into the > wrong partitions. When debugging why that was happening I discovered this > and hence the proposed change. > > You mean that partition RelOptInfo's are placed using the canonical > ordering of OIDs instead of catalog-scan-driven order, right? If that's > true, then there is no possibility of wrong pairing happening, even with > the new ordering of OIDs in the partition descriptor (ie, the ordering > that would be produced by my proposed method above). right! I don't know what's wrong, will debug my changes. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [GENERAL] inconsistent behaviour of set-returning functions in sub-query with random()
Good to know and I agree that it is not an urgent case. I think this practice might be more common in the POSTGIS community where there are plenty of set-returning-functions used in this way. My use was taking a random sample of a pointcloud distrubution. I took the liberty to post your answer at stackexchange. thanks, Tom On Mon, 26 Sep 2016 at 21:38 Tom Lane wrote: > Tom van Tilburg writes: > > I'm often using the WHERE clause random() > 0.5 to pick a random subset > of > > my data. Now I noticed that when using a set-returning function in a > > sub-query, I either get the whole set or none (meaning that the WHERE > > random() > 0.5 clause is interpreted *before* the set is being > generated). > > e.g.: > > > > SELECT num FROM ( > > SELECT unnest(Array[1,2,3,4,5,6,7,8,9,10]) num) AS foo WHERE > random() > 0.5; > > Hmm, I think this is an optimizer bug. There are two legitimate behaviors > here: > > SELECT * FROM unnest(ARRAY[1,2,3,4,5,6,7,8,9,10]) WHERE random() > 0.5; > > should (and does) re-evaluate the WHERE for every row output by unnest(). > > SELECT unnest(ARRAY[1,2,3,4,5,6,7,8,9,10]) WHERE random() > 0.5; > > should evaluate WHERE only once, since that happens before expansion of the > set-returning function in the targetlist. (If you're an Oracle user and > you imagine this query as having an implicit "FROM dual", the WHERE should > be evaluated for the single row coming out of the FROM clause.) > > In the case you've got here, given the placement of the WHERE in the outer > query, you'd certainly expect it to be evaluated for each row coming out > of the inner query. But the optimizer is deciding it can push the WHERE > clause down to become a WHERE of the sub-select. That is legitimate in a > lot of cases, but not when there are SRF(s) in the sub-select's > targetlist, because that pushes the WHERE to occur before the SRF(s), > analogously to the change between the two queries I wrote. > > I'm a bit hesitant to change this in existing releases. Given the lack > of previous complaints, it seems more likely to break queries that were > behaving as-expected than to make people happy. But we could change it > in v10 and up, especially since some other corner-case changes in > SRF-in-tlist behavior are afoot. > > In the meantime, you could force it to work as you wish by inserting the > all-purpose optimization fence "OFFSET 0" in the sub-select: > > =# SELECT num FROM ( > SELECT unnest(Array[1,2,3,4,5,6,7,8,9,10]) num OFFSET 0) AS foo WHERE > random() > 0.5; > num > - >1 >4 >7 >9 > (4 rows) > > > regards, tom lane >
Re: [HACKERS] Patch: Implement failover on libpq connect level.
On Sun, 25 Sep 2016 17:31:53 +0530 Mithun Cy wrote: > I have some more comments on libpq-failover-8.patch > > + /* FIXME need to check that port is numeric */ > > Is this still applicable?. > Unfortunately, it was. I've fixed this problem in 9-th version of patch (attached) > > I think we need comments to know why we change default value based on > number of elements in connection string. why default in not “any" > when node count > 1. Fixed. > > + /* loop over all the host specs in the node variable */ > > + for (node = nodes; node->host != NULL || node->port != NULL; node++) > > { > > I think instead of loop if we can move these code into a separate > function say pg_add_to_addresslist(host, port, &addrs) this helps in > removing temp variables like "struct node info” and several heap > calls around it. For some reason DNS resolving was concentrated in one place before my changes. So, I've tried to not change this decision. > 3) > > +static char * > > +get_port_from_addrinfo(struct addrinfo * ai) > > > Need some comments for this function. Done. > We use strdup in many places no where we handle memory allocation > failure. Added some more memory allocation error checks. > > Comments not in sink with code. Fixed. diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index 4e34f00..f0df877 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -792,7 +792,7 @@ host=localhost port=5432 dbname=mydb connect_timeout=10 The general form for a connection URI is: -postgresql://[user[:password]@][netloc][:port][/dbname][?param1=value1&...] +postgresql://[user[:password]@][netloc][:port][,netloc[:port]...][/dbname][?param1=value1&...] @@ -809,6 +809,7 @@ postgresql://localhost/mydb postgresql://user@localhost postgresql://user:secret@localhost postgresql://other@localhost/otherdb?connect_timeout=10&application_name=myapp +postgresql://node1,node2:5433,node3:4432,node4/mydb?hostorder=random&target_server_type=any Components of the hierarchical part of the URI can also be given as parameters. For example: @@ -831,7 +832,9 @@ postgresql:///mydb?host=localhost&port=5433 For improved compatibility with JDBC connection URIs, instances of parameter ssl=true are translated into -sslmode=require. +sslmode=require and +loadBalanceHosts=true into +hostorder=random. @@ -841,6 +844,10 @@ postgresql:///mydb?host=localhost&port=5433 postgresql://[2001:db8::1234]/database + + There can be serveral host specifications, optionally accompanied + with port, separated by comma. + The host component is interpreted as described for the parameter PostgreSQL was built). On machines without Unix-domain sockets, the default is to connect to localhost. + + There can be more than one host parameter in + the connect string. In this case these hosts would be considered + alternate entries into same database and if connect to first one + fails, library would try to connect second etc. This can be used + for high availability cluster or for load balancing. See +parameter. + + + Network host name can be accompanied with port number, separated by + colon. If so, this port number is used only when connected to + this host. If there is no port number, port specified in the +parameter would be used. + @@ -942,8 +963,44 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname - + + + hostorder + + + Specifies how to choose host from list of alternate hosts, + specified in the parameter. + + + If value of this argument is sequential (the + default) library connects to the hosts in order they specified, + and tries to connect second one only if connection to the first + fails. + + + If value is random host to connect is randomly + picked from the list. It allows to balance load between several + cluster nodes. However, currently PostgreSQL doesn't support + multimaster clusters. So, without use of third-party products, + only read-only connections can take advantage from the + load-balancing. See + + + + + target_server_type + + + If this parameter is master upon successful connection + library checks if host is in recovery state, and if it is so, + tries next host in the connect string. If this parameter is + any, connection to standby nodes are considered + successful. + + + + port @@ -985,7 +1042,6 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname - connect_timeout @@ -996,7 +1052,27 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname - + + failover_t
Re: [HACKERS] Transactions involving multiple postgres foreign servers
On Mon, Sep 26, 2016 at 9:07 PM, Ashutosh Bapat wrote: > On Mon, Sep 26, 2016 at 5:25 PM, Masahiko Sawada > wrote: >> On Mon, Sep 26, 2016 at 7:28 PM, Ashutosh Bapat >> wrote: >>> My original patch added code to manage the files for 2 phase >>> transactions opened by the local server on the remote servers. This >>> code was mostly inspired from the code in twophase.c which manages the >>> file for prepared transactions. The logic to manage 2PC files has >>> changed since [1] and has been optimized. One of the things I wanted >>> to do is see, if those optimizations are applicable here as well. Have >>> you considered that? >>> >>> >> >> Yeah, we're considering it. >> After these changes are committed, we will post the patch incorporated >> these changes. >> >> But what we need to do first is the discussion in order to get consensus. >> Since current design of this patch is to transparently execute DCL of >> 2PC on foreign server, this code changes lot of code and is >> complicated. > > Can you please elaborate. I am not able to understand what DCL is > involved here. According to [1], examples of DCL are GRANT and REVOKE > command. I meant transaction management command such as PREPARE TRANSACTION and COMMIT/ABORT PREPARED command. The web page I refered might be wrong, sorry. >> Another approach I have is to push down DCL to only foreign servers >> that support 2PC protocol, which is similar to DML push down. >> This approach would be more simpler than current idea and is easy to >> use by distributed transaction manager. > > Again, can you please elaborate, how that would be different from the > current approach and how does it simplify the code. > The idea is just to push down PREPARE TRANSACTION, COMMIT/ROLLBACK PREPARED to foreign servers that support 2PC. With this idea, the client need to do following operation when foreign server is involved with transaction. BEGIN; UPDATE parent_table SET ...; -- update including foreign server PREPARE TRANSACTION 'xact_id'; COMMIT PREPARED 'xact_id'; The above PREPARE TRANSACTION and COMMIT PREPARED command are pushed down to foreign server. That is, the client needs to execute PREPARE TRANSACTION and COMMIT/ROLLBACK PREPARED explicitly. In this idea, I think that we don't need to do followings, * Providing the prepare id of 2PC. Current patch adds new API prepare_id_provider() but we can use the prepare id of 2PC that is used on parent server. * Keeping track of status of foreign servers. Current patch keeps track of status of foreign servers involved with transaction but this idea is just to push down transaction management command to foreign server. So I think that we no longer need to do that. * Adding max_prepared_foreign_transactions parameter. It means that the number of transaction involving foreign server is the same as max_prepared_transactions. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress
Hi, I apologize in advance that the comments in this message might one of the ideas discarded in the past thread.. I might not grasp the discussion completely X( The attached patches are rebased to the master and additional one mentioned below. At Wed, 18 May 2016 17:57:49 -0400, Michael Paquier wrote in > A couple of months back is has been reported to pgsql-bugs that WAL > segments were always switched with a low value of archive_timeout even > if a system is completely idle: > http://www.postgresql.org/message-id/20151016203031.3019.72...@wrigleys.postgresql.org > In short, a closer look at the problem has showed up that the logic in > charge of checking if a checkpoint should be skipped or not is > currently broken, because it completely ignores standby snapshots in > its calculation of the WAL activity. So a checkpoint always occurs > after checkpoint_timeout on an idle system since hot_standby has been > introduced as wal_level. This did not get better from 9.4, since > standby snapshots are logged every 15s by the background writer > process. In 9.6, since wal_level = 'archive' and 'hot_standby' > actually has the same meaning, the skip logic that worked with > wal_level = 'archive' does not do its job anymore. > > One solution that has been discussed is to track the progress of WAL > activity when doing record insertion by being able to mark some > records as not updating the progress of WAL. Standby snapshot records > enter in this category, making the checkpoint skip logic more robust. > Attached is a patch implementing a solution for it, by adding in If I understand the old thread correctly, this still doesn't solve the main issue, excessive checkpoint and segment switching. The reason is the interaction between XLOG_SWITCH and checkpoint as mentioned there. I think we may treat XLOG_SWITCH as NO_PROGRESS, since we can recover to the lastest state without it. If it is not wrong, the second patch attached (v12-2) inserts XLOG_SWITCH as NO_PROGRESS and skips segment switching when no progress took place for the round. > WALInsertLock a new field that gets updated for each record to track > the LSN progress. This allows to reliably skip the generation of > standby snapshots in the bgwriter or checkpoints on an idle system. WALInsertLock doesn't seem to me to be a good place for progressAt even considering the discussion about adding few instructions (containing a branch) in the hot-section. BackgroundWriterMain blocks all running XLogInsertRecord every 200 ms, not 15 or 30 seconds (only for replica, though). If this is correct, the Amit's suggestion below will have more significance, and I rather agree with it. XLogCtl is more suitable place for progressAt for the case. https://www.postgresql.org/message-id/caa4ek1lb9hdq+f8lw8bgrqx6sw42xaikx1uq2dzk+auegbf...@mail.gmail.com Amit> Taking and releasing locks again and again (which is done in patch) Amit> matters much more than adding few instructions under it and I think Amit> if you would have written the code in-a-way as in patch in some of Amit> the critical path, it would have been regressed badly, but because Amit> checkpoint doesn't happen often, reproducing regression is difficult. https://www.postgresql.org/message-id/cab7npqso6hvj0t6lut84pcy+_ihitdt64ze2d+sjrhzy72y...@mail.gmail.com > > Also, I think it is worth to once take the performance data for > > write tests (using pgbench 30 minute run or some other way) with > > minimum checkpoint_timeout (i.e 30s) to see if the additional locking > > has any impact on performance. I think taking locks at intervals > > of 15s or 30s should not matter much, but it is better to be safe. > > I don't think performance will be impacted, but there are no reasons > to not do any measurements either. I'll try to get some graphs > tomorrow with runs on my laptop, mainly looking for any effects of > this patch on TPS when checkpoints show up. I don't think the impact is measurable on a laptop, where 2 to 4 cores available at most. > Per discussion with Andres at PGcon, we decided that this is an > optimization, only for 9.7~ because this has been broken for a long > time. I have also changed XLogIncludeOrigin() to use a more generic > routine to set of status flags for a record being inserted: > XLogSetFlags(). This routine can use two flags: > - INCLUDE_ORIGIN to decide if the origin should be logged or not > - NO_PROGRESS to decide at insertion if a record should update the LSN > progress or not. > Andres mentioned me that we'd want to have something similar to > XLogIncludeOrigin, but while hacking I noticed that grouping both > things under the same umbrella made more sense. This looks reasonable. regards, -- Kyotaro Horiguchi NTT Open Source Software Center >From 686e4981c0d7ab3dd9e919f8b203aeb475f89a3b Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Tue, 27 Sep 2016 10:19:55 +0900 Subject: [PATCH 1/3] hs-checkpoints-v12-1 Rebased version of v11-1 --- src/backend/access/hea
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Wed, Sep 21, 2016 at 8:47 AM, Dilip Kumar wrote: > Summary: > -- > At 32 clients no gain, I think at this workload Clog Lock is not a problem. > At 64 Clients we can see ~10% gain with simple update and ~5% with TPCB. > At 128 Clients we can see > 50% gain. > > Currently I have tested with synchronous commit=off, later I can try > with on. I can also test at 80 client, I think we will see some > significant gain at this client count also, but as of now I haven't > yet tested. > > With above results, what we think ? should we continue our testing ? I have done further testing with on TPCB workload to see the impact on performance gain by increasing scale factor. Again at 32 client there is no gain, but at 64 client gain is 12% and at 128 client it's 75%, it shows that improvement with group lock is better at higher scale factor (at 300 scale factor gain was 5% at 64 client and 50% at 128 clients). 8 socket machine (kernel 3.10) 10 min run(median of 3 run) synchronous_commit=off scal factor = 1000 share buffer= 40GB Test results: client head group lock 32 27496 27178 64 31275 35205 1282065634490 LWLOCK_STATS approx. block count on ClogControl Lock ("lwlock main 11") client head group lock 32 8 6 6415 10 128 14 7 Note: These are approx. block count, I have detailed result of LWLOCK_STAT, incase someone wants to look into. LWLOCK_STATS shows that ClogControl lock block count reduced by 25% at 32 client, 33% at 64 client and 50% at 128 client. Conclusion: 1. I think both LWLOCK_STATS and performance data shows that we get significant contention reduction on ClogControlLock with the patch. 2. It also shows that though we are not seeing any performance gain at 32 clients, but there is contention reduction with patch. I am planning to do some more test with higher scale factor (3000 or more). -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Python adding support for multi-dimensional arrays
On 26 September 2016 at 14:52, Dave Cramer wrote: > > > >> >> This crashes with arrays with non-default lower bounds: >> >> postgres=# SELECT * FROM test_type_conversion_array_int >> 4('[2:4]={1,2,3}'); >> INFO: ([1, 2, ], ) >> server closed the connection unexpectedly >> This probably means the server terminated abnormally >> before or while processing the request. >> >> Attached patch fixes this bug, and adds a test for it. > >> >> I'd like to see some updates to the docs for this. The manual doesn't >> currently say anything about multi-dimensional arrays in pl/python, but it >> should've mentioned that they're not supported. Now that it is supported, >> should mention that, and explain briefly that a multi-dimensional array is >> mapped to a python list of lists. >> >> If the code passes I'll fix the docs > >> It seems we don't have any mention in the docs about arrays with >> non-default lower-bounds ATM. That's not this patch's fault, but it would >> be good to point out that the lower bounds are discarded when an array is >> passed to python. >> >> I find the loop in PLyList_FromArray() quite difficult to understand. Are >> the comments there mixing up the "inner" and "outer" dimensions? I wonder >> if that would be easier to read, if it was written in a recursive-style, >> rather than iterative with stacks for the dimensions. >> >> Yes, it is fairly convoluted. > > > Dave Cramer > > da...@postgresintl.com > www.postgresintl.com > > > PL-Python-adding-support-for-multi-dimensional-arrays-20160926.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench more operators & functions
Hi, The patch has correct precedence now. Further minor comments: 1. About documentation, I think it will be good idea to arrange the operators table with the precedence and add a line at top: "In decreasing order of precedence". 2. You may want to remove the comment: + /* should it do a lazy evaluation of the branch? */ Regards, Jeevan Ladhe
Re: [HACKERS] Declarative partitioning - another take
On 2016/09/27 18:09, Ashutosh Bapat wrote: >>> I tried to debug the problem somewhat. In set_append_rel_pathlist(), >>> it finds that at least one child has a parameterized path as the >>> cheapest path, so it doesn't create an unparameterized path for append >>> rel. At the same time there is a parameterization common to all the >>> children, so it doesn't create any path. There seem to be two problems >>> here >>> 1. The children from second level onwards may not be getting >>> parameterized for lateral references. That seems unlikely but >>> possible. > > Did you check this? We may be missing on creating index scan paths > with parameterization. If we fix this, we don't need to > re-parameterize Append. You're right. How about the attached patch that fixes the problem along these lines? The problem seems to be that multi-level inheritance sets (partitioned tables) are not handled in create_lateral_join_info(), which means that lateral_relids and lateral_referencers of the root relation are not being propagated to the partitions below level 1. I'm getting concerned about one thing though - for a given *regular* inheritance set, the root->append_rel_list would be scanned only once; But for a *partitioned table* inheritance set, it would be scanned for every partitioned table in the set (ie, the root table and internal partitions). Thanks, Amit commit d69aeabcfcb58f349602a1b7392c611045c37465 Author: amit Date: Tue Sep 27 20:01:44 2016 +0900 Consider multi-level partitioned tables in create_lateral_join_info(). diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c index 84ce6b3..74734f2 100644 --- a/src/backend/optimizer/plan/initsplan.c +++ b/src/backend/optimizer/plan/initsplan.c @@ -14,6 +14,7 @@ */ #include "postgres.h" +#include "catalog/pg_class.h" #include "catalog/pg_type.h" #include "nodes/nodeFuncs.h" #include "optimizer/clauses.h" @@ -623,8 +624,19 @@ create_lateral_join_info(PlannerInfo *root) for (rti = 1; rti < root->simple_rel_array_size; rti++) { RelOptInfo *brel = root->simple_rel_array[rti]; + RangeTblEntry *rte = root->simple_rte_array[rti]; - if (brel == NULL || brel->reloptkind != RELOPT_BASEREL) + if (brel == NULL) + continue; + + /* + * If an "other rel" RTE is a "partitioned table", we must propagate + * the lateral info inherited from the parent to its children. That's + * because they are not linked directly with the parent via + * AppendRelInfo's (see expand_inherited_rte_internal()). + */ + if (brel->reloptkind != RELOPT_BASEREL && + rte->relkind != RELKIND_PARTITIONED_TABLE) continue; if (root->simple_rte_array[rti]->inh) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add support for restrictive RLS policies
On Tue, Sep 27, 2016 at 12:45 PM, Jeevan Chalke < jeevan.cha...@enterprisedb.com> wrote: > Hello Stephen, > > I am reviewing the latest patch in detail now and will post my review > comments later. > Here are the review comments: 1. In documentation, we should put both permissive as well as restrictive in the header like permissive|restrictive. Or may be a generic term, say, policy type (like we have command and then options mentioned like all, select etc.), followed by options in the description. Or like we have CASCADE and RESTRICT in drop case. 2. "If the policy is a "permissive" or "restrictive" policy." seems broken as sentence starts with "If" and there is no other part to it. Will it be better to say "Specifies whether the policy is a "permissive" or "restrictive" policy."? 3. " .. a policy can instead by "restrictive"" Do you mean "instead be" here? 4. It will be good if we have an example for this in section "5.7. Row Security Policies" 5. AS ( PERMISSIVE | RESTRICTIVE ) should be '{' and '}' instead of parenthesis. 6. I think following changes are irrelevant for this patch. Should be submitted as separate patch if required. @@ -2133,6 +2139,25 @@ psql_completion(const char *text, int start, int end) /* Complete "CREATE POLICY ON USING (" */ else if (Matches6("CREATE", "POLICY", MatchAny, "ON", MatchAny, "USING")) COMPLETE_WITH_CONST("("); +/* CREATE POLICY ON AS PERMISSIVE|RESTRICTIVE FOR ALL|SELECT|INSERT|UPDATE|DELETE */ +else if (Matches8("CREATE", "POLICY", MatchAny, "ON", MatchAny, "AS", MatchAny, "FOR")) +COMPLETE_WITH_LIST5("ALL", "SELECT", "INSERT", "UPDATE", "DELETE"); +/* Complete "CREATE POLICY ON AS PERMISSIVE|RESTRICTIVE FOR INSERT TO|WITH CHECK" */ +else if (Matches9("CREATE", "POLICY", MatchAny, "ON", MatchAny, "AS", MatchAny, "FOR", "INSERT")) +COMPLETE_WITH_LIST2("TO", "WITH CHECK ("); +/* Complete "CREATE POLICY ON AS PERMISSIVE|RESTRICTIVE FOR SELECT|DELETE TO|USING" */ +else if (Matches9("CREATE", "POLICY", MatchAny, "ON", MatchAny, "AS", MatchAny, "FOR", "SELECT|DELETE")) +COMPLETE_WITH_LIST2("TO", "USING ("); +/* CREATE POLICY ON AS PERMISSIVE|RESTRICTIVE FOR ALL|UPDATE TO|USING|WITH CHECK */ +else if (Matches9("CREATE", "POLICY", MatchAny, "ON", MatchAny, "AS", MatchAny, "FOR", "ALL|UPDATE")) +COMPLETE_WITH_LIST3("TO", "USING (", "WITH CHECK ("); +/* Complete "CREATE POLICY ON AS PERMISSIVE|RESTRICTIVE TO " */ +else if (Matches8("CREATE", "POLICY", MatchAny, "ON", MatchAny, "AS", MatchAny, "TO")) +COMPLETE_WITH_QUERY(Query_for_list_of_grant_roles); +/* Complete "CREATE POLICY ON AS PERMISSIVE|RESTRICTIVE USING (" */ +else if (Matches8("CREATE", "POLICY", MatchAny, "ON", MatchAny, "AS", MatchAny, "USING")) +COMPLETE_WITH_CONST("("); 7. Natts_pg_policy should be updated to 7 now. 8. In following error, $2 and @2 should be used to correctly display the option and location. ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), errmsg("unrecognized row security option \"%s\"", $1), errhint("Only PERMISSIVE or RESTRICTIVE policies are supported currently."), parser_errposition(@1))); You will see following result otherwise: postgres=# CREATE POLICY my_policy ON foo AS a123; ERROR: unrecognized row security option "as" LINE 1: CREATE POLICY my_policy ON foo AS a123; ^ HINT: Only PERMISSIVE or RESTRICTIVE policies are supported currently. I think adding negative test to test this error should be added in regression. 9. Need to update following comments in gram.y to reflect new changes. *QUERIES: *CREATE POLICY name ON table [FOR cmd] [TO role, ...] *[USING (qual)] [WITH CHECK (with_check)] 10. ALTER POLICY has no changes for this. Any reason why that is not considered here. 11. Will it be better to use boolean for polpermissive in _policyInfo? And then set that appropriately while getting the policies. So that later we only need to test the boolean avoiding string comparison. 12. Since PERMISSIVE is default, we should dump only "RESTRICTIVE" when appropriate, like other default cases. strcmp(polinfo->polpermissive,"t") == 0 ? "PERMISSIVE" : "RESTRICTIVE" 13. Since PERMISSIVE is default, do we need changes like below? -\QCREATE POLICY p1 ON test_table FOR ALL TO PUBLIC \E +\QCREATE POLICY p1 ON test_table AS PERMISSIVE FOR ALL TO PUBLIC \E I am not familiar or used TAP framework. So no idea about these changes. 14. While displaying policy details in permissionsList, per syntax, we should display (RESTRICT) before the command option. Also will it be better to use (RESTRICTIVE) instead of (RESTRICT)? 15. Similarly in describeOneTableDetails() too, can we have RESTRICTIVE after
Re: [HACKERS] Transactions involving multiple postgres foreign servers
On Tue, Sep 27, 2016 at 2:54 PM, Masahiko Sawada wrote: > On Mon, Sep 26, 2016 at 9:07 PM, Ashutosh Bapat > wrote: >> On Mon, Sep 26, 2016 at 5:25 PM, Masahiko Sawada >> wrote: >>> On Mon, Sep 26, 2016 at 7:28 PM, Ashutosh Bapat >>> wrote: My original patch added code to manage the files for 2 phase transactions opened by the local server on the remote servers. This code was mostly inspired from the code in twophase.c which manages the file for prepared transactions. The logic to manage 2PC files has changed since [1] and has been optimized. One of the things I wanted to do is see, if those optimizations are applicable here as well. Have you considered that? >>> >>> Yeah, we're considering it. >>> After these changes are committed, we will post the patch incorporated >>> these changes. >>> >>> But what we need to do first is the discussion in order to get consensus. >>> Since current design of this patch is to transparently execute DCL of >>> 2PC on foreign server, this code changes lot of code and is >>> complicated. >> >> Can you please elaborate. I am not able to understand what DCL is >> involved here. According to [1], examples of DCL are GRANT and REVOKE >> command. > > I meant transaction management command such as PREPARE TRANSACTION and > COMMIT/ABORT PREPARED command. > The web page I refered might be wrong, sorry. > >>> Another approach I have is to push down DCL to only foreign servers >>> that support 2PC protocol, which is similar to DML push down. >>> This approach would be more simpler than current idea and is easy to >>> use by distributed transaction manager. >> >> Again, can you please elaborate, how that would be different from the >> current approach and how does it simplify the code. >> > > The idea is just to push down PREPARE TRANSACTION, COMMIT/ROLLBACK > PREPARED to foreign servers that support 2PC. > With this idea, the client need to do following operation when foreign > server is involved with transaction. > > BEGIN; > UPDATE parent_table SET ...; -- update including foreign server > PREPARE TRANSACTION 'xact_id'; > COMMIT PREPARED 'xact_id'; > > The above PREPARE TRANSACTION and COMMIT PREPARED command are pushed > down to foreign server. > That is, the client needs to execute PREPARE TRANSACTION and > > In this idea, I think that we don't need to do followings, > > * Providing the prepare id of 2PC. > Current patch adds new API prepare_id_provider() but we can use the > prepare id of 2PC that is used on parent server. > > * Keeping track of status of foreign servers. > Current patch keeps track of status of foreign servers involved with > transaction but this idea is just to push down transaction management > command to foreign server. > So I think that we no longer need to do that. > COMMIT/ROLLBACK PREPARED explicitly. The problem with this approach is same as one previously stated. If the connection between local and foreign server is lost between PREPARE and COMMIT the prepared transaction on the foreign server remains dangling, none other than the local server knows what to do with it and the local server has lost track of the prepared transaction on the foreign server. So, just pushing down those commands doesn't work. > > * Adding max_prepared_foreign_transactions parameter. > It means that the number of transaction involving foreign server is > the same as max_prepared_transactions. > That isn't true exactly. max_prepared_foreign_transactions indicates how many transactions can be prepared on the foreign server, which in the method you propose should have a cap of max_prepared_transactions * number of foreign servers. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Generic type subscription
On Fri, 9 Sep 2016 18:29:23 +0700 Dmitry Dolgov <9erthali...@gmail.com> wrote: > Hi, > > Regarding to the previous conversations [1], [2], here is a patch > (with some improvements from Nikita Glukhov) for the generic type > subscription. It allows > to define type-specific subscription logic for any data type and has > implementations for the array and jsonb types. There are following > changes in this > patch: I've tried to compile this patch with current state of master (commit 51c3e9fade76c12) and found out that, when configured with --enable-cassert, it doesn't pass make check. LOG: server process (PID 4643) was terminated by signal 6: Aborted DETAIL: Failed process was running: update test_jsonb_subscript set test_json['a'] = '{"b": 1}'::jsonb; -- Victor -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Some information about the small portion of source code of postgreSQL
Hi, I am working in optimizer module of postgreSQL 9.4.1. My project is concerned with the plan returned by the optimizer. I am trying to return a subplan for a query instead of full plan. For this I need to return a path form *RelOptInfo *standard_join_search() *at* allpaths.c *other than from last level of *root->join_rel_level().* while doing so I am getting *error :variable not found in subplan target lists* at function *fix_join_expr_mutator* at *setrefs.c.* It will be great if you can give me some idea about why this error is happening,since this error is happening at above function, Please give me more information about that portion of code. Also I want to know for what targetlist stands for. Thanks, Davinder Singh
Re: [HACKERS] Refactoring speculative insertion with unique indexes a little
Heikki Linnakangas writes: > On 09/24/2016 02:34 PM, Peter Geoghegan wrote: >> I go on to explain how this patch represents a partial solution to >> that [1]. That's what I mean by "useful practical consequences". As I >> say in [1], I think we could even get a full solution, if we applied >> this patch and *also* made the ordering in which the relcache returns >> a list of index OIDs more useful (it should still be based on >> something stable, to avoid deadlocks, but more than just OID order. >> Instead, relcache.c can sort indexes such that we insert into primary >> keys first, then unique indexes, then all other indexes. This also >> avoids bloat if there is a unique violation, by getting unique indexes >> out of the way first during ExecInsert()). > Hmm, yeah, that'd be nice to fix. I'd like to see a patch specifically > to fix that. I'm not convinced that we need all the complications of > this patch, to get that fixed. (In particular, indexam's still wouldn't > need to care about the different between CHECK_UNIQUE_PARTIAL and > CHECK_UNIQUE_SPECULATIVE, I think.) I can see the value of processing unique indexes before non-unique ones. I'm pretty suspicious of trying to prioritize primary keys first, though, because (a) it's not clear why bother, and (b) PG is a tad squishy about whether an index is a primary key or not, so that I'd be worried about different backends sometimes choosing different orders. I'd simplify this to "uniques in OID order then non-uniques in OID order". regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol
On Tue, Sep 27, 2016 at 9:01 PM, Heikki Linnakangas wrote: > * Added error-handling for OOM and other errors in liybpq > * In libpq, added check that the server sent back the same client-nonce > * Turned ERRORs into COMMERRORs and removed DEBUG4 lines (they could reveal > useful information to an attacker) > * Improved comments Thanks! > * A source of random values. This currently uses PostmasterRandom() > similarly to how the MD5 salt is generated, in the server, but plain old > random() in the client. If built with OpenSSL, we should probably use > RAND_bytes(). But what if the client is built without OpenSSL? I believe the > protocol doesn't require cryptographically strong randomness for the nonces, > i.e. it's OK if they're predictable, but they should be different for each > session. And what if we just replace PostmasterRandom()? pgcrypto is a useful source of inspiration here. If the server is built with OpenSSL we use RAND_bytes all the time. If not, let's use /dev/urandom. If urandom is not there, we fallback to /dev/random. For WIN32, there is CryptGenRandom(). This could just be done as an independent patch with a routine in src/common/ for example to allow both frontend and backend to use it. Do you think that this is a requirement for this patch? I think not really for the first shot. > * Nonce and salt lengths. The patch currently uses 10 bytes for both, but I > think I just pulled number that out of thin air. The spec doesn't say > anything about nonce and salt lengths AFAICS. What do other implementations > use? Is 10 bytes enough? Good question, but that seems rather short to me now that you mention it. Mongo has implemented already SCRAM-SHA-1 and they are using 3 uint64 so that's 24 bytes (sasl_scramsha1_client_conversation.cpp for example). For the salt I am seeing a reference to a string "salt" only, which is too short. > * The spec defines a final "server-error" message that the server sends on > authentication failure, or e.g. if a required extension is not supported. > The patch just uses FATAL for those. Should we try to send a server-error > message instead, or before, the elog(FATAL) ? It seems to me that sending back the error while the context is still alive, aka before the FATAL would be the way to go. That could be nicely done with an error callback while the exchange is happening. I missed that while going through the spec. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Make flex/bison checks stricter in Git trees
When running ./configure on a system without Flex/Bison it’s easy to miss the warning that flies past and then run into compilation error instead. When it happened to a colleague yesterday a brief discussion came to the conclusion that it would be neat it the flex and bison checks took the existence of the generated files into consideration. Attached patch scans for the generated files and iff flex or bison isn’t found in a non-cross compilation build, errors out in case the generated filed don’t exist while retaining the warning in case they do. This maintains the current warning message for downloaded distribution trees while it make the check strict on Git trees. It does add a hardcoded list of files which it can be argued how nice that is even though the list rarely change (there might be a cleaner way but I couldn’t see one). Also includes a tiny copy-edit fix on the flex warning to make it match the bison one since that seemed better. If this is at all of interest I can add to the commitfest. cheers ./daniel test_generated_files.diff Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench more operators & functions
Hello Jeevan. 1. About documentation, I think it will be good idea to arrange the operators table with the precedence and add a line at top: "In decreasing order of precedence". Done, see attached. 2. You may want to remove the comment: + /* should it do a lazy evaluation of the branch? */ Ok. -- Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index 285608d..bec3228 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -821,9 +821,8 @@ pgbench options dbname The expression may contain integer constants such as 5432, double constants such as 3.14159, references to variables :variablename, - unary operators (+, -) and binary operators - (+, -, *, /, - %) with their usual precedence and associativity, + operators + with their usual precedence and associativity, function calls, and parentheses. @@ -909,6 +908,84 @@ pgbench options dbname + + Built-In Operators + + + The arithmetic, bitwise, comparison and logical operators listed in +are built into pgbench + and may be used in expressions appearing in + \set. + + + + pgbench Operators by increasing precedence + + + + Operator Category + Result Type + List of Operators + + + + + binary logical + boolean (0/1) + + or/||, + xor/^^, + and/&& + + + + binary bitwise + integer + |, #, & + + + comparison + boolean (0/1) + + =/==, <>/!=, + <, <=, >, >= + + + + shifts + integer + <<, >> + + + binary arithmetic + integer or double + +, -, *, / + + + binary arithmetic + integer only + % + + + unary logical + boolean (0/1) + not/! + + + unary bitwise + integer + ~ + + + unary arithmetic + integer or double + +, - + + + + + + Built-In Functions @@ -955,6 +1032,13 @@ pgbench options dbname 5432.0 + exp(x) + double + exponential + exp(1.0) + 2.718281828459045 + + greatest(a [, ... ] ) double if any a is double, else integer largest value among arguments @@ -962,6 +1046,13 @@ pgbench options dbname 5 + if(c,e1,e2) + same as e* + if c is not zero then e1 else e2 + if(0,1.0,2.0) + 2.0 + + int(x) integer cast to int @@ -976,6 +1067,13 @@ pgbench options dbname 2.1 + ln(x) + double + natural logarithm + ln(2.718281828459045) + 1.0 + + pi() double value of the constant PI diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y index 0cc665b..f8dbbaf 100644 --- a/src/bin/pgbench/exprparse.y +++ b/src/bin/pgbench/exprparse.y @@ -52,11 +52,21 @@ static PgBenchExpr *make_func(yyscan_t yyscanner, int fnumber, PgBenchExprList * %type VARIABLE FUNCTION %token INTEGER_CONST DOUBLE_CONST VARIABLE FUNCTION - -/* Precedence: lowest to highest */ +%token AND_OP OR_OP XOR_OP NE_OP LE_OP GE_OP LS_OP RS_OP + +/* Precedence: lowest to highest, taken from C */ +%left OR_OP +%left XOR_OP +%left AND_OP +%left '|' +%left '#' +%left '&' +%left '=' NE_OP +%left '<' '>' LE_OP GE_OP +%left LS_OP RS_OP %left '+' '-' %left '*' '/' '%' -%right UMINUS +%right UNARY %% @@ -68,14 +78,32 @@ elist: { $$ = NULL; } ; expr: '(' expr ')' { $$ = $2; } - | '+' expr %prec UMINUS { $$ = $2; } - | '-' expr %prec UMINUS { $$ = make_op(yyscanner, "-", + | '+' expr %prec UNARY { $$ = $2; } + | '-' expr %prec UNARY { $$ = make_op(yyscanner, "-", make_integer_constant(0), $2); } + | '~' expr %prec UNARY { $$ = make_op(yyscanner, "#", + make_integer_constant(-1), $2); } + | '!' expr %prec UNARY { $$ = make_op(yyscanner, "^^", + make_integer_constant(1), $2); } | expr '+' expr { $$ = make_op(yyscanner, "+", $1, $3); } | expr '-' expr { $$ = make_op(yyscanner, "-", $1, $3); } | expr '*' expr { $$ = make_op(yyscanner, "*", $1, $3); } | expr '/' expr { $$ = make_op(yyscanner, "/", $1, $3); } | expr '%' expr { $$ = make_op(yyscanner, "%", $1, $3); } + | expr '<' expr { $$ = make_op(yyscanner, "<", $1, $3); } + | expr LE_OP expr { $$ = make_op(yyscanner, "<=", $1, $3); } + | expr '>' expr { $$ = make_op(yyscanner, "<", $3, $1); } + | expr GE_OP expr { $$ = make_op(yyscanner, "<=", $3, $1); } + | expr '=' expr { $$ = make_op(yyscanner, "=", $1, $3); } + | expr NE_OP expr { $$ = make_op(yyscanner, "<>", $1, $3); } + | expr '&' expr { $$ = make_op(yyscanner, "&", $1, $3); } + | expr '|' expr { $$ = make_op(yyscanner, "|", $1, $3); } + | e
Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol
On 09/27/2016 04:19 PM, Michael Paquier wrote: On Tue, Sep 27, 2016 at 9:01 PM, Heikki Linnakangas wrote: * A source of random values. This currently uses PostmasterRandom() similarly to how the MD5 salt is generated, in the server, but plain old random() in the client. If built with OpenSSL, we should probably use RAND_bytes(). But what if the client is built without OpenSSL? I believe the protocol doesn't require cryptographically strong randomness for the nonces, i.e. it's OK if they're predictable, but they should be different for each session. And what if we just replace PostmasterRandom()? pgcrypto is a useful source of inspiration here. If the server is built with OpenSSL we use RAND_bytes all the time. If not, let's use /dev/urandom. If urandom is not there, we fallback to /dev/random. For WIN32, there is CryptGenRandom(). This could just be done as an independent patch with a routine in src/common/ for example to allow both frontend and backend to use it. Yeah, if built with OpenSSL, we probably should just always use RAND_bytes(). Without OpenSSL, we have to think a bit harder. The server-side code in the patch is probably good enough. After all, we use the same mechanism for the MD5 salt today. The libpq-side is not. Just calling random() won't do. We haven't needed for random numbers in libpq before, but now we do. Is the pgcrypto solution portable enough that we can use it in libpq? Do you think that this is a requirement for this patch? I think not really for the first shot. We need something for libpq. We can't just call random(), as that's not random unless you also do srandom(), and we don't want to do that because the application might have a different idea of what the seed should be. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Make flex/bison checks stricter in Git trees
Daniel Gustafsson writes: > When running ./configure on a system without Flex/Bison itâs easy to miss > the > warning that flies past and then run into compilation error instead. When it > happened to a colleague yesterday a brief discussion came to the conclusion > that it would be neat it the flex and bison checks took the existence of the > generated files into consideration. > Attached patch scans for the generated files and iff flex or bison isnât > found > in a non-cross compilation build, errors out in case the generated filed > donât > exist while retaining the warning in case they do. Not exactly convinced this is a good idea. What if the files exist but are out of date? More generally, how much advantage is there really in failing at configure rather than build time? The subtext here is that I'm disinclined to load more behavior into configure while we're waiting to see if cmake conversion happens. That job is tough enough without the autoconf sources being more of a moving target than they have to be. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol
On Tue, Sep 27, 2016 at 10:42 PM, Heikki Linnakangas wrote: > The libpq-side is not. Just calling random() won't do. We haven't needed for > random numbers in libpq before, but now we do. Is the pgcrypto solution > portable enough that we can use it in libpq? Do you think that urandom would be enough then? The last time I took a look at that, I saw urandom on all modern platforms even those ones: OpenBSD, NetBSD, Solaris, SunOS. For Windows the CryptGen stuff would be nice enough I guess.. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup
On 9/26/16 2:36 AM, Michael Paquier wrote: > > Just a nit: > > - postmaster.pid > + postmaster.pid and postmaster.opts > > Having one block for each file would be better. I grouped these because they are related and because there are now other bullets where multiple files/dirs are listed. Are you proposing to also breakup pg_replslot, pg_dynshmem, pg_stat_tmp, pg_notify, pg_serial, pg_snapshots, and pg_subtrans? Also backup_label and tablespace_map? -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Showing parallel status in \df+
On Mon, Sep 26, 2016 at 3:06 PM, Stephen Frost wrote: > I feel like we're getting wrapped around the axle as it regards who is > perceived to be voting for what. True. It's not very clear; thanks for trying to shed some light on it. > I don't particularly care for it either, primairly because \sf could be > improved upon, as suggested by Peter, to avoid the need to have the same > information displayed by both \df+ and \sf. IMHO, we've had \dWHATEVER as the way to find out about things for so long that we should just stick with it. I think users are used to remembering which character they need to stick after \d to get information on the object type in which they are currently interested; I know I am. If we move this all over to \sf people will have trouble finding it. I'll get used to it because I "work here" and so will you, but I think most users will just type \df and then \df+ and then say ... well where the %@#! did they put it? >> If you do want to see all of the output, you'll appreciate not having >> it indented by 60 or 80 columns any more. There's really no >> circumstanced under which it's worse than what we're doing today. > > That doesn't mean, at least to me, that we should forgo considering > better alternatives. I don't think so, either, but if we could agree that "Tom's patch > doing nothing" then he could commit it and we could debate whether there's something even better. > We often reject patches which only improve a bit on the status quo > because we wish for a better overall solution, particularly when we're > talking about user interfaces that we don't want to change between every > release. Sure, that's true. In this case, however, I believe that the amount of improvement that's possible is pretty limited. Super-wide lines that rapid repeatedly are bad; we can probably all agree on that. Whether or not it's better to adjust \df+ as Tom has done or introduce \df++ or enhance \sf or something else entirely is debatable; different people prefer different things for different reasons - or for no reason, as some of this is surely down to personal preference. If I thought Tom's patch solved 20% of the problem while kicking 80% of it down the road, I'd probably agree that we ought not to adopt it; but in fact I think it's more like the reverse -- at least in the narrow sense of keeping \df+ output readable, which I think is about as ambitious as we should make our goal for a thread that started out being about showing parallel status in \df+ output. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactoring speculative insertion with unique indexes a little
On Tue, Sep 27, 2016 at 2:13 PM, Tom Lane wrote: > I can see the value of processing unique indexes before non-unique ones. > I'm pretty suspicious of trying to prioritize primary keys first, though, > because (a) it's not clear why bother, and (b) PG is a tad squishy about > whether an index is a primary key or not, so that I'd be worried about > different backends sometimes choosing different orders. I'd simplify > this to "uniques in OID order then non-uniques in OID order". I see your point. A more considered ordering of indexes for processing by the executor (prepared for it by the relcache), including something more that goes further than your proposal is useful in the context of fixing the bug I mentioned [1], but for non-obvious reasons. I would like to clarify what I meant there specifically. I am repeating myself, but maybe I just wasn't clear before. The theory of putting the PK first there is that we then have a well-defined (uh, better defined) user-visible ordering *across unique indexes* such that the problem case would *reliably* be fixed. With only this refactoring patch applied (and no change to the relcache ordering thing), it is then only a sheer accident of OID assignment ordering that the INSERT ON CONFLICT problem case happens to take the alternative path on the OP's *inferred* index (which, as it happens, was the PK for him), rather than the other unique index that was involved (the one that is not actually inferred, and yet is equivalent to the PK, UPSERT-semantics-wise). So, the reporter of the bug [1] is happy with his exact case now working, at least. You might now counter: "But why prefer one convention over the other? Prioritizing the PK would reliably fix that particular problem case, but that's still pretty arbitrary." It's true that it's somewhat arbitrary to always (speculatively) insert into the PK first. But, I think that it's more likely that the PK is inferred in general, and so it's more likely that users will fall on the right side of that in practice. Besides, at least we now have a consistent behavior. You might also reasonably ask: "But what if there are multiple unique indexes, none of which happen to be the PK? Isn't that subject to the same vagaries of OID ordering anyway?" I must admit that it is. But I don't really know where to draw the line here. Is it worth contemplating a more complicated scheme still? For example, trigger-style ordering; a sort order that considers index name as a "secondary attribute", in order to ensure perfectly consistent behavior? I must admit that I don't really have a clue whether or not that's a good idea. It's an idea. [1] https://www.postgresql.org/message-id/CAM3SWZTFTbK_Y%3D7uWJaXYLHnYQ99pV4KFpmjTKbNJR5_%2BQThzA%40mail.gmail.com -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pageinspect: Hash index support
On 09/26/2016 10:45 PM, Peter Eisentraut wrote: On 9/26/16 1:39 PM, Jesper Pedersen wrote: Left as is, since BuildTupleFromCStrings() vs. xyzGetDatum() are equally readable in this case. But, I can change the patch if needed. The point is that to use BuildTupleFromCStrings() you need to convert numbers to strings, and then they are converted back. This is not a typical way to write row-returning functions. Ok. Changed: * BuildTupleFromCStrings -> xyzGetDatum * 'type' field: char -> text w/ full description * Removed 'type' information from documentation Best regards, Jesper >From d6dd73ffef9deafc938b9fe7119db0928494cbf9 Mon Sep 17 00:00:00 2001 From: jesperpedersen Date: Fri, 5 Aug 2016 10:16:32 -0400 Subject: [PATCH] pageinspect: Hash index support --- contrib/pageinspect/Makefile | 10 +- contrib/pageinspect/hashfuncs.c | 408 ++ contrib/pageinspect/pageinspect--1.5--1.6.sql | 59 contrib/pageinspect/pageinspect--1.5.sql | 279 -- contrib/pageinspect/pageinspect--1.6.sql | 346 ++ contrib/pageinspect/pageinspect.control | 2 +- doc/src/sgml/pageinspect.sgml | 146 - 7 files changed, 955 insertions(+), 295 deletions(-) create mode 100644 contrib/pageinspect/hashfuncs.c create mode 100644 contrib/pageinspect/pageinspect--1.5--1.6.sql delete mode 100644 contrib/pageinspect/pageinspect--1.5.sql create mode 100644 contrib/pageinspect/pageinspect--1.6.sql diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile index a98237e..c73d728 100644 --- a/contrib/pageinspect/Makefile +++ b/contrib/pageinspect/Makefile @@ -2,13 +2,13 @@ MODULE_big = pageinspect OBJS = rawpage.o heapfuncs.o btreefuncs.o fsmfuncs.o \ - brinfuncs.o ginfuncs.o $(WIN32RES) + brinfuncs.o ginfuncs.o hashfuncs.o $(WIN32RES) EXTENSION = pageinspect -DATA = pageinspect--1.5.sql pageinspect--1.4--1.5.sql \ - pageinspect--1.3--1.4.sql pageinspect--1.2--1.3.sql \ - pageinspect--1.1--1.2.sql pageinspect--1.0--1.1.sql \ - pageinspect--unpackaged--1.0.sql +DATA = pageinspect--1.6.sql pageinspect--1.5--1.6.sql \ + pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \ + pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \ + pageinspect--1.0--1.1.sql pageinspect--unpackaged--1.0.sql PGFILEDESC = "pageinspect - functions to inspect contents of database pages" ifdef USE_PGXS diff --git a/contrib/pageinspect/hashfuncs.c b/contrib/pageinspect/hashfuncs.c new file mode 100644 index 000..a76b683 --- /dev/null +++ b/contrib/pageinspect/hashfuncs.c @@ -0,0 +1,408 @@ +/* + * hashfuncs.c + * Functions to investigate the content of HASH indexes + * + * Copyright (c) 2016, PostgreSQL Global Development Group + * + * IDENTIFICATION + * contrib/pageinspect/hashfuncs.c + */ + +#include "postgres.h" + +#include "access/hash.h" +#include "access/htup_details.h" +#include "funcapi.h" +#include "miscadmin.h" +#include "utils/builtins.h" + +PG_FUNCTION_INFO_V1(hash_metapage_info); +PG_FUNCTION_INFO_V1(hash_page_items); +PG_FUNCTION_INFO_V1(hash_page_stats); + +/* + * structure for single hash page statistics + * + */ +typedef struct HashPageStat +{ + uint32 live_items; + uint32 dead_items; + uint32 page_size; + uint32 free_size; + char *type; + + /* opaque data */ + BlockNumber hasho_prevblkno; + BlockNumber hasho_nextblkno; + Bucket hasho_bucket; + uint16 hasho_flag; + uint16 hasho_page_id; +} HashPageStat; + + +/* + * Verify that the given bytea contains a HASH page, or die in the attempt. + * A pointer to the page is returned. + */ +static Page +verify_hash_page(bytea *raw_page, bool metap) +{ + Page page; + int raw_page_size; + HashPageOpaque pageopaque; + + raw_page_size = VARSIZE(raw_page) - VARHDRSZ; + + if (raw_page_size != BLCKSZ) + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("input page too small"), + errdetail("Expected size %d, got %d", + BLCKSZ, raw_page_size))); + + page = VARDATA(raw_page); + pageopaque = (HashPageOpaque) PageGetSpecialPointer(page); + if (pageopaque->hasho_page_id != HASHO_PAGE_ID) + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("page is not a HASH page"), + errdetail("Expected %08x, got %08x.", + HASHO_PAGE_ID, pageopaque->hasho_page_id))); + + if (metap) + { + if (pageopaque->hasho_flag != LH_META_PAGE) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("page is not a metadata page"), + errdetail("Expected %08x, got %08x.", + LH_META_PAGE, pageopaque->hasho_flag))); + } + else + { + if (pageopaque->hasho_flag == LH_META_PAGE) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("page is a metadata page"), + errdetail("Flags %08x.", + pageopaque->hasho_flag
Re: [HACKERS] Showing parallel status in \df+
Robert Haas writes: > On Mon, Sep 26, 2016 at 3:06 PM, Stephen Frost wrote: >> That doesn't mean, at least to me, that we should forgo considering >> better alternatives. > I don't think so, either, but if we could agree that "Tom's patch > > doing nothing" then he could commit it and we could debate whether > there's something even better. I think the debate is more about whether moving the source display functionality over to \sf is a better solution than rearranging \df+ output. (If we had consensus to do that, I'd be happy to go code it, but I'm not going to invest the effort when it seems like we don't.) If we'd had \sf all along, I think it's likely that we would never have put source-code display into \df. But of course we didn't, and what would have been best in a green field is not necessarily what's best or achievable given existing reality. Both Robert and Peter have put forward the argument that people are used to finding this info in \df+ output, and I think that deserves a whole lot of weight. The \sf solution might be cleaner, but it's not so much better that it can justify forcing people to relearn their habits. So I think that rearranging \df+ output is really what we ought to be doing here. I'm not necessarily wedded to any of the precise details of what I did in my patch --- for instance, maybe function bodies ought to be indented one tab stop? But we've not gotten to the merits of such points, for lack of agreement about whether this is the basic approach to take. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Mon, Sep 26, 2016 at 3:40 PM, Peter Geoghegan wrote: > On Mon, Sep 26, 2016 at 6:58 PM, Robert Haas wrote: >>> That requires some kind of mutual exclusion mechanism, like an LWLock. >> >> No, it doesn't. Shared memory queues are single-reader, single-writer. > > The point is that there is a natural dependency when merging is > performed eagerly within the leader. One thing needs to be in lockstep > with the others. That's all. I don't know what any of that means. You said we need something like an LWLock, but I think we don't. The workers just write the results of their own final merges into shm_mqs. The leader can read from any given shm_mq until no more tuples can be read without blocking, just like nodeGather.c does, or at least it can do that unless its own queue fills up first. No mutual exclusion mechanism is required for any of that, as far as I can see - not an LWLock, and not anything similar. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS]
On Tue, Sep 27, 2016 at 3:08 PM, Robert Haas wrote: > I don't know what any of that means. You said we need something like > an LWLock, but I think we don't. The workers just write the results > of their own final merges into shm_mqs. The leader can read from any > given shm_mq until no more tuples can be read without blocking, just > like nodeGather.c does, or at least it can do that unless its own > queue fills up first. No mutual exclusion mechanism is required for > any of that, as far as I can see - not an LWLock, and not anything > similar. I'm confused about the distinction you're making. Isn't the shm_mqs queue itself a mutual exclusion mechanism? The leader cannot really do anything without some input to process, because of the dependency that naturally exists. ISTM that everything else we've discussed is semantics, and/or about particular mechanisms in Postgres. At a high level, based on my intuition about performance characteristics, I have reservations about eager merging in the leader. That's all I mean. There is a trade-off to be made between eager and lazy merging. As I pointed out, the author of the Volcano paper (Goetz Graefe) went on at length about this particular trade-off for parallel sort almost 30 years ago. So, it's not clear that that would be an easy win, or even a win. That's all I meant. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Covering + unique indexes.
On Mon, Sep 26, 2016 at 11:17 AM, Anastasia Lubennikova wrote: >> Is there any fundamental problem in storing them in ordered way? I >> mean to say, you need to anyway store all the column values on leaf >> page, so why can't we find the exact location for the complete key. >> Basically use truncated key to reach to leaf level and then use the >> complete key to find the exact location to store the key. I might be >> missing some thing here, but if we can store them in ordered fashion, >> we can use them even for queries containing ORDER BY (where ORDER BY >> contains included columns). > > I'd say that the reason for not using included columns in any > operations which require comparisons, is that they don't have opclass. > Let's go back to the example of points. > This data type don't have any opclass for B-tree, because of fundamental > reasons. > And we can not apply _bt_compare() and others to this attribute, so > we don't include it to scan key. > > create table t (i int, i2 int, p point); > create index idx1 on (i) including (i2); > create index idx2 on (i) including (p); > create index idx3 on (i) including (i2, p); > create index idx4 on (i) including (p, i2); > > You can keep tuples ordered in idx1, but not for idx2, partially ordered for > idx3, but not for idx4. Yeah, I think we shouldn't go there. I mean, once you start ordering by INCLUDING columns, then you're going to need to include them in leaf pages because otherwise you can't actually guarantee that they are in the right order. And then you have to wonder why an INCLUDING column is any different from a non-INCLUDING column. It seems best to make a firm rule that INCLUDING columns are there only for the values, not for ordering. That rule is simple and clear, which is a good thing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Showing parallel status in \df+
* Tom Lane (t...@sss.pgh.pa.us) wrote: > I think the debate is more about whether moving the source display > functionality over to \sf is a better solution than rearranging \df+ > output. (If we had consensus to do that, I'd be happy to go code it, > but I'm not going to invest the effort when it seems like we don't.) Right, that's the main question. > If we'd had \sf all along, I think it's likely that we would never > have put source-code display into \df. But of course we didn't, Indeed. > and what would have been best in a green field is not necessarily > what's best or achievable given existing reality. Both Robert and > Peter have put forward the argument that people are used to finding > this info in \df+ output, and I think that deserves a whole lot of > weight. The \sf solution might be cleaner, but it's not so much > better that it can justify forcing people to relearn their habits. > > So I think that rearranging \df+ output is really what we ought to > be doing here. Alright, given that Robert's made it clear what his preference is and you're in agreement with that, I'll remove my objection to moving down that path. I agree that it's better than the current situation. If we do end up improving \sf (which seems like a good idea, in general), then we may wish to consider a display option to control if the source is included in \df+ or not, but that doesn't need to bar this patch from going in. The earlier comments on the thread hadn't been as clear with regard to who held what opinions regarding the options and I'm glad that we were able to reach a point where it was much clearer that there was strong support for keeping the source in \df+. > I'm not necessarily wedded to any of the precise details of what I did > in my patch --- for instance, maybe function bodies ought to be indented > one tab stop? But we've not gotten to the merits of such points, for > lack of agreement about whether this is the basic approach to take. As for this, I wouldn't indent or change the source at all. For starters, indentation actually matters for some PLs, and I can certainly see people wanting to be able to copy/paste from the output, now that it'll be possible to reasonably do from the \df+ output. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup
On 9/26/16 2:36 AM, Michael Paquier wrote: > Just a nit: > > - postmaster.pid > + postmaster.pid and postmaster.opts > > Having one block for each file would be better. OK, changed. > +const char *excludeFile[] = > excludeFiles[]? > > +# Move pg_replslot out of $pgdata and create a symlink to it > +rename("$pgdata/pg_replslot", "$tempdir/pg_replslot") > + or die "unable to move $pgdata/pg_replslot"; > +symlink("$tempdir/pg_replslot", "$pgdata/pg_replslot"); > This will blow up on Windows. Those tests need to be included in the > SKIP block. Even if your code needs to support junction points on > Windows, as perl does not offer an equivalent for it we just cannot > test it on this platform. Fixed. -- -David da...@pgmasters.net diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml index 0f09d82..a8daa07 100644 --- a/doc/src/sgml/backup.sgml +++ b/doc/src/sgml/backup.sgml @@ -1090,6 +1090,22 @@ SELECT pg_stop_backup(); +The contents of the pg_dynshmem/, pg_stat_tmp/, +pg_notify/, pg_serial/, +pg_snapshots/, and pg_subtrans/ directories can +be omitted from the backup as they will be initialized on postmaster +startup. If the is set and is +under the database cluster directory then the contents of the directory +specified by can also be omitted. + + + +Any file or directory beginning with pgsql_tmp can be +omitted from the backup. These files are removed on postmaster start and +the directories will be recreated as needed. + + + The backup label file includes the label string you gave to pg_start_backup, as well as the time at which pg_start_backup was run, and diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index 68b0941..d65687f 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -2059,17 +2059,26 @@ The commands accepted in walsender mode are: - postmaster.pid + postmaster.pid and postmaster.opts - postmaster.opts + postgresql.auto.conf.tmp - various temporary files created during the operation of the PostgreSQL server + backup_label and tablespace_map. If these + files exist they belong to an exclusive backup and are not applicable + to the base backup. + + + + + Various temporary files and directories created during the operation of + the PostgreSQL server, i.e. any file or directory beginning with + pgsql_tmp. @@ -2082,7 +2091,11 @@ The commands accepted in walsender mode are: - pg_replslot is copied as an empty directory. + pg_replslot, pg_dynshmem, + pg_stat_tmp, pg_notify, + pg_serial, pg_snapshots, and + pg_subtrans are copied as empty directories (even if they + are symbolic links). diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml index 9f1eae1..984ea5b 100644 --- a/doc/src/sgml/ref/pg_basebackup.sgml +++ b/doc/src/sgml/ref/pg_basebackup.sgml @@ -610,10 +610,8 @@ PostgreSQL documentation The backup will include all files in the data directory and tablespaces, including the configuration files and any additional files placed in the - directory by third parties. But only regular files and directories are - copied. Symbolic links (other than those used for tablespaces) and special - device files are skipped. (See for - the precise details.) + directory by third parties, with certain exceptions. (See +for the complete list of exceptions.) diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index da9b7a6..8412472 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -30,6 +30,7 @@ #include "replication/basebackup.h" #include "replication/walsender.h" #include "replication/walsender_private.h" +#include "storage/dsm_impl.h" #include "storage/fd.h" #include "storage/ipc.h" #include "utils/builtins.h" @@ -55,8 +56,10 @@ static int64 sendDir(char *path, int basepathlen, bool sizeonly, static bool sendFile(char *readfilename, char *tarfilename, struct stat * statbuf, bool missing_ok); static void sendFileWithContent(const char *filename, const char *content); -static void _tarWriteHeader(const char *filename, const char *linktarget, - struct stat * statbuf); +static int64 _tarWriteHeader(const char *filename, const char *linktarget, + struct stat * statbuf, bool sizeonly); +static int64 _tarWriteDir(char *pathbuf, int basepathlen, struct stat *statbuf, + bool sizeonly); static void send_int8_string(StringInfo
Re: [HACKERS]
You seem to have erased the subject line from this email somehow. On Tue, Sep 27, 2016 at 10:18 AM, Peter Geoghegan wrote: > On Tue, Sep 27, 2016 at 3:08 PM, Robert Haas wrote: >> I don't know what any of that means. You said we need something like >> an LWLock, but I think we don't. The workers just write the results >> of their own final merges into shm_mqs. The leader can read from any >> given shm_mq until no more tuples can be read without blocking, just >> like nodeGather.c does, or at least it can do that unless its own >> queue fills up first. No mutual exclusion mechanism is required for >> any of that, as far as I can see - not an LWLock, and not anything >> similar. > > I'm confused about the distinction you're making. Isn't the shm_mqs > queue itself a mutual exclusion mechanism? The leader cannot really do > anything without some input to process, because of the dependency that > naturally exists. ISTM that everything else we've discussed is > semantics, and/or about particular mechanisms in Postgres. No, a shm_mq is not a mutual exclusion mechanism. It's a queue! A data dependency and a lock aren't the same thing. If your point in saying "we need something like an LWLock" was actually "the leader will have to wait if it's merged all tuples from a worker and no more are immediately available" then I think that's a pretty odd way to make that point. To me, the first statement appears to be false, while the second is obviously true. > At a high level, based on my intuition about performance > characteristics, I have reservations about eager merging in the > leader. That's all I mean. There is a trade-off to be made between > eager and lazy merging. As I pointed out, the author of the Volcano > paper (Goetz Graefe) went on at length about this particular trade-off > for parallel sort almost 30 years ago. So, it's not clear that that > would be an easy win, or even a win. > > That's all I meant. OK, well I'm not taking any position on whether what Heikki is proposing will turn out to be good from a performance point of view. My intuitions about sorting performance haven't turned out to be particularly good in the past. I'm only saying that if you do decide to queue the tuples passing from worker to leader in a shm_mq, you shouldn't read from the shm_mq objects in a round robin, but rather read multiple tuples at a time from the same worker whenever that is possible without blocking. If you read tuples from workers one at a time, you get a lot of context-switch thrashing, because the worker wakes up, writes one tuple (or even part of a tuple) and immediately goes back to sleep. Then it shortly thereafter does it again. Whereas if you drain the worker's whole queue each time you read from that queue, then the worker can wake up, refill it completely, and go back to sleep again. So you incur fewer context switches for the same amount of work. Or at least, that's how it worked out for Gather, and what I am saying is that it will probably work out the same way for sorting, if somebody chooses to try to implement this. I am also saying that using the shm_mq code here does not require the introduction of an LWLock. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Detect supported SET parameters when pg_restore is run
On Mon, Sep 26, 2016 at 9:56 PM, Vitaly Burovoy wrote: > At work we use several major versions of PostgreSQL, and developers > use non-local clusters for developing and debugging. > We do dump/restore schemas/data via custom/dir formats and we have to > keep several client versions for 9.2, 9.4 and 9.5 versions on local > workstations because after pg_restore95 connects to 9.2, it fails when > it sets run-time parameters via SET: > > vitaly@work 01:21:26 ~ $ pg_restore95 --host DEV_HOST_9_2 -d DBNAME > --data-only -e -1 -Fc arhcivefile > Password: > pg_restore95: [archiver (db)] Error while INITIALIZING: > pg_restore95: [archiver (db)] could not execute query: ERROR: > unrecognized configuration parameter "lock_timeout" > Command was: SET lock_timeout = 0; I think our policy is that a newer pg_dump needs to work with an older version of the database, but I don't think we make any similar guarantee for other tools, such as pg_restore. It's an interesting question whether we should try a little harder to do that, but I suspect it might require more changes than what you've got in this patch -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS]
On Tue, Sep 27, 2016 at 3:31 PM, Robert Haas wrote: > You seem to have erased the subject line from this email somehow. I think that that was Gmail. Maybe that's new? Generally, I have to go out of my way to change the subject line, so it seems unlikely that I fat-fingered it. I wish that they'd stop changing things... > On Tue, Sep 27, 2016 at 10:18 AM, Peter Geoghegan wrote: > No, a shm_mq is not a mutual exclusion mechanism. It's a queue! > > A data dependency and a lock aren't the same thing. If your point in > saying "we need something like an LWLock" was actually "the leader > will have to wait if it's merged all tuples from a worker and no more > are immediately available" then I think that's a pretty odd way to > make that point. To me, the first statement appears to be false, > while the second is obviously true. Okay. Sorry for being unclear. > OK, well I'm not taking any position on whether what Heikki is > proposing will turn out to be good from a performance point of view. > My intuitions about sorting performance haven't turned out to be > particularly good in the past. I'm only saying that if you do decide > to queue the tuples passing from worker to leader in a shm_mq, you > shouldn't read from the shm_mq objects in a round robin, but rather > read multiple tuples at a time from the same worker whenever that is > possible without blocking. If you read tuples from workers one at a > time, you get a lot of context-switch thrashing, because the worker > wakes up, writes one tuple (or even part of a tuple) and immediately > goes back to sleep. Then it shortly thereafter does it again. > Whereas if you drain the worker's whole queue each time you read from > that queue, then the worker can wake up, refill it completely, and go > back to sleep again. So you incur fewer context switches for the same > amount of work. Or at least, that's how it worked out for Gather, and > what I am saying is that it will probably work out the same way for > sorting, if somebody chooses to try to implement this. Maybe. This would involve the overlapping of multiple worker merges that are performed in parallel with a single leader merge. I don't think it would be all that great. There would be a trade-off around disk bandwidth utilization too, so I'm particularly doubtful that the complexity would pay for itself. But, of course, I too have been wrong about sorting performance characteristics in the past, so I can't be sure. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] assert violation in logical messages serialization
Hello. During processing of logical messages in ReorderBufferSerializeChange() pointer to ondisk structure isn’t updated after possible reorder buffer realloc by ReorderBufferSerializeReserve(). Actual reason spotted by Konstantin Knizhnik. reorderbuffer.patch Description: Binary data -- Stas Kelvich Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redesigning parallel dump/restore's wait-for-workers logic
This patch applies with a few minor offsets, compiles without warning, and passes all regression tests including `make check-world` with TAP tests enabled. This and the related patch dealing with the parallel API definitely make things cleaner and easier to follow. I find it disappointing that ACLs continue to be held back as a fourth step outside the framework of the "normal" ordering. That is an ugly hack, which makes it impossible to, for example, create a fifth step to create indexes on materialized views and refresh them in anything resembling a clean fashion. Would it make sense to deal with the ACL ordering hack in one of these patches, or should that be left for later? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol
On 9/26/16 2:02 AM, Michael Paquier wrote: > On Mon, Sep 26, 2016 at 2:15 AM, David Steele wrote: > > Thanks for the review and the comments! > >> I notice that the copyright from pgcrypto/sha1.c was carried over but >> not the copyright from pgcrypto/sha2.c. I'm no expert on how this >> works, but I believe the copyright from sha2.c must be copied over. > > Right, those copyright bits are missing: > - * AUTHOR: Aaron D. Gifford > [...] > - * Copyright (c) 2000-2001, Aaron D. Gifford > The license block being the same, it seems to me that there is no need > to copy it over. The copyright should be enough. Looks fine to me. >> Also, are there any plans to expose these functions directly to the user >> without loading pgcrypto? Now that the functionality is in core it >> seems that would be useful. In addition, it would make this patch stand >> on its own rather than just being a building block. > > There have been discussions about avoiding enabling those functions by > default in the distribution. We'd rather not do that... OK. >> * [PATCH 2/8] Move encoding routines to src/common/ >> >> I wonder if it is confusing to have two of encode.h/encode.c. Perhaps >> they should be renamed to make them distinct? > > Yes it may be a good idea to rename that, like encode_utils.[c|h] for > the new files. I like that better. >> Couldn't md5_crypt_verify() be made more general and take the hash type? >> For instance, password_crypt_verify() with the last param as the new >> password type enum. > > This would mean incorporating the whole SASL message exchange into > this routine because the password string is part of the scram > initialization context, and it seems to me that it is better to just > do once a lookup at the entry in pg_authid. So we'd finish with a more > confusing code I am afraid. At least that's the conclusion I came up > with when doing that.. md5_crypt_verify does only the work on a > received password. Ah, yes, I see now. I missed that when I reviewed patch 6. -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix some corner cases that cube_in rejects
Amul Sul writes: > The following review has been posted through the commitfest application: > make installcheck-world: tested, passed > Implements feature: tested, passed > Spec compliant: tested, passed > Documentation:not tested > Note for committer : There are unnecessary files (cube_1.out, cube_2.out & > cube_3.out) in contrib directory, that need to be removed at code checkin, > other than this concern, I think this patch looks pretty reasonable. Thanks. Thanks for the review --- pushed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Detect supported SET parameters when pg_restore is run
Robert Haas writes: > On Mon, Sep 26, 2016 at 9:56 PM, Vitaly Burovoy > wrote: >> We do dump/restore schemas/data via custom/dir formats and we have to >> keep several client versions for 9.2, 9.4 and 9.5 versions on local >> workstations because after pg_restore95 connects to 9.2, it fails when >> it sets run-time parameters via SET: > I think our policy is that a newer pg_dump needs to work with an older > version of the database, but I don't think we make any similar > guarantee for other tools, such as pg_restore. It's an interesting > question whether we should try a little harder to do that, but I > suspect it might require more changes than what you've got in this > patch The general policy has always been that pg_dump output is only expected to restore without errors into a server that's the same or newer version as pg_dump (regardless of the server version being dumped from). If you try to restore into an older server version, you may get some errors, which we try to minimize the scope of when possible. But it will never be possible to promise none at all. I think the correct advice here is simply "don't use pg_restore -e -1 when trying to restore into an older server version". Taking a step back, there are any number of ways that you might get errors during a pg_restore into a DB that's not set up exactly as pg_dump expected. Missing roles or tablespaces, for example. Again, the dump output is constructed so that you can survive those problems and bull ahead ... but not with "-e -1". I don't see a very good reason why older-server-version shouldn't be considered the same type of problem. The patch as given seems rather broken anyway --- won't it change text output from pg_dump as well as on-line output from pg_restore? (That is, it looks to me like the SETs emitted by pg_dump to text format would depend on the source server version, which they absolutely should not. Either that, or the patch is overwriting pg_dump's idea of what the source server version is at the start of the output phase, which is likely to break all kinds of stuff when dumping from an older server.) It's possible that we could alleviate this specific symptom by arranging for the BEGIN caused by "-1" to come out after the initial SETs, but I'm not sure how complicated that would be or whether it's worth the trouble. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redesigning parallel dump/restore's wait-for-workers logic
Kevin Grittner writes: > This patch applies with a few minor offsets, compiles without > warning, and passes all regression tests including `make > check-world` with TAP tests enabled. This and the related patch > dealing with the parallel API definitely make things cleaner and > easier to follow. OK, thanks for reviewing! > I find it disappointing that ACLs continue to be held back as a > fourth step outside the framework of the "normal" ordering. That > is an ugly hack, which makes it impossible to, for example, create > a fifth step to create indexes on materialized views and refresh > them in anything resembling a clean fashion. Would it make sense > to deal with the ACL ordering hack in one of these patches, or > should that be left for later? Seems like material for some other patch. It would be greatly more invasive than what I have here, I fear, and probably require some fundamental design work. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting huge pages on Windows
On Sun, Sep 25, 2016 at 10:45 PM, Tsunakawa, Takayuki wrote: > Credit: This patch is based on Thomas Munro's one. How are they different? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Misdesigned command/status APIs for parallel dump/restore
This patch also applied with minor offsets, compiled without warning, and passes regression tests, including `make check-world` with TAP tests enabled. It looks good to me, presenting a cleaner API. I really want to get this to the point where we can dump and restore nested materialized views better, and this doesn't get us to the point where that's immediately possible; but with a cleaner API it should be easier to get there, so this is a step along the way. Setting to Ready for Committer. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Detect supported SET parameters when pg_restore is run
On 9/27/16, Tom Lane wrote: > Robert Haas writes: >> On Mon, Sep 26, 2016 at 9:56 PM, Vitaly Burovoy >> wrote: >>> We do dump/restore schemas/data via custom/dir formats and we have to >>> keep several client versions for 9.2, 9.4 and 9.5 versions on local >>> workstations because after pg_restore95 connects to 9.2, it fails when >>> it sets run-time parameters via SET: > >> I think our policy is that a newer pg_dump needs to work with an older >> version of the database, but I don't think we make any similar >> guarantee for other tools, such as pg_restore. It's an interesting >> question whether we should try a little harder to do that, but I >> suspect it might require more changes than what you've got in this >> patch Well... I'm not inclined to insert support of restoring from a higher major version to a lower one, because it can lead to security issues (e.g. with RLS). But my observation is that all new features supported by the "pg_dump" are "incremental", e.g. the last feature "parallel" for pg_dump/pg_restore --- lack of "PARALLEL UNSAFE" (which is by default) from 9.6 and lack of it from pre-9.6. That behavior allows newer versions of pg_restore to use dumps from DB of older versions because of lack of new features grammar. With the patch I'm able to use pg_dump96/pg_restore96 for our database of 9.2 (dump from 9.2 and restore to 9.2). > The general policy has always been that pg_dump output is only expected to > restore without errors into a server that's the same or newer version as > pg_dump (regardless of the server version being dumped from). If you try > to restore into an older server version, you may get some errors, which we > try to minimize the scope of when possible. But it will never be possible > to promise none at all. I think the correct advice here is simply "don't > use pg_restore -e -1 when trying to restore into an older server version". Why can't I use it if pg_dump92/pg_restore92 have the same behavior as pg_dump96/pg_restore96 except the SET block? The patch does not give guarantee of a restoration, it just avoids setting unsupported parameters for pg_restore the same way as pg_dump does. The other issues are for solving by the user who wants to restore to a DB of older version. > Taking a step back, there are any number of ways that you might get > errors during a pg_restore into a DB that's not set up exactly as pg_dump > expected. Missing roles or tablespaces, for example. It does not depends on a pg_dump/pg_restore version and can be soved by using command line options: --no-tablespaces --no-owner --no-privileges > Again, the dump output is constructed so that you can survive those problems > and bull ahead ... but not with "-e -1". I think "-e -1" was invented specially for it --- stop restoring if something is going wrong. Wasn't it? > I don't see a very good reason why > older-server-version shouldn't be considered the same type of problem. > > The patch as given seems rather broken anyway --- won't it change text > output from pg_dump as well as on-line output from pg_restore? My opinion --- no (and I wrote it in the initial letter): because it is impossible to know what version of a database is used for that plain text output. Users who use output to a plain text are able to use sed (or something else) to delete unwanted rows. > (That is, > it looks to me like the SETs emitted by pg_dump to text format would > depend on the source server version, which they absolutely should not. > Either that, or the patch is overwriting pg_dump's idea of what the > source server version is at the start of the output phase, which is > likely to break all kinds of stuff when dumping from an older server.) I agree, that's why I left current behavior as is for the plain text output. > It's possible that we could alleviate this specific symptom by arranging > for the BEGIN caused by "-1" to come out after the initial SETs, but > I'm not sure how complicated that would be or whether it's worth the > trouble. It leads to change ERRORs to WARNINGs but current behavior discussed above is left the same. Why don't just avoid SET parameters when we know for sure they are not supported by the server to which pg_restore is connected? -- Best regards, Vitaly Burovoy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
On Fri, Sep 9, 2016 at 4:25 PM, Kevin Grittner wrote: > On Sun, Sep 4, 2016 at 12:42 PM, Emre Hasegeli wrote: > These patches apply and build on top of 5c609a74 with no problems, > but `make check` finds differences per the attached. Please > investigate why the regression tests are failing and what the > appropriate response is. >> I am not much experienced in C. If you think that inline functions >> are better suited, I can rework the patches. > > I suspect that they will be as fast or faster, and they eliminate > the hazard of multiple evaluation, where a programmer might not be > aware of the multiple evaluation or of some side-effect of an > argument. Emre, are you going to address the above? It would have to be Real Soon Now. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Detect supported SET parameters when pg_restore is run
Vitaly Burovoy writes: > On 9/27/16, Tom Lane wrote: >> The general policy has always been that pg_dump output is only expected to >> restore without errors into a server that's the same or newer version as >> pg_dump (regardless of the server version being dumped from). > Why can't I use it if pg_dump92/pg_restore92 have the same behavior as > pg_dump96/pg_restore96 except the SET block? That's a pretty large "if", and not one I'm prepared to commit to. Before you assert that it's true, you might want to reflect on the size of the diff between 9.2 and 9.6 pg_dump (hint: it's over 20K lines in -u format). >> Either that, or the patch is overwriting pg_dump's idea of what the >> source server version is at the start of the output phase, which is >> likely to break all kinds of stuff when dumping from an older server.) > I agree, that's why I left current behavior as is for the plain text output. I'm not exactly convinced that you did. There's only one copy of Archive->remoteVersion, and you're overwriting it long before the dump process is over. That'd break anything that consulted it to find the source server's version after RestoreArchive starts. It might not be a bad thing to clearly distinguish source server version from (expected?) target server version, but pg_dump doesn't do that currently, and making it do so is probably not entirely trivial. I think your patch confuses that distinction further, which is not going to be helpful in the long run. I could get behind a patch that split remoteVersion into sourceVersion and targetVersion and made sure that all the existing references to remoteVersion were updated to the correct one of those. After that we could do something like what you have here in a less shaky fashion. As Robert noted, there'd probably still be a long way to go before it'd really work to use a newer pg_dump with an older target version, but at least we'd not be building on quicksand. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Python adding support for multi-dimensional arrays
On 09/27/2016 02:04 PM, Dave Cramer wrote: On 26 September 2016 at 14:52, Dave Cramer wrote: This crashes with arrays with non-default lower bounds: postgres=# SELECT * FROM test_type_conversion_array_int 4('[2:4]={1,2,3}'); INFO: ([1, 2, ], ) server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. Attached patch fixes this bug, and adds a test for it. I spent some more time massaging this: * Changed the loops from iterative to recursive style. I think this indeed is slightly easier to understand. * Fixed another segfault, with too deeply nested lists: CREATE or replace FUNCTION test_type_conversion_mdarray_toodeep() RETURNS int[] AS $$ return [[1]] $$ LANGUAGE plpythonu; * Also, in PLySequence_ToArray(), we must check that the 'len' of the array doesn't overflow. * Fixed reference leak in the loop in PLySequence_ToArray() to count the number of dimensions. I'd like to see some updates to the docs for this. The manual doesn't currently say anything about multi-dimensional arrays in pl/python, but it should've mentioned that they're not supported. Now that it is supported, should mention that, and explain briefly that a multi-dimensional array is mapped to a python list of lists. If the code passes I'll fix the docs Please do, thanks! - Heikki >From 6bcff9d1787fc645118a3ecbb71d1ef7561a5bfd Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Tue, 27 Sep 2016 21:53:17 +0300 Subject: [PATCH 1/1] WIP: Multi-dimensional arrays in PL/python --- src/pl/plpython/expected/plpython_types.out | 151 +- src/pl/plpython/expected/plpython_types_3.out | 151 +- src/pl/plpython/plpy_typeio.c | 272 +- src/pl/plpython/sql/plpython_types.sql| 86 4 files changed, 608 insertions(+), 52 deletions(-) diff --git a/src/pl/plpython/expected/plpython_types.out b/src/pl/plpython/expected/plpython_types.out index f0b6abd..947244e 100644 --- a/src/pl/plpython/expected/plpython_types.out +++ b/src/pl/plpython/expected/plpython_types.out @@ -537,9 +537,133 @@ INFO: (None, ) (1 row) SELECT * FROM test_type_conversion_array_int4(ARRAY[[1,2,3],[4,5,6]]); -ERROR: cannot convert multidimensional array to Python list -DETAIL: PL/Python only supports one-dimensional arrays. -CONTEXT: PL/Python function "test_type_conversion_array_int4" +INFO: ([[1, 2, 3], [4, 5, 6]], ) + test_type_conversion_array_int4 +- + {{1,2,3},{4,5,6}} +(1 row) + +SELECT * FROM test_type_conversion_array_int4(ARRAY[[[1,2,NULL],[NULL,5,6]],[[NULL,8,9],[10,11,12]]]); +INFO: ([[[1, 2, None], [None, 5, 6]], [[None, 8, 9], [10, 11, 12]]], ) + test_type_conversion_array_int4 +--- + {{{1,2,NULL},{NULL,5,6}},{{NULL,8,9},{10,11,12}}} +(1 row) + +SELECT * FROM test_type_conversion_array_int4('[2:4]={1,2,3}'); +INFO: ([1, 2, 3], ) + test_type_conversion_array_int4 +- + {1,2,3} +(1 row) + +CREATE FUNCTION test_type_conversion_array_int8(x int8[]) RETURNS int8[] AS $$ +plpy.info(x, type(x)) +return x +$$ LANGUAGE plpythonu; +SELECT * FROM test_type_conversion_array_int8(ARRAY[[[1,2,NULL],[NULL,5,6]],[[NULL,8,9],[10,11,12]]]::int8[]); +INFO: ([[[1L, 2L, None], [None, 5L, 6L]], [[None, 8L, 9L], [10L, 11L, 12L]]], ) + test_type_conversion_array_int8 +--- + {{{1,2,NULL},{NULL,5,6}},{{NULL,8,9},{10,11,12}}} +(1 row) + +CREATE FUNCTION test_type_conversion_array_float4(x float4[]) RETURNS float4[] AS $$ +plpy.info(x, type(x)) +return x +$$ LANGUAGE plpythonu; +SELECT * FROM test_type_conversion_array_float4(ARRAY[[[1.2,2.3,NULL],[NULL,5.7,6.8]],[[NULL,8.9,9.345],[10.123,11.456,12.6768]]]::float4[]); +INFO: ([[[1.200476837158, 2.29952316284, None], [None, 5.69809265137, 6.80190734863]], [[None, 8.89618530273, 9.345000267028809], [10.123000144958496, 11.456000328063965, 12.676799774169922]]], ) + test_type_conversion_array_float4 +-- + {{{1.2,2.3,NULL},{NULL,5.7,6.8}},{{NULL,8.9,9.345},{10.123,11.456,12.6768}}} +(1 row) + +CREATE FUNCTION test_type_conversion_array_float8(x float8[]) RETURNS float8[] AS $$ +plpy.info(x, type(x)) +return x +$$ LANGUAGE plpythonu; +SELECT * FROM test_type_conversion_array_float8(ARRAY[[[1.2,2.3,NULL],[NULL,5.7,6.8]],[[NULL,8.9,9.345],[10.123,11.456,12.6768]]]::float8[]); +INFO: ([[[1.2, 2.3, None], [None, 5.7, 6.8]], [[None, 8.9, 9.345], [10.123, 11.456, 12.6768]]], ) + test_type_conversion_array_float8 +-- + {{{1.2,2.3,NULL},{NULL,5.7,6.8}},{{NULL,8.
Re: [HACKERS] Hash Indexes
On 09/20/2016 09:02 AM, Amit Kapila wrote: On Fri, Sep 16, 2016 at 11:22 AM, Amit Kapila wrote: I do want to work on it, but it is always possible that due to some other work this might get delayed. Also, I think there is always a chance that while doing that work, we face some problem due to which we might not be able to use that optimization. So I will go with your suggestion of removing hashscan.c and it's usage for now and then if required we will pull it back. If nobody else thinks otherwise, I will update this in next patch version. In the attached patch, I have removed the support of hashscans. I think it might improve performance by few percentage (especially for single row fetch transactions) as we have registration and destroy of hashscans. I have been running various tests, and applications with this patch together with the WAL v5 patch [1]. As I havn't seen any failures and doesn't currently have additional feedback I'm moving this patch to "Ready for Committer" for their feedback. If others have comments, move the patch status back in the CommitFest application, please. [1] https://www.postgresql.org/message-id/CAA4eK1KE%3D%2BkkowyYD0vmch%3Dph4ND3H1tViAB%2B0cWTHqjZDDfqg%40mail.gmail.com Best regards, Jesper -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Write Ahead Logging for Hash Indexes
On 09/25/2016 01:00 AM, Amit Kapila wrote: Attached patch fixes the problem, now we do perform full page writes for bitmap pages. Apart from that, I have rebased the patch based on latest concurrent index patch [1]. I have updated the README as well to reflect the WAL logging related information for different operations. With attached patch, all the review comments or issues found till now are addressed. I have been running various tests, and applications with this patch together with the CHI v8 patch [1]. As I havn't seen any failures and doesn't currently have additional feedback I'm moving this patch to "Ready for Committer" for their feedback. If others have comments, move the patch status back in the CommitFest application, please. [1] https://www.postgresql.org/message-id/CAA4eK1%2BX%3D8sUd1UCZDZnE3D9CGi9kw%2Bkjxp2Tnw7SX5w8pLBNw%40mail.gmail.com Best regards, Jesper -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables
On Thu, Sep 22, 2016 at 6:41 AM, Ashutosh Bapat wrote: > [ new patch ] This should probably get updated since Rajkumar reported a crash. Meanwhile, here are some comments from an initial read-through: + * Multiple relations may be partitioned in the same way. The relations + * resulting from joining such relations may be partitioned in the same way as + * the joining relations. Similarly, relations derived from such relations by + * grouping, sorting be partitioned in the same as the underlying relations. I think you should change "may be partitioned in the same way" to "are partitioned in the same way" or "can be regarded as partitioned in the same way". The sentence that begins with "Similarly," is not grammatical; it should say something like: ...by grouping or sorting are partitioned in the same way as the underlying relations. @@ -870,20 +902,21 @@ RelationBuildPartitionDesc(Relation rel) result->bounds->rangeinfo = rangeinfo; break; } } } MemoryContextSwitchTo(oldcxt); rel->rd_partdesc = result; } + /* * Are two partition bound collections logically equal? * * Used in the keep logic of relcache.c (ie, in RelationClearRelation()). * This is also useful when b1 and b2 are bound collections of two separate * relations, respectively, because BoundCollection is a canonical * representation of a set partition bounds (for given partitioning strategy). */ bool partition_bounds_equal(PartitionKey key, Spurious hunk. + * For an umpartitioned table, it returns NULL. Spelling. + * two arguemnts and returns boolean. For types, it suffices to match Spelling. + * partition key expression is stored as a single member list to accomodate Spelling. + * For a base relation, construct an array of partition key expressions. Each + * partition key expression is stored as a single member list to accomodate + * more partition keys when relations are joined. How would joining relations result in more partitioning keys getting added? Especially given the comment for the preceding function, which says that a new PartitionScheme gets created unless an exact match is found. +if (!lc) Test lc == NIL instead of !lc. +extern int +PartitionSchemeGetNumParts(PartitionScheme part_scheme) +{ +return part_scheme ? part_scheme->nparts : 0; +} I'm not convinced it's a very good idea for this function to have special handling for when part_scheme is NULL. In try_partition_wise_join() that checks is not needed because it's already been done, and in generate_partition_wise_join_paths it is needed but only because you are initializing nparts too early. If you move this initialization down below the IS_DUMMY_REL() check you won't need the NULL guard. I would ditch this function and let the callers access the structure member directly. +extern int +PartitionSchemeGetNumKeys(PartitionScheme part_scheme) +{ +return part_scheme ? part_scheme->partnatts : 0; +} Similarly here. have_partkey_equi_join should probably have a quick-exit path when part_scheme is NULL, and then num_pks can be set afterwards unconditionally. Same for match_expr_to_partition_keys. build_joinrel_partition_info already has it and doesn't need this double-check. +extern Oid * +PartitionDescGetPartOids(PartitionDesc part_desc) +{ +Oid *part_oids; +intcnt_parts; + +if (!part_desc || part_desc->nparts <= 0) +return NULL; + +part_oids = (Oid *) palloc(sizeof(Oid) * part_desc->nparts); +for (cnt_parts = 0; cnt_parts < part_desc->nparts; cnt_parts++) +part_oids[cnt_parts] = part_desc->oids[cnt_parts]; + +return part_oids; +} I may be missing something, but this looks like a bad idea in multiple ways. First, you've got checks for part_desc's validity here that should be in the caller, as noted above. Second, you're copying an array by looping instead of using memcpy(). Third, the one and only caller is set_append_rel_size, which doesn't seem to have any need to copy this data in the first place. If there is any possibility that the PartitionDesc is going to change under us while that function is running, something is deeply broken. Nothing in the planner is going to cope with the table structure changing under us, so it had better not. +/* + * For a partitioned relation, we will save the child RelOptInfos in parent + * RelOptInfo in the same the order as corresponding bounds/lists are + * stored in the partition scheme. + */ This comment seems misplaced; shouldn't it be next to the code that is actually doing this, rather than the code that is merely setting up for it? And, also, the comment implies that we're doing this instead of what we'd normally do, whereas I think we are actually doing something additional. +/* + * Save topmost parent's relid. If the parent itself is a child of some + * other relation, use parent's topm
Re: [HACKERS] Make flex/bison checks stricter in Git trees
> On 27 Sep 2016, at 15:49, Tom Lane wrote: > > Daniel Gustafsson writes: >> When running ./configure on a system without Flex/Bison it’s easy to miss the >> warning that flies past and then run into compilation error instead. When it >> happened to a colleague yesterday a brief discussion came to the conclusion >> that it would be neat it the flex and bison checks took the existence of the >> generated files into consideration. > >> Attached patch scans for the generated files and iff flex or bison isn’t >> found >> in a non-cross compilation build, errors out in case the generated filed >> don’t >> exist while retaining the warning in case they do. > > Not exactly convinced this is a good idea. What if the files exist but > are out of date? Wouldn’t that be the same as today if so? > More generally, how much advantage is there really in > failing at configure rather than build time? The error reporting is clearer IMO but it’s a matter of taste. > The subtext here is that I'm disinclined to load more behavior into > configure while we're waiting to see if cmake conversion happens. > That job is tough enough without the autoconf sources being more of > a moving target than they have to be. Fair enough, that’s a very valid argument. Thanks! cheers ./daniel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Detect supported SET parameters when pg_restore is run
On 9/27/16, Tom Lane wrote: > Vitaly Burovoy writes: >> On 9/27/16, Tom Lane wrote: >>> The general policy has always been that pg_dump output is only expected >>> to >>> restore without errors into a server that's the same or newer version as >>> pg_dump (regardless of the server version being dumped from). > >> Why can't I use it if pg_dump92/pg_restore92 have the same behavior as >> pg_dump96/pg_restore96 except the SET block? > > That's a pretty large "if", and not one I'm prepared to commit to. > Before you assert that it's true, you might want to reflect on the > size of the diff between 9.2 and 9.6 pg_dump (hint: it's over 20K > lines in -u format). > >>> Either that, or the patch is overwriting pg_dump's idea of what the >>> source server version is at the start of the output phase, which is >>> likely to break all kinds of stuff when dumping from an older server.) > >> I agree, that's why I left current behavior as is for the plain text >> output. > > I'm not exactly convinced that you did. There's only one copy of > Archive->remoteVersion, and you're overwriting it long before the > dump process is over. I'm sorry, I'm not so good at knowing the code base but I see that my patch changes Archive->remoteVersion in the "RestoreArchive" which is called after all schema is fetched to pg_dump's structs and just before output is written, moreover, there is a comment that it is a final stage (9.2 has the same block of code): ... /* * And finally we can do the actual output. *... */ if (plainText) RestoreArchive(fout); CloseArchive(fout); exit_nicely(0); } It does not seem that I'm "overwriting it long before the dump process is over"... Also pg_dump -v proves that changing remoteVersion happens after all "pg_dump: finding *", "pg_dump: reading *" and "pg_dump: saving *". > That'd break anything that consulted it to > find the source server's version after RestoreArchive starts. I'm sorry again, could you or anyone else point me what part of the code use Archive->remoteVersion after schema is read? I set up break point at the line AHX->remoteVersion = 99; and ran pg_dump with plain text output to a file (via "-f" option). When the line was reached I set up break points at all lines I could find by a script: egrep -n -r --include '*.c' 'remoteVersion [><]' src/bin/pg_dump/ | awk -F: '{print "b "$1":"$2}' (there were 217 of them) and continued debuging. All three fired break points were expectedly in _doSetFixedOutputState (at the lines where my patch introduced them) after which the program exited normally without stopping. Also I wonder that the process of "restoring" consults Archive->remoteVersion because currently in the pg_restore to plain text output remoteVersion is zero. It means restoring process would output schema to the oldest supported version, but is not so. > I could get behind a patch that split remoteVersion into sourceVersion > and targetVersion and made sure that all the existing references to > remoteVersion were updated to the correct one of those. After that > we could do something like what you have here in a less shaky fashion. > As Robert noted, there'd probably still be a long way to go before it'd > really work to use a newer pg_dump with an older target version, but at > least we'd not be building on quicksand. It means support of restoring from a newer version to an older one. What's for? If new features are used it is impossible to restore (port) them to an older version, if they are not, restoring process is done (i guess) in 99% of cases. -- Best regards, Vitaly Burovoy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] to_date_valid()
Peter Eisentraut writes: > I think using ValidateDate() was the right idea. That is what we use > for checking date validity everywhere else. Note that we've got two different CF entries, and threads, covering fundamentally the same territory here, ie making to_timestamp et al behave more sanely. The other thread is https://commitfest.postgresql.org/10/713/ https://www.postgresql.org/message-id/flat/1873520224.1784572.1465833145330.JavaMail.yahoo%40mail.yahoo.com Robert pointed out several times in the other thread that to_timestamp and friends were intended to be Oracle-compatible, which I agree with. It was also pointed out that Oracle does range validity checks on the timestamp component fields, which I verified just now using http://rextester.com/l/oracle_online_compiler (which is what I found by googling after discovering that sqlfiddle isn't working very well, sigh). I got these results: select banner as "oracle version" from v$version Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production PL/SQL Release 11.2.0.2.0 - Production CORE11.2.0.2.0 Production TNS for 64-bit Windows: Version 11.2.0.2.0 - Production NLSRTL Version 11.2.0.2.0 - Production SELECT TO_TIMESTAMP('2016-06-13 99:99:99', '-MM-DD HH24:MI:SS') FROM dual ORA-01850: hour must be between 0 and 23 SELECT TO_TIMESTAMP('2016-06-13 19:99:99', '-MM-DD HH24:MI:SS') FROM dual ORA-01851: minutes must be between 0 and 59 SELECT TO_TIMESTAMP('2016-06-13 19:59:99', '-MM-DD HH24:MI:SS') FROM dual ORA-01852: seconds must be between 0 and 59 SELECT TO_TIMESTAMP('2016-06-13 19:59:59', '-MM-DD HH24:MI:SS') FROM dual 13.06.2016 19:59:59 SELECT TO_TIMESTAMP('2016-06-13 19:59:59', '-MM-DD HH:MI:SS') FROM dual ORA-01849: hour must be between 1 and 12 SELECT TO_TIMESTAMP('2016-02-30 19:59:59', '-MM-DD HH24:MI:SS') FROM dual ORA-01839: date not valid for month specified SELECT TO_TIMESTAMP('2016-02-29 19:59:59', '-MM-DD HH24:MI:SS') FROM dual 29.02.2016 19:59:59 SELECT TO_TIMESTAMP('2015-02-29 19:59:59', '-MM-DD HH24:MI:SS') FROM dual ORA-01839: date not valid for month specified I think this is sufficient evidence to show that we ought to change to_timestamp et al. to apply range checking on the component fields. And while I've not attempted to characterize exactly what Oracle does with extra spaces, non-matching punctuation, etc, it's also clear to me that their behavior is different and probably saner than ours. So I vote for fixing these functions to behave more like Oracle, and forgetting about having a separate family of to_date_valid() functions or optional parameters or anything of the sort. I might've been in favor of that, until I saw how far down the rabbit hole this thread had gotten. Let's just call it a bug and fix it. Having range checking after the field scanning phase also changes the terms of discussion about what we need to do in the scanning phase, since it would catch many (of course not all) of the problems that arise from field boundary issues. So I think we should get the range changing committed first and then fool with the scanner. The 0002 patch that Artur sent for this purpose in https://www.postgresql.org/message-id/22dbe4e0-b550-ca86-8634-adcda0faa...@postgrespro.ru seems like the right approach to me, though I've not read it in detail yet. I propose to work through that and commit it. I'm much less sold on his 0001 patch, but I tend to agree that what we want is to adjust the scanner behavior. I do not like the reverse-conversion wrapper that Andreas proposes here; quite aside from micro-efficiency issues, it seems to me that that just begs the question of how you know what the input is supposed to mean. In short, I think we should reject this thread + CF entry altogether and instead move forward with the approach in the other thread. It's probably too late in the September CF to close out the other submission entirely, but we could get the range checking committed and then RWF the other part for reconsideration later. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Detect supported SET parameters when pg_restore is run
Vitaly Burovoy writes: > On 9/27/16, Tom Lane wrote: >> I'm not exactly convinced that you did. There's only one copy of >> Archive->remoteVersion, and you're overwriting it long before the >> dump process is over. > It does not seem that I'm "overwriting it long before the dump process > is over"... There's a lot that happens during RestoreArchive. Even if none of it inspects remoteVersion today, I do not think that's a safe assumption to make going forward. The easiest counterexample is that this very bit of code you want to add does so. I really do not want to get into a design that says "remoteVersion means the source server version until we reach RestoreArchive, and the target version afterwards". That way madness lies. If we're going to try altering the emitted SQL based on target version, let's first create a separation between those concepts; otherwise I will bet that we add more bugs than we remove. (The other thing I'd want here is a --target-version option so that you could get the same output alterations in pg_dump or pg_restore to text. Otherwise it's nigh undebuggable, and certainly much harder to test than it needs to be.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress
On 9/27/16 6:16 AM, Kyotaro HORIGUCHI wrote: I apologize in advance that the comments in this message might one of the ideas discarded in the past thread.. I might not grasp the discussion completely X( The attached patches are rebased to the master and additional one mentioned below. I tried the attached patch set and noticed an interesting behavior. With archive_timeout=5 whenever I made a change I would get a WAL segment within a few seconds as expected then another one would follow a few minutes later. Database init: 16M Sep 27 20:05 00010001 16M Sep 27 20:09 00010002 Create test table: 16M Sep 27 20:13 00010003 16M Sep 27 20:15 00010004 Insert row into test table: 16M Sep 27 20:46 00010005 16M Sep 27 20:49 00010006 The cluster was completely idle with no sessions connected in between those three commands. Is it possible this is caused by: +* switch segment only when any substantial progress have made from +* the last segment switching by timeout. Segment switching by other +* reasons will cause last_xlog_switch_lsn stay behind but it doesn't +* matter since it just causes possible one excessive segment switch. */ I would like to give Michael a chance to respond to the updated patches before delving deeper. -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 09/26/2016 08:48 PM, Tomas Vondra wrote: On 09/26/2016 07:16 PM, Tomas Vondra wrote: The averages (over the 10 runs, 5 minute each) look like this: 3.2.80 1 8 16 32 64128192 granular-locking1567 12146 26341 44188 43263 49590 15042 no-content-lock 1567 12180 25549 43787 43675 51800 16831 group-update1550 12018 26121 44451 42734 51455 15504 master 1566 12057 25457 42299 42513 42562 10462 4.5.5 1 8 16 32 64128192 granular-locking3018 19031 27394 29222 32032 34249 36191 no-content-lock 2988 18871 27384 29260 32120 34456 36216 group-update2960 18848 26870 29025 32078 34259 35900 master 2984 18917 26430 29065 32119 33924 35897 So, I got the results from 3.10.101 (only the pgbench data), and it looks like this: 3.10.101 1 8 16 32 64128192 granular-locking2582 18492 33416 49583 53759 53572 51295 no-content-lock 2580 18666 33860 49976 54382 54012 51549 group-update2635 18877 33806 49525 54787 54117 51718 master 2630 18783 33630 49451 54104 53199 50497 So 3.10.101 performs even better tnan 3.2.80 (and much better than 4.5.5), and there's no sign any of the patches making a difference. It also seems there's a major regression in the kernel, somewhere between 3.10 and 4.5. With 64 clients, 3.10 does ~54k transactions, while 4.5 does only ~32k - that's helluva difference. I wonder if this might be due to running the benchmark on unlogged tables (and thus not waiting for WAL), but I don't see why that should result in such drop on a new kernel. In any case, this seems like an issue unrelated to the patch, so I'll post further data into a new thread instead of hijacking this one. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add support for restrictive RLS policies
Jeevan, * Jeevan Chalke (jeevan.cha...@enterprisedb.com) wrote: > Here are the review comments: [... many good comments ...] Many thanks for the review, I believe I agree with pretty much all your comments and will update the patch accordingly. Responses for a couple of them are below. > 6. I think following changes are irrelevant for this patch. > Should be submitted as separate patch if required. Those changes were adding support for tab completion of the new restrictive / permissive options, which certainly seems appropriate for the patch which is adding those options. I realize it looks like a lot for just two new options, but that's because there's a number of ways to get to them. > 10. ALTER POLICY has no changes for this. Any reason why that is not > considered here. Generally speaking, I don't believe it makes sense to flip a policy from permissive to restrictive or vice-versa, they're really quite different things. We also don't support altering the "command" type for a policy. > 12. Since PERMISSIVE is default, we should dump only "RESTRICTIVE" when > appropriate, like other default cases. > strcmp(polinfo->polpermissive,"t") == 0 ? "PERMISSIVE" : "RESTRICTIVE" Sure, we could do that, though I suppose we'd want to do that for all of the defaults for policies. > 13. Since PERMISSIVE is default, do we need changes like below? > -\QCREATE POLICY p1 ON test_table FOR ALL TO PUBLIC \E > +\QCREATE POLICY p1 ON test_table AS PERMISSIVE FOR ALL TO > PUBLIC \E > > I am not familiar or used TAP framework. So no idea about these changes. If we change pg_dump to not output AS PERMISSIVE for permissive policies, then the TAP test will need to be updated to not include AS PERMISSIVE (or FOR ALL TO PUBLIC, if we're going down that route). Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Detect supported SET parameters when pg_restore is run
On 9/27/16, Tom Lane wrote: > Vitaly Burovoy writes: >> On 9/27/16, Tom Lane wrote: >>> I'm not exactly convinced that you did. There's only one copy of >>> Archive->remoteVersion, and you're overwriting it long before the >>> dump process is over. > >> It does not seem that I'm "overwriting it long before the dump process >> is over"... > > There's a lot that happens during RestoreArchive. Even if none of it > inspects remoteVersion today, I do not think that's a safe assumption to > make going forward. And... Khm... Note that even _now_ AHX->remoteVersion is set to a database version pg_restore connects to... So all the code has it during restoring process... > The easiest counterexample is that this very bit of > code you want to add does so. The only change I've done is set remoteVersion to the maximum allowed when output is the plain text format. > I really do not want to get into a design > that says "remoteVersion means the source server version until we reach > RestoreArchive, and the target version afterwards". That way madness lies. It is only if you think about "remoteVersion" as "sourceServerVersion", but even now it is not so. Moreover RestoreArchive is a delimter only for pg_dump and only when output is a plain text. For other modes of the pg_dump RestoreArchive is not called at all. > If we're going to try altering the emitted SQL based on target version, > let's first create a separation between those concepts; I've just found there is _archiveHandle.archiveRemoteVersion. Is it a parameter you were searched for? The pg_restore code does not use it (just as remoteVersion), but it can do so if it is necessary. > otherwise I will bet that we add more bugs than we remove. > > (The other thing I'd want here is a --target-version option so that > you could get the same output alterations in pg_dump or pg_restore to > text. Otherwise it's nigh undebuggable, and certainly much harder > to test than it needs to be.) I thought that way. I'm ready to introduce that parameter, but again, I see now it will influence only SET parameters. Does it worth it? -- Best regards, Vitaly Burovoy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LLVM Address Sanitizer (ASAN) and valgrind support
Hi, On 2015-09-07 17:05:10 +0100, Greg Stark wrote: > I feel like I remember hearing about this before but I can't find any > mention of it in my mail archives. It seems pretty simple to add > support for LLVM's Address Sanitizer (asan) by using the hooks we > already have for valgrind. Any plans to pick this up again? > In fact I think this would actually be sufficient. I'm not sure what > the MEMPOOL valgrind stuff is though. I don't think it's relevant to > address sanitizer which only tracks references to free'd or > unallocated pointers. It'd be nice to add msan support, so uninitialized accesses are also tracked. (oh, you suggest that below) I vote for renaming the VALGRIND names etc. to something more tool-neutral. I think it's going to be too confusing otherwise. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [GENERAL] inconsistent behaviour of set-returning functions in sub-query with random()
Tom van Tilburg writes: > Good to know and I agree that it is not an urgent case. > I think this practice might be more common in the POSTGIS community where > there are plenty of set-returning-functions used in this way. My use was > taking a random sample of a pointcloud distrubution. Fix pushed to HEAD only. Thanks for the report! regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Detect supported SET parameters when pg_restore is run
On 9/27/16, Vitaly Burovoy wrote: > On 9/27/16, Tom Lane wrote: >> (The other thing I'd want here is a --target-version option so that >> you could get the same output alterations in pg_dump or pg_restore to >> text. Otherwise it's nigh undebuggable, and certainly much harder >> to test than it needs to be.) > > I thought that way. I'm ready to introduce that parameter, but again, > I see now it will influence only SET parameters. Does it worth it? The only reason I have not implemented it was attempt to avoid users being confused who could think that result of pg_dump (we need it there for the plain text output) or pg_restore can be converted for target version to be restored without new features (but now it is wrong). -- Best regards, Vitaly Burovoy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] hash_create(nelem = 0) does invalid memory accesses
Hi, debugging a citus valgrind bleat I noticed that hash_create() accesses the result of palloc(0) as an hash element: HTAB * hash_create(const char *tabname, long nelem, HASHCTL *info, int flags) { ... if ((flags & HASH_SHARED_MEM) || nelem < hctl->nelem_alloc) { if (!element_alloc(hashp, (int) nelem)) ereport(ERROR, (errcode(ERRCODE_OUT_OF_MEMORY), errmsg("out of memory"))); } ...} I.e. e call element_alloc with nelem = 0. There we then do: static bool element_alloc(HTAB *hashp, int nelem) { ... firstElement = (HASHELEMENT *) hashp->alloc(nelem * elementSize); ... firstElement->link = hctlv->freeList; } which means we'll write to the result of palloc(0). Do we consider this an API usage error that we want to fix? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LLVM Address Sanitizer (ASAN) and valgrind support
On Tue, Sep 27, 2016 at 11:02 PM, Andres Freund wrote: > Any plans to pick this up again? Yeah, I was just thinking I should pick this up again. > I vote for renaming the VALGRIND names etc. to something more tool-neutral. I > think it's going to be too confusing otherwise. Hm, the danger there is once I start refactoring things I could get bogged down... I would love to remove all the #ifdef's and have the macros just be no-ops if they're compiled out for example... -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LLVM Address Sanitizer (ASAN) and valgrind support
On 2016-09-28 00:23:11 +0100, Greg Stark wrote: > On Tue, Sep 27, 2016 at 11:02 PM, Andres Freund wrote: > > Any plans to pick this up again? > > Yeah, I was just thinking I should pick this up again. > > > I vote for renaming the VALGRIND names etc. to something more tool-neutral. > > I think it's going to be too confusing otherwise. > > Hm, the danger there is once I start refactoring things I could get > bogged down... Meh, adding a neutral name seems pretty harmless. > I would love to remove all the #ifdef's and have the > macros just be no-ops if they're compiled out for example... Don't we pretty much have that? #ifdef USE_VALGRIND #include #else #define VALGRIND_CHECK_MEM_IS_DEFINED(addr, size) do {} while (0) #define VALGRIND_CREATE_MEMPOOL(context, redzones, zeroed) do {} while (0) #define VALGRIND_DESTROY_MEMPOOL(context) do {} while (0) #define VALGRIND_MAKE_MEM_DEFINED(addr, size) do {} while (0) #define VALGRIND_MAKE_MEM_NOACCESS(addr, size) do {} while (0) #define VALGRIND_MAKE_MEM_UNDEFINED(addr, size) do {} while (0) #define VALGRIND_MEMPOOL_ALLOC(context, addr, size) do {} while (0) #define VALGRIND_MEMPOOL_FREE(context, addr)do {} while (0) #define VALGRIND_MEMPOOL_CHANGE(context, optr, nptr, size) do {} while (0) #endif ? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LLVM Address Sanitizer (ASAN) and valgrind support
Andres Freund writes: > On 2016-09-28 00:23:11 +0100, Greg Stark wrote: >> I would love to remove all the #ifdef's and have the >> macros just be no-ops if they're compiled out for example... > Don't we pretty much have that? I think "((void) 0)" is a more common spelling of a dummy statement. Though there's little wrong with it as it is. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash_create(nelem = 0) does invalid memory accesses
Andres Freund writes: > debugging a citus valgrind bleat I noticed that hash_create() accesses > the result of palloc(0) as an hash element: > Do we consider this an API usage error that we want to fix? I think Assert(nelem > 0) would be an appropriate response. There are probably issues in sizing the hashtable quite aside from this one. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LLVM Address Sanitizer (ASAN) and valgrind support
On 2016-09-27 19:31:57 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2016-09-28 00:23:11 +0100, Greg Stark wrote: > >> I would love to remove all the #ifdef's and have the > >> macros just be no-ops if they're compiled out for example... > > > Don't we pretty much have that? > > I think "((void) 0)" is a more common spelling of a dummy statement. > Though there's little wrong with it as it is. That's already committed code, I didn't propose to add it ;) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] should xlog_outdesc modify its argument?
The function static void xlog_outdesc(StringInfo buf, XLogReaderState *record); in src/backend/access/transam/xlog.c is called by XLogInsertRecord, and after returning a string describing an XLogRecord, it clears the state data in its XLogReaderState argument. That mixes the read-only semantics of "give me a string that describes this argument" and the read-write semantics of "clear out the value in this argument". That seems ok, except that xlog_outdesc is also called by the function static void rm_redo_error_callback(const void *arg); also in xlog.c, which appears to only want the string description of the argument, and not the side effect of clearing out the value. Now, perhaps under exceptional conditions it won't matter one way or the other if the xlog record gets modified; perhaps it's going to be discarded anyway. I didn't look far enough to find out. But this is the only function of all the (void)(void *arg) callback functions in the entire postgresql source tree that modifies its argument. I discovered this while trying to convert all the callback function signatures to (void)(const void *), and this was the only monkey-wrench in the works. Is this a bug? Do I just have to live with the unfortunate lack of orthogonality in the callback mechanisms? At the very least, there should perhaps be some documentation about how all this is intended to work. Thanks, please find my work-in-progress attempt and constify-ing these functions attached. Mark Dilger callback.patch.wip.1 Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] casting away const in comparators
Friends, comparators usually take arguments like (const void *a, const void *b) and do something read-only to a and b. In our sources, we typically cast these to something else, like (char *), and do something read-only. This generates a lot of warnings if using -Wcast-qual. To fix that, I have converted the casts to not cast away const. Please find my changes, attached. Mark Dilger comparators.patch.1 Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] typedef FileName not const?
Friends, along the lines of other similar emails from me of late, I tried to avoid casting away const when using the FileName typedef. There are several calls where a (const char *) has to be cast to (char *) due to FileName being typedef'd as non-const. But changing the typedef to const doesn't seem to conflict with any code in the source tree. Since this may be seen as an external API change, I kept these changes in their own patch submission, so that it can be rejected separately if need be. Mark Dilger filename.patch.1 Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] typedef FileName not const?
Hi, Can we please keep this topic in one thread? Anybody motivated to apply these isn't going to have an easy time applying things, and everyone else is just having a harder time sorting through the mails. On 2016-09-27 17:08:24 -0700, Mark Dilger wrote: > along the lines of other similar emails from me of late, > I tried to avoid casting away const when using the FileName > typedef. There are several calls where a (const char *) has to > be cast to (char *) due to FileName being typedef'd as > non-const. But changing the typedef to const doesn't seem to > conflict with any code in the source tree. I think the better fix here is to simply remove the typedef. It doesn't seem to have much of a benefit, and makes using correct types harder as demonstrated here. We don't even use it internally in fd.c.. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting huge pages on Windows
> From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas > On Sun, Sep 25, 2016 at 10:45 PM, Tsunakawa, Takayuki > wrote: > > Credit: This patch is based on Thomas Munro's one. > > How are they different? As Thomas mentioned, his patch (only win32_shmem.c) might not have been able to compile (though I didn't try.) And it didn't have error processing or documentation. I added error handling, documentation, comments, and a little bit of structural change. The possibly biggest change, though it's only one-liner in pg_ctl.c, is additionally required. I failed to include it in the first patch. The attached patch includes that. Regards Takayuki Tsunakawa win_large_page_v2.patch Description: win_large_page_v2.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gratuitous casting away const
Friends, here is another patch, this time fixing the casting away of const in the regex code. Mark Dilger regex.patch.1 Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tracking wait event for latches
On Mon, Sep 26, 2016 at 7:07 PM, Michael Paquier wrote: > On Mon, Sep 26, 2016 at 1:46 PM, Thomas Munro > wrote: >> If the class really is strictly implied by the WaitEventIdentifier, >> then do we really need to supply it everywhere when calling the >> various wait functions? That's going to be quite a few functions: >> WaitLatch, WaitLatchOrSocket, WaitEventSetWait for now, and if some >> more patches out there have legs then also ConditionVariableWait, >> BarrierWait, and possibly further kinds of wait points. And then all >> their call sites will have have to supply the wait class ID, even >> though it is implied by the other ID. Perhaps that array that >> currently holds the names should instead hold { classId, displayName } >> so we could just look it up? > > I considered this reverse-engineering, but arrived at the conclusion > that having a flexible API mattered more to give more flexibility to > module developers. In short this avoids having extra class IDs like > ActivityExtention, TimeoutExtension, etc. But perhaps that's just a > matter of taste.. Ok, if they really are independent then shouldn't we take advantage of that at call sites where we might be idle but we might also be waiting for the network? For example we could do this: /* Sleep until something happens or we time out */ WaitLatchOrSocket(MyLatch, wakeEvents, MyProcPort->sock, sleeptime, pq_is_send_pending() ? WAIT_CLIENT : WAIT_ACTIVITY, WE_WAL_SENDER_MAIN); Isn't WE_WAL_SENDER_WAIT_WAL primarily WAIT_IPC, not WAIT_CLIENT? Or perhaps "pq_is_send_pending() ? WAIT_CLIENT : WAIT_IPC". Actually, I'm still not sold on "Activity" and "Client". I think "Idle" and "Network" would be better. Everybody knows intuitively what "Idle" means. "Network" is better than "Client" because it avoids confusion about user applications vs replication connections or clients vs servers. + + + Activity: The server process is waiting for some + activity to happen on a socket. This is mainly used system processes + in their main processing loop. wait_event will identify + the type of activity waited for. + + "The server process is waiting for some activity to happen on a socket." Not true: could be a latch, or both. I think what this category is really getting at is that the server process is idle. Everything in it could otherwise go in the IPC or Client categories on the basis that it's mainly waiting for a socket or a latch, but we're deciding to separate the wait points representing "idleness" and put them here. How about: "The server process is idle. This is used by system processes waiting for activity in their main processing loop. wait_event will identify the specific wait point." + + + Client: The server process is waiting for some activity + on a socket from a client process, and that the server expects + something to happen that is independent from its internal processes. + wait_event will identify the type of client activity + waited for. + + Is it worth spelling out that "client process" here doesn't just mean user applications, it's also remote PostgreSQL servers doing replication? "wait_event" doesn't really identify the type of client activity waited for, it identifies the code that is waiting. + + + Extension: The server process is waiting for activity + in a plugin or an extension. This category is useful for plugin + and module developers to track custom waiting points. + + Plugin, extension, module? How about just "extension"? Why developers? + + + IPC: The server process is waiting for some activity + from an auxilliary process. wait_event will identify + the type of activity waited for. + + s/auxilliary/auxiliary/, but I wouldn't it be better to say something more general like "from another process in the cluster"? Background workers are not generally called auxiliary processes, and some of these wait points are waiting for those. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transactions involving multiple postgres foreign servers
On Tue, Sep 27, 2016 at 6:24 PM, Masahiko Sawada wrote: > * Providing the prepare id of 2PC. > Current patch adds new API prepare_id_provider() but we can use the > prepare id of 2PC that is used on parent server. And we assume that when this is used across many servers there will be no GID conflict because each server is careful enough to generate unique strings, say with UUIDs? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] typedef FileName not const?
> I think the better fix here is to simply remove the typedef. It doesn't > seem to have much of a benefit, and makes using correct types harder as > demonstrated here. We don't even use it internally in fd.c.. > Fine by me. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] make async slave to wait for lsn to be replayed
On Thu, Sep 15, 2016 at 2:41 PM, Thomas Munro wrote: > On Thu, Sep 1, 2016 at 2:16 AM, Ivan Kartyshov > wrote: >> Hi hackers, >> >> Few days earlier I've finished my work on WAITLSN statement utility, so I’d >> like to share it. >> [...] >> Your feedback is welcome! >> >> [waitlsn_10dev.patch] > > Thanks for working on this. Here are some general thoughts on the > feature, and an initial review. Hi Ivan I'm marking the patch Returned with Feedback, since there hasn't been any response or a new patch. I encourage you to keep working on this feature, and I'll be happy to review future patches. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] sloppy handling of pointers
rangetypes_spgist.c contains a call to detoast a pointer which was just returned from a detoast operation. This seems to do no harm, but is to my mind a sloppy thinko. Fix attached. tsgistidx.c contains a detoast of a pointer where detoast of a datum was likely intended. Fix attached. tsrank.c contains some arguably correct casts which I found slightly less correct than an alternative that I've attached. I'm pretty indifferent on this one, and just as happy if it is not included. Mark Dilger tidy.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Should we fix postmaster to avoid slow shutdown?
> From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tom Lane > Allowing SIGQUIT to prompt fast shutdown of the stats collector seems sane, > though. Try to make sure it doesn't leave partly-written stats files > behind. The attached patch based on HEAD does this. I'd like this to be back-patched because one of our important customers uses 9.2. I didn't remove partially written stat files on SIGQUIT for the following reasons. Is this OK? 1. The recovery at the next restart will remove the stat files. 2. SIGQUIT processing should be as fast as possible. 3. If writing stats files took long due to the OS page cache flushing, removing files might be forced to wait likewise. > FWIW, I'm pretty much -1 on messing with the timing of the socket close > actions. I broke that once within recent memory, so maybe I'm gun-shy, > but I think that the odds of unpleasant side effects greatly outweigh any > likely benefit there. Wasn't it related to TouchSocketFiles()? Can I see the discussion on this ML? I don't see any problem looking at the code... Regards Takayuki Tsunakawa 01_pgstat_avoid_writing_on_sigquit.patch Description: 01_pgstat_avoid_writing_on_sigquit.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Detect supported SET parameters when pg_restore is run
2016-09-27 23:12 GMT+02:00 Tom Lane : > Vitaly Burovoy writes: > > On 9/27/16, Tom Lane wrote: > >> I'm not exactly convinced that you did. There's only one copy of > >> Archive->remoteVersion, and you're overwriting it long before the > >> dump process is over. > > > It does not seem that I'm "overwriting it long before the dump process > > is over"... > > There's a lot that happens during RestoreArchive. Even if none of it > inspects remoteVersion today, I do not think that's a safe assumption to > make going forward. The easiest counterexample is that this very bit of > code you want to add does so. I really do not want to get into a design > that says "remoteVersion means the source server version until we reach > RestoreArchive, and the target version afterwards". That way madness > lies. If we're going to try altering the emitted SQL based on target > version, let's first create a separation between those concepts; otherwise > I will bet that we add more bugs than we remove. > > (The other thing I'd want here is a --target-version option so that > you could get the same output alterations in pg_dump or pg_restore to > text. Otherwise it's nigh undebuggable, and certainly much harder > to test than it needs to be.) > This options likes like very good idea. Regards Pavel > > regards, tom lane > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Re: [HACKERS] Supporting huge pages on Windows
On Wed, Sep 28, 2016 at 1:26 PM, Tsunakawa, Takayuki wrote: >> From: pgsql-hackers-ow...@postgresql.org >> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas >> On Sun, Sep 25, 2016 at 10:45 PM, Tsunakawa, Takayuki >> wrote: >> > Credit: This patch is based on Thomas Munro's one. >> >> How are they different? > > As Thomas mentioned, his patch (only win32_shmem.c) might not have been able > to compile (though I didn't try.) And it didn't have error processing or > documentation. I added error handling, documentation, comments, and a little > bit of structural change. The possibly biggest change, though it's only > one-liner in pg_ctl.c, is additionally required. I failed to include it in > the first patch. The attached patch includes that. Right, my patch[1] was untested sketch code. I had heard complaints about poor performance with large shared_buffers on Windows, and then I bumped into some recommendations to turn large pages on for a couple of other RDBMSs if using large buffer pool. So I wrote that patch based on the documentation to try some time in the future if I ever got trapped in a room with Windows, but I posted it just in case the topic would interest other hackers. Thanks for picking it up! > huge_pages=off: 70412 tps > huge_pages=on : 72100 tps Hmm. I guess it could be noise or random code rearrangement effects. I saw your recent post[2] proposing to remove the sentence about the 512MB effective limit and I wondered why you didn't go to larger sizes with a larger database and more run time. But I will let others with more benchmarking experience comment on the best approach to investigate Windows shared_buffers performance. [1] https://www.postgresql.org/message-id/CAEepm=075-bghi_vdt4scamt+o_+1xarap2zh7xwfzvt294...@mail.gmail.com [2] https://www.postgresql.org/message-id/0A3221C70F24FB45833433255569204D1F5EE995@G01JPEXMBYT05 -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Requesting some information about the small portion of source code of postgreSQL
On Tue, Sep 27, 2016 at 11:43 AM, Srinivas Karthik V wrote: > Dear PostgresSQL Hackers, >I am working in optimizer module of postgreSQL 9.4.1. I am trying to > return a subplan for a query instead of full plan.For this I need to return > an intermediate plan (or path) from the DP lattice (i.e. from RelOptInfo > *standard_join_search() at allpaths.c) instead of the full optimal plan > (which is from the last level of root->join_rel_level()). Why do you want to do that? > while doing so I am getting error :variable not found in subplan target > lists at function fix_join_expr_mutator at setrefs.c. > > It will be great if you can give me some idea about why this error is > happening, since this error is happening at fix_join_expr_mutator function > at setrefs.c. Please give me more information about this portion of code. > Also I would like to know for what targetlist stands for. > setref.c implements the final stage of planner, where the variable/expression references in the parent plan nodes are fixed by pointing them to expressions/parts of expressions/variables from their children. This error means that some of the things that parent node is referring to are not available in the children as expected. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Some information about the small portion of source code of postgreSQL
This mail looks exactly same as that from Shrinivas. Please check answers there. On Tue, Sep 27, 2016 at 11:32 AM, davinder singh wrote: > Hi, >I am working in optimizer module of postgreSQL 9.4.1. My project is > concerned with the plan returned by the optimizer. > > I am trying to return a subplan for a query instead of full plan. For this I > need to return a path form RelOptInfo *standard_join_search() at allpaths.c > other than from last level of root->join_rel_level(). > while doing so I am getting error :variable not found in subplan target > lists at function fix_join_expr_mutator at setrefs.c. > > It will be great if you can give me some idea about why this error is > happening,since this error is happening at above function, Please give me > more information about that portion of code. Also I want to know for what > targetlist stands for. > > > Thanks, > Davinder Singh -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] asynchronous execution
On 24 September 2016 at 06:39, Robert Haas wrote: > Since Kyotaro Horiguchi found that my previous design had a > system-wide performance impact due to the ExecProcNode changes, I > decided to take a different approach here: I created an async > infrastructure where both the requestor and the requestee have to be > specifically modified to support parallelism, and then modified Append > and ForeignScan to cooperate using the new interface. Hopefully that > means that anything other than those two nodes will suffer no > performance impact. Of course, it might have other problems I see that the reason why you re-designed the asynchronous execution implementation is because the earlier implementation showed performance degradation in local sequential and local parallel scans. But I checked that the ExecProcNode() changes were not that significant as to cause the degradation. It will not call ExecAsyncWaitForNode() unless that node supports asynchronism. Do you feel there is anywhere else in the implementation that is really causing this degrade ? That previous implementation has some issues, but they seemed solvable. We could resolve the plan state recursion issue by explicitly making sure the same plan state does not get called again while it is already executing. Thanks -Amit Khandekar -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: pg_ctl: Detect current standby state from pg_control
On Tue, Sep 27, 2016 at 9:55 AM, Michael Paquier wrote: > Seems overcomplicated to me. How about returning the control file all > the time and let the caller pfree the result? You could then use > crc_ok in pg_ctl.c's get_control_dbstate() to do the decision-making. In short I would just go with the attached and call it a day. -- Michael diff --git a/src/backend/utils/misc/pg_controldata.c b/src/backend/utils/misc/pg_controldata.c index 4f9de83..0c67833 100644 --- a/src/backend/utils/misc/pg_controldata.c +++ b/src/backend/utils/misc/pg_controldata.c @@ -34,6 +34,7 @@ pg_control_system(PG_FUNCTION_ARGS) TupleDesc tupdesc; HeapTuple htup; ControlFileData *ControlFile; + bool crc_ok; /* * Construct a tuple descriptor for the result row. This must match this @@ -51,8 +52,8 @@ pg_control_system(PG_FUNCTION_ARGS) tupdesc = BlessTupleDesc(tupdesc); /* read the control file */ - ControlFile = get_controlfile(DataDir, NULL); - if (!ControlFile) + ControlFile = get_controlfile(DataDir, NULL, &crc_ok); + if (!crc_ok) ereport(ERROR, (errmsg("calculated CRC checksum does not match value stored in file"))); @@ -83,6 +84,7 @@ pg_control_checkpoint(PG_FUNCTION_ARGS) ControlFileData *ControlFile; XLogSegNo segno; char xlogfilename[MAXFNAMELEN]; + bool crc_ok; /* * Construct a tuple descriptor for the result row. This must match this @@ -130,8 +132,8 @@ pg_control_checkpoint(PG_FUNCTION_ARGS) tupdesc = BlessTupleDesc(tupdesc); /* Read the control file. */ - ControlFile = get_controlfile(DataDir, NULL); - if (!ControlFile) + ControlFile = get_controlfile(DataDir, NULL, &crc_ok); + if (!crc_ok) ereport(ERROR, (errmsg("calculated CRC checksum does not match value stored in file"))); @@ -216,6 +218,7 @@ pg_control_recovery(PG_FUNCTION_ARGS) TupleDesc tupdesc; HeapTuple htup; ControlFileData *ControlFile; + bool crc_ok; /* * Construct a tuple descriptor for the result row. This must match this @@ -235,8 +238,8 @@ pg_control_recovery(PG_FUNCTION_ARGS) tupdesc = BlessTupleDesc(tupdesc); /* read the control file */ - ControlFile = get_controlfile(DataDir, NULL); - if (!ControlFile) + ControlFile = get_controlfile(DataDir, NULL, &crc_ok); + if (!crc_ok) ereport(ERROR, (errmsg("calculated CRC checksum does not match value stored in file"))); @@ -268,6 +271,7 @@ pg_control_init(PG_FUNCTION_ARGS) TupleDesc tupdesc; HeapTuple htup; ControlFileData *ControlFile; + bool crc_ok; /* * Construct a tuple descriptor for the result row. This must match this @@ -303,8 +307,8 @@ pg_control_init(PG_FUNCTION_ARGS) tupdesc = BlessTupleDesc(tupdesc); /* read the control file */ - ControlFile = get_controlfile(DataDir, NULL); - if (!ControlFile) + ControlFile = get_controlfile(DataDir, NULL, &crc_ok); + if (!crc_ok) ereport(ERROR, (errmsg("calculated CRC checksum does not match value stored in file"))); diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c index e92feab..20077a6 100644 --- a/src/bin/pg_controldata/pg_controldata.c +++ b/src/bin/pg_controldata/pg_controldata.c @@ -86,6 +86,7 @@ int main(int argc, char *argv[]) { ControlFileData *ControlFile; + bool crc_ok; char *DataDir = NULL; time_t time_tmp; char pgctime_str[128]; @@ -155,8 +156,8 @@ main(int argc, char *argv[]) } /* get a copy of the control file */ - ControlFile = get_controlfile(DataDir, progname); - if (!ControlFile) + ControlFile = get_controlfile(DataDir, progname, &crc_ok); + if (!crc_ok) printf(_("WARNING: Calculated CRC checksum does not match value stored in file.\n" "Either the file is corrupt, or it has a different layout than this program\n" "is expecting. The results below are untrustworthy.\n\n")); diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c index 052b02e..ab10d2f 100644 --- a/src/bin/pg_ctl/pg_ctl.c +++ b/src/bin/pg_ctl/pg_ctl.c @@ -2147,28 +2147,18 @@ static DBState get_control_dbstate(void) { DBState ret; + bool crc_ok; + ControlFileData *control_file_data = get_controlfile(pg_data, progname, &crc_ok); - for (;;) + if (!crc_ok) { - ControlFileData *control_file_data = get_controlfile(pg_data, progname); - - if (control_file_data) - { - ret = control_file_data->state; - pfree(control_file_data); - return ret; - } - - if (wait_seconds > 0) - { - pg_usleep(100); /* 1 sec */ - wait_seconds--; - continue; - } - write_stderr(_("%s: control file appears to be corrupt\n"), progname); exit(1); } + + ret = control_file_data->state; + pfree(control_file_data); + return ret; } diff --git a/src/common/controldata_utils.c b/src/common/controldata_utils.c index f218d25..822fda7 100644 --- a/src/common/controldata_utils.c +++ b/src/common/controldata_utils.c @@ -29,21 +29,24 @@ #include "port/pg_crc32c.h" /* - * get_controlfile(char *DataDir, const char *progname) + * get_controlfile(char *DataDir, const char
Re: [HACKERS] Transactions involving multiple postgres foreign servers
On Tue, Sep 27, 2016 at 9:06 PM, Ashutosh Bapat wrote: > On Tue, Sep 27, 2016 at 2:54 PM, Masahiko Sawada > wrote: >> On Mon, Sep 26, 2016 at 9:07 PM, Ashutosh Bapat >> wrote: >>> On Mon, Sep 26, 2016 at 5:25 PM, Masahiko Sawada >>> wrote: On Mon, Sep 26, 2016 at 7:28 PM, Ashutosh Bapat wrote: > My original patch added code to manage the files for 2 phase > transactions opened by the local server on the remote servers. This > code was mostly inspired from the code in twophase.c which manages the > file for prepared transactions. The logic to manage 2PC files has > changed since [1] and has been optimized. One of the things I wanted > to do is see, if those optimizations are applicable here as well. Have > you considered that? > > Yeah, we're considering it. After these changes are committed, we will post the patch incorporated these changes. But what we need to do first is the discussion in order to get consensus. Since current design of this patch is to transparently execute DCL of 2PC on foreign server, this code changes lot of code and is complicated. >>> >>> Can you please elaborate. I am not able to understand what DCL is >>> involved here. According to [1], examples of DCL are GRANT and REVOKE >>> command. >> >> I meant transaction management command such as PREPARE TRANSACTION and >> COMMIT/ABORT PREPARED command. >> The web page I refered might be wrong, sorry. >> Another approach I have is to push down DCL to only foreign servers that support 2PC protocol, which is similar to DML push down. This approach would be more simpler than current idea and is easy to use by distributed transaction manager. >>> >>> Again, can you please elaborate, how that would be different from the >>> current approach and how does it simplify the code. >>> >> >> The idea is just to push down PREPARE TRANSACTION, COMMIT/ROLLBACK >> PREPARED to foreign servers that support 2PC. >> With this idea, the client need to do following operation when foreign >> server is involved with transaction. >> >> BEGIN; >> UPDATE parent_table SET ...; -- update including foreign server >> PREPARE TRANSACTION 'xact_id'; >> COMMIT PREPARED 'xact_id'; >> >> The above PREPARE TRANSACTION and COMMIT PREPARED command are pushed >> down to foreign server. >> That is, the client needs to execute PREPARE TRANSACTION and >> >> In this idea, I think that we don't need to do followings, >> >> * Providing the prepare id of 2PC. >> Current patch adds new API prepare_id_provider() but we can use the >> prepare id of 2PC that is used on parent server. >> >> * Keeping track of status of foreign servers. >> Current patch keeps track of status of foreign servers involved with >> transaction but this idea is just to push down transaction management >> command to foreign server. >> So I think that we no longer need to do that. > >> COMMIT/ROLLBACK PREPARED explicitly. > > The problem with this approach is same as one previously stated. If > the connection between local and foreign server is lost between > PREPARE and COMMIT the prepared transaction on the foreign server > remains dangling, none other than the local server knows what to do > with it and the local server has lost track of the prepared > transaction on the foreign server. So, just pushing down those > commands doesn't work. Yeah, my idea is one of the first step. Mechanism that resolves the dangling foreign transaction and the resolver worker process are necessary. >> >> * Adding max_prepared_foreign_transactions parameter. >> It means that the number of transaction involving foreign server is >> the same as max_prepared_transactions. >> > > That isn't true exactly. max_prepared_foreign_transactions indicates > how many transactions can be prepared on the foreign server, which in > the method you propose should have a cap of max_prepared_transactions > * number of foreign servers. Oh, I understood, thanks. Consider sharding solution using postgres_fdw (that is, the parent postgres server has multiple shard postgres servers), we need to increase max_prepared_foreign_transactions whenever new shard server is added to cluster, or to allocate enough size in advance. But the estimation of enough max_prepared_foreign_transactions would not be easy, for example can we estimate it by (max throughput of the system) * (the number of foreign servers)? One new idea I came up with is that we set transaction id on parent server to global transaction id (gid) that is prepared on shard server. And pg_fdw_resolver worker process periodically resolves the dangling transaction on foreign server by comparing active lowest XID on parent server with the XID in gid used by PREPARE TRANSACTION. For example, suppose that there are one parent server and one shard server, and the client executes update transaction (XID = 100) involving foreign servers. In commit phase, parent server executes PR
Re: [HACKERS] Tracking wait event for latches
On Wed, Sep 28, 2016 at 9:39 AM, Thomas Munro wrote: > Ok, if they really are independent then shouldn't we take advantage of > that at call sites where we might be idle but we might also be waiting > for the network? For example we could do this: > > /* Sleep until something happens or we time out */ > WaitLatchOrSocket(MyLatch, wakeEvents, > MyProcPort->sock, sleeptime, > pq_is_send_pending() ? WAIT_CLIENT : > WAIT_ACTIVITY, > WE_WAL_SENDER_MAIN); Yes, we can do fancy things with this API. Extensions could do the same, the important thing being to have generic enough event categories (take class). > Isn't WE_WAL_SENDER_WAIT_WAL primarily WAIT_IPC, not WAIT_CLIENT? Or > perhaps "pq_is_send_pending() ? WAIT_CLIENT : WAIT_IPC". > > Actually, I'm still not sold on "Activity" and "Client". I think > "Idle" and "Network" would be better. Everybody knows intuitively > what "Idle" means. "Network" is better than "Client" because it > avoids confusion about user applications vs replication connections or > clients vs servers. Personally I am buying the argument of Robert instead. The class of an event means what it is waiting for, not in which state in it waiting for something. "Idle" means that the process *is* idle, not that it is waiting for something. > + > + > + Activity: The server process is waiting for some > + activity to happen on a socket. This is mainly used system > processes > + in their main processing loop. wait_event will > identify > + the type of activity waited for. > + > + > > "The server process is waiting for some activity to happen on a > socket." Not true: could be a latch, or both. > > I think what this category is really getting at is that the server > process is idle. Everything in it could otherwise go in the IPC or > Client categories on the basis that it's mainly waiting for a socket > or a latch, but we're deciding to separate the wait points > representing "idleness" and put them here. > > How about: "The server process is idle. This is used by system > processes waiting for activity in their main processing loop. > wait_event will identify the specific wait point." You have a better way to word things than I do. > + > + > + Client: The server process is waiting for some activity > + on a socket from a client process, and that the server expects > + something to happen that is independent from its internal > processes. > + wait_event will identify the type of client activity > + waited for. > + > + > > Is it worth spelling out that "client process" here doesn't just mean > user applications, it's also remote PostgreSQL servers doing > replication? "wait_event" doesn't really identify the type of client > activity waited for, it identifies the code that is waiting. Okay, switched to "user applications", and precised that this is a wait point that wait_event tracks. > + > + > + Extension: The server process is waiting for activity > + in a plugin or an extension. This category is useful for plugin > + and module developers to track custom waiting points. > + > + > > Plugin, extension, module? How about just "extension"? Why developers? Let's say "extension module", this is used in a couple of places in the docs. > + > + > + IPC: The server process is waiting for some activity > + from an auxilliary process. wait_event will identify > + the type of activity waited for. > + > + > > s/auxilliary/auxiliary/, but I wouldn't it be better to say something > more general like "from another process in the cluster"? Background > workers are not generally called auxiliary processes, and some of > these wait points are waiting for those. There was the same typo in latch.h, still "from another process" looks better. -- Michael diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c index 8ca1c1c..9222b73 100644 --- a/contrib/postgres_fdw/connection.c +++ b/contrib/postgres_fdw/connection.c @@ -17,6 +17,7 @@ #include "access/xact.h" #include "mb/pg_wchar.h" #include "miscadmin.h" +#include "pgstat.h" #include "storage/latch.h" #include "utils/hsearch.h" #include "utils/memutils.h" @@ -496,7 +497,9 @@ pgfdw_get_result(PGconn *conn, const char *query) wc = WaitLatchOrSocket(MyLatch, WL_LATCH_SET | WL_SOCKET_READABLE, PQsocket(conn), - -1L); + -1L, + WAIT_EXTENSION, + WE_EXTENSION); ResetLatch(MyLatch); CHECK_FOR_INTERRUPTS(); diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index f400785..bb975c1 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgm
Re: [HACKERS] Some information about the small portion of source code of postgreSQL
On 2016/09/28 13:30, Ashutosh Bapat wrote: > This mail looks exactly same as that from Shrinivas. Please check answers > there. Here is a link to the discussion thread that Ashutosh is referring to: https://www.postgresql.org/message-id/flat/CAFjFpRfxJJJGNhtQaS2CQ7Boyfo88nu-45JcNKeREUbQUPxOEw%40mail.gmail.com Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] should xlog_outdesc modify its argument?
On 09/28/2016 02:35 AM, Mark Dilger wrote: The function static void xlog_outdesc(StringInfo buf, XLogReaderState *record); in src/backend/access/transam/xlog.c is called by XLogInsertRecord, and after returning a string describing an XLogRecord, it clears the state data in its XLogReaderState argument. That mixes the read-only semantics of "give me a string that describes this argument" and the read-write semantics of "clear out the value in this argument". I don't see where the "clears the state data" is happening. Can you elaborate? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transactions involving multiple postgres foreign servers
On Wed, Sep 28, 2016 at 10:43 AM, Masahiko Sawada wrote: > On Tue, Sep 27, 2016 at 9:06 PM, Ashutosh Bapat > wrote: >> On Tue, Sep 27, 2016 at 2:54 PM, Masahiko Sawada >> wrote: >>> On Mon, Sep 26, 2016 at 9:07 PM, Ashutosh Bapat >>> wrote: On Mon, Sep 26, 2016 at 5:25 PM, Masahiko Sawada wrote: > On Mon, Sep 26, 2016 at 7:28 PM, Ashutosh Bapat > wrote: >> My original patch added code to manage the files for 2 phase >> transactions opened by the local server on the remote servers. This >> code was mostly inspired from the code in twophase.c which manages the >> file for prepared transactions. The logic to manage 2PC files has >> changed since [1] and has been optimized. One of the things I wanted >> to do is see, if those optimizations are applicable here as well. Have >> you considered that? >> >> > > Yeah, we're considering it. > After these changes are committed, we will post the patch incorporated > these changes. > > But what we need to do first is the discussion in order to get consensus. > Since current design of this patch is to transparently execute DCL of > 2PC on foreign server, this code changes lot of code and is > complicated. Can you please elaborate. I am not able to understand what DCL is involved here. According to [1], examples of DCL are GRANT and REVOKE command. >>> >>> I meant transaction management command such as PREPARE TRANSACTION and >>> COMMIT/ABORT PREPARED command. >>> The web page I refered might be wrong, sorry. >>> > Another approach I have is to push down DCL to only foreign servers > that support 2PC protocol, which is similar to DML push down. > This approach would be more simpler than current idea and is easy to > use by distributed transaction manager. Again, can you please elaborate, how that would be different from the current approach and how does it simplify the code. >>> >>> The idea is just to push down PREPARE TRANSACTION, COMMIT/ROLLBACK >>> PREPARED to foreign servers that support 2PC. >>> With this idea, the client need to do following operation when foreign >>> server is involved with transaction. >>> >>> BEGIN; >>> UPDATE parent_table SET ...; -- update including foreign server >>> PREPARE TRANSACTION 'xact_id'; >>> COMMIT PREPARED 'xact_id'; >>> >>> The above PREPARE TRANSACTION and COMMIT PREPARED command are pushed >>> down to foreign server. >>> That is, the client needs to execute PREPARE TRANSACTION and >>> >>> In this idea, I think that we don't need to do followings, >>> >>> * Providing the prepare id of 2PC. >>> Current patch adds new API prepare_id_provider() but we can use the >>> prepare id of 2PC that is used on parent server. >>> >>> * Keeping track of status of foreign servers. >>> Current patch keeps track of status of foreign servers involved with >>> transaction but this idea is just to push down transaction management >>> command to foreign server. >>> So I think that we no longer need to do that. >> >>> COMMIT/ROLLBACK PREPARED explicitly. >> >> The problem with this approach is same as one previously stated. If >> the connection between local and foreign server is lost between >> PREPARE and COMMIT the prepared transaction on the foreign server >> remains dangling, none other than the local server knows what to do >> with it and the local server has lost track of the prepared >> transaction on the foreign server. So, just pushing down those >> commands doesn't work. > > Yeah, my idea is one of the first step. > Mechanism that resolves the dangling foreign transaction and the > resolver worker process are necessary. > >>> >>> * Adding max_prepared_foreign_transactions parameter. >>> It means that the number of transaction involving foreign server is >>> the same as max_prepared_transactions. >>> >> >> That isn't true exactly. max_prepared_foreign_transactions indicates >> how many transactions can be prepared on the foreign server, which in >> the method you propose should have a cap of max_prepared_transactions >> * number of foreign servers. > > Oh, I understood, thanks. > > Consider sharding solution using postgres_fdw (that is, the parent > postgres server has multiple shard postgres servers), we need to > increase max_prepared_foreign_transactions whenever new shard server > is added to cluster, or to allocate enough size in advance. But the > estimation of enough max_prepared_foreign_transactions would not be > easy, for example can we estimate it by (max throughput of the system) > * (the number of foreign servers)? > > One new idea I came up with is that we set transaction id on parent > server to global transaction id (gid) that is prepared on shard > server. > And pg_fdw_resolver worker process periodically resolves the dangling > transaction on foreign server by comparing active lowest XID on parent > server with the XID in gid used by PREPARE TRANSACTION. > > For exa
Re: [HACKERS] Supporting huge pages on Windows
From: Thomas Munro [mailto:thomas.mu...@enterprisedb.com] > > huge_pages=off: 70412 tps > > huge_pages=on : 72100 tps > > Hmm. I guess it could be noise or random code rearrangement effects. I'm not the difference was a random noise, because running multiple set of three runs of pgbench (huge_pages = on, off, on, off, on...) produced similar results. But I expected a bit greater improvement, say, +10%. There may be better benchmark model where the large page stands out, but I think pgbench is not so bad because its random data access would cause TLB cache misses. > I saw your recent post[2] proposing to remove the sentence about the 512MB > effective limit and I wondered why you didn't go to larger sizes with a > larger database and more run time. But I will let others with more > benchmarking experience comment on the best approach to investigate Windows > shared_buffers performance. Yes, I could have gone to 8GB of shared_buffers because my PC has 16GB of RAM, but I felt the number of variations was sufficient. Anyway, positive comments on benchmarking would be appreciated. Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tracking wait event for latches
On Wed, Sep 28, 2016 at 6:25 PM, Michael Paquier wrote: > wait-event-set-v8.patch Ok, I'm just about ready to mark this as 'Ready for Committer'. Just a couple of things: + pgstat_report_wait_start((uint8) classId, (uint16) eventId); Unnecessary casts. + + Client + SecureRead + Waiting to read data from a secure connection. + + + SecureWrite + Waiting to write data to a secure connection. + I think we want to drop the word 'secure' from the description lines in this patch. Then I think we plan to make a separate patch that will rename the functions themselves and the corresponding wait points to something more generic? I'm assuming that my suggestions for making WE_WAL_SENDER_WAIT_WAL and WE_WAL_SENDER_MAIN have a dynamically chosen class ID would also be material for another patch, but it doesn't matter much because those processes won't show up yet anyway. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers