[HACKERS] make default TABLESPACE belong to target table.
Dear pgsql community, I've been using postgres for a long time. Recently I'm doing table sharding over a bunch of pgsql instances. I'm using multiple tablespaces one per disk to utilize all the IO bandwidth. Things went on pretty well, however there is a troublesome problem I have when adding auxiliary structures to sharded tables, such as Indexes. These objects have their storage default to the database's tablespace, and it's difficult to shard them by hand. I'd like to implement this small feature --- making table's auxiliary structures store their data to the target table's tablespace by default. I've done a thorough search over the mailing list and there is nothing relevant. Well I may miss some corners :-) What do you think? Regards, Amos -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel execution and prepared statements
There has been a previous discussion about this topic, including an attempted fix by Amit Kapila: http://www.postgresql.org/message-id/flat/CAA4eK1L=tHmmHDK_KW_ja1_dusJxJF+SGQHi=aps4mdnpk7...@mail.gmail.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Forbid use of LF and CR characters in database and role names
On Fri, Nov 25, 2016 at 4:02 PM, Ideriha, Takeshi wrote: > I applied your fixed patch and new one, and confirmed the applied source > passed the tests successfully. And I also checked manually the error messages > were emitted successfully when cr/lf are included in dbname or rolename or > data_directory. > > AFAICT, this patch satisfies the concept discussed before. So I’ve switched > this patch “Ready for Committer”. Thanks for the review, Ideriha-san. (See you next week perhaps?) -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] confusing checkpoint_flush_after / bgwriter_flush_after
Hi, while doing some benchmarking, I've once again got confused by the default settings for checkpoint_flush_after and bgwriter_flush_after. The sample config says this: #checkpoint_flush_after = 0 # 0 disables, # default is 256kB on linux, 0 otherwise #bgwriter_flush_after = 0 # 0 disables, # default is 512kB on linux, 0 otherwise I find this pretty confusing, because for all other GUCs in the file, the commented-out value is the default one. In this case that would mean "0", disabling the flushing. But in practice we use platform-dependent defaults - 256/512K on Linux, 0 otherwise. There are other GUCs where the default is platform-specific, but none of them suggests "disabled" is the default state. While the 9.6 cat is out of the bag, I think we can fix this quite easily - use "-1" to specify the default value should be used, and use that in the sample file. This won't break any user configuration. If that's considered not acceptable, perhaps we should at least improve the comments, so make this clearer. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Forbid use of LF and CR characters in database and role names
> > [Summary] > > 1. apply patch and make world > > -> failed because was mistakenly coded . > > > > 2.correct this mistake and make check-world > > -> got 1 failed test: "'pg_dumpall with \n\r in database name'" > > because test script cannot createdb "foo\n\rbar" > > The attached version addresses those problems. I have replaced the test in > src/bin/pg_dump/t/ by tests in src/bin/scripts/t/ to check if the role name > and database name with CR or LF fail to be created. I have as well added a > test > for initdb when the data directory has an incorrect character in 0002. Thanks for your modification. I applied your fixed patch and new one, and confirmed the applied source passed the tests successfully. And I also checked manually the error messages were emitted successfully when cr/lf are included in dbname or rolename or data_directory. AFAICT, this patch satisfies the concept discussed before. So I’ve switched this patch “Ready for Committer”. Regards, Ideriha, Takeshi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: function xmltable
2016-11-25 7:44 GMT+01:00 Pavel Stehule : > > > 2016-11-25 3:31 GMT+01:00 Alvaro Herrera : > >> Michael Paquier wrote: >> >> > Nit: I did not look at the patch in details, >> > but I find the size of the latest version sent, 167kB, scary as it >> > complicates review and increases the likeliness of bugs. >> >> Here's the stat. Note that removing the functionality as discussed >> would remove all of xpath_parser.c but I think the rest of it remains >> pretty much unchanged. So it's clearly a large patch, but there are >> large docs and tests too, not just code. >> > > yes, lot of is regress tests (expected part is 3x) - and XMLTABLE function > is not trivial. > regress tests are about 50% > > lot of code is mechanical - nodes related. The really complex part is only > in xml.c. There is one longer function only - the complexity is based on > mapping libxml2 result to PostgreSQL result (with catching exceptions due > releasing libxml2 sources). > > The all changes are well isolated - there is less risk to break some > other. > > >> >> doc/src/sgml/func.sgml | 376 ++--- >> src/backend/executor/execQual.c | 335 +++ >> src/backend/executor/execTuples.c| 42 +++ >> src/backend/nodes/copyfuncs.c| 66 >> src/backend/nodes/equalfuncs.c | 51 +++ >> src/backend/nodes/nodeFuncs.c| 100 ++ >> src/backend/nodes/outfuncs.c | 51 +++ >> src/backend/nodes/readfuncs.c| 42 +++ >> src/backend/optimizer/util/clauses.c | 33 ++ >> src/backend/parser/gram.y| 181 ++- >> src/backend/parser/parse_coerce.c| 33 +- >> src/backend/parser/parse_expr.c | 182 +++ >> src/backend/parser/parse_target.c| 7 + >> src/backend/utils/adt/Makefile | 2 +- >> src/backend/utils/adt/ruleutils.c| 100 ++ >> src/backend/utils/adt/xml.c | 610 ++ >> + >> src/backend/utils/adt/xpath_parser.c | 337 +++ >> src/backend/utils/fmgr/funcapi.c | 13 + >> src/include/executor/executor.h | 1 + >> src/include/executor/tableexpr.h | 69 >> src/include/funcapi.h| 1 - >> src/include/nodes/execnodes.h| 31 ++ >> src/include/nodes/nodes.h| 4 + >> src/include/nodes/parsenodes.h | 21 ++ >> src/include/nodes/primnodes.h| 40 +++ >> src/include/parser/kwlist.h | 3 + >> src/include/parser/parse_coerce.h| 4 + >> src/include/utils/xml.h | 2 + >> src/include/utils/xpath_parser.h | 24 ++ >> src/test/regress/expected/xml.out| 415 >> src/test/regress/expected/xml_1.out | 323 +++ >> src/test/regress/expected/xml_2.out | 414 >> src/test/regress/sql/xml.sql | 170 ++ >> 33 files changed, 4019 insertions(+), 64 deletions(-) >> >> -- >> Álvaro Herrerahttps://www.2ndQuadrant.com/ >> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >> > >
Re: [HACKERS] patch: function xmltable
2016-11-25 3:31 GMT+01:00 Alvaro Herrera : > Michael Paquier wrote: > > > Nit: I did not look at the patch in details, > > but I find the size of the latest version sent, 167kB, scary as it > > complicates review and increases the likeliness of bugs. > > Here's the stat. Note that removing the functionality as discussed > would remove all of xpath_parser.c but I think the rest of it remains > pretty much unchanged. So it's clearly a large patch, but there are > large docs and tests too, not just code. > yes, lot of is regress tests (expected part is 3x) - and XMLTABLE function is not trivial. lot of code is mechanical - nodes related. The really complex part is only in xml.c. There is one longer function only - the complexity is based on mapping libxml2 result to PostgreSQL result (with catching exceptions due releasing libxml2 sources). The all changes are well isolated - there is less risk to break some other. > > doc/src/sgml/func.sgml | 376 ++--- > src/backend/executor/execQual.c | 335 +++ > src/backend/executor/execTuples.c| 42 +++ > src/backend/nodes/copyfuncs.c| 66 > src/backend/nodes/equalfuncs.c | 51 +++ > src/backend/nodes/nodeFuncs.c| 100 ++ > src/backend/nodes/outfuncs.c | 51 +++ > src/backend/nodes/readfuncs.c| 42 +++ > src/backend/optimizer/util/clauses.c | 33 ++ > src/backend/parser/gram.y| 181 ++- > src/backend/parser/parse_coerce.c| 33 +- > src/backend/parser/parse_expr.c | 182 +++ > src/backend/parser/parse_target.c| 7 + > src/backend/utils/adt/Makefile | 2 +- > src/backend/utils/adt/ruleutils.c| 100 ++ > src/backend/utils/adt/xml.c | 610 ++ > + > src/backend/utils/adt/xpath_parser.c | 337 +++ > src/backend/utils/fmgr/funcapi.c | 13 + > src/include/executor/executor.h | 1 + > src/include/executor/tableexpr.h | 69 > src/include/funcapi.h| 1 - > src/include/nodes/execnodes.h| 31 ++ > src/include/nodes/nodes.h| 4 + > src/include/nodes/parsenodes.h | 21 ++ > src/include/nodes/primnodes.h| 40 +++ > src/include/parser/kwlist.h | 3 + > src/include/parser/parse_coerce.h| 4 + > src/include/utils/xml.h | 2 + > src/include/utils/xpath_parser.h | 24 ++ > src/test/regress/expected/xml.out| 415 > src/test/regress/expected/xml_1.out | 323 +++ > src/test/regress/expected/xml_2.out | 414 > src/test/regress/sql/xml.sql | 170 ++ > 33 files changed, 4019 insertions(+), 64 deletions(-) > > -- > Álvaro Herrerahttps://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >
Re: [HACKERS] Broken SSL tests in master
On 11/25/2016 07:19 AM, Tsunakawa, Takayuki wrote: Specifying multiple hosts is a new feature to be introduced in v10, so that's here: https://www.postgresql.org/docs/devel/static/libpq-connect.html Thanks, I had missed that patch. If we add support for multiple hosts I think we should also allow for specifying the corresponding multiple hostaddrs. Another thought about this code: should we not check if it is a unix socket first before splitting the host? While I doubt that it is common to have a unix socket in a directory with comma in the path it is a bit annoying that we no longer support this. Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Broken SSL tests in master
From: pgsql-hackers-ow...@postgresql.org > sense to add support for multiple hostaddrs. For consitency's sake if > nothing else. Yes, consistency and performance. The purpose of hostaddr is to speed up connection by eliminating DNS lookup, isn't it? Then, some users should want to specify multiple IP addresses for hostaddr and omit host. > By the way is comma separated hosts documented somewhere? It is not included > in > https://www.postgresql.org/docs/9.6/static/libpq-connect.html#LIBPQ-PA > RAMKEYWORDS. Specifying multiple hosts is a new feature to be introduced in v10, so that's here: https://www.postgresql.org/docs/devel/static/libpq-connect.html Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Random PGDLLIMPORTing
Craig Ringer writes: > PGDLLIMPORT is free, so the question should be "is there a reason not > to add it here?". TBH, my basic complaint about it is that I do not like Microsoft's tool chain assuming that it's entitled to demand that people sprinkle Microsoft-specific droppings throughout what would otherwise be platform independent source code. However, Victor Wagner's observation upthread is quite troubling: >> It worth checking actual variable definitions, not just declarations. >> I've found recently, that at least in MSVC build system, only >> initialized variables are included into postgres.def file, and so are >> actually exported from the backend binary. If this is correct (don't ask me, I don't do Windows) then the issue is not whether "PGDLLIMPORT is free". This puts two separate source-code demands on variables that we want to make available to extensions, neither of which is practically checkable on non-Windows platforms. I think that basically it's going to be on the heads of people who want to work on Windows to make sure that things work on that platform. That is the contract that every other platform under the sun understands, but it seems like Windows people think it's on the rest of us to make their platform work. I'm done with that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] IF (NOT) EXISTS in psql-completion
Hello, At Fri, 25 Nov 2016 06:51:43 +0100, Pavel Stehule wrote in > I am sure about benefit of all patches - but it is lot of changes in one > moment, and it is not necessary in this moment. > > patches 0004 and 0005 does some bigger mental changes, and the work can be > separated. The patches are collestions of sporadic changes on the same basis. I agree that the result is too big too look at. (And the code itself is confused) Please wait for a while for separated patches. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Broken SSL tests in master
On Fri, Nov 25, 2016 at 10:41 AM, Tsunakawa, Takayuki < tsunakawa.ta...@jp.fujitsu.com> wrote: > I agree that pg_conn_host should have hostaddr in addition to host, and PQhost() return host when host is specified with/without hostaddr specified. typedef struct pg_conn_host +{ *+ char *host; /* host name or address, or socket path */* *+ pg_conn_host_type type; /* type of host */* + char *port; /* port number for this host; if not NULL, + * overrrides the PGConn's pgport */ + char *password; /* password for this host, read from the + * password file. only set if the PGconn's + * pgpass field is NULL. */ + struct addrinfo *addrlist; /* list of possible backend addresses */ +} pg_conn_host; +typedef enum pg_conn_host_type +{ + CHT_HOST_NAME, + CHT_HOST_ADDRESS, + CHT_UNIX_SOCKET +} pg_conn_host_type; host parameter stores both hostname and hostaddr, and we already have parameter "type" to identify same. I think we should not be using PQHost() directly in verify_peer_name_matches_certificate_name (same holds good for GSS, SSPI). Instead proceed only if "conn->connhost[conn->whichhost]" is a "CHT_HOST_NAME". Also further old PQHost() did not produce CHT_HOST_ADDRESS as its output so we might need to revert back to old behaviour. >However, I wonder whether the hostaddr parameter should also accept multiple IP addresses. Currently, it accepts only one address as follows. I >asked Robert and Mithun about this, but I forgot about that. As far as I know only pghost allowed to have multiple host. And, pghostaddr takes only one numeric address. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] UNDO and in-place update
2016-11-25 1:44 GMT+01:00 Robert Haas : > On Thu, Nov 24, 2016 at 6:20 PM, Pavel Stehule > wrote: > >> I think that the whole emphasis on whether and to what degree this is > >> like Oracle is somewhat misplaced. I would look at it a different > >> way. We've talked many times over the years about how PostgreSQL is > >> optimized for aborts. Everybody that I've heard comment on that issue > >> thinks that is a bad thing. > > > > > > again this depends on usage - when you have a possibility to run VACUUM, > > then this strategy is better. > > > > The fast aborts is one pretty good feature for stable usage. > > > > Isn't possible to use UNDO log (or something similar) for VACUUM? > ROLLBACK > > should be fast, but > > VACUUM can do more work? > > I think that in this design we wouldn't use VACUUM at all. However, > if what you are saying is that we should try to make aborts > near-instantaneous by pushing UNDO actions into the background, I > agree entirely. InnoDB already does that, IIUC. > ok, it can be another process - that can be more aggressive and less limited than vacuum. Regards Pavel > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
Re: [HACKERS] Broken SSL tests in master
On 11/25/2016 06:11 AM, Tsunakawa, Takayuki wrote: However, I wonder whether the hostaddr parameter should also accept multiple IP addresses. Yeah, I too thought about if we should fix that. I feel like it would make sense to add support for multiple hostaddrs. For consitency's sake if nothing else. By the way is comma separated hosts documented somewhere? It is not included in https://www.postgresql.org/docs/9.6/static/libpq-connect.html#LIBPQ-PARAMKEYWORDS. Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] IF (NOT) EXISTS in psql-completion
2016-11-25 2:24 GMT+01:00 Kyotaro HORIGUCHI : > Hello, > > Thank you for looking this long-and-bothersome patch. > > > At Wed, 23 Nov 2016 07:12:00 +0100, Pavel Stehule > wrote in vd9fgpsgv...@mail.gmail.com> > > Hi > > > > 2016-11-15 12:26 GMT+01:00 Kyotaro HORIGUCHI < > horiguchi.kyot...@lab.ntt.co. > > jp>: > > > > > Hello, I rebased this patch on the current master. > > > > > > At Mon, 31 Oct 2016 10:15:48 +0900 (Tokyo Standard Time), Kyotaro > > > HORIGUCHI wrote in < > > > 20161031.101548.162143279.horiguchi.kyot...@lab.ntt.co.jp> > > > > Anyway, I fixed space issues and addressed the > > > > COMPLETE_WITH_QUERY()'s almost unused parameter problem. And > > > > tried to write a README files. > > > > > > tab-complete.c have gotten some improvements after this time. > > > > > > 577f0bdd2b8904cbdfde6c98f4bda6fd93a05ffc psql: Tab completion for > > > renaming enum values. > > > 927d7bb6b120a2ca09a164898f887eb850b7a329 Improve tab completion for > > > CREATE TRIGGER. > > > 1d15d0db50a5f39ab69c1fe60f2d5dcc7e2ddb9c psql: Tab-complete LOCK > [TABLE] > > > ... IN {ACCESS|ROW|SHARE}. > > > > > > The attached patchset is rebsaed on the master including these > > > patches. > > > > > > > I checked patches 0001, 0002, 0003 patches > > > > There are no any problems with patching, compiling, it is working as > > expected > > > > These patches can be committed separately - they are long, but the code > is > > almost mechanical > > Thanks. > > You're right. I haven't consider about relations among them. > I am sure about benefit of all patches - but it is lot of changes in one moment, and it is not necessary in this moment. patches 0004 and 0005 does some bigger mental changes, and the work can be separated. Regards Pavel > > > 0001 (if-else refactoring) does not anyting functionally. It is > required by 0004(word-shift-and-removal) and 0005(if-not-exists). > > 0002 (keywords case improvement) is almost independent from all > other patches in this patch set. And it brings an obvious > improvement. > > 0003 (addition to 0002) is move embedded keywords out of defined > queries. Functionally can be united to 0002 but separated for > understandability > > 0004 (word-shift-and-removal) is quite arguable one. This > introduces an ability to modify (or destroy) previous_words > array. This reduces almost redundant matching predicates such as, > > > if (TailMatches3("CREATE|UNIQUE", "INDEX", MatchAny) || > > TailMatches4("CREATE|UNIQUE", "INDEX", "CONCURRENTLY", MatchAny)) > > into > > > if (Matches3("CREATE", "INDEX", MatchAnyExcept("ON"))) > > by removing "CONCURRENTLY". This obviously simplifies the > predicates literally but it the code implies history of > modification. The implied history might be worse than the > previous shape, especially for the simple cases like this. For a > complex case of CREATE TRIGGER, it seems worse than the original > shape... I'll consider this a bit more. Maybe match-and-collapse > template should be written a predicate. > > 0005 (if-not-exists). I have admit that this is arguable > feature... > > 0006 is the terder point:( but. > > > The README is perfect > > Thank you, I'm relieved by hearing that. > > regards, > > -- > Kyotaro Horiguchi > NTT Open Source Software Center > > >
Re: [HACKERS] Broken SSL tests in master
From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Andreas Karlsson > On 11/24/2016 10:38 PM, Andreas Karlsson wrote: > > To me it feels like the proper fix would be to make PQHost() return > > the value of the host parameter rather than the hostaddr (maybe add a > > new field in the pg_conn_host struct). But would be a behaviour change > > which might break someones application. Thoughts? > > I have attached a proof of concept patch for this. Remaining work is > investigating all the callers of PQhost() and see if any of them are > negatively affected by this patch and cleaning up the code some. I agree that pg_conn_host should have hostaddr in addition to host, and PQhost() return host when host is specified with/without hostaddr specified. However, I wonder whether the hostaddr parameter should also accept multiple IP addresses. Currently, it accepts only one address as follows. I asked Robert and Mithun about this, but I forgot about that. static bool connectOptions2(PGconn *conn) { /* * Allocate memory for details about each host to which we might possibly * try to connect. If pghostaddr is set, we're only going to try to * connect to that one particular address. If it's not, we'll use pghost, * which may contain multiple, comma-separated names. */ Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan
Amit Kapila writes: > On Fri, Nov 25, 2016 at 3:26 AM, Andreas Seltenreich > wrote: >> just caught another InitPlan below Gather with the recent patches in >> (master as of 4cc6a3f). Recipe below. > I think this problem exists since commit > 110a6dbdebebac9401b43a8fc223e6ec43cd4d10 where we have allowed > subqueries to be pushed to parallel workers. The impression I got in poking at this for a few minutes, before going off to stuff myself with turkey, was that we were allowing a SubqueryScanPath to mark itself as parallel-safe even though the resulting plan node would have an InitPlan attached. So my thought about fixing it was along the lines of if-subroot-contains-initplans- then-dont-let-SubqueryScanPath-be-parallel-safe. What you propose here seems like it would shut off parallel query in more cases than that. But I'm still full of turkey and may be missing something. There's another issue here, which is that the InitPlan is derived from an outer join qual whose outer join seems to have been deleted entirely by analyzejoins.c. Up to now it was possible to avert our eyes from the fact that join removal is lazy about getting rid of unused InitPlans, but if they're going to disable parallelization of the surrounding query, maybe we need to work harder on that. Another question worth asking is whether it was okay for the subquery to decide to use a Gather. Are we OK with multiple Gathers per plan tree, independently of the points above? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
> > I assume you meant "...right after the column name"? > > I will modify the grammar to allow that way then, so that the following > will work: > > create table p1 partition of p ( > a primary key > ) for values in (1); > That seems to be non-intuitive as well. The way it's written it looks like "a primary key" is associated with p rather than p1. Is there any column constraint that can not be a table constraint? If no, then we can think of dropping column constraint syntax all together and let the user specify column constraints through table constraint syntax. OR we may drop constraints all-together from the "CREATE TABLE .. PARTITION OF" syntax and let user handle it through ALTER TABLE commands. In a later version, we will introduce constraint syntax in that DDL. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan
On Fri, Nov 25, 2016 at 3:26 AM, Andreas Seltenreich wrote: > Hi, > > just caught another InitPlan below Gather with the recent patches in > (master as of 4cc6a3f). Recipe below. > I think this problem exists since commit 110a6dbdebebac9401b43a8fc223e6ec43cd4d10 where we have allowed subqueries to be pushed to parallel workers. I think we should consider rel (where rtekind is RTE_SUBQUERY) to be parallel safe if the subquery is also parallel safe. Attached patch does that and fixes the reported problem for me. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com allow_safe_subquery_parallel_worker_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Broken SSL tests in master
On Fri, Nov 25, 2016 at 8:15 AM, Andreas Karlsson wrote: > On 11/24/2016 10:38 PM, Andreas Karlsson wrote: >> To me it feels like the proper fix would be to make PQHost() return the >> value of the host parameter rather than the hostaddr (maybe add a new >> field in the pg_conn_host struct). But would be a behaviour change which >> might break someones application. Thoughts? Thanks for digging up the root cause. I can see the problem with HEAD as well. > I have attached a proof of concept patch for this. Remaining work is > investigating all the callers of PQhost() and see if any of them are > negatively affected by this patch and cleaning up the code some. This needs some thoughts, but first I need to understand the whereabouts of this refactoring work in 9a1d0af4 as well as the structures that 274bb2b3 has introduced. From what I can see you are duplicating some logic parsing the pghost string when there is a pghostaddr set, so my first guess is that you are breaking some of the intentions behind this code by patching it this way... I am adding in CC Robert, Mithun and Tsunakawa-san who worked on that. On my side, I'll need some time to study and understand this new code, that's necessary before looking at your patch in details, there could be a more elegant solution. And we had better address this issue before looking more into the SSL reload patch. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
Amit Langote wrote: > On 2016/11/25 4:36, Alvaro Herrera wrote: > > I think CREATE TABLE OF is pretty much a corner case. I agree that > > allowing the constraint right after the constraint name is more > > intuitive. > > I assume you meant "...right after the column name"? Eh, right. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Typo in comment
Hi Here is a tiny patch to fix a typo in execParallel.c. -- Thomas Munro http://www.enterprisedb.com typo.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Thu, Nov 24, 2016 at 6:13 AM, Amit Langote wrote: > On 2016/11/23 4:50, Robert Haas wrote: >> On Tue, Nov 22, 2016 at 4:15 AM, Amit Langote >> wrote: The easiest thing to do might be to just enforce that all of the partition key columns have to be not-null when the range-partitioned table is defined, and reject any attempt to DROP NOT NULL on them later. That's probably better that shoehorning it into the table constraint. >>> >>> Agreed that forcing range partitioning columns to be NOT NULL during table >>> creation would be a better approach. But then we would have to reject >>> using expressions in the range partition key, right? >> >> Why? > > I was thinking of it like how primary key columns cannot contain > expressions; the reason for which, I assume, is because while we can > ensure that a column contains only non-null values by defining a > constraint on the column, there is no way to force expressions to be non-null. > > In any case, I have implemented in the latest patch that when creating a > range partitioned table, its partition key columns are automatically set > to be NOT NULL. Although, it leaves out the columns that are referenced > in expressions. So even after doing so, we need to check after computing > the range partition key of a input row, that none of the partitions keys > is null, because expressions can still return null. Right. And ensuring that those columns were NOT NULL would be wrong, as it wouldn't guarantee a non-null result anyway. > Also, it does nothing to help the undesirable situation that one can > insert a row with a null partition key (expression) into any of the range > partitions if targeted directly, because of how ExecQual() handles > nullable constraint expressions (treats null value as satisfying the > partition constraint). That's going to have to be fixed somehow. How bad would it be if we passed ExecQual's third argument as false for partition constraints? Or else you could generate the actual constraint as expr IS NOT NULL AND expr >= lb AND expr < ub. > An alternative possibly worth considering might be to somehow handle the > null range partition keys within the logic to compare against range bound > datums. It looks like other databases will map the rows containing nulls > to the unbounded partition. One database allows specifying NULLS > FIRST/LAST and maps a row containing null key to the partition with > -infinity as the lower bound or +infinity as the upper bound, respectively > with NULLS LAST the default behavior. It seems more future-proof not to allow NULLs at all for now, and figure out what if anything we want to do about that later. I mean, with the syntax we've got here, anything else is basically deciding whether NULL is the lowest value or the highest value. It would be convenient for my employer if we made the same decision that Oracle did, here, but it doesn't really seem like the PostgreSQL way - or to put that another way, it's really ugly and unprincipled. So I recommend we decide for now that a partitioning column can't be null and a partitioning expression can't evaluate to NULL. If it does, ERROR. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: function xmltable
Michael Paquier wrote: > Nit: I did not look at the patch in details, > but I find the size of the latest version sent, 167kB, scary as it > complicates review and increases the likeliness of bugs. Here's the stat. Note that removing the functionality as discussed would remove all of xpath_parser.c but I think the rest of it remains pretty much unchanged. So it's clearly a large patch, but there are large docs and tests too, not just code. doc/src/sgml/func.sgml | 376 ++--- src/backend/executor/execQual.c | 335 +++ src/backend/executor/execTuples.c| 42 +++ src/backend/nodes/copyfuncs.c| 66 src/backend/nodes/equalfuncs.c | 51 +++ src/backend/nodes/nodeFuncs.c| 100 ++ src/backend/nodes/outfuncs.c | 51 +++ src/backend/nodes/readfuncs.c| 42 +++ src/backend/optimizer/util/clauses.c | 33 ++ src/backend/parser/gram.y| 181 ++- src/backend/parser/parse_coerce.c| 33 +- src/backend/parser/parse_expr.c | 182 +++ src/backend/parser/parse_target.c| 7 + src/backend/utils/adt/Makefile | 2 +- src/backend/utils/adt/ruleutils.c| 100 ++ src/backend/utils/adt/xml.c | 610 +++ src/backend/utils/adt/xpath_parser.c | 337 +++ src/backend/utils/fmgr/funcapi.c | 13 + src/include/executor/executor.h | 1 + src/include/executor/tableexpr.h | 69 src/include/funcapi.h| 1 - src/include/nodes/execnodes.h| 31 ++ src/include/nodes/nodes.h| 4 + src/include/nodes/parsenodes.h | 21 ++ src/include/nodes/primnodes.h| 40 +++ src/include/parser/kwlist.h | 3 + src/include/parser/parse_coerce.h| 4 + src/include/utils/xml.h | 2 + src/include/utils/xpath_parser.h | 24 ++ src/test/regress/expected/xml.out| 415 src/test/regress/expected/xml_1.out | 323 +++ src/test/regress/expected/xml_2.out | 414 src/test/regress/sql/xml.sql | 170 ++ 33 files changed, 4019 insertions(+), 64 deletions(-) -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support
On Wed, Nov 23, 2016 at 12:53 PM, Tom Lane wrote: > Haribabu Kommi writes: > > On Wed, Nov 23, 2016 at 1:42 AM, Tom Lane wrote: > >> The precedent of int4/int8/float4/float8 is that SQL data types should > >> be named after their length in bytes. So I'd be inclined to call this > >> "macaddr8" not "macaddr64". That would suggest taking the simple > >> approach of always storing values in the 8-byte format, rather than > >> dealing with the complexities of having two formats internally, two > >> display formats, etc. > > > Do you prefer the automatic conversion from 6 byte to 8 byte MAC address, > > This way it takes extra space with new datatype. Is it fine to with new > > datatype? > > Well, I think the space savings would be pretty illusory. If we use a > varlena datatype, then old-style MAC addresses would take 7 bytes and > new-style would take 9. That's not much of an improvement over always > taking 8 bytes. What's worse, if the next field has an alignment > requirement more than char, is that it's really 8 bytes and 12 bytes (or > 16!), making this strictly worse than a fixed-length-8-bytes approach. > > As against that, if we use a varlena type then we'd have some protection > against the day when the MAC people realize they were still being > short-sighted and go up to 10 or 12 or 16 bytes. But even if that happens > while Postgres is still in use, I'm not sure that we'd choose to make use > of the varlena aspect rather than invent a third datatype to go with that > new version of the standard. Per the discussion in this thread, varlena > storage in itself doesn't do very much for the client-side compatibility > issues. Making a new datatype with a new, well-defined I/O behavior > ensures that applications don't get blindsided by a new behavior they're > not ready for. > > In short, I'm leaning to having just a fixed-length-8-byte implementation. > This may seem like learning nothing from our last go-round, but the > advantages of varlena are very far in the hypothetical future, and the > disadvantages are immediate. > > Also, if we define macaddr as "always 6 bytes" and macaddr8 as "always 8 > bytes", then there's a very simple way for users to widen an old-style > address to 8 bytes or convert it back to the 6-byte format: just cast > to the other datatype. If the new macaddr type can store both widths > then you need to invent at least one additional function to provide > those behaviors. > Thanks for your feedback. Here is attached updated patch with new datatype "macaddr8" with fixed length of 8 bytes. If your input a 6 byte MAC address, it automatically converts it into an 8 byte MAC address by adding the reserved keywords and store it as an 8 byte address. While sending it to client it always send an 8 byte MAC address. Currently the casting is supported from macaddr to macaddr8, but not the other way. This is because, not all 8 byte MAC addresses can be converted into 6 byte addresses. Test and doc changes are also added. comments? Regards, Hari Babu Fujitsu Australia mac_eui64_support_2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DISTINCT with btree skip scan
On Wed, Nov 23, 2016 at 4:19 PM, Thomas Munro wrote: > On Wed, Oct 12, 2016 at 4:19 PM, Thomas Munro > wrote: >> Here it is, now split into two parts: one patch >> to add an amskip operation, and another to consider using it to >> implement DISTINCT when it was already has an index only scan path on >> an index that supports skipping. > > Those patches add an amskip operation and then teach the planner and > executor how to do this, given a table foo (a, b) with an index on (a) > or (a, b): > > SELECT DISTINCT foo FROM a > > Index Only Scan for distinct values of (a) using foo_pkey on foo Cool! > In just the right circumstances that is vastly faster than scanning > every tuple and uniquifying. I was thinking that the next step in > this experiment might be to introduce a kind of plan that can handle > queries where the index's leading column is not in the WHERE clause, > building on the above plan, and behaving conceptually a little bit > like a Nested Loop, like so: > > SELECT * FROM foo WHERE b = 42 > > Index Skip Scan > -> Index Only Scan for distinct values of (a) using foo_pkey on foo > -> Index Only Scan using foo_pkey on foo > Index Cond: ((a = outer.a) AND (b = 42)) Blech, that seems really ugly and probably inefficient. > I suppose you could instead have a single node that knows how to > perform the whole scan itself so that you don't finish up searching > for each distinct value in both the inner and outer subplan, but it > occurred to me that building the plan out of Lego bricks like this > might generalise better to more complicated queries. I suppose it > might also parallelise nicely with a Parallel Index Only Scan as the > outer plan. I think putting all the smarts in a single node is likely to be better, faster, and cleaner. This patch has gotten favorable comments from several people, but somebody really needs to REVIEW it. Oh, well, since I'm here... +if (ScanDirectionIsForward(dir)) +{ +so->currPos.moreLeft = false; +so->currPos.moreRight = true; +} +else +{ +so->currPos.moreLeft = true; +so->currPos.moreRight = false; +} The lack of comments makes it hard for me to understand what the motivation for this is, but I bet it's wrong. Suppose there's a cursor involved here and the user tries to back up. Instead of having a separate amskip operation, maybe there should be a flag attached to a scan indicating whether it should return only distinct results. Otherwise, you're allowing for the possibility that the same scan might sometimes skip and other times not skip, but then it seems hard for the scan to survive cursor operations. Anyway, why would that be useful? +if (ScanDirectionIsForward(dir)) +offnum = OffsetNumberPrev(offnum); +if (!_bt_readpage(scan, dir, offnum)) +{ +if (!_bt_steppage(scan, dir)) +{ +return false; +} +} As the TODO:TM comment at the top of the function sort of implies without directly coming out and saying so, this is ugly and suggests that you've picked the wrong API. If the idea I proposed above isn't appealing, I still think we need to get rid of this by some other means. +/* + * TODO:TM For now, we use _bt_search to search from the root; in theory + * we should be able to do a local traversal ie from the current page, but + * I don't know if it would actually be better in general. + */ I think that this depends a lot on the number of duplicates. If we end up merely fast-forwarding within the same page to find the next unique value, re-traversing from the root sucks. However, if the next distinct key *isn't* on the same page, then traversing from the root is a good bet. A typical btree is only a few levels deep (like 3) so once you don't find what you're looking for on the same page it's probably fairly sound to re-traverse from the root. Of course you lose at least a little bit in the case where you would have found the next distinct key within a page or two but in other cases you win big. I wonder if that would be a suitable heuristic: first check the current page, if all keys there are equal then retraverse from the root. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] IF (NOT) EXISTS in psql-completion
Hello, Thank you for looking this long-and-bothersome patch. At Wed, 23 Nov 2016 07:12:00 +0100, Pavel Stehule wrote in > Hi > > 2016-11-15 12:26 GMT+01:00 Kyotaro HORIGUCHI jp>: > > > Hello, I rebased this patch on the current master. > > > > At Mon, 31 Oct 2016 10:15:48 +0900 (Tokyo Standard Time), Kyotaro > > HORIGUCHI wrote in < > > 20161031.101548.162143279.horiguchi.kyot...@lab.ntt.co.jp> > > > Anyway, I fixed space issues and addressed the > > > COMPLETE_WITH_QUERY()'s almost unused parameter problem. And > > > tried to write a README files. > > > > tab-complete.c have gotten some improvements after this time. > > > > 577f0bdd2b8904cbdfde6c98f4bda6fd93a05ffc psql: Tab completion for > > renaming enum values. > > 927d7bb6b120a2ca09a164898f887eb850b7a329 Improve tab completion for > > CREATE TRIGGER. > > 1d15d0db50a5f39ab69c1fe60f2d5dcc7e2ddb9c psql: Tab-complete LOCK [TABLE] > > ... IN {ACCESS|ROW|SHARE}. > > > > The attached patchset is rebsaed on the master including these > > patches. > > > > I checked patches 0001, 0002, 0003 patches > > There are no any problems with patching, compiling, it is working as > expected > > These patches can be committed separately - they are long, but the code is > almost mechanical Thanks. You're right. I haven't consider about relations among them. 0001 (if-else refactoring) does not anyting functionally. It is required by 0004(word-shift-and-removal) and 0005(if-not-exists). 0002 (keywords case improvement) is almost independent from all other patches in this patch set. And it brings an obvious improvement. 0003 (addition to 0002) is move embedded keywords out of defined queries. Functionally can be united to 0002 but separated for understandability 0004 (word-shift-and-removal) is quite arguable one. This introduces an ability to modify (or destroy) previous_words array. This reduces almost redundant matching predicates such as, > if (TailMatches3("CREATE|UNIQUE", "INDEX", MatchAny) || > TailMatches4("CREATE|UNIQUE", "INDEX", "CONCURRENTLY", MatchAny)) into > if (Matches3("CREATE", "INDEX", MatchAnyExcept("ON"))) by removing "CONCURRENTLY". This obviously simplifies the predicates literally but it the code implies history of modification. The implied history might be worse than the previous shape, especially for the simple cases like this. For a complex case of CREATE TRIGGER, it seems worse than the original shape... I'll consider this a bit more. Maybe match-and-collapse template should be written a predicate. 0005 (if-not-exists). I have admit that this is arguable feature... 0006 is the terder point:( but. > The README is perfect Thank you, I'm relieved by hearing that. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bugfix] commit timestamps ERROR on lookup of FrozenTransactionId
On 25 November 2016 at 02:44, Alvaro Herrera wrote: > Craig Ringer wrote: > >> Updated to correct the other expected file, since there's an alternate. > > FWIW I don't know what you did here, but you did not patch the > alternate expected file. Damn. Attached the first patch a second time is what I did. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.6 TAP tests and extensions
On 25 November 2016 at 02:47, Alvaro Herrera wrote: > Craig Ringer wrote: >> On 27 October 2016 at 00:42, Robert Haas wrote: >> > On Wed, Oct 26, 2016 at 7:17 AM, Andres Freund wrote: >> >> On 2016-09-23 16:04:32 -0400, Tom Lane wrote: >> >>> Looking back over the thread, I see that you also proposed installing >> >>> isolationtester and pg_isolation_regress for the benefit of extensions. >> >>> I'm very much less excited about that idea. It'd be substantially more >> >>> dead weight in typical installations, and I'm not sure that it'd be >> >>> useful >> >>> to common extensions, and I'm not eager to treat isolationtester's API >> >>> and behavior as something we need to hold stable for extension use. >> >> >> >> FWIW, I'd be quite happy if it were installed. Running isolationtester >> >> when compiling extensions against distribution postgres packages would >> >> be quite useful. >> > >> > +1. >> >> Patch attached. > > Hmm but this only installs isolationtester itself ... don't you need > pg_isolation_regress too? Yeah, as Andres pointed out offlist after I posted this. Meant to follow up but got side-tracked. It needs PGXS support to be properly useful. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2016/11/25 4:36, Alvaro Herrera wrote: > Amit Langote wrote: >> On 2016/11/24 15:10, Ashutosh Bapat wrote: >>> On Thu, Nov 24, 2016 at 11:34 AM, Amit Langote wrote: > You have to specify column constraints using the keywords WITH OPTIONS, like below: create table p1 partition of p ( a with options primary key ) for values in (1); >>> >>> Oh, sorry for not noticing it. You are right. Why do we need "with >>> option" there? Shouldn't user be able to specify just "a primary key"; >>> it's not really an "option", it's a constraint. >> >> I just adopted the existing syntax for specifying column/table constraints >> of a table created with CREATE TABLE OF type_name. > > I think CREATE TABLE OF is pretty much a corner case. I agree that > allowing the constraint right after the constraint name is more > intuitive. I assume you meant "...right after the column name"? I will modify the grammar to allow that way then, so that the following will work: create table p1 partition of p ( a primary key ) for values in (1); Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Random PGDLLIMPORTing
On 25 November 2016 at 07:36, Michael Paquier wrote: > On Thu, Nov 24, 2016 at 11:01 PM, Magnus Hagander wrote: >> On Thu, Nov 24, 2016 at 2:58 PM, Craig Ringer wrote: >> My guess is that PGDLLIMPORT has been added explicitly when somebody needed >> it for something, without any actual thought. I can't say I see any reason >> not to export the other ones as well -- more that maybe there are even more >> that are needed? > > Yes, more or less. The reasoning behind at least the PostmasterPID bit is > that: > https://www.postgresql.org/message-id/CAB7nPqS_=14krcdch6nyrsw8c58j1cwxv6qcvs7yp-6tthq...@mail.gmail.com > That resulted in cac83219. > > The other ones could perhaps be marked with PGDLLIMPORT. Now usually > we need a use-case behind this change, and there are not that many > hackers on Windows doing much plugin development. Exactly, that's why nobody's been shouting yet. The use case is is exactly the same as the use case for these vars existing. IsBackgroundWorker in particular makes no sense to have at all if it isn't exported, and the only reason nobody's complaining is that nobody's building cassert binaries under Windows or has tried to write anything that has behaviour that cares if it's in a bgworker or not. PGDLLIMPORT is free, so the question should be "is there a reason not to add it here?". -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: function xmltable
On Fri, Nov 25, 2016 at 3:31 AM, Pavel Stehule wrote: > 2016-11-24 18:51 GMT+01:00 Tom Lane : >> contrib/xml2 has always relied on libxslt for xpath functionality. >> Can we do that here instead of writing, debugging, and documenting >> a pile of new code? > > I am sorry - I don't see it. There is nothing complex manipulation with > XPath expressions. You are missing the point here, which is to make the implementation footprint as light as possible, especially if the added functionality is already present in a dependency that Postgres can be linked to. OK, libxslt can only be linked with contrib/xml2/ now, but it would be at least worth looking at how much the current patch can be simplified for things like transformTableExpr or XmlTableGetValue by relying on some existing routines. Nit: I did not look at the patch in details, but I find the size of the latest version sent, 167kB, scary as it complicates review and increases the likeliness of bugs. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNDO and in-place update
On Thu, Nov 24, 2016 at 6:20 PM, Pavel Stehule wrote: >> I think that the whole emphasis on whether and to what degree this is >> like Oracle is somewhat misplaced. I would look at it a different >> way. We've talked many times over the years about how PostgreSQL is >> optimized for aborts. Everybody that I've heard comment on that issue >> thinks that is a bad thing. > > > again this depends on usage - when you have a possibility to run VACUUM, > then this strategy is better. > > The fast aborts is one pretty good feature for stable usage. > > Isn't possible to use UNDO log (or something similar) for VACUUM? ROLLBACK > should be fast, but > VACUUM can do more work? I think that in this design we wouldn't use VACUUM at all. However, if what you are saying is that we should try to make aborts near-instantaneous by pushing UNDO actions into the background, I agree entirely. InnoDB already does that, IIUC. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Random PGDLLIMPORTing
On Thu, Nov 24, 2016 at 11:01 PM, Magnus Hagander wrote: > On Thu, Nov 24, 2016 at 2:58 PM, Craig Ringer wrote: > My guess is that PGDLLIMPORT has been added explicitly when somebody needed > it for something, without any actual thought. I can't say I see any reason > not to export the other ones as well -- more that maybe there are even more > that are needed? Yes, more or less. The reasoning behind at least the PostmasterPID bit is that: https://www.postgresql.org/message-id/CAB7nPqS_=14krcdch6nyrsw8c58j1cwxv6qcvs7yp-6tthq...@mail.gmail.com That resulted in cac83219. The other ones could perhaps be marked with PGDLLIMPORT. Now usually we need a use-case behind this change, and there are not that many hackers on Windows doing much plugin development. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNDO and in-place update
> I think that the whole emphasis on whether and to what degree this is > like Oracle is somewhat misplaced. I would look at it a different > way. We've talked many times over the years about how PostgreSQL is > optimized for aborts. Everybody that I've heard comment on that issue > thinks that is a bad thing. again this depends on usage - when you have a possibility to run VACUUM, then this strategy is better. The fast aborts is one pretty good feature for stable usage. Isn't possible to use UNDO log (or something similar) for VACUUM? ROLLBACK should be fast, but VACUUM can do more work? > I am proposing a design that is optimized > for commits; that is, if the transaction commits, none of the pages it > modified need to be dirtied again afterwards at any point. I think > that's an extremely important property and it's one that we are very > far from having today. It necessarily implies that you cannot store > the old row versions in the heap, because if you do, then you are > going to have to dirty the pages again to get rid of them (unless you > prefer to just leak the space forever). Now there is plenty of room > for argument about whether the specific design I proposed is going to > be any good, and I think that would be quite an interesting discussion > to have. But I think if you say, well, you know, the fact that we may > rewrite the same page 5 or 6 times after a modification to set hint > bits (a few at a time) and HOT prune and set all-visible and freeze > isn't really any big deal, you must live in a world where the write > bandwidth of the I/O channel is a lot less of a problem than it is in > mine. And we've been around and around on all of that stuff and > people have come up with various ideas to improve the situation - some > of which have been implemented - but many of those ideas involve > unpleasant trade-offs and so the core problems remain. If we don't do > something more fundamental, we're still going to be up against the > same basic issues ten years from now. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Re: [HACKERS] Broken SSL tests in master
On 11/24/2016 10:38 PM, Andreas Karlsson wrote: To me it feels like the proper fix would be to make PQHost() return the value of the host parameter rather than the hostaddr (maybe add a new field in the pg_conn_host struct). But would be a behaviour change which might break someones application. Thoughts? I have attached a proof of concept patch for this. Remaining work is investigating all the callers of PQhost() and see if any of them are negatively affected by this patch and cleaning up the code some. Andreas diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index 3e9c45b..39c11eb 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -798,8 +798,34 @@ connectOptions2(PGconn *conn) */ if (conn->pghostaddr != NULL && conn->pghostaddr[0] != '\0') { - conn->connhost[0].host = strdup(conn->pghostaddr); - if (conn->connhost[0].host == NULL) + if (conn->pghost != NULL && conn->pghost[0] != '\0') + { + char *e = conn->pghost; + + /* + * Search for the end of the first hostname; a comma or + * end-of-string acts as a terminator. + */ + while (*e != '\0' && *e != ',') +++e; + + /* Copy the hostname whose bounds we just identified. */ + conn->connhost[0].host = +(char *) malloc(sizeof(char) * (e - conn->pghost + 1)); + if (conn->connhost[0].host == NULL) +goto oom_error; + memcpy(conn->connhost[0].host, conn->pghost, e - conn->pghost); + conn->connhost[0].host[e - conn->pghost] = '\0'; + } + else + { + conn->connhost[0].host = strdup(conn->pghostaddr); + if (conn->connhost[0].host == NULL) +goto oom_error; + } + + conn->connhost[0].hostaddr = strdup(conn->pghostaddr); + if (conn->connhost[0].hostaddr == NULL) goto oom_error; conn->connhost[0].type = CHT_HOST_ADDRESS; } @@ -827,6 +853,10 @@ connectOptions2(PGconn *conn) memcpy(conn->connhost[i].host, s, e - s); conn->connhost[i].host[e - s] = '\0'; + conn->connhost[i].hostaddr = strdup(conn->connhost[i].host); + if (conn->connhost[i].hostaddr == NULL) +goto oom_error; + /* Identify the type of host. */ conn->connhost[i].type = CHT_HOST_NAME; #ifdef HAVE_UNIX_SOCKETS @@ -845,12 +875,14 @@ connectOptions2(PGconn *conn) { #ifdef HAVE_UNIX_SOCKETS conn->connhost[0].host = strdup(DEFAULT_PGSOCKET_DIR); + conn->connhost[0].hostaddr = strdup(DEFAULT_PGSOCKET_DIR); conn->connhost[0].type = CHT_UNIX_SOCKET; #else conn->connhost[0].host = strdup(DefaultHost); + conn->connhost[0].hostaddr = strdup(DefaultHost); conn->connhost[0].type = CHT_HOST_NAME; #endif - if (conn->connhost[0].host == NULL) + if (conn->connhost[0].host == NULL || conn->connhost[0].hostaddr == NULL) goto oom_error; } @@ -1547,7 +1579,7 @@ connectDBStart(PGconn *conn) for (i = 0; i < conn->nconnhost; ++i) { pg_conn_host *ch = &conn->connhost[i]; - char *node = ch->host; + char *node = ch->hostaddr; struct addrinfo hint; int thisport; @@ -3034,6 +3066,8 @@ freePGconn(PGconn *conn) { if (conn->connhost[i].host != NULL) free(conn->connhost[i].host); + if (conn->connhost[i].hostaddr != NULL) +free(conn->connhost[i].hostaddr); if (conn->connhost[i].port != NULL) free(conn->connhost[i].port); if (conn->connhost[i].password != NULL) diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h index 854ec89..19e3a5e 100644 --- a/src/interfaces/libpq/libpq-int.h +++ b/src/interfaces/libpq/libpq-int.h @@ -306,7 +306,10 @@ typedef enum pg_conn_host_type */ typedef struct pg_conn_host { - char *host; /* host name or address, or socket path */ + char *host; /* host name or address, or socket path, + * used for validating the hostname */ + char *hostaddr; /* host name or address, or socket path, + * used when actually connecting */ pg_conn_host_type type; /* type of host */ char *port; /* port number for this host; if not NULL, * overrrides the PGConn's pgport */ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functions Immutable but not parallel safe?
On Thu, Nov 24, 2016 at 5:29 AM, David Rowley wrote: > There's 11 functions which are marked immutable, but are marked as > parallel unsafe. > > postgres=# select proname from pg_proc where provolatile = 'i' and > proparallel = 'u'; >proname > - > _pg_expandarray > _pg_keysequal > _pg_truetypid > _pg_truetypmod > _pg_char_max_length > _pg_char_octet_length > _pg_numeric_precision > _pg_numeric_precision_radix > _pg_numeric_scale > _pg_datetime_precision > _pg_interval_type > (11 rows) > > I'm finding hard to imagine a reason why these might be unsafe, but > failed. I do notice they're all only used in information_schema. > > Could it just perhaps be that these just missed the verification > process the other functions went through to determine their parallel > safety? Yes, I think that's it. I went through pg_proc.h, but never looked at information_schema.sql. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNDO and in-place update
On Thu, Nov 24, 2016 at 11:06 AM, Greg Stark wrote: > Fwiw, Oracle does not use the undo log for snapshot fetches. It's used > only for transaction rollback and recovery. > > For snapshot isolation Oracle has yet a *third* copy of the data in a > space called the "rollback segment(s)". When you update a row in a > block you save the whole block in the rollback segment. When you try > to access a block you check if the CSN -- which is basically > equivalent to our LSN -- is newer than your snapshot and if it is you > fetch the old version of the block from the rollback. My understanding is that this isn't correct. I think the rollback segments are what they call the thing that stores UNDO. See e.g. http://ss64.com/ora/syntax-redo.html Having said that, I'm by no means an Oracle expert. > Essentially their MVCC is done on a per-block level rather than a > per-row level and they keep only the newest version of the block in > the table, the rest are in the rollback segment. For what it's worth > I think our approach is cleaner and more flexible. They had a lot of > trouble with their approach over the years and it works well only > because they invested an enormous amount of development in it and also > because people throw a lot of hardware at it too. People do throw a lot of hardware at Oracle, but I don't think I'd accept the contention that Oracle generally gets less performance out of the same amount of hardware than PostgreSQL. There are certainly some things we often do faster -- including inserts, deletes, and transaction aborts -- but their implementation of updates seems to be very fast. Obviously, a lot depends on configuration and workload, so it's hard to make general statements, but I think it's more correct to imagine that people throw a lot of hardware at Oracle because that's where their big, critical databases are than to imagine that it's because Oracle is a resource hog. > I think the main use case we have trouble with is actually the "update > every row in the table" type of update which requires we write to > every block, plus a second copy of every block, plus write full pages > of both copies, then later set hint bits dirtying pages again and > generating more full pages writes, then later come along and vacuum > which requires two more writes of every block, etc. If we had a > solution for the special case of an update that replaces every row in > a page that I think would complement HOT nicely and go a long way > towards fixing our issues. This case is certainly a problem, but I don't believe it's anywhere close to being our only problem. My experience is that our system works fairly well when there are only a few indexes. However, with heavily indexed tables, all of your updates become non-HOT and then bad things happen. So you can either leave out indexes that you need for good query performance and then you're hosed, or you can add those indexes and then you're hosed anyway because of the bloat and write amplification associated with non-HOT updates. Also, write-heavy workloads that perform OK as long as there are no long-lived snapshots often enter a death spiral when you run reporting queries on the same system. Finer-grained snapshot tracking would help with that problem, but it doesn't solve it: now a single long-lived snapshot can "only" cause the database to double in size instead of increasing without bound, a scant consolation. Nor does it change the fundamental calculus that once a table gets bloated, it's very painful to de-bloat it. The appeal of a design that supports in-place update is that you don't bloat the table in the first place. You still have the bloat, of course, but it's off in a separate data structure that is engineered for efficient deletion. I think that the whole emphasis on whether and to what degree this is like Oracle is somewhat misplaced. I would look at it a different way. We've talked many times over the years about how PostgreSQL is optimized for aborts. Everybody that I've heard comment on that issue thinks that is a bad thing. I am proposing a design that is optimized for commits; that is, if the transaction commits, none of the pages it modified need to be dirtied again afterwards at any point. I think that's an extremely important property and it's one that we are very far from having today. It necessarily implies that you cannot store the old row versions in the heap, because if you do, then you are going to have to dirty the pages again to get rid of them (unless you prefer to just leak the space forever). Now there is plenty of room for argument about whether the specific design I proposed is going to be any good, and I think that would be quite an interesting discussion to have. But I think if you say, well, you know, the fact that we may rewrite the same page 5 or 6 times after a modification to set hint bits (a few at a time) and HOT prune and set all-visible and freeze isn't really any big deal, you must liv
Re: [HACKERS] pg_dump / copy bugs with "big lines" ?
I propose to rename allow_long to huge_ok. "Huge" is the terminology used by palloc anyway. I'd keep makeLongStringInfo() and initLongStringInfo() though as interface, because using Huge instead of Long there looks strange. Not wedded to that, though (particularly as it's a bit inconsistent). I'm not terribly sure about enlargeStringInfo(). I think I would propose that it takes Size instead of int. That has rather more fallout than I'd like, but if we don't do that, then I'm not sure that appendStringInfo continues to makes sense -- considering that it uses the return value from appendStringInfoVA (which I'm redefining as returning Size) to pass to enlargeStringInfo. I'm not really sure how to ensure that the values passed fit both in an int and Size ... which they would, given that all the callers use not-huge stringinfos anyway. But it'd be better if the compiler knew. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan
Hi, just caught another InitPlan below Gather with the recent patches in (master as of 4cc6a3f). Recipe below. regards, andreas set max_parallel_workers_per_gather = 2; set min_parallel_relation_size = 0; set parallel_setup_cost = 0; set parallel_tuple_cost = 0; explain select 1 from public.quad_point_tbl as ref_0, lateral (select ref_0.p as c3, sample_0.d as c5 from public.nv_child_2010 as sample_0 left join public.mvtest_tvv as ref_1 on ('x'< (select contype from pg_catalog.pg_constraint limit 1)) limit 82) as subq_0; --QUERY PLAN -- -- Gather (cost=0.19..13727.52 rows=902246 width=4) --Workers Planned: 2 ---> Nested Loop (cost=0.19..13727.52 rows=902246 width=4) -- -> Parallel Seq Scan on quad_point_tbl ref_0 (cost=0.00..105.85 rows=4585 width=16) -- -> Limit (cost=0.19..1.33 rows=82 width=20) --InitPlan 1 (returns $0) -- -> Limit (cost=0.00..0.19 rows=1 width=1) ---> Gather (cost=0.00..10.22 rows=54 width=1) -- Workers Planned: 2 -- -> Parallel Seq Scan on pg_constraint (cost=0.00..10.22 rows=22 width=1) ---> Seq Scan on nv_child_2010 sample_0 (cost=0.00..35.50 rows=2550 width=20) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Broken SSL tests in master
Hi, The SSL test suite (src/test/ssl) is broken in the master since commit 9a1d0af4ad2cbd419115b453d811c141b80d872b, which is Robert's refactoring of getting the server hostname for GSS, SSPI, and SSL in libpq. The error we get in the test suite: # Running: psql -X -A -t -c SELECT 'connected with user=ssltestuser dbname=trustdb sslcert=invalid hostaddr=127.0.0.1 host=common-name.pg-ssltest.test sslrootcert=ssl/root+server_ca.crt sslmode=verify-full' -d user=ssltestuser dbname=trustdb sslcert=invalid hostaddr=127.0.0.1 host=common-name.pg-ssltest.test sslrootcert=ssl/root+server_ca.crt sslmode=verify-full psql: server certificate for "common-name.pg-ssltest.test" does not match host name "127.0.0.1" As you can see, after the patch libpq will now look at hostaddr rather than host when validating the server certificate because that is what is stored in the first (and only) entry of conn->connhost, and therefore what PQhost() return. To me it feels like the proper fix would be to make PQHost() return the value of the host parameter rather than the hostaddr (maybe add a new field in the pg_conn_host struct). But would be a behaviour change which might break someones application. Thoughts? Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump / copy bugs with "big lines" ?
Daniel Verite wrote: > Here's an updated patch. Compared to the previous version: > > - removed CopyStartSend (per comment #1 in review) > > - renamed flag to allow_long (comment #2) > > - resetStringInfo no longer resets the flag (comment #3). > > - allowLongStringInfo() is removed (comment #3 and #5), > makeLongStringInfo() and initLongStringInfo() are provided > instead, as alternatives to makeStringInfo() and initStringInfo(). > > - StringInfoData.len is back to int type, 2GB max. > (addresses comment #4 incidentally). > This is safer because many routines peeking > into StringInfoData use int for offsets into the buffer, > for instance most of the stuff in backend/libpq/pqformat.c > Altough these routines are not supposed to be called on long > buffers, this assertion was not enforced in the code, and > with a 64-bit length effectively over 2GB, they would misbehave > pretty badly. Hmm, this looks a lot better I think. One change we overlooked and I just noticed is that we need to change appendStringInfoVA() to return size_t rather than int. This comment at the bottom of it gave that up: /* * Return pvsnprintf's estimate of the space needed. (Although this is * given as a size_t, we know it will fit in int because it's not more * than MaxAllocSize.) */ return (int) nprinted; The parenthical comment is no longer true. This means we need to update all its callers to match. I suppose it won't be a problem for most of them since they are not using long stringinfos anyway, but it seems better to keep everything consistent. I hope that callers that do not adjust the type of the declared variable would get a compiler warning. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
Amit Langote wrote: > On 2016/11/24 15:10, Ashutosh Bapat wrote: > > On Thu, Nov 24, 2016 at 11:34 AM, Amit Langote wrote: > >> You have to specify column constraints using the keywords WITH OPTIONS, > >> like below: > >> > >> create table p1 partition of p ( > >> a with options primary key > >> ) for values in (1); > > > > Oh, sorry for not noticing it. You are right. Why do we need "with > > option" there? Shouldn't user be able to specify just "a primary key"; > > it's not really an "option", it's a constraint. > > I just adopted the existing syntax for specifying column/table constraints > of a table created with CREATE TABLE OF type_name. I think CREATE TABLE OF is pretty much a corner case. I agree that allowing the constraint right after the constraint name is more intuitive. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.6 TAP tests and extensions
Craig Ringer wrote: > On 27 October 2016 at 00:42, Robert Haas wrote: > > On Wed, Oct 26, 2016 at 7:17 AM, Andres Freund wrote: > >> On 2016-09-23 16:04:32 -0400, Tom Lane wrote: > >>> Looking back over the thread, I see that you also proposed installing > >>> isolationtester and pg_isolation_regress for the benefit of extensions. > >>> I'm very much less excited about that idea. It'd be substantially more > >>> dead weight in typical installations, and I'm not sure that it'd be useful > >>> to common extensions, and I'm not eager to treat isolationtester's API > >>> and behavior as something we need to hold stable for extension use. > >> > >> FWIW, I'd be quite happy if it were installed. Running isolationtester > >> when compiling extensions against distribution postgres packages would > >> be quite useful. > > > > +1. > > Patch attached. Hmm but this only installs isolationtester itself ... don't you need pg_isolation_regress too? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bugfix] commit timestamps ERROR on lookup of FrozenTransactionId
Craig Ringer wrote: > Updated to correct the other expected file, since there's an alternate. FWIW I don't know what you did here, but you did not patch the alternate expected file. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bugfix] commit timestamps ERROR on lookup of FrozenTransactionId
Tom Lane wrote: > Alvaro Herrera writes: > > I considered the argument here for a bit and I think Craig is right -- > > FWIW, I agree. We shouldn't require every call site to special-case this, > and we definitely don't want it to require special cases in SQL code. > > (And I'm for back-patching, too.) Pushed. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: function xmltable
2016-11-24 18:51 GMT+01:00 Tom Lane : > Alvaro Herrera writes: > > Pavel Stehule wrote: > >> 2016-11-24 0:13 GMT+01:00 Alvaro Herrera : > >>> Oh my, I just noticed we have a new xpath preprocessor in this patch > >>> too. Where did this code come from -- did you write it all from > >>> scratch? > > >> I wrote it from scratch - libxml2 has not any API for iteration over > XPath > >> expression (different than iteration over XPath expression result), and > >> what I have info, there will not be any new API in libxml2. > > > Okay, I agree that the default namespace stuff looks worthwhile in the > > long run. But I don't have enough time to review the xpath parser stuff > > in the current commitfest, and I think it needs at the very least a lot > > of additional code commentary. > > contrib/xml2 has always relied on libxslt for xpath functionality. > Can we do that here instead of writing, debugging, and documenting > a pile of new code? > I am sorry - I don't see it. There is nothing complex manipulation with XPath expressions. Regards Pavel > > regards, tom lane >
Re: [HACKERS] patch: function xmltable
Alvaro Herrera writes: > Pavel Stehule wrote: >> 2016-11-24 0:13 GMT+01:00 Alvaro Herrera : >>> Oh my, I just noticed we have a new xpath preprocessor in this patch >>> too. Where did this code come from -- did you write it all from >>> scratch? >> I wrote it from scratch - libxml2 has not any API for iteration over XPath >> expression (different than iteration over XPath expression result), and >> what I have info, there will not be any new API in libxml2. > Okay, I agree that the default namespace stuff looks worthwhile in the > long run. But I don't have enough time to review the xpath parser stuff > in the current commitfest, and I think it needs at the very least a lot > of additional code commentary. contrib/xml2 has always relied on libxslt for xpath functionality. Can we do that here instead of writing, debugging, and documenting a pile of new code? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: function xmltable
Pavel Stehule wrote: > can me send your last work? Sure, it's in the archives -- https://www.postgresql.org/message-id/20161123233130.oqf7jl6czehy5fiw@alvherre.pgsql -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: function xmltable
2016-11-24 18:29 GMT+01:00 Alvaro Herrera : > Pavel Stehule wrote: > > Hi > > > > 2016-11-24 0:13 GMT+01:00 Alvaro Herrera : > > > > > Oh my, I just noticed we have a new xpath preprocessor in this patch > > > too. Where did this code come from -- did you write it all from > > > scratch? > > > > I wrote it from scratch - libxml2 has not any API for iteration over > XPath > > expression (different than iteration over XPath expression result), and > > what I have info, there will not be any new API in libxml2. > > Okay, I agree that the default namespace stuff looks worthwhile in the > long run. But I don't have enough time to review the xpath parser stuff > in the current commitfest, and I think it needs at the very least a lot > of additional code commentary. > > However I think the rest of it can reasonably go in -- I mean the SQL > parse of it, analysis, executor. Let me propose this: you split the > patch, leaving the xpath_parser.c stuff out and XMLNAMESPACES DEFAULT, > and we introduce just the TableExpr stuff plus the XMLTABLE function. I > can commit that part in the current commitfest, and we leave the > xpath_parser plus associated features for the upcoming commitfest. > Deal? > ok can me send your last work? Regards Pavel > > -- > Álvaro Herrerahttps://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >
Re: [HACKERS] patch: function xmltable
Pavel Stehule wrote: > Hi > > 2016-11-24 0:13 GMT+01:00 Alvaro Herrera : > > > Oh my, I just noticed we have a new xpath preprocessor in this patch > > too. Where did this code come from -- did you write it all from > > scratch? > > I wrote it from scratch - libxml2 has not any API for iteration over XPath > expression (different than iteration over XPath expression result), and > what I have info, there will not be any new API in libxml2. Okay, I agree that the default namespace stuff looks worthwhile in the long run. But I don't have enough time to review the xpath parser stuff in the current commitfest, and I think it needs at the very least a lot of additional code commentary. However I think the rest of it can reasonably go in -- I mean the SQL parse of it, analysis, executor. Let me propose this: you split the patch, leaving the xpath_parser.c stuff out and XMLNAMESPACES DEFAULT, and we introduce just the TableExpr stuff plus the XMLTABLE function. I can commit that part in the current commitfest, and we leave the xpath_parser plus associated features for the upcoming commitfest. Deal? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNDO and in-place update
On Thu, Nov 24, 2016 at 2:32 AM, Tsunakawa, Takayuki wrote: > IMHO, overall, there should be pros and cons of the current approach and the > new UNDo one (like Oracle?), depending on the workload. Under update-heavy > workload, the UNDO method may be better. OTOH, under the mostly-INSERT > workload (like data warehouse?), the current method will be better because it > writes no log for UNDO. The foreground operation will complete more quickly, because it won't have to write UNDO. On the other hand, you'll have to set hint bits later, as well as freeze, which may be more expensive than writing UNDO by the time all is said and done. Whether it's better to do pay a foreground tax immediately or to do deferred work at a later time depends on things like whether you have quiet times during which you can catch up on the deferred work ... but the number of users who have gotten unpleasant surprises due to autovacuum kicking in during a busy period is not small. > Furthermore, it maybe the best to be able to switch the method for each table > and/or tablespace. For example, in pgbench, history table uses the current > method, > and other tables use the UNDO method. Is it time to introduce a pluggable > storage system? IMHO, it's past time for that. > Because PostgreSQL is a follower in the UNDO approach, I think it will be > better to study other DBMSs well (Oracle and MySQL?). That includes not only > their manuals, but also whitepapers and books. Especially, I expect good > books to give deep knowledge on performance tuning and troubleshooting, from > which we will be able to know the cons that Oracle's materials don't state. I agree up to a point. I think we need to design our own system as well as we can, not just copy what others have done. For example, the design I sketched will work with all of PostgreSQL's existing index types. You need to modify each AM in order to support in-place updates when a column indexed by that AM has been modified, and that's probably highly desirable, but it's not a hard requirement. I believe that's a better approach for us than insisting that we have to do it in exactly the same way as some other system. Now, that doesn't mean we shouldn't learn from what works well and poorly in other systems, but I think our goal here should be to chart the best way forward given PostgreSQL's existing architecture and its existing strengths and weaknesses, rather than to make it exactly like Oracle or MySQL or anything else. Few people on this mailing list would say that either of those systems are categorically better than PostgreSQL; most, I suspect, would disagree somewhat vigorously. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNDO and in-place update
> FWIW, while this is basically true, the idea of repurposing UNDO to be > usable for MVCC is definitely an Oracleism. Mohan's ARIES paper says > nothing about MVCC. > For snapshot isolation Oracle has yet a *third* copy of the data in a > space called the "rollback segment(s)". UNDO and rollback segment are the same thing in Oracle. In older versions it was just called "rollback segment" I think it started with Oracle 10g that they called it UNDO. > Fwiw, Oracle does not use the undo log for snapshot fetches. > It's used only for transaction rollback and recovery. As UNDO and "rollback" are the same they do use the UNDO information for MVCC: http://docs.oracle.com/database/121/CNCPT/consist.htm#GUID-00A3688F-1219-423C-A5ED-4B8F25BEEAFB__BABFDBAJ -- View this message in context: http://postgresql.nabble.com/UNDO-and-in-place-update-tp5931575p5931844.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bugfix] commit timestamps ERROR on lookup of FrozenTransactionId
Alvaro Herrera writes: > I considered the argument here for a bit and I think Craig is right -- FWIW, I agree. We shouldn't require every call site to special-case this, and we definitely don't want it to require special cases in SQL code. (And I'm for back-patching, too.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNDO and in-place update
On Thu, Nov 24, 2016 at 04:06:14PM +, Greg Stark wrote: > For snapshot isolation Oracle has yet a *third* copy of the data in a > space called the "rollback segment(s)". When you update a row in a > block you save the whole block in the rollback segment. When you try > to access a block you check if the CSN -- which is basically > equivalent to our LSN -- is newer than your snapshot and if it is you > fetch the old version of the block from the rollback. I assume they can do this with good performance because, unlike UNDO, they don't need to fsync the pages kept for snapshot visibility --- if the server crashes, you don't need them. > Essentially their MVCC is done on a per-block level rather than a > per-row level and they keep only the newest version of the block in > the table, the rest are in the rollback segment. For what it's worth > I think our approach is cleaner and more flexible. They had a lot of > trouble with their approach over the years and it works well only > because they invested an enormous amount of development in it and also > because people throw a lot of hardware at it too. > > I think the main use case we have trouble with is actually the "update > every row in the table" type of update which requires we write to > every block, plus a second copy of every block, plus write full pages > of both copies, then later set hint bits dirtying pages again and > generating more full pages writes, then later come along and vacuum > which requires two more writes of every block, etc. If we had a > solution for the special case of an update that replaces every row in > a page that I think would complement HOT nicely and go a long way > towards fixing our issues. Any ideas how we could optimize the update-all workload? Create a new heap file and indexes? Our previous worst-case workload was a single row that was updated repeatedly, but HOT fixed that for non-indexed rows, and WARM will improve it for indexed rows. One of the big takeaways when writing my MVCC talk is that no matter how much we optimize UPDATE, we still need cleanup for deletes and aborted transactions. I am hopeful we can continue reducing the number of times we write a page for maintenance purposes. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNDO and in-place update
On Wed, Nov 23, 2016 at 5:18 PM, Thomas Munro wrote: > On Wed, Nov 23, 2016 at 6:01 PM, Peter Geoghegan wrote: >> * Our behavior with many duplicates in secondary indexes is pretty bad >> in general, I suspect. > > From the pie-in-the-sky department: I believe there are > snapshot-based systems that don't ever have more than one entry for a > logical row in a btree. Instead, they do UNDO-log based snapshot > reconstruction for btrees too, generalising what is being discussed > for heap tables in this thread. That means that whenever you descend > a btree, at each page you have to be prepared to go and look in the > UNDO log if necessary on a page-by-page basis. Alternatively, some > systems use a shadow table rather than an UNDO log for the old > entries, but still privilege queries that need the latest version by > keeping only those in the btree. I don't know the details but suspect > that both of those approaches might require CSN (= LSN?) based > snapshots, so that you can make page-level visibility determinations > while traversing the 'current' btree based on a page header. That > would reduce both kinds of btree bloat: the primary kind of being > entries for dead tuples that still exist in the heap, and the > secondary kind being the resultant permanent btree fragmentation that > remains even after vacuum removes them. Presumably once you have all > that, you can allow index-only-scans without a visibility map. Also, > I suspect that such a "3D time travelling" btree would be a > requirement for index ordered tables in a snapshot-based RDBMS. I don't really understand how a system of this type copes with page splits. Suppose there are entries 1000, 2000, ..., 1e6 in the btree. I now start two overlapping transactions, one inserting all the positive odd values < 1e6 and the other inserting all the positive even values < 1e6 that are not already present. The insertions are interleaved with each other. After they get done inserting, what does the 3D time traveling btree look like at this point? The problem I'm getting at here is that the heap, unlike an index, is unordered. So when I've got all values 1..1e6, they've got to be in a certain order. Necessarily, the values from the in-flight transactions are interleaved with each other and with the old data. If they're not, I can no longer support efficient searches by the in-flight transactions, plus I have an unbounded amount of cleanup work to do at commit time. But if I *do* have all of those values interleaved in the proper order, then an abort leaves fragmented free space. I can go and kill all of the inserted tuples during UNDO of an aborted transactions, but because page splits have happened, the index tuples I need to kill may not be on the same pages where I inserted them, so I might need to just search from the top of the tree, so it's expensive. And even if I do it anyway, the pages are left half-full. The point is that the splits are the joint result of the two concurrent transactions taken together - you can't blame individual splits on individual transactions. Another way to put this is - if you could make all of the index operations appear to happen instantaneously at commit time, then I see how we could make what you're talking about work. And if you never needed to split pages, then you could do that in the same way that you can do it for the heap. But that's not realistic. Even for an UNDO-based heap, we have a mild form of this problem. Suppose we have a full page and delete a tuple, creating free space. Now another transaction inserts a tuple, using up that free space. Well, if the first transaction aborts and the second one commits, we've got trouble. So we can't allow that. We have to keep track of the amount of space that could potentially be gobbled up by UNDO actions and leave enough space on the page to guarantee that such actions will succeed. If that means that some new tuple has to be placed on a different page instead, so be it. (This could create some minor bloat, but I think it's not really that bad.) For an index, we can't manage so easily, because we don't have the same kind of flexibility about where to put inserted tuples. Do you see some trick here that I'm missing? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNDO and in-place update
On 23 November 2016 at 04:28, Peter Geoghegan wrote: > On Tue, Nov 22, 2016 at 7:01 PM, Robert Haas wrote: >> This basic DO-UNDO-REDO protocol has been well-understood for >> decades. > > FWIW, while this is basically true, the idea of repurposing UNDO to be > usable for MVCC is definitely an Oracleism. Mohan's ARIES paper says > nothing about MVCC. Fwiw, Oracle does not use the undo log for snapshot fetches. It's used only for transaction rollback and recovery. For snapshot isolation Oracle has yet a *third* copy of the data in a space called the "rollback segment(s)". When you update a row in a block you save the whole block in the rollback segment. When you try to access a block you check if the CSN -- which is basically equivalent to our LSN -- is newer than your snapshot and if it is you fetch the old version of the block from the rollback. Essentially their MVCC is done on a per-block level rather than a per-row level and they keep only the newest version of the block in the table, the rest are in the rollback segment. For what it's worth I think our approach is cleaner and more flexible. They had a lot of trouble with their approach over the years and it works well only because they invested an enormous amount of development in it and also because people throw a lot of hardware at it too. I think the main use case we have trouble with is actually the "update every row in the table" type of update which requires we write to every block, plus a second copy of every block, plus write full pages of both copies, then later set hint bits dirtying pages again and generating more full pages writes, then later come along and vacuum which requires two more writes of every block, etc. If we had a solution for the special case of an update that replaces every row in a page that I think would complement HOT nicely and go a long way towards fixing our issues. Incidentally the "Interested transaction list" is for locking rows for updates and it's basically similar to what we've discussed before of having a "most frequent xmin" in the header and then a bit indicating the xmin is missing from the row header. Except in their case they don't need it for the actual xmin/xmax because their visibility is done per-block, only the transient lock state -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Creating a DSA area to provide work space for parallel execution
I have one more question, In V1 we were calling dsa_detach in ExecParallelCleanup and in ParallelQueryMain, but it's removed in v2. Any specific reason ? Does this need to be used differently ? ExecParallelCleanup(ParallelExecutorInfo *pei) { + if (pei->area != NULL) + { + dsa_detach(pei->area); + pei->area = NULL; + } After this changes, I am getting DSM segment leak warning. I am calling dsa_allocate and dsa_free. On Thu, Nov 24, 2016 at 8:09 PM, Dilip Kumar wrote: > On Wed, Nov 23, 2016 at 5:42 PM, Thomas Munro > wrote: >> ... or we could allow DSA areas to be constructed inside existing >> shmem, as in the attached patch which requires dsa_create_in_place, >> from the patch at >> https://www.postgresql.org/message-id/CAEepm%3D0-pbokaQdCXhtYn%3Dw64MmdJz4hYW7qcSU235ar276x7w%40mail.gmail.com >> . > Seems like there is problem in this patch.. > > In below code, pei->area is not yet allocated at this point , so > estate->es_query_area will always me NULL ? > > + estate->es_query_area = pei->area; > + > /* Create a parallel context. */ > pcxt = CreateParallelContext(ParallelQueryMain, nworkers); > pei->pcxt = pcxt; > @@ -413,6 +423,10 @@ ExecInitParallelPlan(PlanState *planstate, EState > *estate, int nworkers) > shm_toc_estimate_keys(&pcxt->estimator, 1); > } > > + /* Estimate space for DSA area. */ > + shm_toc_estimate_chunk(&pcxt->estimator, PARALLEL_AREA_SIZE); > + shm_toc_estimate_keys(&pcxt->estimator, 1); > + > /* Everyone's had a chance to ask for space, so now create the DSM. */ > InitializeParallelDSM(pcxt); > > @@ -466,6 +480,14 @@ ExecInitParallelPlan(PlanState *planstate, EState > *estate, int nworkers) > pei->instrumentation = instrumentation; > } > > + /* Create a DSA area that can be used by the leader and all workers. */ > + area_space = shm_toc_allocate(pcxt->toc, PARALLEL_AREA_SIZE); > + shm_toc_insert(pcxt->toc, PARALLEL_KEY_AREA, area_space); > + pei->area = dsa_create_in_place(area_space, PARALLEL_AREA_SIZE, > + LWTRANCHE_PARALLEL_EXEC_AREA, > + "parallel query memory area"); > > > -- > Regards, > Dilip Kumar > EnterpriseDB: http://www.enterprisedb.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Creating a DSA area to provide work space for parallel execution
On Wed, Nov 23, 2016 at 5:42 PM, Thomas Munro wrote: > ... or we could allow DSA areas to be constructed inside existing > shmem, as in the attached patch which requires dsa_create_in_place, > from the patch at > https://www.postgresql.org/message-id/CAEepm%3D0-pbokaQdCXhtYn%3Dw64MmdJz4hYW7qcSU235ar276x7w%40mail.gmail.com > . Seems like there is problem in this patch.. In below code, pei->area is not yet allocated at this point , so estate->es_query_area will always me NULL ? + estate->es_query_area = pei->area; + /* Create a parallel context. */ pcxt = CreateParallelContext(ParallelQueryMain, nworkers); pei->pcxt = pcxt; @@ -413,6 +423,10 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate, int nworkers) shm_toc_estimate_keys(&pcxt->estimator, 1); } + /* Estimate space for DSA area. */ + shm_toc_estimate_chunk(&pcxt->estimator, PARALLEL_AREA_SIZE); + shm_toc_estimate_keys(&pcxt->estimator, 1); + /* Everyone's had a chance to ask for space, so now create the DSM. */ InitializeParallelDSM(pcxt); @@ -466,6 +480,14 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate, int nworkers) pei->instrumentation = instrumentation; } + /* Create a DSA area that can be used by the leader and all workers. */ + area_space = shm_toc_allocate(pcxt->toc, PARALLEL_AREA_SIZE); + shm_toc_insert(pcxt->toc, PARALLEL_KEY_AREA, area_space); + pei->area = dsa_create_in_place(area_space, PARALLEL_AREA_SIZE, + LWTRANCHE_PARALLEL_EXEC_AREA, + "parallel query memory area"); -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Random PGDLLIMPORTing
On Thu, 24 Nov 2016 15:01:33 +0100 Magnus Hagander wrote: > On Thu, Nov 24, 2016 at 2:58 PM, Craig Ringer > wrote: > > > Hi all > > > > Noticed this while reading something unrelated > > > > extern PGDLLIMPORT pid_t PostmasterPid; > > extern bool IsPostmasterEnvironment; > > extern PGDLLIMPORT bool IsUnderPostmaster; > > extern bool IsBackgroundWorker; > > extern PGDLLIMPORT bool IsBinaryUpgrade; > > > > I don't see any sane reason for some of those to be PGDLLIMPORT but > > not others. In particular it's pretty silly for IsBackgroundWorker > > not to be PGDLLIMPORT. > > > > Too trivial to be worth making an actual patch, it'd be more work to > > apply it than edit directly. > > > > My guess is that PGDLLIMPORT has been added explicitly when somebody > needed it for something, without any actual thought. I can't say I > see any reason not to export the other ones as well -- more that > maybe there are even more that are needed? > It worth checking actual variable definitions, not just declarations. I've found recently, that at least in MSVC build system, only initialized variables are included into postgres.def file, and so are actually exported from the backend binary. But in this case both IsPostmasterEnvironment and IsBackgroundWorker are initialized. So probably they are not marked as importable only for historic reason. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Physical append-only tables
On Thu, Nov 24, 2016 at 10:13:30AM +0100, Magnus Hagander wrote: > On Thu, Nov 24, 2016 at 2:26 AM, Bruce Momjian wrote: > > On Mon, Nov 14, 2016 at 08:43:12PM +, Greg Stark wrote: > > That said, I don't think the "maintain clustering a bit better using > > BRIN" is a bad idea. It's just the bit about turning a table > > append-only to deal with update-once data that I think is overreach. > > What if we used BRIN to find heap pages where the new row was in the > page BRIN min/max range, and the heap page had free space. Only if that > fails do we put is somewhere else in the heap. > > > That would certainly be useful. You'd have to figure out what to do in the > case > of multiple conflicting BRIN indexes (which you shouldn't have in the first > place, but that won't keep people from having them), but other than that it > would be quite good I think. This idea is only possible because the BRIN index is so small and easy to scan, i.e. this wouldn't work for a btree index. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNDO and in-place update
On Wed, Nov 23, 2016 at 11:35:38PM -0800, Peter Geoghegan wrote: > On Wed, Nov 23, 2016 at 11:32 PM, Tsunakawa, Takayuki > wrote: > > IMHO, overall, there should be pros and cons of the current approach and > > the new UNDo one (like Oracle?), depending on the workload. Under > > update-heavy workload, the UNDO method may be better. OTOH, under the > > mostly-INSERT workload (like data warehouse?), the current method will be > > better because it writes no log for UNDO. > > I believe that you are correct about that. This is probably similar to our use of double-buffering, using shared_buffers and the kernel cache --- in some cases, the double-buffering is better (variable workloads), while in other cases a single cache is best. We have had trouble figuring out if we need to support both single and double caching methods, and when to recommend one over the other. Seems UNDO would have the same complexity. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Reload SSL certificates on SIGHUP
On 11/24/2016 02:49 PM, Andreas Karlsson wrote: Thanks for finding this. I will look at this more once I get home, but the tests do not fail on my computer. I wonder what I do differently. What versions of Perl and OpenSSL do you run and how did you run the tests when the failed? I ran the tests by running "make check" inside "src/test/ssl". Never mind, I had not fetched the latest master. Once I had done so these tests were both broken in my rebased branch and in the master branch without any modifications. I guess I will have to bisect this once I get home. Inof for myself or anyone else who feels like bisecting this: the last known good commit is 67dc4ccbb2e1c27da823eced66d9217a5652cbb0. Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Random PGDLLIMPORTing
On Thu, Nov 24, 2016 at 2:58 PM, Craig Ringer wrote: > Hi all > > Noticed this while reading something unrelated > > extern PGDLLIMPORT pid_t PostmasterPid; > extern bool IsPostmasterEnvironment; > extern PGDLLIMPORT bool IsUnderPostmaster; > extern bool IsBackgroundWorker; > extern PGDLLIMPORT bool IsBinaryUpgrade; > > I don't see any sane reason for some of those to be PGDLLIMPORT but > not others. In particular it's pretty silly for IsBackgroundWorker not > to be PGDLLIMPORT. > > Too trivial to be worth making an actual patch, it'd be more work to > apply it than edit directly. > My guess is that PGDLLIMPORT has been added explicitly when somebody needed it for something, without any actual thought. I can't say I see any reason not to export the other ones as well -- more that maybe there are even more that are needed? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Random PGDLLIMPORTing
On Thu, Nov 24, 2016 at 2:58 PM, Craig Ringer wrote: > Hi all > > Noticed this while reading something unrelated > > extern PGDLLIMPORT pid_t PostmasterPid; > extern bool IsPostmasterEnvironment; > extern PGDLLIMPORT bool IsUnderPostmaster; > extern bool IsBackgroundWorker; > extern PGDLLIMPORT bool IsBinaryUpgrade; > > I don't see any sane reason for some of those to be PGDLLIMPORT but > not others. In particular it's pretty silly for IsBackgroundWorker not > to be PGDLLIMPORT. > > Too trivial to be worth making an actual patch, it'd be more work to > apply it than edit directly. > > -- > Craig Ringer http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
[HACKERS] Random PGDLLIMPORTing
Hi all Noticed this while reading something unrelated extern PGDLLIMPORT pid_t PostmasterPid; extern bool IsPostmasterEnvironment; extern PGDLLIMPORT bool IsUnderPostmaster; extern bool IsBackgroundWorker; extern PGDLLIMPORT bool IsBinaryUpgrade; I don't see any sane reason for some of those to be PGDLLIMPORT but not others. In particular it's pretty silly for IsBackgroundWorker not to be PGDLLIMPORT. Too trivial to be worth making an actual patch, it'd be more work to apply it than edit directly. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Reload SSL certificates on SIGHUP
On 11/24/2016 08:46 AM, Michael Paquier wrote: On Sat, Nov 12, 2016 at 3:42 AM, Andreas Karlsson wrote: On 11/11/2016 07:40 PM, Andreas Karlsson wrote: Here is a new version of the patch with the only differences; 1) The SSL tests have been changed to use reload rather than restart Did you check if the tests pass? I am getting a couple of failures like this one: psql: server certificate for "common-name.pg-ssltest.test" does not match host name "127.0.0.1" not ok 11 - sslrootcert=ssl/root+server_ca.crt sslmode=verify-full Attached are the logs of the run I did, and the same behavior shows for macOS and Linux. The shape of the tests look correct to me after review. Still, seeing failing tests with sslmode=verify-full is a problem that needs to be addressed. This may be pointing to an incorrect CA load handling, though I could not spot a problem when going through the code. Thanks for finding this. I will look at this more once I get home, but the tests do not fail on my computer. I wonder what I do differently. What versions of Perl and OpenSSL do you run and how did you run the tests when the failed? I ran the tests by running "make check" inside "src/test/ssl". Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bugfix] commit timestamps ERROR on lookup of FrozenTransactionId
I considered the argument here for a bit and I think Craig is right -- FrozenXid eventually makes it to a tuple's xmin where it becomes a burden to the caller, making our interface bug-prone -- sure you can special-case it, but you don't until it first happens ... and it may not until you're deep into production. Even the code comment is confused: "error if the given Xid doesn't normally commit". But surely FrozenXid *does* commit in the sense that it appears in committed tuples' Xmin. We already have a good mechanism for replying to the query with "this value is too old for us to have its commit TS", which is a false return value. We should use that. I think not backpatching is worse, because then users have to be aware that they need to handle the FrozenXid case specially, but only on 9.5/9.6 ... I think the reason it took this long to pop up is because it has taken this long to get to replication systems on which this issue matters. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres 9.3 postgres_fdw ::LOG: could not receive data from client: Connection reset by peer
Local server log has the line and remote table log is empty (it is configured for minimum warning and when I produce one it appears in log OK) And I have new details - it happens on some additional environments - not constantly. Some hours it happens every time, then just stops appearing: postgres@hostname:~$ grep user 9.3/main/pg_log/postgresql-2016-11-22_00.log | cat -n 1 2016-11-22 06:00:02 UTC 127.0.0.1 user::LOG: could not receive data from client: Connection reset by peer 2 2016-11-22 06:00:03 UTC 127.0.0.1 user::LOG: could not receive data from client: Connection reset by peer 3 2016-11-22 06:00:03 UTC 127.0.0.1 user::LOG: could not receive data from client: Connection reset by peer 4 2016-11-22 10:06:08 UTC 127.0.0.1 user::LOG: could not receive data from client: Connection reset by peer 5 2016-11-22 10:06:08 UTC 127.0.0.1 user ::LOG: could not receive data from client: Connection reset by peer 6 2016-11-22 10:06:08 UTC 127.0.0.1 user::LOG: could not receive data from client: Connection reset by peer 7 2016-11-22 11:25:27 UTC 127.0.0.1 user::LOG: could not receive data from client: Connection reset by peer 8 2016-11-22 11:25:27 UTC 127.0.0.1 user::LOG: could not receive data from client: Connection reset by peer 9 2016-11-22 11:25:27 UTC 127.0.0.1 user::LOG: could not receive data from client: Connection reset by peer this is log from local server where user runs same bunch of three statements MINUTELY But when it start happening if I psql to db, same statement produces NOT this string in log. Here reason why you should not follow this post any more: postgres=# select version(); version -- PostgreSQL 9.3.9 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit (1 row) I will just upgrade it. And this is the only machine I have this weird behavior at. Sorry guys and thank you for your time 2016-11-23 18:16 GMT+00:00 Jeff Janes : > On Mon, Nov 21, 2016 at 6:32 AM, Vladimir Svedov > wrote: > >> Hi, >> I have this question. Looked for a help on http://dba.stackexchange.com/ >> No success. >> Maybe you can answer?.. >> Thank you in advance >> >> >> "FOREIGN_TABLE" created with postgres_fdw. LOCAL_TABLE is just a local >> table... >> >> Symptoms: >> >>1. I run in psql query SELECT * from FOREIGN_TABLE. No log generated >>2. I run in bash psql -c "SELECT * from LOCAL_TABLE". No log generated >>3. I run in bash psql -c "SELECT * from FOREIGN_TABLE". ::LOG: could >>not receive data from client: Connection reset by peer generated in >>postgres log >> >> > Which server log file is this generated in, the local or the foreign? > Whichever it is, is there an entry in the logfile for the other server > which seems to match up to this one? That may have more useful details. > > Cheers, > > Jeff >
Re: [HACKERS] Patch to implement pg_current_logfile() function
On Wed, 23 Nov 2016 23:08:18 -0600 "Karl O. Pinc" wrote: > On Wed, 23 Nov 2016 03:21:05 -0600 > "Karl O. Pinc" wrote: > > > On Sat, 19 Nov 2016 12:58:47 +0100 > > Gilles Darold wrote: > > > > > ... attached v14 of the patch. > > > patch_pg_current_logfile-v14.diff.retry_current_logfiles-part3 > > > Re-try the write of current_logfiles should it fail because the > > system is too busy. > --- > > patch_pg_current_logfile-v14.diff.doc_linux_default > > Applies on top of > patch_pg_current_logfile-v14.diff.pg_ctl_and_more_docs > > Mentions, in the body of the docs, defaults and their impact on > current_logfiles and pg_current_logfiles. It seems appropriate > to mention this in the main documentation and in the overall context > of logging. Attached is version 2 of this patch. Adds an index entry. patch_pg_current_logfile-v14.diff.doc_linux_default-v2 Karl Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein top of > p patch_pg_current_logfile-v14.diff.doc_linux_default-v2 Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Push down more full joins in postgres_fdw
On 2016/11/22 18:28, Ashutosh Bapat wrote: The comments should explain why is the assertion true. +/* Shouldn't be NIL */ +Assert(tlist != NIL); I noticed that I was wrong; in the Assertion the tlist can be empty. An example for such a case is: SELECT 1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t1 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t2 ON (TRUE); In this example any columns of the relations ft4 and ft5 wouldn't be needed for the join or the final output, so both the tlists for the relations created in deparseRangeTblRef would be empty. Attached is an updated version fixing this bug. Changes are: * I removed make_outerrel_subquery/make_innerrel_subquery from fpinfo and added a new member "is_subquery_rel" to fpinfo, as a flag on whether to deparse the relation as a subquery. * I modified deparseRangeTblRef, deparseSelectSql, and is_subquery_var to see the is_subquery_rel, not make_outerrel_subquery/make_innerrel_subquery. * I modified appendSubselectAlias to handle the case where ncols = 0 properly. * I renamed subquery_rels in fpinfo to lower_subquery_rels, to make it easy to distinguish this from is_subquery_rel clearly. * I added regression tests for that. The rest of the patch is the same as the previous version, but: +/* Should be same length */ +Assert(list_length(tlist) == list_length(foreignrel->reltarget->exprs)); I added comments explaining the reason. The name get_subselect_alias_id() seems to suggest that the function returns identifier for subselect alias, which isn't correct. It returns the alias identifiers for deparsing a Var node. But I guess, we have left this to the committer's judgement. I agree that the name isn't good, so I changed it to get_relation_column_alias_ids(). Let me know if it may be better. I also revised code and comments a bit, just for consistency. I will update another patch for PHVs on top of the attached one. Best regards, Etsuro Fujita *** a/contrib/postgres_fdw/deparse.c --- b/contrib/postgres_fdw/deparse.c *** *** 109,114 typedef struct deparse_expr_cxt --- 109,116 /* Handy macro to add relation name qualification */ #define ADD_REL_QUALIFIER(buf, varno) \ appendStringInfo((buf), "%s%d.", REL_ALIAS_PREFIX, (varno)) + #define SS_REL_ALIAS_PREFIX "s" + #define SS_COL_ALIAS_PREFIX "c" /* * Functions to determine whether an expression can be evaluated safely on *** *** 167,172 static void appendConditions(List *exprs, deparse_expr_cxt *context); --- 169,181 static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *joinrel, bool use_alias, List **params_list); static void deparseFromExpr(List *quals, deparse_expr_cxt *context); + static void deparseRangeTblRef(StringInfo buf, PlannerInfo *root, + RelOptInfo *foreignrel, List **params_list); + static void appendSubselectAlias(StringInfo buf, int relno, int ncols); + static void get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, + int *relno, int *colno); + static bool is_subquery_var(Var *node, RelOptInfo *foreignrel, int *relno, + int *colno); static void deparseAggref(Aggref *node, deparse_expr_cxt *context); static void appendGroupByClause(List *tlist, deparse_expr_cxt *context); static void appendAggOrderBy(List *orderList, List *targetList, *** *** 990,1001 deparseSelectSql(List *tlist, List **retrieved_attrs, deparse_expr_cxt *context) */ appendStringInfoString(buf, "SELECT "); if (foreignrel->reloptkind == RELOPT_JOINREL || ! foreignrel->reloptkind == RELOPT_UPPER_REL) ! { ! /* For a join relation use the input tlist */ deparseExplicitTargetList(tlist, retrieved_attrs, context); - } else { /* --- 999,1013 */ appendStringInfoString(buf, "SELECT "); + /* + * For a join relation or an upper relation, use deparseExplicitTargetList. + * Likewise, for a relation that is being deparsed as a subquery, use that + * function. Otherwise, use deparseTargetList. + */ if (foreignrel->reloptkind == RELOPT_JOINREL || ! foreignrel->reloptkind == RELOPT_UPPER_REL || ! fpinfo->is_subquery_rel) deparseExplicitTargetList(tlist, retrieved_attrs, context); else { /* *** *** 1155,1165 deparseLockingClause(deparse_expr_cxt *context) --- 1167,1185 StringInfo buf = context->buf; PlannerInfo *root = context->root; RelOptInfo *rel = context->scanrel; + PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) rel->fdw_private; int relid = -1; while ((relid = bms_next_member(rel->relids, relid)) >= 0) { /* + * Ignore relation if it appears in a lower subquery. Locking clause + * for such a relation, if needed, is included in the subquery. + */ + if (bms_is_member(relid, fpinfo->lower_subquery_rels)) + continue; + + /* * Add FOR UPDATE/SHARE if appropr
Re: [HACKERS] Patch: Implement failover on libpq connect level.
On Wed, Nov 23, 2016 at 10:19 PM, Catalin Iacob wrote: On Tue, Nov 22, 2016 at 8:38 AM, Tsunakawa, Takayuki < tsunakawa.ta...@jp.fujitsu.com> wrote: >> If you want to connect to a server where the transaction is read-only, then shouldn't the connection parameter be something like >>"target_session_attrs=readonly"? That represents exactly what the code does. >FWIW I find this to be a reasonable compromise. To keep the analogy >with the current patch it would be more something like "target_session_attrs=read_write|any". I have taken this suggestion now renamed target_server_type to target_session_attrs with possible 2 values "read-write", "any". May be we could expand to "readonly" and "prefer-readonly" in next patch proposal. Attaching the patch for same. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com failover-to-new-writable-session-05.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Gather Merge
On Wed, Nov 16, 2016 at 3:10 PM, Rushabh Lathia wrote: > > > On Mon, Nov 14, 2016 at 3:51 PM, Thomas Munro < > thomas.mu...@enterprisedb.com> wrote: > >> On Sat, Nov 12, 2016 at 1:56 AM, Rushabh Lathia >> wrote: >> > On Fri, Nov 4, 2016 at 8:30 AM, Thomas Munro < >> thomas.mu...@enterprisedb.com> >> > wrote: >> >> + * Portions Copyright (c) 1996-2015, PostgreSQL Global Development >> Group >> >> + * Portions Copyright (c) 1994, Regents of the University of >> California >> >> >> >> Shouldn't this say just "(c) 2016, PostgreSQL Global Development >> >> Group"? >> > >> > Fixed. >> >> The year also needs updating to 2016 in nodeGatherMerge.h. >> > > Oops sorry, fixed now. > > >> >> >> + /* Per-tuple heap maintenance cost */ >> >> + run_cost += path->path.rows * comparison_cost * 2.0 * logN; >> >> >> >> Why multiply by two? The comment above this code says "about log2(N) >> >> comparisons to delete the top heap entry and another log2(N) >> >> comparisons to insert its successor". In fact gather_merge_getnext >> >> calls binaryheap_replace_first, which replaces the top element without >> >> any comparisons at all and then performs a sift-down in log2(N) >> >> comparisons to find its new position. There is no per-tuple "delete" >> >> involved. We "replace" the top element with the value it already had, >> >> just to trigger the sift-down, because we know that our comparator >> >> function might have a new opinion of the sort order of this element. >> >> Very clever! The comment and the 2.0 factor in cost_gather_merge seem >> >> to be wrong though -- or am I misreading the code? >> >> >> > See cost_merge_append. >> >> That just got tweaked in commit 34ca0905. >> > > Fixed. > > >> >> > Looking at the plan I realize that this is happening because wrong >> costing >> > for Gather Merge. Here in the plan we can see the row estimated by >> > Gather Merge is wrong. This is because earlier patch GM was considering >> > rows = subpath->rows, which is not true as subpath is partial path. So >> > we need to multiple it with number of worker. Attached patch also fixed >> > this issues. I also run the TPC-H benchmark with the patch and results >> > are same as earlier. >> >> In create_grouping_paths: >> + double total_groups = gmpath->rows * >> gmpath->parallel_workers; >> >> This hides a variable of the same name in the enclosing scope. Maybe >> confusing? >> >> In some other places like create_ordered_paths: >> + double total_groups = path->rows * path->parallel_workers; >> >> Though it probably made sense to use this variable name in >> create_grouping_paths, wouldn't total_rows be better here? >> >> > Initially I just copied from the other places. I agree with you that > create_ordered_paths - total_rows make more sense. > > >> It feels weird to be working back to a total row count estimate from >> the partial one by simply multiplying by path->parallel_workers. >> Gather Merge will underestimate the total rows when parallel_workers < >> 4, if using partial row estimates ultimately from cost_seqscan which >> assume some leader contribution. I don't have a better idea though. >> Reversing cost_seqscan's logic certainly doesn't seem right. I don't >> know how to make them agree on the leader's contribution AND give >> principled answers, since there seems to be some kind of cyclic >> dependency in the costing logic (cost_seqscan really needs to be given >> a leader contribution estimate from its superpath which knows whether >> it will allow the leader to pull tuples greedily/fairly or not, but >> that superpath hasn't been created yet; cost_gather_merge needs the >> row count from its subpath). Or maybe I'm just confused. >> >> > Yes, I agree with you. But we can't really do changes into cost_seqscan. > Another option I can think of is just calculate the rows for gather merge, > by using the reverse formula which been used into cost_seqscan. So > we can completely remote the rows argument from the > create_gather_merge_path() > and then inside create_gather_merge_path() - calculate the total_rows > using same formula which been used into cost_seqscan. This is working > fine - but not quite sure about the approach. So I attached that part of > changes > as separate patch. Any suggestions? > With offline discussion with Thomas, I realized that this won't work. It will work only if the subplan is seqscan - so this logic won't be enough to estimate the rows. I guess as Thomas told earlier, this is not problem with GatherMerge implementation as such - so we will keep this as separate. Apart from this my colleague Neha Sharma, reported the server crash with the patch. It was hitting below Assert into create_gather_merge_path(). Assert(pathkeys); Basically when query is something like "select * from foo where a = 1 order by a;" here query has sortclause but planner won't generate sort key because where equality clause is on same column. Fix is about making sure of pathkeys
Re: [HACKERS] Declarative partitioning - another take
On 2016/11/23 4:50, Robert Haas wrote: > On Tue, Nov 22, 2016 at 4:15 AM, Amit Langote > wrote: >>> The easiest thing to do might be to just enforce that all of the >>> partition key columns have to be not-null when the range-partitioned >>> table is defined, and reject any attempt to DROP NOT NULL on them >>> later. That's probably better that shoehorning it into the table >>> constraint. >> >> Agreed that forcing range partitioning columns to be NOT NULL during table >> creation would be a better approach. But then we would have to reject >> using expressions in the range partition key, right? > > Why? I was thinking of it like how primary key columns cannot contain expressions; the reason for which, I assume, is because while we can ensure that a column contains only non-null values by defining a constraint on the column, there is no way to force expressions to be non-null. In any case, I have implemented in the latest patch that when creating a range partitioned table, its partition key columns are automatically set to be NOT NULL. Although, it leaves out the columns that are referenced in expressions. So even after doing so, we need to check after computing the range partition key of a input row, that none of the partitions keys is null, because expressions can still return null. Also, it does nothing to help the undesirable situation that one can insert a row with a null partition key (expression) into any of the range partitions if targeted directly, because of how ExecQual() handles nullable constraint expressions (treats null value as satisfying the partition constraint). An alternative possibly worth considering might be to somehow handle the null range partition keys within the logic to compare against range bound datums. It looks like other databases will map the rows containing nulls to the unbounded partition. One database allows specifying NULLS FIRST/LAST and maps a row containing null key to the partition with -infinity as the lower bound or +infinity as the upper bound, respectively with NULLS LAST the default behavior. In our case, if we allowed a similar way of defining a range partitioned table: create table p (a int, b int) partition by range nulls first (a); create table p0 partition of p for values from (unbounded) to (1); create table p1 partition of p for values from (1) to (10); create table p2 partition of p for values from (10) to (unbounded); Row (null, 1) will be mapped to p0. If we didn't have p0, we would report the "partition not found" error. In case of a multi-column key: create table p (a int, b int) partition by range (a, b); create table p0 partition of p for values from (1, unbounded) to (1, 1); create table p1 partition of p for values from (1, 1) to (1, 10); create table p2 partition of p for values from (1, 10) to (1, unbounded); Row (1, null) will be mapped to p2 (default nulls last behavior). But I guess we still end up without a solution for the problem that a row with null partition key (expression) could be inserted into any of the range partitions if targeted directly. Thoughts? >> As a result, one change became necessary: to how we flag individual range >> bound datum as infinite or not. Previously, it was a regular Boolean >> value (either infinite or not) and to distinguish +infinity from >> -infinity, we looked at whether the bound is lower or upper (the lower >> flag). Now, instead, the variable holding the status of individual range >> bound datum is set to a ternary value: RANGE_DATUM_FINITE (0), >> RANGE_DATUM_NEG_INF (1), and RANGE_DATUM_POS_INF (2), which still fits in >> a bool. > > You better not be using a bool to represent a ternary value! Use an > enum for that -- or if in the system catalogs, a char. OK, created an enum called RangeDatumContent. In the system catalog, we still store the boolean value; it is only after we read it into the relcache structure that we use one of these enum values. I'm worried though that using enum would consume more memory (we need to store nparts * partnattrs instances of the enum). Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DISTINCT with btree skip scan
On 23 November 2016 at 21:19, Thomas Munro wrote: > Worth pursuing? Does amskip suck? Does anyone have better ideas, > either for how to do the low level skip or the higher level Index Skip > Scan, or perhaps a completely different way of looking at this? I have no helpful suggestions with how to achieve it, but can I add a voice of encouragement: there have been a good few occasions in the past year (we moved from another db to PG) where the lack of skip scans has bitten us; in that case it was using MIN() and GROUP BY, where the grouping column was the first element in the compound index and the aggregate column was the second: in the Other DB that sort of query was extremely quick, not so much here. I was also idly pondering (in one of those moments of conceited self-delusion where I thought I might actually have enough spare time to try to work on it myself) whether it would be possible to implement that sort of skip with two indexes (so MIN(a) GROUP BY b with separate indexes on (a) and (b) rather than a single index (a,b)); I never got much further than idle musings though. Geoff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Functions Immutable but not parallel safe?
There's 11 functions which are marked immutable, but are marked as parallel unsafe. postgres=# select proname from pg_proc where provolatile = 'i' and proparallel = 'u'; proname - _pg_expandarray _pg_keysequal _pg_truetypid _pg_truetypmod _pg_char_max_length _pg_char_octet_length _pg_numeric_precision _pg_numeric_precision_radix _pg_numeric_scale _pg_datetime_precision _pg_interval_type (11 rows) I'm finding hard to imagine a reason why these might be unsafe, but failed. I do notice they're all only used in information_schema. Could it just perhaps be that these just missed the verification process the other functions went through to determine their parallel safety? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] PassDownLimitBound for ForeignScan/CustomScan [take-2]
On Mon, Nov 21, 2016 at 1:59 PM, Kouhei Kaigai wrote: > Hello, > > The attached patch is a revised version of pass-down LIMIT to FDW/CSP. > > Below is the updates from the last version. > > 'ps_numTuples' of PlanState was declared as uint64, instead of long > to avoid problems on 32bits machine when a large LIMIT clause is > supplied. > > 'ps_numTuples' is re-interpreted; 0 means that its upper node wants > to fetch all the tuples. It allows to eliminate a boring initialization > on ExecInit handler for each executor node. > > Even though it was not suggested, estimate_path_cost_size() of postgres_fdw > adjusts number of rows if foreign-path is located on top-level of > the base-relations and LIMIT clause takes a constant value. > It will make more adequate plan as follows: > > * WITHOUT this patch > > postgres=# explain verbose select * from t_a, t_b where t_a.id = t_b.id > and t_a.x < t_b.x LIMIT 100; >QUERY PLAN > > > Limit (cost=261.17..274.43 rows=100 width=88) >Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y >-> Hash Join (cost=261.17..581.50 rows=2416 width=88) > Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y > Hash Cond: (t_a.id = t_b.id) > Join Filter: (t_a.x < t_b.x) > -> Foreign Scan on public.t_a (cost=100.00..146.12 rows=1204 > width=44) >Output: t_a.id, t_a.x, t_a.y >Remote SQL: SELECT id, x, y FROM public.t > -> Hash (cost=146.12..146.12 rows=1204 width=44) >Output: t_b.id, t_b.x, t_b.y >-> Foreign Scan on public.t_b (cost=100.00..146.12 > rows=1204 width=44) > Output: t_b.id, t_b.x, t_b.y > Remote SQL: SELECT id, x, y FROM public.t > (14 rows) > > * WITH this patch > - > postgres=# explain verbose select * from t_a, t_b where t_a.id = t_b.id > and t_a.x < t_b.x LIMIT 100; > > QUERY PLAN > > -- > Limit (cost=100.00..146.58 rows=100 width=88) >Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y >-> Foreign Scan (cost=100.00..146.58 rows=100 width=88) > Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y > Relations: (public.t_a) INNER JOIN (public.t_b) > Remote SQL: SELECT r1.id, r1.x, r1.y, r2.id, r2.x, r2.y FROM > (public.t r1 INNER JOIN public.t r2 ON (((r1.x < r2.x)) AND ((r1.id = > r2.id > (6 rows) > > > That's nice. > On the other hands, I noticed it is not safe to attach LIMIT clause at > the planner stage because root->limit_tuples is declared as double. > Even if LIMIT clause takes a constant value, it is potentially larger > than 2^53 which is the limitation we can represent accurately with > float64 data type but LIMIT clause allows up to 2^63-1. > So, postgres_fdw now attaches LIMIT clause on the remote query on > execution time only. > I think, it's OK. Here are few comments on latest patch: 1. make/make check is fine, however I am getting regression failure in postgres_fdw contrib module (attached regression.diff). Please investigate and fix. 2. + * + * MEMO: root->limit_tuples is not attached when query contains + * grouping-clause or aggregate functions. So, we don's adjust + * rows even if LIMIT is supplied. Can you please explain why you are not doing this for grouping-clause or aggregate functions. 3. Typo: don's => don't Rest of the changes look good to me. Thanks > Thanks, > > PG-Strom Project / NEC OSS Promotion Center > KaiGai Kohei > > > > -Original Message- > > From: Robert Haas [mailto:robertmh...@gmail.com] > > Sent: Thursday, November 10, 2016 3:08 AM > > To: Kaigai Kouhei(海外 浩平) > > Cc: pgsql-hackers@postgresql.org; Jeevan Chalke > > ; Etsuro Fujita > > ; Andres Freund > > Subject: ##freemail## Re: PassDownLimitBound for ForeignScan/CustomScan > > [take-2] > > > > On Mon, Oct 31, 2016 at 10:20 AM, Kouhei Kaigai > > wrote: > > > As an example, I enhanced postgres_fdw to understand the ps_numTuples > > > if it is set. If and when remote ORDER BY is pushed down, the latest > > > code tries to sort the entire remote table because it does not know > > > how many rows to be returned. Thus, it took larger execution time. > > > On the other hands, the patched one runs the remote query with LIMIT > > > clause according to the ps_numTuples; which is informed by the Limit > > > node on top of the ForeignScan node. > > > > So there are two cases here. If the user says LIMIT 12, we could in > theory > > know that at planner time and optimize accordingly. If the user says > LIMIT > > twelve(), however, we will need to wait until execution time unless > twelve() > > happens to be capable of being simplified to a
Re: [HACKERS] Push down more full joins in postgres_fdw
On 2016/11/24 18:20, Ashutosh Bapat wrote: I wrote: You missed the point; the foreignrel->reltarget->exprs doesn't contain any PHVs, so the tlist created by build_tlist_to_depase will be guaranteed to be one-to-one with the foreignrel->reltarget->exprs. You wrote: It's guaranteed now, but can not be forever. This means anybody who supports PHV tomorrow, will have to "remember" to update this code as well. If s/he misses it, a bug will be introduced. That's the avenue for bug, I am talking about. I wrote: It *can* be guaranteed. See another patch for supporting evaluating PHVs remotely. We can't base assumptions in this patch on something in the other patch, esp. when that's not even reviewed once. PHV is just one case, subqueries involving aggregates is other and there are others. Let's discuss this together with the patch for PHVs. That was one of the reasons why I had merged the two patches into one. I'd like to leave enhancements such as subqueries involving aggregates for future work, though. But that's not really the point. The point is add_to_flat_tlist(pull_var_clause(rel->reltarget->exprs) can not be proved to be same as rel->reltarget->exprs in general. Really? As mentioned in comments for RelOptInfo in relation.h shown below, the rel->reltarget->exprs for a base/join relation only contains Vars and PHVs, so the two things would be proved to be one-to-one. (Note: we don't need to care about the appendrel-child-relation case because appendrel child relations wouldn't appear in the main join tree.) * reltarget - Default Path output tlist for this rel; normally contains * Var and PlaceHolderVar nodes for the values we need to * output from this relation. * List is in no particular order, but all rels of an * appendrel set must use corresponding orders. * NOTE: in an appendrel child relation, may contain * arbitrary expressions pulled up from a subquery! Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Push down more full joins in postgres_fdw
> > build_tlist_to_depase() calls pull_var_nodes() before creating the tlist, whereas the code that searches does not do that. Code-wise those are not the same things. > > >>> You missed the point; the foreignrel->reltarget->exprs doesn't contain >>> any >>> PHVs, so the tlist created by build_tlist_to_depase will be guaranteed to >>> be >>> one-to-one with the foreignrel->reltarget->exprs. > > >> It's guaranteed now, but can not be forever. This means anybody who >> supports PHV tomorrow, will have to "remember" to update this code as >> well. If s/he misses it, a bug will be introduced. That's the avenue >> for bug, I am talking about. > > > It *can* be guaranteed. See another patch for supporting evaluating PHVs > remotely. We can't base assumptions in this patch on something in the other patch, esp. when that's not even reviewed once. PHV is just one case, subqueries involving aggregates is other and there are others. But that's not really the point. The point is add_to_flat_tlist(pull_var_clause(rel->reltarget->exprs) can not be proved to be same as rel->reltarget->exprs in general. So, we should base our code on an assumption that can not be proved. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Physical append-only tables
On Thu, Nov 24, 2016 at 2:26 AM, Bruce Momjian wrote: > On Mon, Nov 14, 2016 at 08:43:12PM +, Greg Stark wrote: > > That said, I don't think the "maintain clustering a bit better using > > BRIN" is a bad idea. It's just the bit about turning a table > > append-only to deal with update-once data that I think is overreach. > > What if we used BRIN to find heap pages where the new row was in the > page BRIN min/max range, and the heap page had free space. Only if that > fails do we put is somewhere else in the heap. > That would certainly be useful. You'd have to figure out what to do in the case of multiple conflicting BRIN indexes (which you shouldn't have in the first place, but that won't keep people from having them), but other than that it would be quite good I think. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Push down more full joins in postgres_fdw
On 2016/11/24 17:39, Ashutosh Bapat wrote: On Thu, Nov 24, 2016 at 1:27 PM, Etsuro Fujita wrote: On 2016/11/24 16:46, Ashutosh Bapat wrote: table will be misleading as subquery can represent a join and corresponding alias would represent the join. Relation is better term. But the documentation uses the word "table" for a join relation. See 7.2.1.2. Table and Column Aliases: https://www.postgresql.org/docs/9.6/static/queries-table-expressions.html#QUERIES-TABLE-ALIASES The definition there says "A temporary name can be given to tables and complex table references to be used for references to the derived table in the rest of the query. This is called a table alias.", please note the usage "complex table references". Within the code, we do not refer to a relation as table. So, the comment in this code should not refer to a relation as table either. OK, will keep using the term "relation". I wrote: I don't think so; in the current version, we essentially deparse from and search into the same object, ie, foreignrel->reltarget->exprs, since the tlist created by build_tlist_to_deparse is build from the foreignrel->reltarget->exprs. Also, since it is just created for deparsing the relation as a subquery in deparseRangeTblRef and isn't stored into fpinfo or anywhere alse, we would need no concern about creating such avenues. IMO I think adding comments would be enough for this. build_tlist_to_depase() calls pull_var_nodes() before creating the tlist, whereas the code that searches does not do that. Code-wise those are not the same things. You missed the point; the foreignrel->reltarget->exprs doesn't contain any PHVs, so the tlist created by build_tlist_to_depase will be guaranteed to be one-to-one with the foreignrel->reltarget->exprs. It's guaranteed now, but can not be forever. This means anybody who supports PHV tomorrow, will have to "remember" to update this code as well. If s/he misses it, a bug will be introduced. That's the avenue for bug, I am talking about. It *can* be guaranteed. See another patch for supporting evaluating PHVs remotely. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Push down more full joins in postgres_fdw
On Thu, Nov 24, 2016 at 1:27 PM, Etsuro Fujita wrote: > On 2016/11/24 16:46, Ashutosh Bapat wrote: Sorry. I think the current version is better than previous one. The term "subselect alias" is confusing in the previous version. In the current version, "Get the relation and column alias for a given Var node," we need to add word "identifiers" like "Get the relation and column identifiers for a given Var node". > > > I wrote: >>> >>> OK, but one thing I'm concerned about is the term "relation alias" seems >>> a >>> bit confusing because we already used the term for the alias of the form >>> "rN". To avoid that, how about saying "table alias", not "relation >>> alias"? >>> (in which case, the comment would be something like "Get the table and >>> column identifiers for a given Var node".) > > >> table will be misleading as subquery can represent a join and >> corresponding alias would represent the join. Relation is better term. > > > But the documentation uses the word "table" for a join relation. See > 7.2.1.2. Table and Column Aliases: > https://www.postgresql.org/docs/9.6/static/queries-table-expressions.html#QUERIES-TABLE-ALIASES The definition there says "A temporary name can be given to tables and complex table references to be used for references to the derived table in the rest of the query. This is called a table alias.", please note the usage "complex table references". Within the code, we do not refer to a relation as table. So, the comment in this code should not refer to a relation as table either. > >>> I don't think so; in the current version, we essentially deparse from and >>> search into the same object, ie, foreignrel->reltarget->exprs, since the >>> tlist created by build_tlist_to_deparse is build from the >>> foreignrel->reltarget->exprs. Also, since it is just created for >>> deparsing >>> the relation as a subquery in deparseRangeTblRef and isn't stored into >>> fpinfo or anywhere alse, we would need no concern about creating such >>> avenues. IMO I think adding comments would be enough for this. > > >> build_tlist_to_depase() calls pull_var_nodes() before creating the >> tlist, whereas the code that searches does not do that. Code-wise >> those are not the same things. > > > You missed the point; the foreignrel->reltarget->exprs doesn't contain any > PHVs, so the tlist created by build_tlist_to_depase will be guaranteed to be > one-to-one with the foreignrel->reltarget->exprs. It's guaranteed now, but can not be forever. This means anybody who supports PHV tomorrow, will have to "remember" to update this code as well. If s/he misses it, a bug will be introduced. That's the avenue for bug, I am talking about. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Push down more full joins in postgres_fdw
On 2016/11/24 16:46, Ashutosh Bapat wrote: Sorry. I think the current version is better than previous one. The term "subselect alias" is confusing in the previous version. In the current version, "Get the relation and column alias for a given Var node," we need to add word "identifiers" like "Get the relation and column identifiers for a given Var node". I wrote: OK, but one thing I'm concerned about is the term "relation alias" seems a bit confusing because we already used the term for the alias of the form "rN". To avoid that, how about saying "table alias", not "relation alias"? (in which case, the comment would be something like "Get the table and column identifiers for a given Var node".) table will be misleading as subquery can represent a join and corresponding alias would represent the join. Relation is better term. But the documentation uses the word "table" for a join relation. See 7.2.1.2. Table and Column Aliases: https://www.postgresql.org/docs/9.6/static/queries-table-expressions.html#QUERIES-TABLE-ALIASES I don't think so; in the current version, we essentially deparse from and search into the same object, ie, foreignrel->reltarget->exprs, since the tlist created by build_tlist_to_deparse is build from the foreignrel->reltarget->exprs. Also, since it is just created for deparsing the relation as a subquery in deparseRangeTblRef and isn't stored into fpinfo or anywhere alse, we would need no concern about creating such avenues. IMO I think adding comments would be enough for this. build_tlist_to_depase() calls pull_var_nodes() before creating the tlist, whereas the code that searches does not do that. Code-wise those are not the same things. You missed the point; the foreignrel->reltarget->exprs doesn't contain any PHVs, so the tlist created by build_tlist_to_depase will be guaranteed to be one-to-one with the foreignrel->reltarget->exprs. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers