Re: [HACKERS] INSERT ON CONFLICT and partitioned tables
On 2017/12/01 11:01, Amit Langote wrote: > On 2017/12/01 1:02, Robert Haas wrote: >> Second, this would be the first place where the second argument to >> ExecOpenIndices() is passed simply as true. The only other caller >> that doesn't pass constant false is in nodeModifyTable.c and looks >> like this: >> >> ExecOpenIndices(resultRelInfo, mtstate->mt_onconflict != ONCONFLICT_NONE); >> >> The intention here appears to be to avoid the overhead of doing >> BuildSpeculativeIndexInfo() when it isn't needed. I think we should >> try to do the same thing here. As written, the patch will do that >> work even when the query has no ON CONFLICT clause, which doesn't seem >> like a good idea. > > Agreed. One thing though is that ExecSetupPartitionTupleRouting doesn't > receive the mtstate today. I've a patch pending that will re-design that > interface (for another project) that I will post in the near future, but > meanwhile let's just add a new parameter mtstate so that this patch can > move forward with its business. The future patch I'm talking about will > keep the parameter intact, so it should be OK now to add the same. > > Attached two patches: > > 0001: refactoring to add the mtstate parameter > 0002: the original patch updated to address the above comment I forgot to consider the fact that mtstate could be NULL in ExecSetupPartitionTupleRouting(), so would result in dereferencing NULL pointer when called from CopyFrom(), which fixed in the attached updated patch. Thanks, Amit From 986676c8614d2d5954ea5d505cf729dc6dc487bd Mon Sep 17 00:00:00 2001 From: amitDate: Fri, 1 Dec 2017 10:44:37 +0900 Subject: [PATCH 1/2] Pass mtstate to ExecSetupPartitionTupleRouting --- src/backend/commands/copy.c| 3 ++- src/backend/executor/execPartition.c | 3 ++- src/backend/executor/nodeModifyTable.c | 3 ++- src/include/executor/execPartition.h | 3 ++- 4 files changed, 8 insertions(+), 4 deletions(-) diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 13eb9e34ba..bace390470 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -2478,7 +2478,8 @@ CopyFrom(CopyState cstate) int num_parted, num_partitions; - ExecSetupPartitionTupleRouting(cstate->rel, + ExecSetupPartitionTupleRouting(NULL, + cstate->rel, 1, estate, _dispatch_info, diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c index 59a0ca4597..f6108a156b 100644 --- a/src/backend/executor/execPartition.c +++ b/src/backend/executor/execPartition.c @@ -63,7 +63,8 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel, * RowExclusiveLock mode upon return from this function. */ void -ExecSetupPartitionTupleRouting(Relation rel, +ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, + Relation rel, Index resultRTindex, EState *estate, PartitionDispatch **pd, diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c index 1e3ece9b34..afb83ed3ae 100644 --- a/src/backend/executor/nodeModifyTable.c +++ b/src/backend/executor/nodeModifyTable.c @@ -1953,7 +1953,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags) int num_parted, num_partitions; - ExecSetupPartitionTupleRouting(rel, + ExecSetupPartitionTupleRouting(mtstate, + rel, node->nominalRelation, estate, _dispatch_info, diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h index 43ca9908aa..86a199d169 100644 --- a/src/include/executor/execPartition.h +++ b/src/include/executor/execPartition.h @@ -49,7 +49,8 @@ typedef struct PartitionDispatchData typedef struct PartitionDispatchData *PartitionDispatch; -extern void ExecSetupPartitionTupleRouting(Relation rel, +extern void ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, + Relation rel,
Re: Commit fest 2017-11
Hello Michaël, And the last 21 patches have been classified as well. Here is the final score for this time: Committed: 55. Moved to next CF: 103. Rejected: 1. Returned with Feedback: 47. Total: 206. Thanks to all the contributors for this session! The CF is now closed. Thanks for the CF management. Attached a small graph of the end status of patch at the end of each CF. There is a significant "moved to next CF" set of patches, which includes: - patches that were ready but did not get a committer, (about one per committer...) - patches that were still waiting for a review (a lot...) - possibly a few that were in "waiting on author" state, although they tend to be switched to "returned with feedback". There is a rolling chunck of 50% of patches which moves from CF to CF. Not enough review(er)s, obviously. Also, not enough committers: as a reviewer, if the patches I review do not get committed once ready, spending time on reviews becomes quite unattractive. -- Fabien.
Re: [HACKERS] pgbench - allow to store select results into variables
Hello Michaël, Here is a v14, after yet another rebase, and some comments added to answer your new comments. The last patch sent still applies, but its status of "ready for committer" does not look adapted as this version got no reviews. So I am switching back the patch as "needs review". Feel free to change if that's not adapted, Indeed, I think that is not needed, as the edit was beyond mundane (comments lightly edited, one const added after a wind of const got committed, patch context diff changes probably after a pgindent run that triggered the need of a rebase), which is why I did not updated the status for this version. So I put it back to "ready". The patch is moved as well to next CF. -- Fabien.
Re: Commit fest 2017-11
On Thu, Nov 30, 2017 at 1:51 PM, Michael Paquierwrote: > Remains 22 patches as of now, exactly *one* for each committer. And the last 21 patches have been classified as well. Here is the final score for this time: Committed: 55. Moved to next CF: 103. Rejected: 1. Returned with Feedback: 47. Total: 206. Thanks to all the contributors for this session! The CF is now closed. (I know that I am a couple of hours ahead, doing things now fits better with my schedule.) -- Michael
Re: [HACKERS] pg_basebackup --progress output for batch execution
On Tue, Nov 21, 2017 at 8:24 PM, Martín Marquéswrote: > Thank you very much for reviewing the patch and for your valuable > input (you made me read the Microsoft Visual C specs ;)) + if (!isatty(fileno(stderr))) + fprintf(stderr, "\n"); + else + fprintf(stderr, "\r"); Er, why is that not "\r\n"? I have moved this patch to next CF. Congrats for being the last one. -- Michael
Re: [HACKERS] [PATCH] Improve geometric types
On Thu, Nov 30, 2017 at 3:28 AM, Emre Hasegeliwrote: > It would at least be dump-and-restore hazard if we don't let them in. > The new version allows NaNs. > >> This gives a wrong result for NaN-containing objects. > > I removed the NaN aware comparisons from FP macros, and carefully > reviewed the places that needs to be NaN aware. > > I am sorry that it took so long for me to post the new versions. The > more I get into this the more problems I find. The new versions > include non-trivial changes. I would be glad if you can look into > them. I have moved this patch to next CF, with "needs review" as status as a new version has been posted. -- Michael
Re: [HACKERS] postgres_fdw super user checks
On Thu, Nov 30, 2017 at 12:15 PM, Ashutosh Bapatwrote: > On Wed, Nov 29, 2017 at 7:42 PM, Stephen Frost wrote: >> Ashutosh, >> >> * Ashutosh Bapat (ashutosh.ba...@enterprisedb.com) wrote: >>> On Wed, Nov 29, 2017 at 4:56 AM, Stephen Frost wrote: >>> > The "global rethink" being contemplated seems to be more about >>> > authentication forwarding than it is about this specific change. If >>> > there's some 'global rethink' which is actually applicable to this >>> > specific deviation from the usual "use the view's owner for privilege >>> > checks", then it's unclear to me what that is. >>> >>> Global rethink may constitute other authentication methods like >>> certificate based authentication. But I am not clear about global >>> rethink in the context of owner privileges problem being discussed >>> here. >> >> Right, I'm all for an independent discussion about how we can have >> same-cluster or cross-cluster trust relationships set up to make it >> easier for users in one cluster/database to access tables in another >> that they should be allowed to, but that's a different topic from this. >> >> In other words, I think we're agreeing here. :) > > Yes. I am moving this patch to next CF 2018-01. -- Michael
Re: Transform for pl/perl
On Tue, Nov 28, 2017 at 5:14 PM, Aleksander Alekseevwrote: > The new status of this patch is: Ready for Committer Patch moved to CF 2018-01. Perhaps a committer will look at it at some point. -- Michael
Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files
On Tue, Oct 3, 2017 at 1:20 AM, chenhjwrote: > I had filled the authors field of this patch in commitfest, and will rebase > this patch if needed. Thank you for your help! The documentation of the patch needs a rebase, so I am moving it to next CF with "waiting on author" as status. $ git diff master --check src/bin/pg_rewind/pg_rewind.c:292: trailing whitespace. +* There are whitespace complains. -- Michael
Re: [HACKERS] Fix performance degradation of contended LWLock on NUMA
On Tue, Nov 28, 2017 at 4:10 AM, Jesper Pedersenwrote: > On 11/27/2017 07:41 AM, Юрий Соколов wrote: >> Oh... there were stupid error in previos file. >> Attached fixed version. > > I can reconfirm my performance findings with this patch; system same as > up-thread. I have moved this patch to next CF, with "needs review" as status as a new patch has been sent after addressing Andres' comments. -- Michael
Re: [HACKERS] GUC for cleanup indexes threshold.
On Thu, Nov 30, 2017 at 12:06 AM, Masahiko Sawadawrote: > On Wed, Nov 29, 2017 at 11:05 PM, Simon Riggs wrote: >> Unless there is disagreement on the above, it seems we should apply >> Yura's patch (an edited version, perhaps). >> > > IIRC the patches that makes the cleanup scan skip has a problem > pointed by Peter[1], that is that we stash an XID when a btree page is > deleted, which is used to determine when it's finally safe to recycle > the page. Yura's patch doesn't have that problem? > > [1] > https://www.postgresql.org/message-id/CAH2-Wz%3D1%3Dt5fcGGfarQGcAWBqaCh%2BdLMjpYCYHpEyzK8Qg6OrQ%40mail.gmail.com The latest patch present on this thread does not apply (https://www.postgresql.org/message-id/c4a47caef28024ab7528946bed264...@postgrespro.ru). Please send a rebase. For now I am moving the patch to next CF, with "waiting on author". -- Michael
Re: Use of uninitialized variables in ExecFindPartition() for parent partition without leaves (HEAD only)
On 2017/12/01 11:48, Michael Paquier wrote: > On Thu, Nov 30, 2017 at 10:17 AM, Amit Langote >wrote: >> Oops, I messed up taking the diff and mistakenly added noise to the patch. > > Which is that bit: > - * BuildSlotPartitionKeyDescription > + * ExecBuildSlotPartitionKeyDescription Yep. >> Fixed in the attached. > > For information, it is easy enough to run into rather nasty behaviors > here. Taking for example the test case in the last patch, but without > the actual fix: > -- no partitions, so fail > insert into range_parted values ('a', 11); > ! ERROR: invalid memory alloc request size 2139062147 Yikes. > So the test case you are adding is a good thing to have, and I am fine > with the comment. Thanks. > I have added a CF entry for this thread by the way > (https://commitfest.postgresql.org/16/1392/), and marked the thing as > ready for committer as we agree about the fix. Let's track properly > this issue until it gets committed. Yeah, thanks. Regards, Amit
Re: [HACKERS] INSERT ON CONFLICT and partitioned tables
On 2017/12/01 11:27, Simon Riggs wrote: > On 24 November 2017 at 13:45, Amit Langote >wrote: > >>> Why? There is no caller that needs information. >> >> It is to be used if and when ExecInsert() calls >> ExecCheckIndexConstraints() in the code path to handle ON CONFLICT DO >> NOTHING that we're intending to support in some cases. Note that it will >> only check conflicts for the individual leaf partitions using whatever >> constraint-enforcing indexes they might have. > > So we should have 2 patches. One for now that does DO NOTHING and > another that adds the change that depends upon Alvaro's work. Yes, I'd think so. It won't be until the patch at [1] to add support to define UNIQUE indexes on partitioned tables is committed that we could support specifying conflict_target on partitioned tables. Even then, it would take at least some executor changes to actually make it work, but at least the planner won't complain that there are no indexes. If I try Alvaro's patch today and see what happens when conflict_target is specified, still get an error but now it's the executor: create table p (a int, b char) partition by list (a); create table p1 partition of p (b unique) for values in (1); -- on HEAD (before applying the patch on this thread) insert into p values (1) on conflict (a) do nothing; ERROR: ON CONFLICT clause is not supported with partitioned tables -- after applying the patch on this thread (no indexes yet) insert into p values (1) on conflict (a) do nothing; ERROR: there is no unique or exclusion constraint matching the ON CONFLICT specification -- after applying Alvaro's patch at [1] create unique index on p (a); -- but, the executor support is missing, so... insert into p values (1) on conflict (a) do nothing; ERROR: unexpected failure to find arbiter index I will report this on that thread and we can discuss the executor changes that would be need to make it work there. Thanks, Amit [1] https://commitfest.postgresql.org/16/1365/
Re: [HACKERS] Proposal: Local indexes for partitioned table
On 1 December 2017 at 01:02, Alvaro Herrerawrote: > we currently (I mean after my > patch) don't mark partitioned-table-level indexes as valid or not valid > depending on whether all its children exist, so trying to use that in > the planner without having a flag could cause invalid plans to be > generated (i.e. ones that would cause nonexistent indexes to be > referenced). So, then this patch is only really intended as a syntax shortcut for mass adding of indexes to each partition? I feel like we could do better here with little extra effort. The DETACH index feature does not really seem required for this patch. I think just ensuring a matching index exists on each leaf partition and creating any which don't exist before creating the index on the target partitioned table seems like the correct solution. That way we can make that index indisvalid flag have a valid meaning all the time. Some later patch can invent some way to replace a bloated index. Perhaps later we can invent some generic way to replace a physical leaf index for a given partitioned index perhaps with the same patch that might allow us to replace an index which is used by a constraint, which to me seems like a feature we should have had years ago. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Doc tweak for huge_pages?
On Fri, Dec 1, 2017 at 5:04 PM, Justin Pryzbywrote: > On Fri, Dec 01, 2017 at 04:01:24PM +1300, Thomas Munro wrote: >> Hi hackers, >> >> The manual implies that only Linux can use huge pages. That is not >> true: FreeBSD, Illumos and probably others support larger page sizes >> using transparent page coalescing algorithms. On my FreeBSD box >> procstat -v often shows PostgreSQL shared buffers in "S"-flagged >> memory. I think we should adjust the manual to make clear that it's >> the *explicit request for huge pages* that is supported only on Linux >> (and hopefully soon Windows). Am I being too pedantic? > > I suggest to remove "other" and include Linux in the enumeration, since it > also > supports "transparent" hugepages. Hmm. Yeah, it does, but apparently it's not so transparent. So if we mention that we'd better indicate in the same paragraph that you probably don't actually want to use it. How about the attached? -- Thomas Munro http://www.enterprisedb.com huge-pages-doc-tweak-v2.patch Description: Binary data
Re: Protect syscache from bloating with negative cache entries
Robert Haaswrites: > On Wed, Nov 29, 2017 at 11:17 PM, Tom Lane wrote: >> The thing that makes me uncomfortable about this is that we used to have a >> catcache size limitation mechanism, and ripped it out because it had too >> much overhead (see commit 8b9bc234a). I'm not sure how we can avoid that >> problem within a fresh implementation. > At the risk of beating a dead horse, I still think that the amount of > wall clock time that has elapsed since an entry was last accessed is > very relevant. While I don't object to that statement, I'm not sure how it helps us here. If we couldn't afford DLMoveToFront(), doing a gettimeofday() during each syscache access is surely right out. regards, tom lane
Re: [HACKERS] Additional logging for VACUUM and ANALYZE
On Wed, Nov 8, 2017 at 12:54 AM, Fabrízio de Royes Mellowrote: > Great. Changed status to ready for commiter. The patch still applies, but no committer seem to be interested in the topic, so moved to next CF. -- Michael
Re: [HACKERS] Supporting huge pages on Windows
On Fri, Nov 24, 2017 at 9:28 AM, Tsunakawa, Takayukiwrote: > From: Thomas Munro [mailto:thomas.mu...@enterprisedb.com] >> I hope Tsunakawa-san doesn't mind me posting another rebased version of >> his patch. The last version conflicted with the change from SGML to XML >> that just landed in commit 3c49c6fa. > > Thank you very much for keeping it fresh. I hope the committer could have > some time. I have moved this patch to next CF. Magnus, you are registered as a reviewer of this patch. Are you planning to look at it and potentially commit it? -- Michael
Doc tweak for huge_pages?
Hi hackers, The manual implies that only Linux can use huge pages. That is not true: FreeBSD, Illumos and probably others support larger page sizes using transparent page coalescing algorithms. On my FreeBSD box procstat -v often shows PostgreSQL shared buffers in "S"-flagged memory. I think we should adjust the manual to make clear that it's the *explicit request for huge pages* that is supported only on Linux (and hopefully soon Windows). Am I being too pedantic? -- Thomas Munro http://www.enterprisedb.com huge-pages-doc-tweak.patch Description: Binary data
Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump
On Wed, Nov 29, 2017 at 5:50 AM, Robert Haaswrote: > On Mon, Aug 7, 2017 at 11:14 AM, Fabrízio de Royes Mello > wrote: >> I also test against all current supported versions (9.2 ... 9.6) and didn't >> find any issue. >> >> Changed status to "ready for commiter". > > On a very fast read this patch looks OK to me, but I'm a bit concerned > about whether we have consensus for it. By my count the vote is 6-3 > in favor of proceeding. > > +1: Robins Tharakan, Stephen Frost, David Fetter, Fabrizio Mello, > Michael Paquier, Robert Haas > -1: David G. Johnston, Tom Lane, Simon Riggs > > I guess that's probably sufficient to go forward, but does anyone wish > to dispute my characterization of their vote or the vote tally in > general? Would anyone else like to cast a vote? As this is not settled yet, I have moved the patch to next CF. -- Michael
Re: [HACKERS] pgbench more operators & functions
On Wed, Oct 25, 2017 at 5:57 PM, Pavel Stehulewrote: > 2017-10-20 18:36 GMT+02:00 Fabien COELHO : > Here is a v13. No code changes, but TAP tests added to maintain pgbench > coverage to green. >> >> >> Here is a v14, which is just a rebase after the documentation xml-ization. > > all tests passed > no problems with doc building Moved to next CF. This is still parked as ready for committer. -- Michael
Re: [HACKERS] pgbench - allow to store select results into variables
On Sat, Nov 4, 2017 at 8:05 PM, Fabien COELHOwrote: >>> Here is a v13, which is just a rebase after the documentation >>> xml-ization. > > Here is a v14, after yet another rebase, and some comments added to answer > your new comments. The last patch sent still applies, but its status of "ready for committer" does not look adapted as this version got no reviews. So I am switching back the patch as "needs review". Feel free to change if that's not adapted, The patch is moved as well to next CF. -- Michael
Re: [HACKERS] pow support for pgbench
On Tue, Nov 7, 2017 at 1:34 AM, Fabien COELHOwrote: > Let's now hope that a committer gets around to consider these patch some > day. Which is not the case yet, so moved to CF 2018-01. Please note that the patch proposed does not apply anymore, so its status is changed to "waiting on author" for a rebase. -- Michael
Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.
On Wed, Nov 29, 2017 at 7:36 AM, Michael Paquierwrote: > On Wed, Nov 29, 2017 at 6:44 AM, Robert Haas wrote: >> On Tue, Nov 21, 2017 at 4:32 AM, Masahiko Sawada >> wrote: >>> Thank you for comments. Attached updated patch. >> >> I see that Michael has marked this Ready for Committer, but also that >> he didn't update the thread, so perhaps some interested committer >> (Fujii Masao?) might have missed the fact that Michael thinks it's >> ready to go. > > Sorry for not mentioning that directly on the thread. > >> Anyone interested in taking a look at this one for commit? > > I would assume that Fujii-san would be the chosen one here as he has > worked already on the thread. No replies yet. So I am moving this patch to next CF. -- Michael
Re: [HACKERS] Commits don't block for synchronous replication
On Fri, Nov 24, 2017 at 11:38 PM, Simon Riggswrote: > On 23 November 2017 at 11:11, Michael Paquier > wrote: > >> This is older than the bug report of this thread. All those >> indications point out that the patch has *not* been committed. So it >> seems to me that you perhaps committed it to your local repository, >> but forgot to push it to the remote. I am switching back the patch >> status to what looks correct to me "Ready for committer". Thanks. > > Yes, that looks like it's my mistake. Thanks for rechecking. > > Will commit and backpatch when I get home. Ping. Simon, are you planning to look at this patch, and potentially commit it? I am moving this item to next CF for now. -- Michael
Re: Use of uninitialized variables in ExecFindPartition() for parent partition without leaves (HEAD only)
On Thu, Nov 30, 2017 at 10:17 AM, Amit Langotewrote: > Oops, I messed up taking the diff and mistakenly added noise to the patch. Which is that bit: - * BuildSlotPartitionKeyDescription + * ExecBuildSlotPartitionKeyDescription > Fixed in the attached. For information, it is easy enough to run into rather nasty behaviors here. Taking for example the test case in the last patch, but without the actual fix: -- no partitions, so fail insert into range_parted values ('a', 11); ! ERROR: invalid memory alloc request size 2139062147 So the test case you are adding is a good thing to have, and I am fine with the comment. I have added a CF entry for this thread by the way (https://commitfest.postgresql.org/16/1392/), and marked the thing as ready for committer as we agree about the fix. Let's track properly this issue until it gets committed. -- Michael
Re: [HACKERS] INSERT ON CONFLICT and partitioned tables
On 24 November 2017 at 13:45, Amit Langotewrote: >> Why? There is no caller that needs information. > > It is to be used if and when ExecInsert() calls > ExecCheckIndexConstraints() in the code path to handle ON CONFLICT DO > NOTHING that we're intending to support in some cases. Note that it will > only check conflicts for the individual leaf partitions using whatever > constraint-enforcing indexes they might have. So we should have 2 patches. One for now that does DO NOTHING and another that adds the change that depends upon Alvaro's work. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256
On Wed, Nov 29, 2017 at 7:42 AM, Peter Eisentrautwrote: > On 11/28/17 17:33, Michael Paquier wrote: >> 1) Have a special value in the parameter saslchannelbinding proposed >> in patch 0001. For example by specifying "none" then no channel >> binding is used. > > I was thinking if it's empty then don't use channel binding. Right now, > empty means the same thing as tls-unique. In any case, some variant of > that should be fine. I don't think we need a separate server option > that this point. OK, here is a reworked version with the following changes: - renamed saslchannelbinding to scramchannelbinding, with a default set to tls-unique. - An empty value of scramchannelbinding allows client to not use channel binding, or in short use use SCRAM-SHA-256 and cbind-flag set to 'n'. While reviewing the code, I have found something a bit disturbing with the header definitions: the libpq frontend code includes scram.h, which references backend-side routines. So I think that the definition of the SCRAM mechanisms as well as the channel binding types should be moved to scram-common.h. This cleanup is included in 0001. -- Michael 0001-Move-SCRAM-related-name-definitions-to-scram-common..patch Description: Binary data 0002-Add-connection-parameter-scramchannelbinding.patch Description: Binary data 0003-Implement-channel-binding-tls-server-end-point-for-S.patch Description: Binary data
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
On Fri, Dec 1, 2017 at 3:04 AM, Robert Haaswrote: > On Thu, Nov 30, 2017 at 6:20 AM, Masahiko Sawada > wrote: >>> This code ignores the existence of multiple databases; RELEXTLOCK >>> contains a relid, but no database OID. That's easy enough to fix, but >>> it actually causes no problem unless, by bad luck, you have two >>> relations with the same OID in different databases that are both being >>> rapidly extended at the same time -- and even then, it's only a >>> performance problem, not a correctness problem. In fact, I wonder if >>> we shouldn't go further: instead of creating these RELEXTLOCK >>> structures dynamically, let's just have a fixed number of them, say >>> 1024. When we get a request to take a lock, hash and >>> take the result modulo 1024; lock the RELEXTLOCK at that offset in the >>> array. >> >> Attached the latest patch incorporated comments except for the fix of >> the memory size for relext lock. > > It doesn't do anything about the comment of mine quoted above. Sorry I'd missed the comment. > Since it's only possible to hold one relation extension lock at a time, we > don't really need the hash table here at all. We can just have an > array of 1024 or so locks and map every pair on to one of > them by hashing. The worst thing we'll get it some false contention, > but that doesn't seem awful, and it would permit considerable further > simplification of this code -- and maybe make it faster in the > process, because we'd no longer need the hash table, or the pin count, > or the extra LWLocks that protect the hash table. All we would have > is atomic operations manipulating the lock state, which seems like it > would be quite a lot faster and simpler. Agreed. With this change, we will have an array of the struct that has lock state and cv. The lock state has the wait count as well as the status of lock. > BTW, I think RelExtLockReleaseAll is broken because it shouldn't > HOLD_INTERRUPTS(); I also think it's kind of silly to loop here when > we know we can only hold one lock. Maybe RelExtLockRelease can take > bool force and do if (force) held_relextlock.nLocks = 0; else > held_relextlock.nLocks--. Or, better yet, have the caller adjust that > value and then only call RelExtLockRelease() if we needed to release > the lock in shared memory. That avoids needless branching. Agreed. I'd vote for the latter. > On a > related note, is there any point in having both held_relextlock.nLocks > and num_held_relextlocks? num_held_relextlocks is actually unnecessary, will be removed. > I think RelationExtensionLock should be a new type of IPC wait event, > rather than a whole new category. Hmm, I thought the wait event types of IPC seems related to events that communicates to other processes for the same purpose, for example parallel query, sync repli etc. On the other hand, the relation extension locks are one kind of the lock mechanism. That's way I added a new category. But maybe it can be fit to the IPC wait event. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] Refactoring identifier checks to consistently use strcmp
On Fri, Dec 1, 2017 at 4:14 AM, Robert Haaswrote: > I think the changes in DefineView and ATExecSetRelOptions is wrong, > because transformRelOptions() is still using pg_strcasecmp. With the > patch: > > rhaas=# create view v(x) with ("Check_option"="local") as select 1; > CREATE VIEW > rhaas=# create view v(x) with ("check_option"="local") as select 1; > ERROR: WITH CHECK OPTION is supported only on automatically updatable views > HINT: Views that do not select from a single table or view are not > automatically updatable. > > Oops. I am getting the feeling that there could be other issues than this one... If this patch ever gets integrated, I think that this should actually be shaped as two patches: - One patch with the set of DDL queries taking advantage of the current grammar with pg_strcasecmp. - A second patch that does the switch from pg_strcasecmp to strcmp, and allows checking which paths of patch 1 are getting changed. Patch 1 is the hard part to figure out all the possible patterns that could be changed. > I am actually pretty dubious that we want to do this. I found one bug > already (see above), and I don't think there's any guarantee that it's > the only one, because it's pretty hard to be sure that none of the > remaining calls to pg_strcasecmp are being applied to any of these > values in some other part of the code. I'm not sure that the backward > compatibility issue is a huge deal, but from my point of view this > carries a significant risk of introducing new bugs, might annoy users > who spell any of these keywords in all caps with surrounding quotation > marks, and really has no clear benefit that I can see. The > originally-offered justification is that making this consistent would > help us avoid subtle bugs, but it seems at least as likely to CREATE > subtle bugs, and the status quo is AFAICT harming nobody. Changing opinion here ;) Yes, I agree that the risk of bugs may not be worth the compatibility effort at the end. I still see value in this patch for long-term purposes by making the code more consistent though. -- Michael
Re: Combine function returning NULL unhandled?
On 24 November 2017 at 14:20, Andres Freundwrote: > Pushed a fix to the relevant branches, including tests of the > trans/combine functions returns NULL cases. Apologies for my silence here. I've been on leave and out of internet range for two weeks. Thank you for making the fix. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling
Alexey Kondratov wrote: > I have rebased my branch and squashed all commits into one, since the > patch is quite small. Everything seems to be working fine now, patch > passes all regression tests. On a *very* quick look, please use an enum to return from NextCopyFrom rather than 'int'. The chunks that change bool to int are very odd-looking. This would move the comment that explains the value from copy.c to copy.h, obviously. Also, you seem to be using non-ASCII dashes in the descriptions of those values; please don't. But those are really just minor nitpicks while paging through. After looking at the way you're handling errors, it seems that you've chosen to go precisely the way people were saying that was not admissible: use a PG_TRY block, and when things go south, simply log a WARNING and move on. This is unsafe. For example: what happens if the error being raised is out-of-memory? Failing to re-throw means you don't release current transaction memory context. What if the error is that WAL cannot be written because the WAL disk ran out of space? This would just be reported as a warning over and over until the server log disk is also full. What if your insertion fired a trigger that caused an update which got a serializability exception? You cannot just move on, because at that point session state (serializability session state) might be busted until you abort the current transaction. Or maybe I misunderstood the patch completely. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Commit fest 2017-11
On Fri, Dec 1, 2017 at 12:44 AM, Andreas Karlssonwrote: > On 11/30/2017 05:51 AM, Michael Paquier wrote:> One thing that could be > improved in my >> >> opinion is that patch authors should try more to move a patch to a >> following commit fest once the end gets close...This would leverage >> slightly the load of work for the CFM. > > Thanks for the suggestion. I had no idea that I could do that. As a patch author, you have the right to decide what you are yourself planning to do with your own baby :) When the commit fest comes close to the end, I always try to deal with my own patches first. If the discussion has stalled or that based on the feedback a given thing is going to require more efforts than I initially planned so it is not possible to spend more time on something in the close future, I mark my own entries as returned with feedback. If I am still planning to work on something in the close future, then I move them to the next CF. This makes less work for the CFM which has less patches to deal with at the end. Mentioning what you do on the patch thread with the so-said CF entry is also something worth doing in my opinion to keep a track of what you did. When multiple authors and/or reviewers are involved, it is sometimes good to post your intention before doing it some time ahead. -- Michael
Re: [HACKERS] Custom compression methods
Tomas Vondra wrote: > On 11/30/2017 04:20 PM, Ildus Kurbangaliev wrote: > > CREATE COMPRESSION METHOD ts1 FOR tsvector HANDLER > > tsvector_compression_handler; > > Understood. Good to know you've considered it, and I agree it doesn't > need to be there from the start (which makes the patch simpler). Just passing by, but wouldn't this fit in the ACCESS METHOD group of commands? So this could be simplified down to CREATE ACCESS METHOD ts1 TYPE COMPRESSION we have that for indexes and there are patches flying for heap storage, sequences, etc. I think that's simpler than trying to invent all new commands here. Then (in a future patch) you can use ALTER TYPE to define compression for that type, or even add a column-level option to reference a specific compression method. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: using index or check in ALTER TABLE SET NOT NULL
Thank you for review! My apologies for my errors. It seems i read developers wiki pages not enough carefully. I will reread wiki, code style and then update patch with all your remarks. > The comment /* T if we added new NOT NULL constraints */ should > probably be changed to /* T if we should recheck NOT NULL constraints > */ or similar. Correct. Hm, probably I will also rename this property to verify_new_notnull for clarity > I'd suggest registering this with the next CommitFest. Already done in https://commitfest.postgresql.org/16/1389/ , i noticed this step in wiki
Re: IndexTupleDSize macro seems redundant
On Thu, Nov 30, 2017 at 1:48 PM, Robert Haaswrote: > One difference between those two macros is that IndexTupleSize > forcibly casts the argument to IndexTuple, which means that you don't > get any type-checking when you use that one. I suggest that in > addition to removing IndexTupleDSize as proposed, we also remove that > cast. It seems that the only place which relies on that cast > currently is btree_xlog_split. > > Therefore I propose the attached patch instead. +1 to both points. I specifically recall being annoyed by both issues, more than once. -- Peter Geoghegan
Re: [HACKERS] <> join selectivity estimate question
On Fri, Dec 1, 2017 at 4:05 AM, Robert Haaswrote: > On Wed, Nov 29, 2017 at 11:55 PM, Thomas Munro > wrote: >> Thank you for the original pointer and the commit. Everything here >> seems to make intuitive sense and the accompanying throw-away tests >> that I posted above seem to produce sensible results except in some >> cases that we discussed, so I think this is progress. There is still >> something pretty funny about the cardinality estimates for TPCH Q21 >> which I haven't grokked though. I suspect it is crafted to look for a >> technique we don't know (an ancient challenge set by some long retired >> database gurus back in 1992 that their RDBMSs know how to solve, >> hopefully not in the manner of a certain car manufacturer's air >> pollution tests), but I haven't yet obtained enough round tuits to dig >> further. I will, though. > > Hmm, do you have an example of the better but still-funky estimates > handy? Like an EXPLAIN plan? Sure. Here's some EXPLAIN ANALYZE output from scale 3 TPCH + a few indexes[1]. There's a version from HEAD with and without commit 7ca25b7d. [1] https://github.com/macdice/pg_sisyphus/blob/master/cluster-recipes/make-tpch-cluster.sh -- Thomas Munro http://www.enterprisedb.com QUERY PLAN - Limit (cost=544020.68..544020.69 rows=1 width=34) (actual time=5070.235..5070.253 rows=100 loops=1) -> Sort (cost=544020.68..544020.69 rows=1 width=34) (actual time=5070.233..5070.241 rows=100 loops=1) Sort Key: (count(*)) DESC, supplier.s_name Sort Method: top-N heapsort Memory: 39kB -> GroupAggregate (cost=544020.65..544020.67 rows=1 width=34) (actual time=5061.289..5068.050 rows=1194 loops=1) Group Key: supplier.s_name -> Sort (cost=544020.65..544020.66 rows=1 width=26) (actual time=5061.275..5063.360 rows=11887 loops=1) Sort Key: supplier.s_name Sort Method: quicksort Memory: 1406kB -> Nested Loop (cost=1752.62..544020.64 rows=1 width=26) (actual time=3.134..4926.365 rows=11887 loops=1) -> Nested Loop Semi Join (cost=1752.19..544015.98 rows=1 width=34) (actual time=3.122..4598.302 rows=24138 loops=1) -> Gather (cost=1751.75..544010.27 rows=1 width=34) (actual time=2.909..4354.282 rows=40387 loops=1) Workers Planned: 2 Workers Launched: 2 -> Nested Loop Anti Join (cost=751.75..543010.17 rows=1 width=34) (actual time=4.282..4720.283 rows=13462 loops=3) -> Hash Join (cost=751.32..442412.33 rows=99981 width=34) (actual time=3.911..3554.800 rows=151103 loops=3) Hash Cond: (l1.l_suppkey = supplier.s_suppkey) -> Parallel Seq Scan on lineitem l1 (cost=0.00..431288.00 rows=2499520 width=8) (actual time=0.046..2742.912 rows=3792542 loops=3) Filter: (l_receiptdate > l_commitdate) Rows Removed by Filter: 2206328 -> Hash (cost=736.32..736.32 rows=1200 width=30) (actual time=3.690..3.690 rows=1194 loops=3) Buckets: 2048 Batches: 1 Memory Usage: 91kB -> Nested Loop (cost=25.59..736.32 rows=1200 width=30) (actual time=0.486..3.206 rows=1194 loops=3) -> Seq Scan on nation (cost=0.00..1.31 rows=1 width=4) (actual time=0.018..0.038 rows=1 loops=3) Filter: (n_name = 'ALGERIA'::bpchar) Rows Removed by Filter: 24 -> Bitmap Heap Scan on supplier (cost=25.59..723.00 rows=1200 width=34) (actual time=0.460..2.809 rows=1194 loops=3) Recheck Cond: (s_nationkey = nation.n_nationkey) Heap Blocks: exact=564 -> Bitmap Index Scan
Re: [HACKERS] Bug in to_timestamp().
On Thu, Nov 30, 2017 at 10:39:56AM -0500, Robert Haas wrote: > > So is it now the case that all of the regression test cases in this > patch produce the same results as they would on Oracle? > Yes, exactly. All new cases give the same result as in Oracle, except text of error messages. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [HACKERS] postgres_fdw bug in 9.6
On Wed, Nov 29, 2017 at 5:32 PM, Tom Lanewrote: > AFAICT, [1] just demonstrates that nobody had thought about the scenario > up to that point, not that the subsequently chosen solution was a good > one. But have a look at this: http://postgr.es/m/561e12d4.7040...@lab.ntt.co.jp That shows a case where, before 5fc4c26db5120bd90348b6ee3101fcddfdf54800, a query that required the foreign table to do an EPQ recheck produced an unambiguously wrong answer; the query stipulates that inttab.a = ft1.a but the row returned lacks that property. I think this clearly shows that EPQ rechecks are needed at least for FDW paths that are parameterized. However, I guess that doesn't actually refute your point, which IIUC is that maybe we don't need these checks for FDW join paths, because we don't build parameterized paths for those yet. Now I think it would still be a good idea to have a way of handling that case, because somebody's likely going want to fix that /* XXX Consider parameterized paths for the join relation */ eventually, but I guess for the back-branches just setting epq_path = NULL instead of calling GetExistingLocalJoinPath() might be the way to go... assuming that your contention that this won't break anything is indeed correct. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Custom compression methods
On 11/30/2017 04:20 PM, Ildus Kurbangaliev wrote: > On Thu, 30 Nov 2017 00:30:37 +0100 > Tomas Vondrawrote: > > ... > >> I can imagine other interesting use cases - for example values in >> JSONB columns often use the same "schema" (keys, nesting, ...), so >> can I imagine building a "dictionary" of JSON keys for the whole >> column ... >> >> Ildus, is this a use case you've been aiming for, or were you aiming >> to use the new API in a different way? > > Thank you for such good overview. I agree that pglz is pretty good as > general compression method, and there's no point to change it, at > least now. > > I see few useful use cases for compression methods, it's special > compression methods for int[], timestamp[] for time series and yes, > dictionaries for jsonb, for which I have even already created an > extension (https://github.com/postgrespro/jsonbd). It's working and > giving promising results. > I understand the reluctance to put everything into core, particularly for complex patches that evolve quickly. Also, not having to put everything into core is kinda why we have extensions. But perhaps some of the simpler cases would be good candidates for core, making it possible to test the feature? >> >> I wonder if the patch can be improved to handle this use case better. >> For example, it requires knowledge the actual data type, instead of >> treating it as opaque varlena / byte array. I see tsvector compression >> does that by checking typeid in the handler. >> >> But that fails for example with this example >> >> db=# create domain x as tsvector; >> CREATE DOMAIN >> db=# create table t (a x compressed ts1); >> ERROR: unexpected type 28198672 for tsvector compression handler >> >> which means it's a few brick shy to properly support domains. But I >> wonder if this should be instead specified in CREATE COMPRESSION >> METHOD instead. I mean, something like >> >> CREATE COMPRESSION METHOD ts1 HANDLER tsvector_compression_handler >> TYPE tsvector; >> >> When type is no specified, it applies to all varlena values. Otherwise >> only to that type. Also, why not to allow setting the compression as >> the default method for a data type, e.g. >> >> CREATE COMPRESSION METHOD ts1 HANDLER tsvector_compression_handler >> TYPE tsvector DEFAULT; >> >> would automatically add 'COMPRESSED ts1' to all tsvector columns in >> new CREATE TABLE commands. > > Initial version of the patch contains ALTER syntax that change > compression method for whole types, but I have decided to remove > that functionality for now because the patch is already quite complex > and it could be added later as separate patch. > > Syntax was: > ALTER TYPE SET COMPRESSION ; > > Specifying the supported type for the compression method is a good idea. > Maybe the following syntax would be better? > > CREATE COMPRESSION METHOD ts1 FOR tsvector HANDLER > tsvector_compression_handler; > Understood. Good to know you've considered it, and I agree it doesn't need to be there from the start (which makes the patch simpler). >> >> BTW do you expect the tsvector compression to be generally useful, or >> is it meant to be used only by the tests? If generally useful, >> perhaps it should be created in pg_compression by default. If only >> for tests, maybe it should be implemented in an extension in contrib >> (thus also serving as example how to implement new methods). >> >> I haven't thought about the JSONB use case very much, but I suppose >> that could be done using the configure/drop methods. I mean, >> allocating the dictionary somewhere (e.g. in a table created by an >> extension?). The configure method gets the Form_pg_attribute record, >> so that should be enough I guess. >> >> But the patch is not testing those two methods at all, which seems >> like something that needs to be addresses before commit. I don't >> expect a full-fledged JSONB compression extension, but something >> simple that actually exercises those methods in a meaningful way. > > I will move to tsvector_compression_handler to separate extension in > the next version. I added it more like as example, but also it could be > used to achieve a better compression for tsvectors. Tests on maillists > database ('archie' tables): > > usual compression: > > maillists=# select body_tsvector, subject_tsvector into t1 from > messages; SELECT 1114213 > maillists=# select pg_size_pretty(pg_total_relation_size('t1')); > pg_size_pretty > > 1637 MB > (1 row) > > tsvector_compression_handler: > maillists=# select pg_size_pretty(pg_total_relation_size('t2')); > pg_size_pretty > > 1521 MB > (1 row) > > lz4: > maillists=# select pg_size_pretty(pg_total_relation_size('t3')); > pg_size_pretty > > 1487 MB > (1 row) > > I don't stick to tsvector_compression_handler, I think if there > will some example that can use all the features then >
Re: [HACKERS] SQL procedures
On 11/29/17 14:17, Andrew Dunstan wrote: > On 11/28/2017 10:03 AM, Peter Eisentraut wrote: >> Here is a new patch that addresses the previous review comments. >> >> If there are no new comments, I think this might be ready to go. > > Looks good to me. Marking ready for committer. committed -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] WIP: Covering + unique indexes.
Hi, Peter! > 29 нояб. 2017 г., в 8:45, Peter Geogheganнаписал(а): > > It looks like amcheck needs to be patched -- a simple oversight. > amcheck is probably calling _bt_compare() without realizing that > internal pages don't have the extra attributes (just leaf pages, > although they should also not participate in comparisons in respect of > included/extra columns). There were changes to amcheck at one point in > the past. That must have slipped through again. I don't think it's > that complicated. > > BTW, it would probably be a good idea to use the new Github version's > "heapallindexed" verification [1] for testing this patch. Anastasia > will need to patch the externally maintained amcheck to do this, but > it's probably no extra work, because this is already needed for > contrib/amcheck, and because the heapallindexed check doesn't actually > care about index structure at all. Seems like it was not a big deal of patching, I've fixed those bits (see attachment). I've done only simple tests as for now, but I'm planning to do better testing before next CF. Thanks for mentioning "heapallindexed", I'll use it too. Best regards, Andrey Borodin. amcheck_include.diff Description: Binary data
Re: [HACKERS] Support to COMMENT ON DATABASE CURRENT_DATABASE
On Wed, Nov 29, 2017 at 11:35 PM, Michael Paquierwrote: > On Mon, Nov 27, 2017 at 11:41 AM, Jing Wang wrote: >> This is a patch for current_database working on ALTER ROLE/GRANT/REVOKE >> statements which should be applied after the previous patch >> "comment_on_current_database_no_pgdump_v4.4.patch". >> >> By using the patch the CURRENT_DATABASE can working in the following SQL >> statements: >> >> ALTER ROLE ... IN DATABASE CURRENT_DATABASE SET/RESET >> configuration_parameter >> GRANT ... ON DATABASE CURRENT_DATABASE TO role_specification ... >> REVOKE ... ON DATABASE CURRENT_DATABASE FROM ... > > Moved to next CF with same status, "needs review". Patch no longer applies. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] [PROPOSAL] Temporal query processing with range types
On Tue, Nov 21, 2017 at 4:36 AM, Peter Moserwrote: > Hi hackers, > we like to rethink our approach... > > For simplicity I'll drop ALIGN for the moment and focus solely on NORMALIZE: > > SELECT * FROM (R NORMALIZE S ON R.x = S.y WITH (R.time, S.time)) c; > > Our normalization executor node needs the following input (for now > expressed in plain SQL): > > SELECT R.*, p1 > FROM (SELECT *, row_id() OVER () rn FROM R) R > LEFT OUTER JOIN ( > SELECT y, LOWER(time) p1 FROM S > UNION > SELECT y, UPPER(time) p1 FROM S > ) S > ON R.x = S.y AND p1 <@ R.time > ORDER BY rn, p1; > > In other words: > 1) The left subquery adds an unique ID to each tuple (i.e., rn). > 2) The right subquery creates two results for each input tuple: one for >the upper and one for the lower bound of each input tuple's valid time >column. The boundaries get put into a single (scalar) column, namely p1. > 3) We join both subqueries if the normalization predicates hold (R.x = S.y) >and p1 is inside the time of the current outer tuple. > 4) Finally, we sort the result by the unique ID (rn) and p1, and give all >columns of the outer relation, rn and p1 back. So, if I understand correctly, there are three possible outcomes. If S.time and R.time are non-overlapping, then a null-extended row is produced with the original data from R. If S.time overlaps R.time but is not contained within it, then this produces one result row, not null-extended -- either the upper or lower bound will found to be contained within R.time, but the other will not. If S.time overlaps R.time but is not contained within it, then this produces 2 result rows, one with p1 containing S.time's lower bound and one with p1 containing S.time's upper bound. Is that right? Then, after all this happens, the temporal normalizer code runs over the results and uses the p1 values as split points for the ranges from R. I wonder if we could instead think about R NORMALIZE S ON R.x = S.y WITH (R.time, S.time) as an ordinary join on R.x = S.y with some extra processing afterwards. After finding all the join partners for a row in R, extract all the lower and upper bounds from the S.time fields of the join partners and use that to emit as many rows from R as may be necessary. The main problem that I see with that approach is that it seems desirable to separate the extra-processing-afterwards step into a separate executor node, and there's no way for a separate type of plan node to know when the join advances the outer side of the join. That's the purpose that row_id() is serving as you have it; we'd have to invent some other kind of signalling mechanism, which might not be entirely trivial. :-( If we could do it, though, it might be quite a bit more efficient, because it would avoid scanning S twice and performing a UNION on the results of those scans. Also, we wouldn't necessarily need to sort the whole set of rows from R, which I suspect is unavoidable in your implementation. We'd just need to sort the individual groups of rows from S, and my guess is that in many practical cases those groups are fairly small. I wonder what identities hold for NORMALIZE. It does not seem to be commutative, but it looks like it might be associative... i.e. (R NORMALIZE S) NORMALIZE T produces the same output as R NORMALIZE (S NORMALIZE T), perhaps? > Our first attempt to understand the new approach would be as follows: The > left base rel of the inner left-outer-join can be expressed as a WindowAgg > node. However, the right query of the join is much more difficult to build > (maybe through hash aggregates). Both queries could be put together with a > MergeJoin for instance. However, if we create the plan tree by hand and > choose algorithms for it manually, how is it possible to have it optimized > later? Or, if that is not possible, how do we choose the best algorithms > for it? You don't want to generate plan nodes directly like this, I think. Generally, you create paths, and the cheapest path wins at the end of planning. The thing that makes this tricky is that the temporal normalization phase can be separated from the inner left-outer-join by other joins, and the planner doesn't really have a good way of representing that concept. More broadly, the very idea of creating plan nodes suggests that you are approaching this from the point of view of sticking the logic in near the end of the planning process, which I think is not going to work. Whatever is going to happen here needs to happen at path generation time at the latest, or maybe even earlier, before the optimizer even starts processing the join tree. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] create_unique_path and GEQO
On Fri, Mar 24, 2017 at 9:50 AM, Tom Lanewrote: > Yeah. I think the code in mark_dummy_rel is newer and better-thought-out > than what's in create_unique_path. It probably makes sense to change over. I did a bit of archaeology here. create_unique_path() first appears in commit bdfbfde1b168b3332c4cdac34ac86a80aaf4d442 (vintage 2003), where it used GetMemoryChunkContext(rel). Commit f41803bb39bc2949db200116a609fd242d0ec221 (vintage 2007) changed it to use root->planner_cxt, but neither the comment changes in that patch nor the commit message give any hint as to the motivation for the change. The comments do mention that it should be kept in sync with best_inner_indexscan(), which was also switched to use root->planner_cxt by that commit, again without any real explanation as to why, and best_inner_indexscan() continued to use root->planner_cxt until its demise in commit e2fa76d80ba571d4de8992de6386536867250474 (vintage 2012). Meanwhile, mark_dummy_rel() didn't switch contexts at all until commit eca75a12a27d28b972fc269c1c8813cd8eb15441 (vintage 2011) at which point it began using GetMemoryChunkContext(rel). So, the coding that Ashutosh is proposing here is just changing things back to the way they were before 2007, but with the better comment that Tom wrote in 2011 for mark_dummy_rel(). Since the 2007 change lacks any justification and the new code has one, I'm inclined to agree with Tom that changing this make sense. If it turns out to be wrong, then mark_dummy_rel() also needs fixing, and the comments need to explain things better. My guess, though, is that it's right, because the rationale offered in mark_dummy_rel() seems to make a lot of sense. One reason why we might want to care about this, other than a laudable consistency, is that at one point Ashutosh had some patches to reduce the memory usage of partition-wise join that involved doing more incremental discarding of RelOptInfos akin to what GEQO does today. If we ever do anything like that, it will be good if we're consistent (and correct) about which contexts end up containing which data structures. All of which, I think, is a long-winded way of saying that I'm going to go commit this patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: ASCII Null control character validation
Alexey Chernyshovwrites: > I found in src/backend/utils/mb/wchar.c: pg_verify_mbstr_len() that it > reports ASCII Null character (\000) as invalid. As for me, it should > pass validation. This is intentional and we're not going to change it. There is too much code in the backend that relies on NUL-terminated strings ... > However, ASCII Null character breaks a line and the > end of the line is missed, try: > INSERT INTO mytable VALUES (E'a\001b\000c and rest of line MIA'); ... like that for instance. See (many) past discussions of this issue. regards, tom lane
Re: [HACKERS] postgres_fdw bug in 9.6
Etsuro Fujitawrites: > (2017/11/30 7:32), Tom Lane wrote: >> the output of the foreign join cannot change during EPQ, since the remote >> server already locked the rows before returning them. The only thing that >> can change is the output of the local scan on public.tab. Yes, we then >> need to re-verify that foo.a = tab.a ... but in this example, that's the >> responsibility of the NestLoop plan node, not the foreign join. > That's right, but is that true when the FDW uses late row locking? An FDW using late row locking would need to work harder, yes. But that's true at the scan level as well as the join level. We have already committed to using early locking in postgres_fdw, for the network-round-trip-cost reasons I mentioned before, and I can't see why we'd change that decision at the join level. Right now we've got the worst of both worlds, in that we're actually doing early row locking on the remote, but we're paying (some of) the code complexity costs required for late locking. regards, tom lane
Re: [HACKERS] UPDATE of partition key
While addressing Thomas's point about test scenarios not yet covered, I observed the following ... Suppose an UPDATE RLS policy with a WITH CHECK clause is defined on the target table. Now In ExecUpdate(), the corresponding WCO qual gets executed *before* the partition constraint check, as per existing behaviour. And the qual succeeds. And then because of partition-key updated, the row is moved to another partition. Here, suppose there is a BR INSERT trigger which modifies the row, and the resultant row actually would *not* pass the UPDATE RLS policy. But for this partition, since it is an INSERT, only INSERT RLS WCO quals are executed. So effectively, with a user-perspective, an RLS WITH CHECK policy that was defined to reject an updated row, is getting updated successfully. This is because the policy is not checked *after* a row trigger in the new partition is executed. Attached is a test case that reproduces this issue. I think, in case of row-movement, we should defer calling ExecWithCheckOptions() until the row is being inserted using ExecInsert(). And then in ExecInsert(), ExecWithCheckOptions() should be called using WCO_RLS_UPDATE_CHECK rather than WCO_RLS_INSERT_CHECK (I recall Amit Langote was of this opinion) as below : --- a/src/backend/executor/nodeModifyTable.c +++ b/src/backend/executor/nodeModifyTable.c @@ -510,7 +510,9 @@ ExecInsert(ModifyTableState *mtstate, * we are looking for at this point. */ if (resultRelInfo->ri_WithCheckOptions != NIL) - ExecWithCheckOptions(WCO_RLS_INSERT_CHECK, +ExecWithCheckOptions((mtstate->operation == CMD_UPDATE ? + WCO_RLS_UPDATE_CHECK : WCO_RLS_INSERT_CHECK), resultRelInfo, slot, estate); It can be argued that since in case of triggers we always execute INSERT row triggers for rows inserted as part of update-row-movement, we should be consistent and execute INSERT WCOs and not UPDATE WCOs for such rows. But note that, the row triggers we execute are defined on the leaf partitions. But the RLS policies being executed are defined for the target partitioned table, and not the leaf partition. Hence it makes sense to execute them as per the original operation on the target table. This is similar to why we execute UPDATE statement triggers even when the row is eventually inserted into another partition. This is because UPDATE statement trigger was defined for the target table, not the leaf partition. Barring any objections, I am going to send a revised patch that fixes the above issue as described. Thanks -Amit Khandekar wco_rls_issue.sql Description: Binary data
Re: [HACKERS] postgres_fdw bug in 9.6
(2017/11/30 7:32), Tom Lane wrote: AFAICT, [1] just demonstrates that nobody had thought about the scenario up to that point, not that the subsequently chosen solution was a good one. In that example, LockRows (cost=100.00..101.18 rows=4 width=70) Output: tab.a, tab.b, tab.ctid, foo.*, bar.* -> Nested Loop (cost=100.00..101.14 rows=4 width=70) Output: tab.a, tab.b, tab.ctid, foo.*, bar.* Join Filter: (foo.a = tab.a) -> Seq Scan on public.tab (cost=0.00..1.01 rows=1 width=14) Output: tab.a, tab.b, tab.ctid -> Foreign Scan (cost=100.00..100.08 rows=4 width=64) Output: foo.*, foo.a, bar.*, bar.a Relations: (public.foo) INNER JOIN (public.bar) Remote SQL: SELECT l.a1, l.a2, r.a1, r.a2 FROM (SELECT ROW(l.a9), l.a9 FROM (SELECT a a9 FROM public.foo FOR UPDATE) l) l (a1, a2) INNER JOIN (SELECT ROW(r.a9), r.a9 FROM (SELECT a a9 FROM public.bar FOR UPDATE) r) r (a1, a2) ON ((l.a2 = r.a2)) the output of the foreign join cannot change during EPQ, since the remote server already locked the rows before returning them. The only thing that can change is the output of the local scan on public.tab. Yes, we then need to re-verify that foo.a = tab.a ... but in this example, that's the responsibility of the NestLoop plan node, not the foreign join. That's right, but is that true when the FDW uses late row locking? Best regards, Etsuro Fujita
Re: [HACKERS] INSERT .. ON CONFLICT DO SELECT [FOR ..]
On Thu, Nov 30, 2017 at 6:39 AM, Peter Geogheganwrote: > On Wed, Nov 29, 2017 at 8:34 PM, Michael Paquier > wrote: > > The patch does not currently apply. I am noticing as well that Peter > > Geoghegan has registered himself as a reviewer a couple of hours back, > > so moved to next CF with waiting on author as status. > > Marko unregistered me, so I promptly reregistered. You'll have to ask him > why. Oops. I didn't mean to do what I did, and I apologize. I had my months mixed up and I thought this commit fest had ended without you having looked at the patch, so I assumed you would register in the next CF if you were still interested, or someone else could pick it up. In any case, *I* don't have any interest in pursuing this patch at this point. I think this is a really good feature, and as far as I know the patch is technically solid, or at least was when it still applied. Can someone else take it from here? .m
Re: [HACKERS] Proposal: Local indexes for partitioned table
Michael Paquier wrote: > On Wed, Nov 15, 2017 at 2:49 AM, Alvaro Herrera> wrote: > > Here's the remaining bits, rebased. > > At least patch 3 has conflicts with HEAD. I am moving this patch to > next CF per a lack of reviews, switching status to waiting on author. Thanks, will submit a new version before the start of the next commitfest. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Proposal: Local indexes for partitioned table
David Rowley wrote: > I wonder if it should be this patches job to alter the code in > get_relation_info() which causes the indexes not to be loaded for > partitioned tables: > > /* > * Make list of indexes. Ignore indexes on system catalogs if told to. > * Don't bother with indexes for an inheritance parent, either. > */ > if (inhparent || > (IgnoreSystemIndexes && IsSystemRelation(relation))) > hasindex = false; > else > hasindex = relation->rd_rel->relhasindex; > > A partitioned table will always go into the hasindex = false code path. > > I'm kind of thinking this patch should change that, even if the patch is > not making use of the indexes, you could argue that something > using set_rel_pathlist_hook might want to do something there, although, > there's likely a bunch of counter arguments too. > > What do you think? Great question. So you're thinking that the planner might have an interest in knowing what indexes are defined at the parent table level for planning purposes; but for that to actually have any effect we would need to change the planner and executor also. And one more point, also related to something you said before: we currently (I mean after my patch) don't mark partitioned-table-level indexes as valid or not valid depending on whether all its children exist, so trying to use that in the planner without having a flag could cause invalid plans to be generated (i.e. ones that would cause nonexistent indexes to be referenced). -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Issues with logical replication
On 30 November 2017 at 11:30, Petr Jelinekwrote: > On 30/11/17 00:47, Andres Freund wrote: >> On 2017-11-30 00:45:44 +0100, Petr Jelinek wrote: >>> I don't understand. I mean sure the SnapBuildWaitSnapshot() can live >>> with it, but the problematic logic happens inside the >>> XactLockTableInsert() and SnapBuildWaitSnapshot() has no way of >>> detecting the situation short of reimplementing the >>> XactLockTableInsert() instead of calling it. >> >> Right. But we fairly trivially can change that. I'm remarking on it >> because other people's, not yours, suggestions aimed at making this >> bulletproof. I just wanted to make clear that I don't think that's >> necessary at all. >> > > Okay, then I guess we are in agreement. I can confirm that the attached > fixes the issue in my tests. Using SubTransGetTopmostTransaction() > instead of SubTransGetParent() means 3 more ifs in terms of extra CPU > cost for other callers. I don't think it's worth worrying about given we > are waiting for heavyweight lock, but if we did we can just inline the > code directly into SnapBuildWaitSnapshot(). This will still fail an Assert in TransactionIdIsInProgress() when snapshots are overflowed. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] logical decoding of two-phase transactions
Hi, >> So perhaps better approach would be to not return >> HEAPTUPLE_DEAD if the transaction id is newer than the OldestXmin (same >> logic we use for deleted tuples of committed transactions) in the >> HeapTupleSatisfiesVacuum() even for aborted transactions. I also briefly >> checked HOT pruning and AFAICS the normal HOT pruning (the one not >> called by vacuum) also uses the xmin as authoritative even for aborted >> txes so nothing needs to be done there probably. >> >> In case we are worried that this affects cleanups of for example large >> aborted COPY transactions and we think it's worth worrying about then we >> could limit the new OldestXmin based logic only to catalog tuples as >> those are the only ones we need available in decoding. > > > Yeah, if it's limited to catalog tuples only then that sounds good. I was > quite concerned about how it'd impact vacuuming otherwise, but if limited to > catalogs about the only impact should be on workloads that create lots of > TEMPORARY tables then ROLLBACK - and not much on those. > Based on these discussions, I think there are two separate issues here: 1) Make HeapTupleSatisfiesVacuum() to behave differently for recently aborted catalog tuples. 2) Invent a mechanism to stop a specific logical decoding activity in the middle. The reason to stop it could be a concurrent abort, maybe a global transaction manager decides to rollback, or any other reason, for example. ISTM, that for 2, if (1) is able to leave the recently abort tuples around for a little bit while (we only really need them till the decode of the current change record is ongoing), then we could accomplish it via a callback. This callback should be called before commencing decode and network send of each change record. In case of in-core logical decoding, the callback for pgoutput could check for the transaction having aborted (a call to TransactionIdDidAbort() or similar such functions), additional logic can be added as needed for various scenarios. If it's aborted, we will abandon decoding and send an ABORT to the subscribers before returning. Regards, Nikhils
Re: es_query_dsa is broken
On Thu, Nov 30, 2017 at 4:01 AM, Robert Haaswrote: >> Better ideas? > > How about this: > > 1. Remove es_query_dsa altogether. > 2. Add a dsa_area * to ExecParallelInitializeDSMContext. > 3. In ExecParallelInitializeDSM, pass the dsa_area * as a separate to > the per-node-type function. > 4. If the per-node-type function cares about the dsa_area *, it is > responsible for saving a pointer to it in the PlanState node. > 5. Also pass it down via the ParallelWorkerContext. > > In v10 we might need to go with a solution like what you've sketched > here, because Tom will complain about breaking binary compatibility > with EState (and maybe other PlanState nodes) in a released branch. I will post both versions. I've been stuck for a while now trying to come up with a query that actually breaks, so I can show that it's fixed... -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] create_unique_path and GEQO
On Thu, Nov 30, 2017 at 9:00 AM, Ashutosh Bapatwrote: > On Thu, Nov 30, 2017 at 8:50 AM, Michael Paquier > wrote: >> On Fri, Mar 24, 2017 at 10:50 PM, Tom Lane wrote: >>> Ashutosh Bapat writes: > Do you have test case, which can reproduce the issue you > explained above? >>> No. It would require some surgery in standard_planner() to measure the memory consumed in the planner context OR build the code with SHOW_MEMORY_STATS defined and dump memory context statistics and check planner context memory usage. I don't think I can produce a testcase quickly right now. But then, I think the problem is quite apparent from the code inspection alone, esp. comparing what mark_dummy_rel() does with what create_unique_path() is doing. >>> >>> Yeah. I think the code in mark_dummy_rel is newer and better-thought-out >>> than what's in create_unique_path. It probably makes sense to change over. >> >> This has remained unanswered for more than 8 months, so I am marking >> this patch as returned with feedback. > > I am not sure what's there to answer in Tom's reply. He seems to be > agreeing with my analysis. Correct me if I am wrong. If that's the > case, I am expecting somebody to review the patch. If there are no > review comments, some committer may commit it. > For now I have marked it as need reviewer and moved to the next commitfest. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
ASCII Null control character validation
Hello, hackers! I found in src/backend/utils/mb/wchar.c: pg_verify_mbstr_len() that it reports ASCII Null character (\000) as invalid. As for me, it should pass validation. However, ASCII Null character breaks a line and the end of the line is missed, try: INSERT INTO mytable VALUES (E'a\001b\000c and rest of line MIA'); Find patch attached. Am I wrong? -- Alexey Chernyshov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company>From ee7b505792a4d7630dd5e20aca475825d11301a7 Mon Sep 17 00:00:00 2001 From: Alexey ChernyshovDate: Wed, 29 Nov 2017 15:35:10 +0300 Subject: [PATCH] Fix 0x00 symbol validation. Earlier it was considered as illegal by fast path for ASCII. --- src/backend/utils/mb/wchar.c | 16 +--- 1 file changed, 5 insertions(+), 11 deletions(-) diff --git a/src/backend/utils/mb/wchar.c b/src/backend/utils/mb/wchar.c index a5fdda4..62b1a7d 100644 --- a/src/backend/utils/mb/wchar.c +++ b/src/backend/utils/mb/wchar.c @@ -1922,18 +1922,12 @@ pg_verify_mbstr_len(int encoding, const char *mbstr, int len, bool noError) int l; /* fast path for ASCII-subset characters */ - if (!IS_HIGHBIT_SET(*mbstr)) + if ((*mbstr >= 0x00) && (*mbstr <= 0x7F)) { - if (*mbstr != '\0') - { -mb_len++; -mbstr++; -len--; -continue; - } - if (noError) -return -1; - report_invalid_encoding(encoding, mbstr, len); + mb_len++; + mbstr++; + len--; + continue; } l = (*mbverify) ((const unsigned char *) mbstr, len); -- 2.7.4