Re: [HACKERS] transition table behavior with inheritance appears broken
On Thu, Jun 29, 2017 at 6:21 AM, Andrew Gierth wrote: > Commits pushed. Great news. Thanks for stepping up to get this committed. Thanks a lot also to Marko, Amit L, Kevin, Robert, Noah and Peter G for the problem reports, reviews and issue chasing. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken
Commits pushed. Unless I broke the buildfarm again (which I'll check up on later), or some new issue arises with the fixes, this should close all 3 related items for transition tables. -- Andrew. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken
> "Noah" == Noah Misch writes: Noah> IMMEDIATE ATTENTION REQUIRED. This PostgreSQL 10 open item is Noah> long past due for your status update. Please reacquaint yourself Noah> with the policy on open item ownership[1] and then reply Noah> immediately. If I do not hear from you by 2017-06-28 06:00 UTC, Noah> I will transfer this item to release management team ownership Noah> without further notice. Sorry for the lack of updates. I need to sleep now, but I will send a proper status update by 1800 UTC (1900 BST) today (28th). -- Andrew. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken
On Sat, Jun 24, 2017 at 10:57:49PM -0700, Noah Misch wrote: > On Fri, Jun 23, 2017 at 02:39:48PM +0100, Andrew Gierth wrote: > > > "Noah" == Noah Misch writes: > > > > Noah> This PostgreSQL 10 open item is past due for your status update. > > Noah> Kindly send a status update within 24 hours, > > > > oops, sorry! I forgot to include a date in the last one, and in fact a > > personal matter delayed things anyway. I expect to have this wrapped up > > by 23:59 BST on the 24th. > > This PostgreSQL 10 open item is again past due for your status update. Kindly > send a status update within 24 hours, and include a date for your subsequent > status update. IMMEDIATE ATTENTION REQUIRED. This PostgreSQL 10 open item is long past due for your status update. Please reacquaint yourself with the policy on open item ownership[1] and then reply immediately. If I do not hear from you by 2017-06-28 06:00 UTC, I will transfer this item to release management team ownership without further notice. [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken
On Fri, Jun 23, 2017 at 02:39:48PM +0100, Andrew Gierth wrote: > > "Noah" == Noah Misch writes: > > Noah> This PostgreSQL 10 open item is past due for your status update. > Noah> Kindly send a status update within 24 hours, > > oops, sorry! I forgot to include a date in the last one, and in fact a > personal matter delayed things anyway. I expect to have this wrapped up > by 23:59 BST on the 24th. This PostgreSQL 10 open item is again past due for your status update. Kindly send a status update within 24 hours, and include a date for your subsequent status update. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken
> "Noah" == Noah Misch writes: Noah> This PostgreSQL 10 open item is past due for your status update. Noah> Kindly send a status update within 24 hours, oops, sorry! I forgot to include a date in the last one, and in fact a personal matter delayed things anyway. I expect to have this wrapped up by 23:59 BST on the 24th. -- Andrew. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken
On Sun, Jun 18, 2017 at 11:59:43PM +0100, Andrew Gierth wrote: > > "Andrew" == Andrew Gierth writes: > > Andrew> Unfortunately I've been delayed over the past couple of days, > Andrew> but I have Thomas' latest patchset in hand and will be working > Andrew> on it over the rest of the week. Status update by 23:59 BST on > Andrew> Sun 18th, by which time I hope to have everything finalized > Andrew> (all three issues, not just the inheritance one). > > I have, I believe, completed my review of the patchset. My conclusion is > that the fix appears to be sound and I haven't been able to find any > further issues with it; so I think Thomas's patches should be committed > as-is. Unless anyone objects I will do this within the next few days. > > (Any preferences for whether it should be one commit or 3 separate ones?) This PostgreSQL 10 open item is past due for your status update. Kindly send a status update within 24 hours, and include a date for your subsequent status update. Refer to the policy on open item ownership: https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Mon, Jun 12, 2017 at 2:03 PM, Thomas Munro wrote: > Thanks for the review. New version of patch #1 attached. Here's a version rebased on top of the recently reindented master branch. -- Thomas Munro http://www.enterprisedb.com transition-tuples-from-child-tables-v12.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken
On Sun, Jun 18, 2017 at 6:59 PM, Andrew Gierth wrote: > (Any preferences for whether it should be one commit or 3 separate ones?) If I were doing it, I would commit them separately. But I'm not doing it, so I won't complain about what you decide to do. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken
On 2017/06/19 7:59, Andrew Gierth wrote: >> "Andrew" == Andrew Gierth writes: > > (Any preferences for whether it should be one commit or 3 separate ones?) For my 2c, it would be nice to keep at least the inheritance one (or all of them actually) separate. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken
> "Andrew" == Andrew Gierth writes: Andrew> Unfortunately I've been delayed over the past couple of days, Andrew> but I have Thomas' latest patchset in hand and will be working Andrew> on it over the rest of the week. Status update by 23:59 BST on Andrew> Sun 18th, by which time I hope to have everything finalized Andrew> (all three issues, not just the inheritance one). I have, I believe, completed my review of the patchset. My conclusion is that the fix appears to be sound and I haven't been able to find any further issues with it; so I think Thomas's patches should be committed as-is. Unless anyone objects I will do this within the next few days. (Any preferences for whether it should be one commit or 3 separate ones?) -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken
> "Andrew" == Andrew Gierth writes: Andrew> I will post a further status update before 23:59 BST on 14th Andrew> Jun. Unfortunately I've been delayed over the past couple of days, but I have Thomas' latest patchset in hand and will be working on it over the rest of the week. Status update by 23:59 BST on Sun 18th, by which time I hope to have everything finalized (all three issues, not just the inheritance one). -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Sat, Jun 10, 2017 at 6:08 AM, Robert Haas wrote: > I have spent some time now studying this patch. I might be missing > something, but to me this appears to be in great shape. A few minor > nitpicks: > > -if ((event == TRIGGER_EVENT_DELETE && > !trigdesc->trig_delete_after_row) || > -(event == TRIGGER_EVENT_INSERT && !trigdesc->trig_insert_after_row) > || > - (event == TRIGGER_EVENT_UPDATE && !trigdesc->trig_update_after_row)) > +if (trigdesc == NULL || > +(event == TRIGGER_EVENT_DELETE && > !trigdesc->trig_delete_after_row) || > +(event == TRIGGER_EVENT_INSERT && > !trigdesc->trig_insert_after_row) || > +(event == TRIGGER_EVENT_UPDATE && > !trigdesc->trig_update_after_row)) > > I suspect the whitespace changes will get reverted by pgindent, making > them pointless. But that's a trivial issue. Not just a whitespace change: added "trigdesc == NULL" and put the existing stuff into parens, necessarily changing indentation level. > +if (mtstate->mt_transition_capture != NULL) > +{ > +if (resultRelInfo->ri_TrigDesc && > +(resultRelInfo->ri_TrigDesc->trig_insert_before_row || > + resultRelInfo->ri_TrigDesc->trig_insert_instead_row)) > +{ > +/* > + * If there are any BEFORE or INSTEAD triggers on the > + * partition, we'll have to be ready to convert their result > + * back to tuplestore format. > + */ > + > mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL; > +mtstate->mt_transition_capture->tcs_map = > +mtstate->mt_transition_tupconv_maps[leaf_part_index]; > +} > +else > +{ > +/* > + * Otherwise, just remember the original unconverted tuple, > to > + * avoid a needless round trip conversion. > + */ > + > mtstate->mt_transition_capture->tcs_original_insert_tuple = tuple; > +mtstate->mt_transition_capture->tcs_map = NULL; > +} > +} > > This chunk of code gets inserted between a comment and the code to > which that comment refers. Maybe that's not the best idea. Fixed. Similar code existed in copy.c, so fixed there too. > * Does has_superclass have concurrency issues? After some > consideration, I decided that it's probably fine as long as you hold a > lock on the target relation sufficient to prevent it from concurrently > becoming an inheritance child - i.e. any lock at all. The caller is > CREATE TRIGGER, which certainly does. I added a comment to make that clear. > * In AfterTriggerSaveEvent, is it a problem that the large new hunk of > code ignores trigdesc if transition_capture != NULL? If I understand > correctly, the trigdesc will be coming from the leaf relation actually > being updated, while the transition_capture will be coming from the > relation named in the query. Is the transition_capture object > guaranteed to have all the flags set, or do we also need to include > the ones from the trigdesc? This also seems to be fine, because of > the restriction that row-level triggers with tuplestores can't > participate in inheritance hierarchies. We can only need to capture > the tuples for the relation named in the query, not the leaf > partitions. Yeah. I think that's right. Note that in this patch I was trying to cater to execReplication.c so that it wouldn't have to construct a TransitionCaptureState. It doesn't actually support hierarchies (it doesn't operate on partitioned tables, and doesn't have the smarts to direct updates to traditional inheritance children), and doesn't fire statement triggers. In the #2 patch in the other thread about wCTEs, I changed this to make a TransitionCaptureState object the *only* way to cause transition tuple capture, because in that patch it owns the tuplestores without which you can't capture anything. So in that patch trigdesc is no longer relevant at all for the tuple capture. Someone extending execReplication.c to support partitions and statement triggers etc will need to think about creating a TransitionCaptureState but I decided that was out of scope for the transition table rescue project. In the new version of the #2 patch that I'm about to post on the other there there is now a comment in execReplication.c to explain. > So, in short, +1 from me. Thanks for the review. New version of patch #1 attached. -- Thomas Munro http://www.enterprisedb.com transition-tuples-from-child-tables-v11.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken
> "Andrew" == Andrew Gierth writes: Andrew> I have it; I will post a status update before 23:59 BST on 11 Andrew> Jun. This is that status update. I am still studying Thomas' latest patch set; as I mentioned in another message, I've confirmed a memory leak, and I expect further work may be needed in some other areas as well, but I think we're still making progress towards fixing it and I will work with Thomas on it. I will post a further status update before 23:59 BST on 14th Jun. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken
> "Robert" == Robert Haas writes: Robert> So, Andrew, are you running with this, or should I keep looking Robert> into it? I have it; I will post a status update before 23:59 BST on 11 Jun. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Fri, Jun 9, 2017 at 12:19 PM, Robert Haas wrote: > On Thu, Jun 8, 2017 at 11:56 PM, Thomas Munro > wrote: >> [Adding Andrew Gierth] >> >> Here is a rebased version of the patch to fix transition tables with >> inheritance. Fixes a typo in an error message ("not support on >> partitions" -> "... supported ..."), and changes regression test >> triggers to be single-event (only one of INSERT, UPDATE or DELETE), >> because a later patch will not allow multi-event triggers with TTs. >> >> This is patch 1 of a stack of 3 patches addressing currently known >> problems with transition tables. > > So, Andrew, are you running with this, or should I keep looking into it? I have spent some time now studying this patch. I might be missing something, but to me this appears to be in great shape. A few minor nitpicks: -if ((event == TRIGGER_EVENT_DELETE && !trigdesc->trig_delete_after_row) || -(event == TRIGGER_EVENT_INSERT && !trigdesc->trig_insert_after_row) || - (event == TRIGGER_EVENT_UPDATE && !trigdesc->trig_update_after_row)) +if (trigdesc == NULL || +(event == TRIGGER_EVENT_DELETE && !trigdesc->trig_delete_after_row) || +(event == TRIGGER_EVENT_INSERT && !trigdesc->trig_insert_after_row) || +(event == TRIGGER_EVENT_UPDATE && !trigdesc->trig_update_after_row)) I suspect the whitespace changes will get reverted by pgindent, making them pointless. But that's a trivial issue. +if (mtstate->mt_transition_capture != NULL) +{ +if (resultRelInfo->ri_TrigDesc && +(resultRelInfo->ri_TrigDesc->trig_insert_before_row || + resultRelInfo->ri_TrigDesc->trig_insert_instead_row)) +{ +/* + * If there are any BEFORE or INSTEAD triggers on the + * partition, we'll have to be ready to convert their result + * back to tuplestore format. + */ + mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL; +mtstate->mt_transition_capture->tcs_map = +mtstate->mt_transition_tupconv_maps[leaf_part_index]; +} +else +{ +/* + * Otherwise, just remember the original unconverted tuple, to + * avoid a needless round trip conversion. + */ + mtstate->mt_transition_capture->tcs_original_insert_tuple = tuple; +mtstate->mt_transition_capture->tcs_map = NULL; +} +} This chunk of code gets inserted between a comment and the code to which that comment refers. Maybe that's not the best idea. Some other things I thought about: * Does has_superclass have concurrency issues? After some consideration, I decided that it's probably fine as long as you hold a lock on the target relation sufficient to prevent it from concurrently becoming an inheritance child - i.e. any lock at all. The caller is CREATE TRIGGER, which certainly does. * In AfterTriggerSaveEvent, is it a problem that the large new hunk of code ignores trigdesc if transition_capture != NULL? If I understand correctly, the trigdesc will be coming from the leaf relation actually being updated, while the transition_capture will be coming from the relation named in the query. Is the transition_capture object guaranteed to have all the flags set, or do we also need to include the ones from the trigdesc? This also seems to be fine, because of the restriction that row-level triggers with tuplestores can't participate in inheritance hierarchies. We can only need to capture the tuples for the relation named in the query, not the leaf partitions. * The regression tests for this function are fairly lengthy. Given the complexity of the behavior being tested, though, it seems like a really good idea to have these. Otherwise, it's easy to imagine some future patch breaking this again. I also like the documentation update. So, in short, +1 from me. Regards, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Thu, Jun 8, 2017 at 11:56 PM, Thomas Munro wrote: > [Adding Andrew Gierth] > > Here is a rebased version of the patch to fix transition tables with > inheritance. Fixes a typo in an error message ("not support on > partitions" -> "... supported ..."), and changes regression test > triggers to be single-event (only one of INSERT, UPDATE or DELETE), > because a later patch will not allow multi-event triggers with TTs. > > This is patch 1 of a stack of 3 patches addressing currently known > problems with transition tables. So, Andrew, are you running with this, or should I keep looking into it? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Mon, May 22, 2017 at 5:51 PM, Amit Langote wrote: > On 2017/05/20 9:01, Thomas Munro wrote: >> Sent too soon. Several variables should also be renamed to make clear >> they refer to the transition capture state in effect, instead of vague >> names like 'transitions'. Sorry for the version churn. > > Ah, I was kind of getting distracted by it earlier too; thanks. > > Anyway, the patch looks good to me. [Adding Andrew Gierth] Here is a rebased version of the patch to fix transition tables with inheritance. Fixes a typo in an error message ("not support on partitions" -> "... supported ..."), and changes regression test triggers to be single-event (only one of INSERT, UPDATE or DELETE), because a later patch will not allow multi-event triggers with TTs. This is patch 1 of a stack of 3 patches addressing currently known problems with transition tables. -- Thomas Munro http://www.enterprisedb.com transition-tuples-from-child-tables-v10.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On 2017/05/20 9:01, Thomas Munro wrote: > On Sat, May 20, 2017 at 10:43 AM, Thomas Munro > wrote: >> On Fri, May 19, 2017 at 6:35 PM, Amit Langote >> wrote: >>> On 2017/05/19 15:16, Thomas Munro wrote: Would TransitionCaptureState be a better name for this struct? >>> >>> Yes. Although, losing the Trigger prefix might make it sound a bit >>> ambiguous though. Right above its definition, we have TriggerData. So, >>> maybe TriggerTransitionCaptureState or TriggerTransitionCaptureData or >>> TriggerTransitionData may be worth considering. >> >> Ok, here's a version using TransitionCaptureState. Those other names >> seem too long, and "TriggerTransition" is already in use so >> "TriggerTransitionData" seems off the table. Having the word >> "capture" in there seems good, since this is an object that controls >> what we capture when we process a modify a set of tables. I hope >> that's clear. I agree. TransitionCaptureState sounds good. > Sent too soon. Several variables should also be renamed to make clear > they refer to the transition capture state in effect, instead of vague > names like 'transitions'. Sorry for the version churn. Ah, I was kind of getting distracted by it earlier too; thanks. Anyway, the patch looks good to me. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Sat, May 20, 2017 at 10:43 AM, Thomas Munro wrote: > On Fri, May 19, 2017 at 6:35 PM, Amit Langote > wrote: >> On 2017/05/19 15:16, Thomas Munro wrote: >>> Would TransitionCaptureState be a better name for this struct? >> >> Yes. Although, losing the Trigger prefix might make it sound a bit >> ambiguous though. Right above its definition, we have TriggerData. So, >> maybe TriggerTransitionCaptureState or TriggerTransitionCaptureData or >> TriggerTransitionData may be worth considering. > > Ok, here's a version using TransitionCaptureState. Those other names > seem too long, and "TriggerTransition" is already in use so > "TriggerTransitionData" seems off the table. Having the word > "capture" in there seems good, since this is an object that controls > what we capture when we process a modify a set of tables. I hope > that's clear. Sent too soon. Several variables should also be renamed to make clear they refer to the transition capture state in effect, instead of vague names like 'transitions'. Sorry for the version churn. -- Thomas Munro http://www.enterprisedb.com transition-tuples-from-child-tables-v9.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Fri, May 19, 2017 at 6:35 PM, Amit Langote wrote: > On 2017/05/19 15:16, Thomas Munro wrote: >> Would TransitionCaptureState be a better name for this struct? > > Yes. Although, losing the Trigger prefix might make it sound a bit > ambiguous though. Right above its definition, we have TriggerData. So, > maybe TriggerTransitionCaptureState or TriggerTransitionCaptureData or > TriggerTransitionData may be worth considering. Ok, here's a version using TransitionCaptureState. Those other names seem too long, and "TriggerTransition" is already in use so "TriggerTransitionData" seems off the table. Having the word "capture" in there seems good, since this is an object that controls what we capture when we process a modify a set of tables. I hope that's clear. -- Thomas Munro http://www.enterprisedb.com transition-tuples-from-child-tables-v8.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On 2017/05/19 15:16, Thomas Munro wrote: > On Fri, May 19, 2017 at 5:51 PM, Amit Langote > wrote: >> I saw in the latest patch that now ExecSetupTriggerTransitionState() looks >> at mtstate->mt_partition_dispatch_info when setting up the transition >> conversion map. In the case where it's non-NULL, you may have realized >> that mt_transition_tupconv_map will be an exact copy of >> mt_partition_tupconv_maps that's already built. Would it perhaps be a >> good idea to either share the same or make a copy using memcpy() instead >> of doing the convert_tuples_by_name() calls again? > > Isn't it the opposite? mt_partition_tupconv_maps holds maps that > convert the parent format to the partition format. > mt_transition_tupconv_maps holds maps that convert the partition > format to the parent format (= transition tuplestore format). You're right, never mind. >>> On Thu, May 18, 2017 at 10:16 PM, Amit Langote >>> wrote: +typedef struct TriggerTransitionState +{ ... +boolttf_delete_old_table; Just curious: why ttf_? TriggerTransition field? >>> >>> Oops. Changed to "tts_". I had renamed this struct but not the members. >> >> Ah. BTW, maybe it's not a problem, but the existing TupleTableSlot's >> member names are prefixed with tts_, too. :) > > Would TransitionCaptureState be a better name for this struct? Yes. Although, losing the Trigger prefix might make it sound a bit ambiguous though. Right above its definition, we have TriggerData. So, maybe TriggerTransitionCaptureState or TriggerTransitionCaptureData or TriggerTransitionData may be worth considering. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Fri, May 19, 2017 at 5:51 PM, Amit Langote wrote: > I saw in the latest patch that now ExecSetupTriggerTransitionState() looks > at mtstate->mt_partition_dispatch_info when setting up the transition > conversion map. In the case where it's non-NULL, you may have realized > that mt_transition_tupconv_map will be an exact copy of > mt_partition_tupconv_maps that's already built. Would it perhaps be a > good idea to either share the same or make a copy using memcpy() instead > of doing the convert_tuples_by_name() calls again? Isn't it the opposite? mt_partition_tupconv_maps holds maps that convert the parent format to the partition format. mt_transition_tupconv_maps holds maps that convert the partition format to the parent format (= transition tuplestore format). >> On Thu, May 18, 2017 at 10:16 PM, Amit Langote >> wrote: >>> +typedef struct TriggerTransitionState >>> +{ >>> ... >>> +boolttf_delete_old_table; >>> >>> Just curious: why ttf_? TriggerTransition field? >> >> Oops. Changed to "tts_". I had renamed this struct but not the members. > > Ah. BTW, maybe it's not a problem, but the existing TupleTableSlot's > member names are prefixed with tts_, too. :) Would TransitionCaptureState be a better name for this struct? -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On 2017/05/19 14:01, Thomas Munro wrote: > On Fri, May 19, 2017 at 1:38 AM, Kevin Grittner wrote: >> On Thu, May 18, 2017 at 5:16 AM, Amit Langote >> wrote: >> >>> Do we need to update documentation? Perhaps, some clarification on the >>> inheritance/partitioning behavior somewhere. >> >> Yeah, I think so. > > Here is an attempt at documenting the situation in the CREATE TRIGGER > notes section. Looks good, thanks. >>> I'm not sure if it's significant for transition tables, but what if a >>> partition's BR trigger modified the tuple? Would we want to include the >>> modified version of the tuple in the transition table or the original as >>> the patch does? Same for the code in CopyFrom(). >> >> Good spot! If the BR trigger on the child table modifies or >> suppresses the action, I strongly feel that must be reflected in the >> transition table. This needs to be fixed. > > Gah. Right. In the attached version, there is a still an 'original > tuple' optimisation for insertions (avoiding parent -> child -> parent > conversion), but it's disabled if there are any BEFORE INSERT or > INSTEAD OF INSERT row-level triggers. > > That's demonstrated by this part of the regression test, which > modifies the value inserted into the 'CCC' partition (and similar case > for COPY): > > insert into parent values ('AAA', 42), ('BBB', 42), ('CCC', 66); > NOTICE: trigger = parent_stmt_trig, old table = , new table = > (AAA,42), (BBB,42), (CCC,1066) Seems to work correctly. I saw in the latest patch that now ExecSetupTriggerTransitionState() looks at mtstate->mt_partition_dispatch_info when setting up the transition conversion map. In the case where it's non-NULL, you may have realized that mt_transition_tupconv_map will be an exact copy of mt_partition_tupconv_maps that's already built. Would it perhaps be a good idea to either share the same or make a copy using memcpy() instead of doing the convert_tuples_by_name() calls again? > On Thu, May 18, 2017 at 10:16 PM, Amit Langote > wrote: >> +typedef struct TriggerTransitionState >> +{ >> ... >> +boolttf_delete_old_table; >> >> Just curious: why ttf_? TriggerTransition field? > > Oops. Changed to "tts_". I had renamed this struct but not the members. Ah. BTW, maybe it's not a problem, but the existing TupleTableSlot's member names are prefixed with tts_, too. :) Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Fri, May 19, 2017 at 1:38 AM, Kevin Grittner wrote: > On Thu, May 18, 2017 at 5:16 AM, Amit Langote > wrote: > >> Do we need to update documentation? Perhaps, some clarification on the >> inheritance/partitioning behavior somewhere. > > Yeah, I think so. Here is an attempt at documenting the situation in the CREATE TRIGGER notes section. >> -Assert((enrmd->reliddesc == InvalidOid) != (enrmd->tupdesc == NULL)); >> +Assert((enrmd->reliddesc == InvalidOid) != >> + (enrmd->tupdesc == NULL)); >> >> Perhaps, unintentional change? > > Agreed; line is not long enough to need to wrap. Fixed. >> I'm not sure if it's significant for transition tables, but what if a >> partition's BR trigger modified the tuple? Would we want to include the >> modified version of the tuple in the transition table or the original as >> the patch does? Same for the code in CopyFrom(). > > Good spot! If the BR trigger on the child table modifies or > suppresses the action, I strongly feel that must be reflected in the > transition table. This needs to be fixed. Gah. Right. In the attached version, there is a still an 'original tuple' optimisation for insertions (avoiding parent -> child -> parent conversion), but it's disabled if there are any BEFORE INSERT or INSTEAD OF INSERT row-level triggers. That's demonstrated by this part of the regression test, which modifies the value inserted into the 'CCC' partition (and similar case for COPY): insert into parent values ('AAA', 42), ('BBB', 42), ('CCC', 66); NOTICE: trigger = parent_stmt_trig, old table = , new table = (AAA,42), (BBB,42), (CCC,1066) On Thu, May 18, 2017 at 10:16 PM, Amit Langote wrote: > +typedef struct TriggerTransitionState > +{ > ... > +boolttf_delete_old_table; > > Just curious: why ttf_? TriggerTransition field? Oops. Changed to "tts_". I had renamed this struct but not the members. -- Thomas Munro http://www.enterprisedb.com transition-tuples-from-child-tables-v7.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Thu, May 18, 2017 at 5:16 AM, Amit Langote wrote: > Do we need to update documentation? Perhaps, some clarification on the > inheritance/partitioning behavior somewhere. Yeah, I think so. > -Assert((enrmd->reliddesc == InvalidOid) != (enrmd->tupdesc == NULL)); > +Assert((enrmd->reliddesc == InvalidOid) != > + (enrmd->tupdesc == NULL)); > > Perhaps, unintentional change? Agreed; line is not long enough to need to wrap. > I'm not sure if it's significant for transition tables, but what if a > partition's BR trigger modified the tuple? Would we want to include the > modified version of the tuple in the transition table or the original as > the patch does? Same for the code in CopyFrom(). Good spot! If the BR trigger on the child table modifies or suppresses the action, I strongly feel that must be reflected in the transition table. This needs to be fixed. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On 2017/05/18 7:13, Thomas Munro wrote: > On Wed, May 17, 2017 at 7:42 PM, Thomas Munro > wrote: >> On Wed, May 17, 2017 at 6:04 PM, Amit Langote >> wrote: >>> targetRelInfo should instead be set to mtstate->rootResultRelInfo that was >>> set in ExecInitModifyTable() as described above, IOW, as follows: >>> >>> /* Partitioned table. */ >>> if (mtstate->rootResultRelInfo != NULL) >>> targetRelInfo = mtstate->rootResultRelInfo; >> >> Ah, I see. Thank you. Fixed in the attached. > > Here's a post-pgindent rebase. I read through the latest patch. Some comments: Do we need to update documentation? Perhaps, some clarification on the inheritance/partitioning behavior somewhere. +typedef struct TriggerTransitionState +{ ... +boolttf_delete_old_table; Just curious: why ttf_? TriggerTransition field? -Assert((enrmd->reliddesc == InvalidOid) != (enrmd->tupdesc == NULL)); +Assert((enrmd->reliddesc == InvalidOid) != + (enrmd->tupdesc == NULL)); Perhaps, unintentional change? +original_tuple = tuple; map = mtstate->mt_partition_tupconv_maps[leaf_part_index]; if (map) { @@ -570,8 +572,17 @@ ExecInsert(ModifyTableState *mtstate, setLastTid(&(tuple->t_self)); } +/* + * If we inserted into a partitioned table, then insert routing logic may + * have converted the tuple to a partition's format. Make the original + * unconverted tuple available for transition tables. + */ +if (mtstate->mt_transition_state != NULL) +mtstate->mt_transition_state->original_insert_tuple = original_tuple; I'm not sure if it's significant for transition tables, but what if a partition's BR trigger modified the tuple? Would we want to include the modified version of the tuple in the transition table or the original as the patch does? Same for the code in CopyFrom(). * 'tup_conv_maps' receives an array of TupleConversionMap objects with one * entry for every leaf partition (required to convert input tuple based * on the root table's rowtype to a leaf partition's rowtype after tuple - * routing is done + * routing is done) Oh, thanks! :) Other than the above minor nitpicks, patch looks good to me. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Wed, May 17, 2017 at 7:42 PM, Thomas Munro wrote: > On Wed, May 17, 2017 at 6:04 PM, Amit Langote > wrote: >> targetRelInfo should instead be set to mtstate->rootResultRelInfo that was >> set in ExecInitModifyTable() as described above, IOW, as follows: >> >> /* Partitioned table. */ >> if (mtstate->rootResultRelInfo != NULL) >> targetRelInfo = mtstate->rootResultRelInfo; > > Ah, I see. Thank you. Fixed in the attached. Here's a post-pgindent rebase. Also, I discovered a preexisting bug that is independent of all this inheritance stuff. COPY in the batch optimisation case was failing to capture transition tuples. I thought about sending a separate patch but this patch already has a regression test that covers it so I've included it here. It's this hunk: @@ -2872,7 +2872,8 @@ CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid, * anyway. */ else if (resultRelInfo->ri_TrigDesc != NULL && -resultRelInfo->ri_TrigDesc->trig_insert_after_row) +(resultRelInfo->ri_TrigDesc->trig_insert_after_row || + resultRelInfo->ri_TrigDesc->trig_insert_new_table)) { for (i = 0; i < nBufferedTuples; i++) { -- Thomas Munro http://www.enterprisedb.com transition-tuples-from-child-tables-v6.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Wed, May 17, 2017 at 6:04 PM, Amit Langote wrote: > On 2017/05/17 11:22, Thomas Munro wrote: >> Here is that patch. Thoughts? > > I looked at the patch and noticed that there might be some confusion about > what's in the EState.es_root_result_relations array. Thanks for looking at this! > ... > > targetRelInfo should instead be set to mtstate->rootResultRelInfo that was > set in ExecInitModifyTable() as described above, IOW, as follows: > > /* Partitioned table. */ > if (mtstate->rootResultRelInfo != NULL) > targetRelInfo = mtstate->rootResultRelInfo; Ah, I see. Thank you. Fixed in the attached. -- Thomas Munro http://www.enterprisedb.com transition-tuples-from-child-tables-v5.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On 2017/05/17 11:22, Thomas Munro wrote: > On Wed, May 17, 2017 at 10:13 AM, Kevin Grittner wrote: >> On Tue, May 16, 2017 at 4:50 PM, Thomas Munro >> wrote: >>> >>> I'm about to post a much simpler patch that collects uniform tuples >>> from children, addressing the reported bug, and simply rejects >>> transition tables on row-triggers on tables that are in either kind of >>> inheritance hierarchy. More soon... >> >> I agree that we can safely go that far, but not farther. Thanks! > > Here is that patch. Thoughts? I looked at the patch and noticed that there might be some confusion about what's in the EState.es_root_result_relations array. If you see the following block of code in ExecInitModifyTable(): /* If modifying a partitioned table, initialize the root table info */ if (node->rootResultRelIndex >= 0) mtstate->rootResultRelInfo = estate->es_root_result_relations + node->rootResultRelIndex; You might be able to see that node->rootResultRelIndex is used as offset into es_root_result_relations, which means the partitioned table root being modified by this node. The entries in es_root_result_relations correspond to the partitioned table roots referenced as targets in different parts of the query (for example, a WITH query might have its own target partitioned tables). So, in ExecSetupTriggerTransitionState() added by the patch, the following code needs to be changed: /* * Find the ResultRelInfo corresponding to the relation that was * explicitly named in the statement. */ if (estate->es_num_root_result_relations > 0) { /* Partitioned table. The named relation is the first root. */ targetRelInfo = &estate->es_root_result_relations[0]; } targetRelInfo should instead be set to mtstate->rootResultRelInfo that was set in ExecInitModifyTable() as described above, IOW, as follows: /* Partitioned table. */ if (mtstate->rootResultRelInfo != NULL) targetRelInfo = mtstate->rootResultRelInfo; I guess that's what you intend to do here. Sorry if the comments that I wrote about es_root_result_relations in what got committed as e180c8aa8c were not clear enough. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Wed, May 17, 2017 at 10:13 AM, Kevin Grittner wrote: > On Tue, May 16, 2017 at 4:50 PM, Thomas Munro > wrote: >> >> I'm about to post a much simpler patch that collects uniform tuples >> from children, addressing the reported bug, and simply rejects >> transition tables on row-triggers on tables that are in either kind of >> inheritance hierarchy. More soon... > > I agree that we can safely go that far, but not farther. Thanks! Here is that patch. Thoughts? On Wed, May 10, 2017 at 3:41 PM, Robert Haas wrote: > Hmm. What if the partitioning hierarchy contains foreign tables? No tuples are collected from partitions that are foreign tables. See the attached demonstration. I wasn't sure whether and if so where to include that in the regression tests because it needs a contrib module. Robert and I discussed this off-list and came up with some options: (1) document that as the current behaviour (where?), (2) figure out how to prevent that situation from arising, (3) raise some kind of runtime error if foreign transition tuples need to be collected. Option 2 would seem to require us to lock the whole chain of ancestors to check for statement-level triggers with transition tables, which seems unpleasant, and option 3 is conceptually similar to the execution time insertion failure. It's debatable wither 3 or 1 is more surprising or inconvenient to users. I vote for option 1 as a stop-gap measure (and I hope that someone will soon fix transition tuple capture for foreign tables generally). However, it's a bit inconsistent that we explicitly reject triggers with transition tables on foreign tables directly, but let them silently fail to capture anything when they're indirectly accessed via a parent relation. Thoughts? -- Thomas Munro http://www.enterprisedb.com transition-tables-vs-foreign-partitions.sql Description: Binary data transition-tuples-from-child-tables-v4.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Tue, May 16, 2017 at 4:50 PM, Thomas Munro wrote: > On Wed, May 17, 2017 at 9:20 AM, Kevin Grittner wrote: >> On Wed, May 10, 2017 at 7:02 AM, Thomas Munro >> wrote: >>> On Wed, May 10, 2017 at 11:10 PM, Thomas Munro >>> wrote: 2. If you attach a row-level trigger with transition tables to any inheritance child, it will see transition tuples from all tables in the inheritance hierarchy at or below the directly named table that were modified by the same statement, sliced so that they appear as tuples from the directly named table. >>> >>> Of course that's a bit crazy, not only for trigger authors to >>> understand and deal with, but also for plan caching: it just doesn't >>> really make sense to have a database object, even an ephemeral one, >>> whose type changes depending on how the trigger was invoked, because >>> the plans stick around. >> >> The patch to add transition tables changed caching of a trigger >> function to key on the combination of the function and the target >> relation, rather than having one cache entry regardless of the >> target table. > > Right. That works as long as triggers always see tuples in the format > of the relation that they're attached to, and I think that's the only > sensible choice. The problem I was thinking about was this: We have > only one pair of tuplestores and in the current design it holds tuples > of a uniform format, yet it may (eventually) need to be scanned by a > statement trigger attached to a the named relation AND any number of > row triggers attached to children with potentially different formats. > That implies some extra conversions are required at scan time. I had > a more ambitious patch that would deal with sone of that by tracking > storage format and scan format separately (next time your row trigger > is invoked the scan format will be the same but the storage format may > be different depending on which relation you named in a query), but > I'm putting that to one side for this release. That was a bit of a > rabbit hole, and there are some tricky design questions about tuple > conversions (to behave like DB2 with subtables may even require > tuplestore with per-tuple type information) and also the subset of > rows that each row trigger should see (which may require extra tuple > origin metadata or separate tuplestores). Yeah, I wish this had surfaced far earlier, but I missed it and none of the reviews prior to commit caught it, either. :-( It seems too big to squeeze into v10 now. I just want to have that general direction in mind and not paint ourselves into any corner that makes it hard to get there in the next release. > I'm about to post a much simpler patch that collects uniform tuples > from children, addressing the reported bug, and simply rejects > transition tables on row-triggers on tables that are in either kind of > inheritance hierarchy. More soon... I agree that we can safely go that far, but not farther. Thanks! -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Wed, May 17, 2017 at 9:20 AM, Kevin Grittner wrote: > On Wed, May 10, 2017 at 7:02 AM, Thomas Munro > wrote: >> On Wed, May 10, 2017 at 11:10 PM, Thomas Munro >> wrote: >>> 2. If you attach a row-level trigger with transition tables to any >>> inheritance child, it will see transition tuples from all tables in >>> the inheritance hierarchy at or below the directly named table that >>> were modified by the same statement, sliced so that they appear as >>> tuples from the directly named table. >> >> Of course that's a bit crazy, not only for trigger authors to >> understand and deal with, but also for plan caching: it just doesn't >> really make sense to have a database object, even an ephemeral one, >> whose type changes depending on how the trigger was invoked, because >> the plans stick around. > > The patch to add transition tables changed caching of a trigger > function to key on the combination of the function and the target > relation, rather than having one cache entry regardless of the > target table. Right. That works as long as triggers always see tuples in the format of the relation that they're attached to, and I think that's the only sensible choice. The problem I was thinking about was this: We have only one pair of tuplestores and in the current design it holds tuples of a uniform format, yet it may (eventually) need to be scanned by a statement trigger attached to a the named relation AND any number of row triggers attached to children with potentially different formats. That implies some extra conversions are required at scan time. I had a more ambitious patch that would deal with sone of that by tracking storage format and scan format separately (next time your row trigger is invoked the scan format will be the same but the storage format may be different depending on which relation you named in a query), but I'm putting that to one side for this release. That was a bit of a rabbit hole, and there are some tricky design questions about tuple conversions (to behave like DB2 with subtables may even require tuplestore with per-tuple type information) and also the subset of rows that each row trigger should see (which may require extra tuple origin metadata or separate tuplestores). I'm about to post a much simpler patch that collects uniform tuples from children, addressing the reported bug, and simply rejects transition tables on row-triggers on tables that are in either kind of inheritance hierarchy. More soon... -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Wed, May 10, 2017 at 7:02 AM, Thomas Munro wrote: > On Wed, May 10, 2017 at 11:10 PM, Thomas Munro > wrote: >> 2. If you attach a row-level trigger with transition tables to any >> inheritance child, it will see transition tuples from all tables in >> the inheritance hierarchy at or below the directly named table that >> were modified by the same statement, sliced so that they appear as >> tuples from the directly named table. > > Of course that's a bit crazy, not only for trigger authors to > understand and deal with, but also for plan caching: it just doesn't > really make sense to have a database object, even an ephemeral one, > whose type changes depending on how the trigger was invoked, because > the plans stick around. The patch to add transition tables changed caching of a trigger function to key on the combination of the function and the target relation, rather than having one cache entry regardless of the target table. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
[Apologies to all for my recent absence from community lists, and special thanks to Thomas and Robert for picking up the slack.] On Tue, May 9, 2017 at 4:51 PM, Thomas Munro wrote: > On Tue, May 9, 2017 at 10:29 PM, Thomas Munro > wrote: > Recall that transition tables can be specified for statement-level > triggers AND row-level triggers. If you specify them for row-level > triggers, then they can see all rows changed so far each time they > fire. No, they see all rows from the statement, each time. test=# create table t (c int not null); CREATE TABLE test=# create function t_func() test-# returns trigger test-# language plpgsql test-# as $$ test$# begin test$# raise notice '% / % = %', test$#new.c, test$#(select sum(c) from n), test$#(select new.c::float / sum(n.c) from n); test$# return null; test$# end; test$# $$; CREATE FUNCTION test=# create trigger t_trig test-# after insert or update on t test-# referencing new table as n test-# for each row test-# execute procedure t_func(); CREATE TRIGGER test=# insert into t select generate_series(1,5); NOTICE: 1 / 15 = 0.0667 NOTICE: 2 / 15 = 0.133 NOTICE: 3 / 15 = 0.2 NOTICE: 4 / 15 = 0.267 NOTICE: 5 / 15 = 0.333 INSERT 0 5 This behavior is required for this feature by the SQL standard. > Now our policy of firing the statement level triggers only for > the named relation but firing row-level triggers for all modified > relations leads to a tricky problem for the inheritance case: what > type of transition tuples should the child table's row-level triggers > see? The record format for the object on which the trigger was declared, IMO. > Suppose you have an inheritance hierarchy like this: > > animal > -> mammal > -> cat > > You define a statement-level trigger on "animal" and another > statement-level trigger on "mammal". You define a row-level trigger > on "cat". When you update either "animal" or "mammal", the row > triggers on "cat" might run. Row-level triggers on "cat" see OLD and > NEW as "cat" tuples, of course, but if they are configured to see > transition tables, should they see "cat", "mammal" or "animal" tuples > in the transition tables? With my patch as it is, that depends on > which level of the hierarchy you explicitly updated! I think that the ideal behavior would be that if you define a trigger on "cat", you see rows in the "cat" format; if you define a trigger on rows for "mammal", you see rows in the "mammal" format; if you define a trigger on rows for "animal", you see rows in the "animal" format. Also, the ideal would be that we support an ONLY option for trigger declaration. If your statement is ONLY on the a given level in the hierarchy, the row triggers for only that level would fire. If you don't use ONLY, a row trigger at that level would fire for operations at that level or any child level, but with a record format matching the level of the trigger. Now, that may be too ambitious for this release. If so, I suggest we not implement anything that would be broken by the above, and throw a "not implemented" error when necessary. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Wed, May 10, 2017 at 8:02 AM, Thomas Munro wrote: > On Wed, May 10, 2017 at 11:10 PM, Thomas Munro > wrote: >> 2. If you attach a row-level trigger with transition tables to any >> inheritance child, it will see transition tuples from all tables in >> the inheritance hierarchy at or below the directly named table that >> were modified by the same statement, sliced so that they appear as >> tuples from the directly named table. > > Of course that's a bit crazy, not only for trigger authors to > understand and deal with, but also for plan caching: it just doesn't > really make sense to have a database object, even an ephemeral one, > whose type changes depending on how the trigger was invoked, because > the plans stick around. Perhaps you could modify NamedTuplestorescan > to convert on the fly to the TupleDesc of the table that the row-level > trigger is attached to, using NULL for missing columns, but that'd be > a slightly strange too, depending on how you did it. I don't think it's crazy from a user perspective, but the plan caching thing sounds like a problem. > Perhaps we should reject row-level triggers with transition tables on > tables that are part of an inheritance hierarchy, but allow them for > partitions. Sounds like a sensible solution. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Wed, May 10, 2017 at 11:10 PM, Thomas Munro wrote: > 2. If you attach a row-level trigger with transition tables to any > inheritance child, it will see transition tuples from all tables in > the inheritance hierarchy at or below the directly named table that > were modified by the same statement, sliced so that they appear as > tuples from the directly named table. Of course that's a bit crazy, not only for trigger authors to understand and deal with, but also for plan caching: it just doesn't really make sense to have a database object, even an ephemeral one, whose type changes depending on how the trigger was invoked, because the plans stick around. Perhaps you could modify NamedTuplestorescan to convert on the fly to the TupleDesc of the table that the row-level trigger is attached to, using NULL for missing columns, but that'd be a slightly strange too, depending on how you did it. Perhaps we should reject row-level triggers with transition tables on tables that are part of an inheritance hierarchy, but allow them for partitions. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Wed, May 10, 2017 at 3:55 PM, Robert Haas wrote: > On Tue, May 9, 2017 at 11:48 PM, Thomas Munro > wrote: >> Hmm. DB2 has transition tables (invented them maybe?) and it allows >> OLD/NEW TABLE on row-level triggers: >> >> https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.1.0/com.ibm.db2.luw.admin.dbobj.doc/doc/t0020236.html > > Yeah, my impression is that Kevin was pretty keen on supporting that > case. I couldn't say exactly why, though. Ok, here's a new version that handles row-level triggers with transition tables on any child table. The regression tests show partition and inheritance examples of that. To be clear about what this does: 1. If you attach a row-level trigger with transition tables to any partition, it will see transition tuples from all partitions that were modified by the same statement. 2. If you attach a row-level trigger with transition tables to any inheritance child, it will see transition tuples from all tables in the inheritance hierarchy at or below the directly named table that were modified by the same statement, sliced so that they appear as tuples from the directly named table. On Wed, May 10, 2017 at 3:41 PM, Robert Haas wrote: > Hmm. What if the partitioning hierarchy contains foreign tables? Arghalalkjhsdflg. Looking into that... -- Thomas Munro http://www.enterprisedb.com transition-tuples-from-child-tables-v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Tue, May 9, 2017 at 11:48 PM, Thomas Munro wrote: > Hmm. DB2 has transition tables (invented them maybe?) and it allows > OLD/NEW TABLE on row-level triggers: > > https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.1.0/com.ibm.db2.luw.admin.dbobj.doc/doc/t0020236.html Yeah, my impression is that Kevin was pretty keen on supporting that case. I couldn't say exactly why, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Wed, May 10, 2017 at 11:22 AM, Thomas Munro wrote: > On Wed, May 10, 2017 at 9:57 AM, Alvaro Herrera > wrote: >> Thomas Munro wrote: >> >>> Recall that transition tables can be specified for statement-level >>> triggers AND row-level triggers. If you specify them for row-level >>> triggers, then they can see all rows changed so far each time they >>> fire. >> >> Uhmm ... why do we do this? It seems like a great way to cause much >> confusion. Shouldn't we see the transition table containing the whole >> set for statement-level triggers only, and give row-level triggers just >> the individual affected row each time? > > I assumed that had come from the standard. I don't have a published > standard, but I have just taken a quick look at one of the publicly > available drafts dated 2006. I think its model is that the transition > tables are always conceptually there, and NEW and OLD are just range > variables over those tables. That may explain why transition tables > are mentioned in the context of row-level triggers, and it may be that > the spec's authors never intended row-level triggers to be able to see > the (partial) transition table other than through the range variables > that access exactly one row, but I don't see any wording that > explicitly says so in the spec. Do you? Thoughts, Kevin? Hmm. DB2 has transition tables (invented them maybe?) and it allows OLD/NEW TABLE on row-level triggers: https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.1.0/com.ibm.db2.luw.admin.dbobj.doc/doc/t0020236.html -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Tue, May 9, 2017 at 11:40 PM, Thomas Munro wrote: > On Wed, May 10, 2017 at 2:31 PM, Amit Langote > wrote: >> On 2017/05/10 6:51, Thomas Munro wrote: >>> No such problem exists for partition hierarchies since the tables all >>> appear as the same type to user code (though conversions may be >>> happening for technical reasons). >> >> To clarify a bit, there may exist differences in the ordering of columns, >> either between the parent and its partitions or between different >> partitions. For example, while parent's rowtype is (a int, b char, c >> float), a partition's may be (b char, a int, c float), and yet another >> partition may have (c float, a int, b char). If some user code happens to >> depend on the ordering of columns, selecting from the parent and selecting >> from a partition directly may return the same result but in different >> formats. > > Right. And the patch I posted converts all transition tuples it > collects from child tables to match the TupleDescriptor of the > relation you named, which it gets from > estate->es_root_result_relations[0]. Is that right? I suppose it > will be very common for partitions to have matching TupleDescriptors, > so the TupleConversionMap will usually be NULL meaning no conversion > is ever done. But in the inheritance case they might be different on > purpose, and in both inheritance and partitioning cases they might be > different in physical ways that aren't logically important as you said > (column order, dropped columns). Hmm. What if the partitioning hierarchy contains foreign tables? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Wed, May 10, 2017 at 2:31 PM, Amit Langote wrote: > On 2017/05/10 6:51, Thomas Munro wrote: >> No such problem exists for partition hierarchies since the tables all >> appear as the same type to user code (though conversions may be >> happening for technical reasons). > > To clarify a bit, there may exist differences in the ordering of columns, > either between the parent and its partitions or between different > partitions. For example, while parent's rowtype is (a int, b char, c > float), a partition's may be (b char, a int, c float), and yet another > partition may have (c float, a int, b char). If some user code happens to > depend on the ordering of columns, selecting from the parent and selecting > from a partition directly may return the same result but in different formats. Right. And the patch I posted converts all transition tuples it collects from child tables to match the TupleDescriptor of the relation you named, which it gets from estate->es_root_result_relations[0]. Is that right? I suppose it will be very common for partitions to have matching TupleDescriptors, so the TupleConversionMap will usually be NULL meaning no conversion is ever done. But in the inheritance case they might be different on purpose, and in both inheritance and partitioning cases they might be different in physical ways that aren't logically important as you said (column order, dropped columns). -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On 2017/05/10 6:51, Thomas Munro wrote: > No such problem exists for partition hierarchies since the tables all > appear as the same type to user code (though conversions may be > happening for technical reasons). To clarify a bit, there may exist differences in the ordering of columns, either between the parent and its partitions or between different partitions. For example, while parent's rowtype is (a int, b char, c float), a partition's may be (b char, a int, c float), and yet another partition may have (c float, a int, b char). If some user code happens to depend on the ordering of columns, selecting from the parent and selecting from a partition directly may return the same result but in different formats. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Wed, May 10, 2017 at 9:57 AM, Alvaro Herrera wrote: > Thomas Munro wrote: > >> Recall that transition tables can be specified for statement-level >> triggers AND row-level triggers. If you specify them for row-level >> triggers, then they can see all rows changed so far each time they >> fire. > > Uhmm ... why do we do this? It seems like a great way to cause much > confusion. Shouldn't we see the transition table containing the whole > set for statement-level triggers only, and give row-level triggers just > the individual affected row each time? I assumed that had come from the standard. I don't have a published standard, but I have just taken a quick look at one of the publicly available drafts dated 2006. I think its model is that the transition tables are always conceptually there, and NEW and OLD are just range variables over those tables. That may explain why transition tables are mentioned in the context of row-level triggers, and it may be that the spec's authors never intended row-level triggers to be able to see the (partial) transition table other than through the range variables that access exactly one row, but I don't see any wording that explicitly says so in the spec. Do you? Thoughts, Kevin? After thinking about this some more, it's not only the conversion to some arbitrary parent tuple type that would be surprising to a user of inheritance + triggers + transition tables, it's the fact that a row-level trigger on a given child table will also see tuples collected from other tables in the hierarchy. I think I didn't quite get that right in -v2: it should probably build TriggerTransitionFilter from union of all child tables' transition table flags, not just the named table, so that if any child table has a row trigger we start collecting transition tuples from all others for it to see... That sounds pretty crazy to me. So, assuming we want to proceed with this plan of collecting transition tuples from children, I see approximately 3 choices: 1. Reject transition tables on row-level triggers. 2. Reject transition tables on row-level triggers on tables that inherit from other tables. 3. Continue to allow transition tables on row-level triggers, but document that they must be prepared to see transition tuples as they would when querying arbitrary parent tables, and see tuples from other tables in the hierarchy (!) Another possibility which I haven't seriously considered is per-table transition tables so you'd collect each child's tuples separately. I take Alvaro's comment as a vote for 1. I vote for 1 too. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
Thomas Munro wrote: > Recall that transition tables can be specified for statement-level > triggers AND row-level triggers. If you specify them for row-level > triggers, then they can see all rows changed so far each time they > fire. Uhmm ... why do we do this? It seems like a great way to cause much confusion. Shouldn't we see the transition table containing the whole set for statement-level triggers only, and give row-level triggers just the individual affected row each time? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Tue, May 9, 2017 at 10:29 PM, Thomas Munro wrote: > In master, the decision to populate transition tables happens in > AfterTriggerSaveEvent (called by the ExecAR* functions) in trigger.c. > It does that if it sees that there are triggers on the > relation-being-modified that have transition tables. > > With this patch, nodeModifyTuple.c makes a 'TriggerTransitionFilter' > object to override that behaviour, if there are child tables of either > kind. That is passed to the ExecAR* functions and thence > AfterTriggerSaveEvent, overriding its usual behaviour, so that it can > know that it needs collect tuples from child nodes and how to convert > them to the layout needed for the tuplestores if necessary. > > Thoughts? I'm not yet sure about the locking situation. Generally > needs some more testing. Here is a new version with tidied up tests and a couple of small bug fixes. However, I've realised that there is a surprising behaviour with this approach, and I'm not sure what to do about it. Recall that transition tables can be specified for statement-level triggers AND row-level triggers. If you specify them for row-level triggers, then they can see all rows changed so far each time they fire. Now our policy of firing the statement level triggers only for the named relation but firing row-level triggers for all modified relations leads to a tricky problem for the inheritance case: what type of transition tuples should the child table's row-level triggers see? Suppose you have an inheritance hierarchy like this: animal -> mammal -> cat You define a statement-level trigger on "animal" and another statement-level trigger on "mammal". You define a row-level trigger on "cat". When you update either "animal" or "mammal", the row triggers on "cat" might run. Row-level triggers on "cat" see OLD and NEW as "cat" tuples, of course, but if they are configured to see transition tables, should they see "cat", "mammal" or "animal" tuples in the transition tables? With my patch as it is, that depends on which level of the hierarchy you explicitly updated! No such problem exists for partition hierarchies since the tables all appear as the same type to user code (though conversions may be happening for technical reasons). -- Thomas Munro http://www.enterprisedb.com transition-tuples-from-child-tables-v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Mon, May 8, 2017 at 7:09 PM, Thomas Munro wrote: > On Fri, May 5, 2017 at 8:29 AM, Thomas Munro > wrote: >> On Fri, May 5, 2017 at 2:40 AM, Robert Haas wrote: >>> On Thu, May 4, 2017 at 4:46 AM, Thomas Munro >>> wrote: On Thu, May 4, 2017 at 4:02 AM, Alvaro Herrera wrote: > Robert Haas wrote: >> I suspect that most users would find it more useful to capture all of >> the rows that the statement actually touched, regardless of whether >> they hit the named table or an inheritance child. > > Yes, agreed. For the plain inheritance cases each row would need to > have an indicator of which relation it comes from (tableoid); I'm not > sure if such a thing would be useful in the partitioning case. On Thu, May 4, 2017 at 4:26 AM, David Fetter wrote: > +1 on the not-duct-tape view of partitioned tables. Hmm. Ok. Are we talking about PG10 or PG11 here? Does this approach makes sense? >>> >>> I was thinking PG10 if it can be done straightforwardly. >> >> Ok, I will draft a patch to do it the way I described and see what people >> think. > > FYI I am still working on this and will post a draft patch to do this > (that is: make transition tables capture changes from children with > appropriate tuple conversion) in the next 24 hours. Ok, here is a first swing at it, for discussion. In master, the decision to populate transition tables happens in AfterTriggerSaveEvent (called by the ExecAR* functions) in trigger.c. It does that if it sees that there are triggers on the relation-being-modified that have transition tables. With this patch, nodeModifyTuple.c makes a 'TriggerTransitionFilter' object to override that behaviour, if there are child tables of either kind. That is passed to the ExecAR* functions and thence AfterTriggerSaveEvent, overriding its usual behaviour, so that it can know that it needs collect tuples from child nodes and how to convert them to the layout needed for the tuplestores if necessary. Thoughts? I'm not yet sure about the locking situation. Generally needs some more testing. -- Thomas Munro http://www.enterprisedb.com transition-tuples-from-child-tables.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Fri, May 5, 2017 at 8:29 AM, Thomas Munro wrote: > On Fri, May 5, 2017 at 2:40 AM, Robert Haas wrote: >> On Thu, May 4, 2017 at 4:46 AM, Thomas Munro >> wrote: >>> On Thu, May 4, 2017 at 4:02 AM, Alvaro Herrera >>> wrote: Robert Haas wrote: > I suspect that most users would find it more useful to capture all of > the rows that the statement actually touched, regardless of whether > they hit the named table or an inheritance child. Yes, agreed. For the plain inheritance cases each row would need to have an indicator of which relation it comes from (tableoid); I'm not sure if such a thing would be useful in the partitioning case. >>> >>> On Thu, May 4, 2017 at 4:26 AM, David Fetter wrote: +1 on the not-duct-tape view of partitioned tables. >>> >>> Hmm. Ok. Are we talking about PG10 or PG11 here? Does this approach >>> makes sense? >> >> I was thinking PG10 if it can be done straightforwardly. > > Ok, I will draft a patch to do it the way I described and see what people > think. FYI I am still working on this and will post a draft patch to do this (that is: make transition tables capture changes from children with appropriate tuple conversion) in the next 24 hours. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Fri, May 5, 2017 at 2:40 AM, Robert Haas wrote: > On Thu, May 4, 2017 at 4:46 AM, Thomas Munro > wrote: >> On Thu, May 4, 2017 at 4:02 AM, Alvaro Herrera >> wrote: >>> Robert Haas wrote: I suspect that most users would find it more useful to capture all of the rows that the statement actually touched, regardless of whether they hit the named table or an inheritance child. >>> >>> Yes, agreed. For the plain inheritance cases each row would need to >>> have an indicator of which relation it comes from (tableoid); I'm not >>> sure if such a thing would be useful in the partitioning case. >> >> On Thu, May 4, 2017 at 4:26 AM, David Fetter wrote: >>> +1 on the not-duct-tape view of partitioned tables. >> >> Hmm. Ok. Are we talking about PG10 or PG11 here? Does this approach >> makes sense? > > I was thinking PG10 if it can be done straightforwardly. Ok, I will draft a patch to do it the way I described and see what people think. >> To avoid the whiff of duct tape, we'd probably also want to make ROW >> triggers created on the partitioned table(s) containing partition to >> fire too, with appropriate TypeConversionMap treatment. Not sure what >> exactly is involved there. >> >> On the other hand, doing that with inheritance hierarchies would be an >> incompatible behavioural change, which I guess people don't want -- am >> I right? > > Incompatible with what? Transition tables haven't been released yet. > If we're going to fix the definition of what they do, now's the time. The last two paragraphs of my email were about ROW triggers created on partitioned tables and tables with inheritance children, not the new transition table stuff. I will forget that for now and look only at making the transition tables duct-tape-free. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Thu, May 4, 2017 at 4:46 AM, Thomas Munro wrote: > On Thu, May 4, 2017 at 4:02 AM, Alvaro Herrera > wrote: >> Robert Haas wrote: >>> I suspect that most users would find it more useful to capture all of >>> the rows that the statement actually touched, regardless of whether >>> they hit the named table or an inheritance child. >> >> Yes, agreed. For the plain inheritance cases each row would need to >> have an indicator of which relation it comes from (tableoid); I'm not >> sure if such a thing would be useful in the partitioning case. > > On Thu, May 4, 2017 at 4:26 AM, David Fetter wrote: >> +1 on the not-duct-tape view of partitioned tables. > > Hmm. Ok. Are we talking about PG10 or PG11 here? Does this approach > makes sense? I was thinking PG10 if it can be done straightforwardly. > 1. Remove the prohibition on creating transition-capturing triggers > on a partitioned table. > > 2. Make sure that the ExecAR* functions call AfterTriggerSaveEvent > when modifying partition tables if the explicitly named parent > partitioned table has after triggers with transition tables. Not sure > how exactly how but doesn't seem difficult. > > 3. Convert tuples to the TupleDesc of the relation that owns the > statement trigger (ie the partitioned table) when inserting them into > the tuplestore. One way to do that might be to build an array of > TupleConversionMap objects that does the opposite of the conversions > done by tup_conv_maps. While tup_conv_maps is for converting tuples > to the layout needed for a partition, tup_unconv_maps (or better name) > would be for converting the old and new tuples to the TupleDesc of the > partitioned table. Then the appropriate TupleConversionMap could be > passed into the ExecAR* functions as a new argument 'transition_map'. > AfterTriggerSaveEvent would use 'oldtup' and 'newtup' directly for ROW > triggers, but convert using the passed in map if it needs to insert > them into the transition tuplestores. > > The same thing could work for inheritance, if tupconvert.c had a new > kind of conversion that allows slicing of tuples (converting a wider > child table's tuples to the parent's subset of columns) rather the > just conversion between logically equivalent TupleDescs. > > To avoid the whiff of duct tape, we'd probably also want to make ROW > triggers created on the partitioned table(s) containing partition to > fire too, with appropriate TypeConversionMap treatment. Not sure what > exactly is involved there. > > On the other hand, doing that with inheritance hierarchies would be an > incompatible behavioural change, which I guess people don't want -- am > I right? Incompatible with what? Transition tables haven't been released yet. If we're going to fix the definition of what they do, now's the time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Thu, May 4, 2017 at 4:02 AM, Alvaro Herrera wrote: > Robert Haas wrote: >> I suspect that most users would find it more useful to capture all of >> the rows that the statement actually touched, regardless of whether >> they hit the named table or an inheritance child. > > Yes, agreed. For the plain inheritance cases each row would need to > have an indicator of which relation it comes from (tableoid); I'm not > sure if such a thing would be useful in the partitioning case. On Thu, May 4, 2017 at 4:26 AM, David Fetter wrote: > +1 on the not-duct-tape view of partitioned tables. Hmm. Ok. Are we talking about PG10 or PG11 here? Does this approach makes sense? 1. Remove the prohibition on creating transition-capturing triggers on a partitioned table. 2. Make sure that the ExecAR* functions call AfterTriggerSaveEvent when modifying partition tables if the explicitly named parent partitioned table has after triggers with transition tables. Not sure how exactly how but doesn't seem difficult. 3. Convert tuples to the TupleDesc of the relation that owns the statement trigger (ie the partitioned table) when inserting them into the tuplestore. One way to do that might be to build an array of TupleConversionMap objects that does the opposite of the conversions done by tup_conv_maps. While tup_conv_maps is for converting tuples to the layout needed for a partition, tup_unconv_maps (or better name) would be for converting the old and new tuples to the TupleDesc of the partitioned table. Then the appropriate TupleConversionMap could be passed into the ExecAR* functions as a new argument 'transition_map'. AfterTriggerSaveEvent would use 'oldtup' and 'newtup' directly for ROW triggers, but convert using the passed in map if it needs to insert them into the transition tuplestores. The same thing could work for inheritance, if tupconvert.c had a new kind of conversion that allows slicing of tuples (converting a wider child table's tuples to the parent's subset of columns) rather the just conversion between logically equivalent TupleDescs. To avoid the whiff of duct tape, we'd probably also want to make ROW triggers created on the partitioned table(s) containing partition to fire too, with appropriate TypeConversionMap treatment. Not sure what exactly is involved there. On the other hand, doing that with inheritance hierarchies would be an incompatible behavioural change, which I guess people don't want -- am I right? -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Wed, May 03, 2017 at 11:47:04AM -0400, Robert Haas wrote: > On Tue, May 2, 2017 at 9:44 PM, Thomas Munro > wrote: > > I think that we should only capture transition tuples captured from > > the explicitly named relation, since we only fire AFTER STATEMENT > > triggers on that relation. I see no inconsistency with the policy of > > rejecting transition tables on partitioned tables (as I proposed and > > Kevin accepted[1]), because partitioned tables can't have any data so > > there would be no point. In contrast, every parent table in an > > inheritance hierarchy is also a regular table and can hold data, so I > > think we should allow transition tables on them, and capture > > transition tuples from that table only when you modify it directly. > > I suspect that most users would find it more useful to capture all of > the rows that the statement actually touched, regardless of whether > they hit the named table or an inheritance child. I just don't know > if it's practical to make that work. (And, of course, I don't know if > other people agree with my assessment of what is useful ... but > generally there seems to be support for making partitioned tables, at > least, look more like a single table that happens to have partitions > and less like a bunch of separate tables attached to each other with > duct tape.) +1 on the not-duct-tape view of partitioned tables. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Wed, May 3, 2017 at 12:02 PM, Alvaro Herrera wrote: > Robert Haas wrote: >> I suspect that most users would find it more useful to capture all of >> the rows that the statement actually touched, regardless of whether >> they hit the named table or an inheritance child. > > Yes, agreed. For the plain inheritance cases each row would need to > have an indicator of which relation it comes from (tableoid); I'm not > sure if such a thing would be useful in the partitioning case. I think it would be about equally useful. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
Robert Haas wrote: > I suspect that most users would find it more useful to capture all of > the rows that the statement actually touched, regardless of whether > they hit the named table or an inheritance child. Yes, agreed. For the plain inheritance cases each row would need to have an indicator of which relation it comes from (tableoid); I'm not sure if such a thing would be useful in the partitioning case. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Tue, May 2, 2017 at 9:44 PM, Thomas Munro wrote: > I think that we should only capture transition tuples captured from > the explicitly named relation, since we only fire AFTER STATEMENT > triggers on that relation. I see no inconsistency with the policy of > rejecting transition tables on partitioned tables (as I proposed and > Kevin accepted[1]), because partitioned tables can't have any data so > there would be no point. In contrast, every parent table in an > inheritance hierarchy is also a regular table and can hold data, so I > think we should allow transition tables on them, and capture > transition tuples from that table only when you modify it directly. I suspect that most users would find it more useful to capture all of the rows that the statement actually touched, regardless of whether they hit the named table or an inheritance child. I just don't know if it's practical to make that work. (And, of course, I don't know if other people agree with my assessment of what is useful ... but generally there seems to be support for making partitioned tables, at least, look more like a single table that happens to have partitions and less like a bunch of separate tables attached to each other with duct tape.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Tue, May 2, 2017 at 3:01 AM, Robert Haas wrote: > [...] > Only the rows in the parent show up in the transition table. But now > look what happens if I add an unrelated trigger that also uses > transition tables to the children: > > rhaas=# CREATE FUNCTION u() RETURNS trigger LANGUAGE plpgsql AS > $$begin null; end$$; > CREATE FUNCTION > rhaas=# CREATE TRIGGER x1 AFTER UPDATE ON p1 REFERENCING OLD TABLE AS > old NEW TABLE AS new FOR EACH STATEMENT EXECUTE PROCEDURE u(); > CREATE TRIGGER > rhaas=# UPDATE p SET b = 'whatever'; > NOTICE: table p > NOTICE: table p got value 0 > NOTICE: table p got value 1 > NOTICE: table p got value 2 > UPDATE 3 > > It seems pretty clear to me that this is busted. The existence of > trigger x1 on p1 shouldn't affect whether trigger x on p sees changes > to p1's rows in its transition tables. Yikes -- agreed. See analysis and draft patch for discussion below. > Either all changes to any > descendants of p should be captured by the transition tables, or only > changes to the root table should be captured. If we do the former, > the restriction against using transition tables in triggers on > partitioned tables should be removed, I would think. If we do the > latter, then what we should document is not that partitioned tables > have a restriction that doesn't apply to inheritance but rather that > the restriction on the partitioned case flows from the fact that only > the parent's changes are captured, and the parent is always empty in > the partitioning case. In deciding between these two cases, we should > consider the case where the inheritance children have extra columns > and/or different column orderings. I think that we should only capture transition tuples captured from the explicitly named relation, since we only fire AFTER STATEMENT triggers on that relation. I see no inconsistency with the policy of rejecting transition tables on partitioned tables (as I proposed and Kevin accepted[1]), because partitioned tables can't have any data so there would be no point. In contrast, every parent table in an inheritance hierarchy is also a regular table and can hold data, so I think we should allow transition tables on them, and capture transition tuples from that table only when you modify it directly. The transition table infrastructure only supports exactly one relation being modified at each query level, and it's a bug that this example captures tuples from p1 into the tuplestore intended for p's tuples even though it is not even going to fire the after statement trigger on p1. It's only a coincidence that the tuples have compatible TupleDescs. The pre-existing behaviour for triggers with inheritance is that STATEMENT triggers fire only for the directly named relation, but ROW triggers fire for all affected relations. The transition table patch didn't change that, but it added code to AfterTriggerSaveEvent, which is called by ExecAR(Insert|Update|Delete)Triggers, to capture transitions. That gets called for every updated relation (ie including partitions and inheritance sub-tables) to implement the ROW policy. It needs to be taught not to capture transition tuples from relations except the one directly named. One solution to this problem is for nodeModifyTable.c to tell the ExecAR* functions explicitly whether to capture transition tuples. It knows when it has modified the explicitly named relation in a hierarchy (mt_whichplan == 0) without rerouting via a partitioned table (mt_partition_dispatch_info == NULL). See attached patch for discussion (it lacks tests and needs better comments). Does this make sense? Do you see a better way? [1] https://www.postgresql.org/message-id/CACjxUsNhdm4ZCgaVreLK5kAwHTZUkqJAVXiywwi-HNVsuTLMnA%40mail.gmail.com -- Thomas Munro http://www.enterprisedb.com transition-tables-from-one-relation-only.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Mon, May 1, 2017 at 11:53 AM, Robert Haas wrote: > On Mon, May 1, 2017 at 12:10 PM, Kevin Grittner wrote: >> On Mon, May 1, 2017 at 10:01 AM, Robert Haas wrote: >>> It seems pretty clear to me that this is busted. >> >> I don't think you actually tested anything that is dependent on any >> of my patches there. > > I was testing which rows show up in a transition table, so I assumed > that was related to the transition tables patch. Note that this is > not about which triggers are fired, just about how inheritance > interacts with transition tables. Yeah, I got confused a bit there, comparing to the updateable views case. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Mon, May 1, 2017 at 12:10 PM, Kevin Grittner wrote: > On Mon, May 1, 2017 at 10:01 AM, Robert Haas wrote: >> It seems pretty clear to me that this is busted. > > I don't think you actually tested anything that is dependent on any > of my patches there. I was testing which rows show up in a transition table, so I assumed that was related to the transition tables patch. Note that this is not about which triggers are fired, just about how inheritance interacts with transition tables. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Mon, May 1, 2017 at 10:01 AM, Robert Haas wrote: > It seems pretty clear to me that this is busted. I don't think you actually tested anything that is dependent on any of my patches there. > Adding this as an open item. Kevin? It will take some time to establish what legacy behavior is and how the new transition tables are impacted. My first reaction is that a trigger on the parent should fire for any related action on a child (unless maybe the trigger is defined with an ONLY keyword???) using the TupleDesc of the parent. Note that the SQL spec mandates that even in a AFTER EACH ROW trigger the transition tables must represent all rows affected by the STATEMENT. I think that this should be independent of triggers fired at the row level. I think the rules should be similar for updateable views. This will take some time to investigate, discuss and produce a patch. I think best case is Friday. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Mon, May 1, 2017 at 12:51 AM, Amit Langote wrote: > What we could document now is that partitioned tables don't allow > specifying triggers that reference transition tables. Although, I am > wondering where this note really belongs - the partitioning chapter, the > triggers chapter or the CREATE TRIGGER reference page? Maybe, Kevin and > Thomas have something to say about that. If it turns out that the > partitioning chapter is a good place, here is a documentation patch. I think that before we document this behavior, we'd better make sure we understand exactly what the behavior is, and we'd better make sure it's correct. Currently, triggers that involve transition tables are altogether prohibited when the root relation is partitioned, but are allowed in inheritance cases. However, the actual behavior appears to be buggy. Here's what happens when I define a parent and a child and update all the rows: rhaas=# CREATE FUNCTION t() RETURNS trigger rhaas-# LANGUAGE plpgsql rhaas-# AS $$declare q record; begin raise notice 'table %', tg_table_name; for q in select * from old loop raise notice 'table % got value %', tg_table_name, q.a; end loop; return null; end;$$; CREATE FUNCTION rhaas=# CREATE TABLE p (a int, b text); CREATE TABLE rhaas=# CREATE TABLE p1 () INHERITS (p); CREATE TABLE rhaas=# CREATE TRIGGER x AFTER UPDATE ON p REFERENCING OLD TABLE AS old NEW TABLE AS new FOR EACH STATEMENT EXECUTE PROCEDURE t(); CREATE TRIGGER rhaas=# INSERT INTO p VALUES (0, 'zero'); INSERT 0 1 rhaas=# INSERT INTO p1 VALUES (1, 'one'); INSERT 0 1 rhaas=# INSERT INTO p1 VALUES (2, 'two'); INSERT 0 1 rhaas=# UPDATE p SET b = 'whatever'; NOTICE: table p NOTICE: table p got value 0 UPDATE 3 Only the rows in the parent show up in the transition table. But now look what happens if I add an unrelated trigger that also uses transition tables to the children: rhaas=# CREATE FUNCTION u() RETURNS trigger LANGUAGE plpgsql AS $$begin null; end$$; CREATE FUNCTION rhaas=# CREATE TRIGGER x1 AFTER UPDATE ON p1 REFERENCING OLD TABLE AS old NEW TABLE AS new FOR EACH STATEMENT EXECUTE PROCEDURE u(); CREATE TRIGGER rhaas=# UPDATE p SET b = 'whatever'; NOTICE: table p NOTICE: table p got value 0 NOTICE: table p got value 1 NOTICE: table p got value 2 UPDATE 3 It seems pretty clear to me that this is busted. The existence of trigger x1 on p1 shouldn't affect whether trigger x on p sees changes to p1's rows in its transition tables. Either all changes to any descendants of p should be captured by the transition tables, or only changes to the root table should be captured. If we do the former, the restriction against using transition tables in triggers on partitioned tables should be removed, I would think. If we do the latter, then what we should document is not that partitioned tables have a restriction that doesn't apply to inheritance but rather that the restriction on the partitioned case flows from the fact that only the parent's changes are captured, and the parent is always empty in the partitioning case. In deciding between these two cases, we should consider the case where the inheritance children have extra columns and/or different column orderings. Adding this as an open item. Kevin? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers