Re: a misbehavior of partition row movement (?)

2020-11-20 Thread Amit Langote
On Sat, Oct 3, 2020 at 8:26 PM Amit Langote  wrote:
> On Sat, Oct 3, 2020 at 8:15 PM Tomas Vondra  
> wrote
> > On Sat, Oct 03, 2020 at 11:42:21AM +0900, Amit Langote wrote:
> > >On Fri, Oct 2, 2020 at 11:32 PM David G. Johnston
> > > wrote:
> > >> On Friday, October 2, 2020, Amit Langote 
> > >> wrote:
> > >>>
> > >>>
> > >>> Reporter on that thread says that the last update should have failed
> > >>> and I don't quite see a workable alternative to that.
> > >>
> > >>
> > >> To be clear the OP would rather have it just work, the same as the
> > >> non-row-movement version.  Maybe insert the new row first, execute
> > >> the on update trigger chained from the old row, then delete the old
> > >> row?
> > >
> > >I was thinking yesterday about making it just work, but considering the
> > >changes that would need to be made to how the underlying triggers fire,
> > >it does not seem we would be able to back-port the solution.
> > >
> >
> > I think we need to differentiate between master and backbranches. IMO we
> > should try to make it "just work" in master, and the amount of code
> > should not be an issue there I think (no opinion on whether insert and
> > update trigger is the way to go). For backbranches we may need to do
> > something less intrusive, of course.
>
> Sure, that makes sense.  I will try making a patch for HEAD to make it
> just work unless someone beats me to it.

After working on this for a while, here is my proposal for HEAD.

To reiterate, the problem occurs when an UPDATE of a partitioned PK
table is turned into a DELETE + INSERT.  In that case, the UPDATE RI
triggers are not fired at all, but DELETE ones are, so the foreign key
may fail to get enforced correctly.  For example, even though the new
row after the update is still logically present in the PK table, it
would wrongly get deleted because of the DELETE RI trigger firing if
there's a ON DELETE CASCADE clause on the foreign key.

To fix that, I propose that we teach trigger.c to skip queuing the
events that would be dangerous to fire, such as that for the DELETE on
the source leaf partition mentioned above.  Instead, queue an UPDATE
event on the root target table, matching the actual operation being
performed.  Note though that this new arrangement doesn't affect the
firing of any other triggers except those that are relevant to the
reported problem, viz. the PK-side RI triggers.  All other triggers
fire exactly as they did before.

To make that happen, I had to:

1. Make RI triggers available on partitioned tables at all, which they
are not today.  Also, mark the RI triggers in partitions correctly as
*clones* of the RI triggers in their respective parents.

2. Make it possible to allow AfterTriggerSaveEvent() to access the
query's actual target relation, that is, in addition to the target
relation on which an event was fired.  Also, added a bunch of code to
AFTER TRIGGER infrastructure to handle events fired on partitioned
tables.  Because those events cannot contain physical references to
affected tuples, I generalized the code currently used to handle after
triggers on foreign tables by storing the tuples in and retrieving
them from a tuple store.  I read a bunch of caveats of that
implementation (such as its uselessness for deferred triggers), but
for the limited cases for which it will be used for partitioned
tables, it seems safe, because it won't be used for deferred triggers
on partitioned tables.

Attached patches 0001 and 0002 implement 1 and 2, respectively.
Later, I will post an updated version of the patch for the
back-branches, which, as mentioned upthread, is to prevent the
cross-partition updates on foreign key PK tables.

--
Amit Langote
EDB: http://www.enterprisedb.com


v1-0001-Create-foreign-key-triggers-in-partitioned-tables.patch
Description: Binary data


v1-0002-Enforce-foreign-key-correctly-during-cross-partit.patch
Description: Binary data


Re: a misbehavior of partition row movement (?)

2020-12-14 Thread Amit Langote
On Fri, Nov 20, 2020 at 8:55 PM Amit Langote  wrote:
> On Sat, Oct 3, 2020 at 8:26 PM Amit Langote  wrote:
> > On Sat, Oct 3, 2020 at 8:15 PM Tomas Vondra  
> > wrote
> > > I think we need to differentiate between master and backbranches. IMO we
> > > should try to make it "just work" in master, and the amount of code
> > > should not be an issue there I think (no opinion on whether insert and
> > > update trigger is the way to go). For backbranches we may need to do
> > > something less intrusive, of course.
> >
> > Sure, that makes sense.  I will try making a patch for HEAD to make it
> > just work unless someone beats me to it.
>
> After working on this for a while, here is my proposal for HEAD.
>
> To reiterate, the problem occurs when an UPDATE of a partitioned PK
> table is turned into a DELETE + INSERT.  In that case, the UPDATE RI
> triggers are not fired at all, but DELETE ones are, so the foreign key
> may fail to get enforced correctly.  For example, even though the new
> row after the update is still logically present in the PK table, it
> would wrongly get deleted because of the DELETE RI trigger firing if
> there's a ON DELETE CASCADE clause on the foreign key.
>
> To fix that, I propose that we teach trigger.c to skip queuing the
> events that would be dangerous to fire, such as that for the DELETE on
> the source leaf partition mentioned above.  Instead, queue an UPDATE
> event on the root target table, matching the actual operation being
> performed.  Note though that this new arrangement doesn't affect the
> firing of any other triggers except those that are relevant to the
> reported problem, viz. the PK-side RI triggers.  All other triggers
> fire exactly as they did before.
>
> To make that happen, I had to:
>
> 1. Make RI triggers available on partitioned tables at all, which they
> are not today.  Also, mark the RI triggers in partitions correctly as
> *clones* of the RI triggers in their respective parents.
>
> 2. Make it possible to allow AfterTriggerSaveEvent() to access the
> query's actual target relation, that is, in addition to the target
> relation on which an event was fired.  Also, added a bunch of code to
> AFTER TRIGGER infrastructure to handle events fired on partitioned
> tables.  Because those events cannot contain physical references to
> affected tuples, I generalized the code currently used to handle after
> triggers on foreign tables by storing the tuples in and retrieving
> them from a tuple store.  I read a bunch of caveats of that
> implementation (such as its uselessness for deferred triggers), but
> for the limited cases for which it will be used for partitioned
> tables, it seems safe, because it won't be used for deferred triggers
> on partitioned tables.
>
> Attached patches 0001 and 0002 implement 1 and 2, respectively.
> Later, I will post an updated version of the patch for the
> back-branches, which, as mentioned upthread, is to prevent the
> cross-partition updates on foreign key PK tables.

I have created a CF entry for this:

https://commitfest.postgresql.org/31/2877/

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: a misbehavior of partition row movement (?)

2020-12-14 Thread Arne Roland
Hi,


thanks for looking into this. I haven't yet looked at your patch in detail, yet 
I think we should address the general issue here. To me this doesn't seem to be 
a RI-trigger issue, but a more general issue like I mentioned in the pg-bugs 
thread 
https://www.postgresql.org/message-id/b1bfc99296e34725900bcd9689be8...@index.de


While I like the idea of making fks work, I'd prefer a solution that fires the 
appropriate row trigger for partitioned tables for non RI-cases as well.


Regards

Arne





Re: a misbehavior of partition row movement (?)

2020-12-14 Thread Amit Langote
Hi,

On Tue, Dec 15, 2020 at 12:01 AM Arne Roland  wrote:
> thanks for looking into this. I haven't yet looked at your patch in detail, 
> yet I think we should address the general issue here. To me this doesn't seem 
> to be a RI-trigger issue, but a more general issue like I mentioned in the 
> pg-bugs thread 
> https://www.postgresql.org/message-id/b1bfc99296e34725900bcd9689be8...@index.de

Hmm, maybe you're reading too much into the implementation details of
the fix, because the patch does fix the issue that you mention in the
linked report:

Quoting your original example:

drop table a, b;
create table a (id serial, primary key (id)) partition by range (id);
create table b (id serial,  primary key (id)) partition by range (id);
alter table b add constraint a_fk foreign key (id) references a (id)
on delete cascade;
create table a1 partition of a for values from (1) to (2);
create table a2 partition of a for values from (2) to (3);
create table b1 partition of b for values from (1) to (2);
create table b2 partition of b for values from (2) to (3);
insert into a (id) values (1);
insert into b (id) values (1);

-- correctly errors out instead of silently performing the ON DELETE CASCADE
update a set id=2;
ERROR:  update or delete on table "a" violates foreign key constraint
"a_fk" on table "b"
DETAIL:  Key (id)=(1) is still referenced from table "b".

select * from b;
 id

  1
(1 row)

Changing the example to specify ON UPDATE CASCADE:

drop table a, b;
create table a (id serial, primary key (id)) partition by range (id);
create table b (id serial,  primary key (id)) partition by range (id);
alter table b add constraint a_fk foreign key (id) references a (id)
on delete cascade;
create table a1 partition of a for values from (1) to (2);
create table a2 partition of a for values from (2) to (3);
create table b1 partition of b for values from (1) to (2);
create table b2 partition of b for values from (2) to (3);
insert into a (id) values (1);
insert into b (id) values (1);

-- correctly applies ON UPDATE CASCADE action
update a set id=2;
UPDATE 1

select * from b;
 id

  2
(1 row)

What am I missing about what you think is the more general problem to be solved?

> While I like the idea of making fks work, I'd prefer a solution that fires 
> the appropriate row trigger for partitioned tables for non RI-cases as well.

Maybe we could do that, but I don't know of a known issue where the
root cause is our firing of a row trigger on the leaf partition
instead of on the root partitioned table.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: a misbehavior of partition row movement (?)

2020-12-14 Thread Amit Langote
On Tue, Dec 15, 2020 at 12:43 PM Amit Langote  wrote:
> Quoting your original example:
>
> drop table a, b;
> create table a (id serial, primary key (id)) partition by range (id);
> create table b (id serial,  primary key (id)) partition by range (id);
> alter table b add constraint a_fk foreign key (id) references a (id)
> on delete cascade;
> create table a1 partition of a for values from (1) to (2);
> create table a2 partition of a for values from (2) to (3);
> create table b1 partition of b for values from (1) to (2);
> create table b2 partition of b for values from (2) to (3);
> insert into a (id) values (1);
> insert into b (id) values (1);
>
> -- correctly errors out instead of silently performing the ON DELETE CASCADE
> update a set id=2;
> ERROR:  update or delete on table "a" violates foreign key constraint
> "a_fk" on table "b"
> DETAIL:  Key (id)=(1) is still referenced from table "b".
>
> select * from b;
>  id
> 
>   1
> (1 row)
>
> Changing the example to specify ON UPDATE CASCADE:
>
> drop table a, b;
> create table a (id serial, primary key (id)) partition by range (id);
> create table b (id serial,  primary key (id)) partition by range (id);
> alter table b add constraint a_fk foreign key (id) references a (id)
> on delete cascade;

Oops, I copy-pasted the wrong block of text from my terminal.  I meant:

alter table b add constraint a_fk foreign key (id) references a (id)
on delete cascade on update cascade;

> create table a1 partition of a for values from (1) to (2);
> create table a2 partition of a for values from (2) to (3);
> create table b1 partition of b for values from (1) to (2);
> create table b2 partition of b for values from (2) to (3);
> insert into a (id) values (1);
> insert into b (id) values (1);
>
> -- correctly applies ON UPDATE CASCADE action
> update a set id=2;
> UPDATE 1
>
> select * from b;
>  id
> 
>   2
> (1 row)

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: a misbehavior of partition row movement (?)

2020-12-21 Thread Arne Roland
Hi Amit,


thanks for the quick reply! Sadly I have been busy and the second part of your 
patch does no longer apply in src/include/nodes/execnodes.h:497.


I'm sorry, I should have been more careful rereading my posts. The case I meant 
is the one below. I checked the thread again. I can barely believe, I didn't 
post such an example along back then. Sorry for the confusion!


create table a (id serial, primary key (id)) partition by range (id);
create table b (id serial,  primary key (id)) partition by range (id);
create table a1 partition of a for values from (1) to (2);
create table a2 partition of a for values from (2) to (3);
create table b1 partition of b for values from (1) to (2);
create table b2 partition of b for values from (2) to (3);
insert into a (id) values (1);
insert into b (id) values (1);

create or replace function del_trig_fkt()
 returns trigger
 language plpgsql
as $$
  begin
raise notice 'Deleted!';
return old;
  end;
$$;
create trigger a_del_trig after delete on a for each row execute function 
del_trig_fkt();
create or replace function public.upd_trig_fkt()
 returns trigger
 language plpgsql
as $function$
begin
  raise notice 'Updated!';
  return new;
end;
$function$;
create trigger a_upd_trig after update on a for each row execute function 
upd_trig_fkt();

update a set id=2;

To me the issue seems to have litte to do with the fact that the trigger is 
executed on the leaf node in itself, but rather we lack the infrastructure to 
track whether the tuple is removed or only rerouted.

Regards
Arne



From: Amit Langote 
Sent: Tuesday, December 15, 2020 4:45:19 AM
To: Arne Roland
Cc: Tomas Vondra; David G. Johnston; PostgreSQL-development
Subject: Re: a misbehavior of partition row movement (?)

On Tue, Dec 15, 2020 at 12:43 PM Amit Langote  wrote:
> Quoting your original example:
>
> drop table a, b;
> create table a (id serial, primary key (id)) partition by range (id);
> create table b (id serial,  primary key (id)) partition by range (id);
> alter table b add constraint a_fk foreign key (id) references a (id)
> on delete cascade;
> create table a1 partition of a for values from (1) to (2);
> create table a2 partition of a for values from (2) to (3);
> create table b1 partition of b for values from (1) to (2);
> create table b2 partition of b for values from (2) to (3);
> insert into a (id) values (1);
> insert into b (id) values (1);
>
> -- correctly errors out instead of silently performing the ON DELETE CASCADE
> update a set id=2;
> ERROR:  update or delete on table "a" violates foreign key constraint
> "a_fk" on table "b"
> DETAIL:  Key (id)=(1) is still referenced from table "b".
>
> select * from b;
>  id
> 
>   1
> (1 row)
>
> Changing the example to specify ON UPDATE CASCADE:
>
> drop table a, b;
> create table a (id serial, primary key (id)) partition by range (id);
> create table b (id serial,  primary key (id)) partition by range (id);
> alter table b add constraint a_fk foreign key (id) references a (id)
> on delete cascade;

Oops, I copy-pasted the wrong block of text from my terminal.  I meant:

alter table b add constraint a_fk foreign key (id) references a (id)
on delete cascade on update cascade;

> create table a1 partition of a for values from (1) to (2);
> create table a2 partition of a for values from (2) to (3);
> create table b1 partition of b for values from (1) to (2);
> create table b2 partition of b for values from (2) to (3);
> insert into a (id) values (1);
> insert into b (id) values (1);
>
> -- correctly applies ON UPDATE CASCADE action
> update a set id=2;
> UPDATE 1
>
> select * from b;
>  id
> 
>   2
> (1 row)

--
Amit Langote
EDB: http://www.enterprisedb.com





Re: a misbehavior of partition row movement (?)

2020-12-21 Thread Amit Langote
Hi,

On Mon, Dec 21, 2020 at 11:30 PM Arne Roland  wrote:
> thanks for the quick reply! Sadly I have been busy and the second part of 
> your patch does no longer apply in src/include/nodes/execnodes.h:497.

I don't see any problem with applying the patch.  Are you sure you're
applying the patch to the correct git branch?  The patch is meant to
be applied to the development (master) branch.

> I'm sorry, I should have been more careful rereading my posts. The case I 
> meant is the one below. I checked the thread again. I can barely believe, I 
> didn't post such an example along back then. Sorry for the confusion!

No worries, thanks for the follow up.

> create table a (id serial, primary key (id)) partition by range (id);
> create table b (id serial,  primary key (id)) partition by range (id);
> create table a1 partition of a for values from (1) to (2);
> create table a2 partition of a for values from (2) to (3);
> create table b1 partition of b for values from (1) to (2);
> create table b2 partition of b for values from (2) to (3);
> insert into a (id) values (1);
> insert into b (id) values (1);
>
> create or replace function del_trig_fkt()
>  returns trigger
>  language plpgsql
> as $$
>   begin
> raise notice 'Deleted!';
> return old;
>   end;
> $$;
> create trigger a_del_trig after delete on a for each row execute function 
> del_trig_fkt();
> create or replace function public.upd_trig_fkt()
>  returns trigger
>  language plpgsql
> as $function$
> begin
>   raise notice 'Updated!';
>   return new;
> end;
> $function$;
> create trigger a_upd_trig after update on a for each row execute function 
> upd_trig_fkt();
>
> update a set id=2;

The output for this I get with (or without) the patch is:

NOTICE:  Deleted!
UPDATE 1

> To me the issue seems to have litte to do with the fact that the trigger is 
> executed on the leaf node in itself, but rather we lack the infrastructure to 
> track whether the tuple is removed or only rerouted.

This behavior of partition key updates with regard to *user-defined*
AFTER triggers is documented:

https://www.postgresql.org/docs/current/trigger-definition.html

"As far as AFTER ROW triggers are concerned, AFTER DELETE and AFTER
INSERT triggers are applied; but AFTER UPDATE triggers are not applied
because the UPDATE has been converted to a DELETE and an INSERT."

I don't quite recall if the decision to implement it like this was
based on assuming that this is what users would like to see happen in
this case or the perceived difficulty of implementing it the other way
around, that is, of firing AFTER UPDATE triggers in this case.

As for the original issue of this thread, it happens to be fixed by
firing the *internal* AFTER UPDATE triggers that are involved in
enforcing the foreign key even if the UPDATE has been turned into
DELETE+INSERT, which it makes sense to do, because what can happen
today with CASCADE triggers does not seem like a useful behavior by
any measure.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: a misbehavior of partition row movement (?)

2020-12-21 Thread Amit Langote
On Tue, Dec 22, 2020 at 4:16 PM Amit Langote  wrote:
> On Mon, Dec 21, 2020 at 11:30 PM Arne Roland  wrote:
> > thanks for the quick reply! Sadly I have been busy and the second part of 
> > your patch does no longer apply in src/include/nodes/execnodes.h:497.
>
> I don't see any problem with applying the patch.  Are you sure you're
> applying the patch to the correct git branch?  The patch is meant to
> be applied to the development (master) branch.

Sorry, it seems you are right and the 2nd patch indeed fails to apply to master.

Attached updated version.


--
Amit Langote
EDB: http://www.enterprisedb.com


v2-0001-Create-foreign-key-triggers-in-partitioned-tables.patch
Description: Binary data


v2-0002-Enforce-foreign-key-correctly-during-cross-partit.patch
Description: Binary data


Re: a misbehavior of partition row movement (?)

2020-12-28 Thread Arne Roland
While I'd agree that simply deleting with "on delete cascade" on tuple 
rerouting is a strong enough violation of the spirit of partitioning changing 
that behavior, I am surprised by the idea to make a differentiation between fks 
and other triggers. The way user defined triggers work, is a violation to the 
same degree I get complaints about that on a monthly (if not weekly) basis. 
It's easy to point towards the docs, but the docs offer no solution to archive 
the behavior, that makes partitioning somewhat transparent. Most people I know 
don't partition because they like to complicate things, but just because they 
have to much data.

It isn't just a thing with after triggers. Imo before triggers are even worse: 
If we move a row between partitions we fire all three triggers at once (at 
different leaf pages obviously). It's easy to point towards the docs. Heart 
bleed was well documented, but I am happy that it was fixed. I don't want to 
imply this totally unrelated security issue has anything to do with our weird 
behavior. I just want to raise the question whether that's a good thing, 
because frankly I haven't met anyone thus far, who thought it is.

To me it seems the RI triggers are just a specific victim of the way we made 
triggers work in general.

What I tried to express, albeit I apparently failed: I care about having 
triggers, which make partitioning transparent, on the long run.

> because what can happen
> today with CASCADE triggers does not seem like a useful behavior by
> any measure.

I wholeheartedly agree. I just want to point out, that you could state the same 
about triggers in general.

Backwards compatibility is usually a pretty compelling argument. I would still 
want to raise the question, whether this should change, because I frankly think 
this is a bad idea.

You might ask: Why am I raising this question in this thread?

In case we can not (instantly) agree on the fact that this behavior should 
change, I'd still prefer to think about making that behavior possible for other 
triggers (possibly later) as well. One possibility would be having an entry in 
the catalog to mark when the trigger should fire.

I don't want to stall your definitely useful RI-Trigger fix, but I sincerely 
believe, that we have to do better with triggers in general.

If we would make the condition when to fire or not dependent something besides 
that fact whether the trigger is internal, we could at a later date choose to 
make both ways available, if anyone makes a good case for this. Even though I 
still think it's not worth it.

>I don't quite recall if the decision to implement it like this was
> based on assuming that this is what users would like to see happen in
> this case or the perceived difficulty of implementing it the other way
> around, that is, of firing AFTER UPDATE triggers in this case.

I tried to look that up, but I couldn't find any discussion about this. Do you 
have any ideas in which thread that was handled?

> Sorry, it seems you are right and the 2nd patch indeed fails to apply to 
> master.

Thank you! I hope to have a more in depth look later this week.

Regards
Arne




Re: a misbehavior of partition row movement (?)

2021-03-23 Thread Amit Langote
Sawada-san,

On Wed, Mar 10, 2021 at 4:51 PM Masahiko Sawada  wrote:
> I looked at the 0001 patch and here are random comments. Please ignore
> a comment if it is already discussed.

Thanks a lot for the review and sorry for the delay in replying.

> ---
> @@ -9077,7 +9102,8 @@ addFkRecurseReferenced(List **wqueue, Constraint
> *fkconstraint, Relation rel,
>partIndexId, constrOid, numfks,
>mapped_pkattnum, fkattnum,
>pfeqoperators, ppeqoperators, 
> ffeqoperators,
> -  old_check_ok);
> +  old_check_ok,
> +  deleteTriggerOid, updateTriggerOid);
>
> /* Done -- clean up (but keep the lock) */
> table_close(partRel, NoLock);
> @@ -9126,8 +9152,12 @@ addFkRecurseReferencing(List **wqueue,
> Constraint *fkconstraint, Relation rel,
> Relation pkrel, Oid indexOid, Oid parentConstr,
> int numfks, int16 *pkattnum, int16 *fkattnum,
> Oid *pfeqoperators, Oid *ppeqoperators, Oid
> *ffeqoperators,
> -   bool old_check_ok, LOCKMODE lockmode)
> +   bool old_check_ok, LOCKMODE lockmode,
> +   Oid parentInsTrigger, Oid parentUpdTrigger)
>  {
>
> We need to update the function comments as well.

Done.

> ---
> I think it's better to add comments for newly added functions such as
> GetForeignKeyActionTriggers() and GetForeignKeyCheckTriggers() etc.
> Those functions have no comment at all.

I've added comments.

> BTW, those two functions out of newly added four functions:
> AttachForeignKeyCheckTriggers() and DetachForeignKeyCheckTriggers(),
> have only one user. Can we past the functions body at where each
> function is called?

I made those pieces of code into functions because I thought future
patches may have a need for them.  But maybe those future patches
should do the refactoring, so I've incorporated their code into the
respective callers as you suggest.

> ---
> /*
>  * If the referenced table is a plain relation, create the action triggers
>  * that enforce the constraint.
>  */
> -   if (pkrel->rd_rel->relkind == RELKIND_RELATION)
> -   {
> -   createForeignKeyActionTriggers(rel, RelationGetRelid(pkrel),
> -  fkconstraint,
> -  constrOid, indexOid);
> -   }
> +   createForeignKeyActionTriggers(rel, RelationGetRelid(pkrel),
> +  fkconstraint,
> +  constrOid, indexOid,
> +  parentDelTrigger, parentUpdTrigger,
> +  &deleteTriggerOid, &updateTriggerOid);
>
> The comment needs to be updated.
> ---
>  /*
>   * If the referencing relation is a plain table, add the check triggers 
> to
>   * it and, if necessary, schedule it to be checked in Phase 3.
>   *
>   * If the relation is partitioned, drill down to do it to its partitions.
>   */
> +createForeignKeyCheckTriggers(RelationGetRelid(rel),
> +  RelationGetRelid(pkrel),
> +  fkconstraint,
> +  parentConstr,
> +  indexOid,
> +  parentInsTrigger, parentUpdTrigger,
> +  &insertTriggerOid, &updateTriggerOid);
>
> Same as above.

Done and done.

> ---
> I think TriggerSetParentTrigger needs to free the heap tuple copied by
> heap_copytuple().

Oops, done.

> ---
> +   Assert(trigForm->tgparentid == 0);
> +   if (trigForm->tgparentid != InvalidOid)
> +   elog(ERROR, "trigger %u already has a parent trigger",
> +childTrigId);
>
> I think the first condition in Assert() can be
> !OidIsValid(trigForm->tgparentid) and the second condition in 'if'
> statement can be OidIsValid(trigForm->tgparentid). So those are
> redundant?

Ah, indeed.  I've kept the if () elog(...) after applying your suggested change.

> ---
> -   if (fout->remoteVersion >= 9)
> +   if (fout->remoteVersion >= 13)
> +   {
> +   /*
> +* NB: think not to use pretty=true in pg_get_triggerdef.  It
> +* could result in non-forward-compatible dumps of WHEN clauses
> +* due to under-parenthesization.
> +*/
> +   appendPQExpBuffer(query,
> + "SELECT tgname, "
> + "tgfoid::pg_catalog.regproc AS tgfname, "
> + "pg_catalog.pg_get_triggerdef(oid,
> false) AS tgdef, "
> + "tgenabled, tableoid, oid "
> + "FROM pg_catalog.pg_trigger t "
> +  

Re: a misbehavior of partition row movement (?)

2021-03-31 Thread Masahiko Sawada
On Tue, Mar 23, 2021 at 6:27 PM Amit Langote  wrote:
>
> Sawada-san,
>
> On Wed, Mar 10, 2021 at 4:51 PM Masahiko Sawada  wrote:
> > I looked at the 0001 patch and here are random comments. Please ignore
> > a comment if it is already discussed.
>
> Thanks a lot for the review and sorry for the delay in replying.

No problem. Sorry for the late reply too.

> > Or is this change really needed? This change added one condition
> > "tgparentid = 0" but IIUC I think triggers that are NOT tgisinternal
> > are always tgparentid = 0. Also, it seems it is true both before and
> > after this patch.
>
> Actually, as noted in the commit message, I'm intending to change
> tgisnternal to only be true for triggers generated by foreign keys and
> no longer for partitions' user-defined triggers that are inherited.
> So whereas NOT tgisnternal would suffice to exclude partitions'
> inherited triggers before, that would no longer be the case with this
> patch; AND tgparentid = 0 will be needed for that.

Understood.

>
> Actually, I found a big hole in my assumptions around deferrable
> foreign constraints, invalidating the approach I took in 0002 to use a
> query-lifetime tuplestore to record root parent tuples.  I'm trying to
> find a way to make the tuplestore transaction-lifetime so that the
> patch still works.
>
> In the meantime, I'm attaching an updated set with 0001 changed per
> your comments.

0001 patch conflicts with 71f4c8c6f74. Could you please rebase the patchset?

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: a misbehavior of partition row movement (?)

2021-04-02 Thread Amit Langote
On Thu, Apr 1, 2021 at 10:56 AM Masahiko Sawada  wrote:
> On Tue, Mar 23, 2021 at 6:27 PM Amit Langote  wrote:
> > Actually, I found a big hole in my assumptions around deferrable
> > foreign constraints, invalidating the approach I took in 0002 to use a
> > query-lifetime tuplestore to record root parent tuples.  I'm trying to
> > find a way to make the tuplestore transaction-lifetime so that the
> > patch still works.
> >
> > In the meantime, I'm attaching an updated set with 0001 changed per
> > your comments.
>
> 0001 patch conflicts with 71f4c8c6f74. Could you please rebase the patchset?

Thanks for the heads up.

I still don't have a working patch to address the above mentioned
shortcoming of the previous approach, but here is a rebased version in
the meantime.


--
Amit Langote
EDB: http://www.enterprisedb.com


v7-0001-Create-foreign-key-triggers-in-partitioned-tables.patch
Description: Binary data


v7-0002-Enforce-foreign-key-correctly-during-cross-partit.patch
Description: Binary data


Re: a misbehavior of partition row movement (?)

2021-02-14 Thread Masahiko Sawada
On Wed, Jan 20, 2021 at 7:04 PM Amit Langote  wrote:
>
> On Wed, Jan 20, 2021 at 4:13 PM Peter Eisentraut
>  wrote:
> > On 2021-01-08 09:54, Amit Langote wrote:
> > >>> I don't quite recall if the decision to implement it like this was
> > >>> based on assuming that this is what users would like to see happen in
> > >>> this case or the perceived difficulty of implementing it the other way
> > >>> around, that is, of firing AFTER UPDATE triggers in this case.
> > >> I tried to look that up, but I couldn't find any discussion about this. 
> > >> Do you have any ideas in which thread that was handled?
> > > It was discussed here:
> > >
> > > https://www.postgresql.org/message-id/flat/CAJ3gD9do9o2ccQ7j7%2BtSgiE1REY65XRiMb%3DyJO3u3QhyP8EEPQ%40mail.gmail.com
> > >
> > > It's a huge discussion, so you'll have to ctrl+f "trigger" to spot
> > > relevant emails.  You might notice that the developers who
> > > participated in that discussion gave various opinions and what we have
> > > today got there as a result of a majority of them voting for the
> > > current approach.  Someone also said this during the discussion:
> > > "Regarding the trigger issue, I can't claim to have a terribly strong
> > > opinion on this. I think that practically anything we do here might
> > > upset somebody, but probably any halfway-reasonable thing we choose to
> > > do will be OK for most people." So what we've got is that
> > > "halfway-reasonable" thing, YMMV. :)
> >
> > Could you summarize here what you are trying to do with respect to what
> > was decided before?  I'm a bit confused, looking through the patches you
> > have posted.  The first patch you posted hard-coded FK trigger OIDs
> > specifically, other patches talk about foreign key triggers in general
> > or special case internal triggers or talk about all triggers.
>
> The original problem statement is this: the way we generally fire
> row-level triggers of a partitioned table can lead to some unexpected
> behaviors of the foreign keys pointing to that partitioned table
> during its cross-partition updates.
>
> Let's start with an example that shows how triggers are fired during a
> cross-partition update:
>
> create table p (a numeric primary key) partition by list (a);
> create table p1 partition of p for values in (1);
> create table p2 partition of p for values in (2);
> create or replace function report_trig_details() returns trigger as $$
> begin raise notice '% % on %', tg_when, tg_op, tg_relname; if tg_op =
> 'DELETE' then return old; end if; return new; end; $$ language
> plpgsql;
> create trigger trig1 before update on p for each row execute function
> report_trig_details();
> create trigger trig2 after update on p for each row execute function
> report_trig_details();
> create trigger trig3 before delete on p for each row execute function
> report_trig_details();
> create trigger trig4 after delete on p for each row execute function
> report_trig_details();
> create trigger trig5 before insert on p for each row execute function
> report_trig_details();
> create trigger trig6 after insert on p for each row execute function
> report_trig_details();
>
> insert into p values (1);
> update p set a = 2;
> NOTICE:  BEFORE UPDATE on p1
> NOTICE:  BEFORE DELETE on p1
> NOTICE:  BEFORE INSERT on p2
> NOTICE:  AFTER DELETE on p1
> NOTICE:  AFTER INSERT on p2
> UPDATE 1
>
> (AR update triggers are not fired.)
>
> For contrast, here is an intra-partition update:
>
> update p set a = a;
> NOTICE:  BEFORE UPDATE on p2
> NOTICE:  AFTER UPDATE on p2
> UPDATE 1
>
> Now, the trigger machinery makes no distinction between user-defined
> and internal triggers, which has implications for the foreign key
> enforcing triggers on partitions.  Consider the following example:
>
> create table q (a bigint references p);
> insert into q values (2);
> update p set a = 1;
> NOTICE:  BEFORE UPDATE on p2
> NOTICE:  BEFORE DELETE on p2
> NOTICE:  BEFORE INSERT on p1
> ERROR:  update or delete on table "p2" violates foreign key constraint
> "q_a_fkey2" on table "q"
> DETAIL:  Key (a)=(2) is still referenced from table "q".
>
> So the RI delete trigger (NOT update) on p2 prevents the DELETE that
> occurs as part of the row movement.  One might make the updates
> cascade and expect that to prevent the error:
>
> drop table q;
> create table q (a bigint references p on update cascade);
> insert into q values (2);
> update p set a = 1;
> NOTICE:  BEFORE UPDATE on p2
> NOTICE:  BEFORE DELETE on p2
> NOTICE:  BEFORE INSERT on p1
> ERROR:  update or delete on table "p2" violates foreign key constraint
> "q_a_fkey2" on table "q"
> DETAIL:  Key (a)=(2) is still referenced from table "q".
>
> No luck, because again it's the RI delete trigger on p2 that gets
> fired.  If you make deletes cascade too, an even worse thing happens:
>
> drop table q;
> create table q (a bigint references p on update cascade on delete cascade);
> insert into q values (2);
> update p set a = 1;
> NOTICE:  BEFORE UPDATE on p2
> N

Re: a misbehavior of partition row movement (?)

2021-02-15 Thread Amit Langote
Thank you for looking at this.

On Mon, Feb 15, 2021 at 4:12 PM Masahiko Sawada  wrote:
> On Wed, Jan 20, 2021 at 7:04 PM Amit Langote  wrote:
> > This shows that the way we've made these triggers behave in general
> > can cause some unintended behaviors for foreign keys during
> > cross-partition updates.  I started this thread to do something about
> > that and sent a patch to prevent cross-partition updates at all when
> > there are foreign keys pointing to it.  As others pointed out, that's
> > not a great long-term solution to the problem, but that's what we may
> > have to do in the back-branches if anything at all.
>
> I've started by reviewing the patch for back-patching that the first
> patch you posted[1].
>
> +   for (i = 0; i < trigdesc->numtriggers; i++)
> +   {
> +   Trigger *trigger = &trigdesc->triggers[i];
> +
> +   if (trigger->tgisinternal &&
> +   OidIsValid(trigger->tgconstrrelid) &&
> +   trigger->tgfoid == F_RI_FKEY_CASCADE_DEL)
> +   {
> +   found = true;
> +   break;
> +   }
> +   }
>
> IIUC the above checks if the partition table is referenced by a
> foreign key constraint on another table with ON DELETE CASCADE option.
> I think we should prevent cross-partition update also when ON DELETE
> SET NULL and ON DELETE SET DEFAULT. For example, with the patch, a
> tuple in a partition table is still updated to NULL when
> cross-partition update as follows:
>
> postgres=# create table p (a numeric primary key) partition by list (a);
> CREATE TABLE
> postgres=# create table p1 partition of p for values in (1);
> CREATE TABLE
> postgres=# create table p2 partition of p for values in (2);
> CREATE TABLE
> postgres=# insert into p values (1);
> INSERT 0 1
> postgres=# create table q (a int references p(a) on delete set null);
> CREATE TABLE
> postgres=# insert into q values (1);
> INSERT 0 1
> postgres=# update p set a = 2;
> UPDATE 1
> postgres=# table p;
>  a
> ---
>  2
> (1 row)
>
> postgres=# table q;
>  a
> ---
>
> (1 row)

Yeah, I agree that's not good.

Actually, I had meant to send an updated version of the patch to apply
in back-branches that would prevent such a cross-partition update, but
never did since starting to work on the patch for HEAD.  I have
attached it here.

Regarding the patch, I would have liked if it only prevented the
update when the foreign key that would be violated by the component
delete references the update query's main target relation.  If the
foreign key references the source partition directly, then the delete
would be harmless.  However there's not enough infrastructure in v12,
v13 branches to determine that, which makes back-patching this a bit
hard.  For example, there's no way to tell in the back-branches if the
foreign-key-enforcing triggers of a leaf partition have descended from
the parent table.  IOW, I am not so sure anymore if we should try to
do anything in the back-branches.

-- 
Amit Langote
EDB: http://www.enterprisedb.com


prevent-row-movement-on-pk-table.patch
Description: Binary data


Re: a misbehavior of partition row movement (?)

2021-02-18 Thread Rahila Syed
Hi Amit,


>
> Here is an updated version of the patch with some cosmetic changes
> from the previous version.  I moved the code being added to
> AfterTriggerSaveEvent() and ExecUpdate() into separate subroutines to
> improve readability, hopefully.
>
> I tested these patches. It works as expected in case of cross partition
updates, by correctly updating the
referencing table.  It works fine for ON UPDATE SET NULL and SET DEFAULT
options as well.
Also, tested by having a table reference only a partition and not the
parent. In this case, the delete
trigger is correctly called when the row is moved out of referenced
partition.

The partition-key-update-1.spec test fails with the following error message
appearing in the diffs.

 step s1u3pc: UPDATE foo_range_parted SET a=11 WHERE a=7;
+ERROR:  cannot move row being updated to another partition

I think the documentation update is missing from the patches.

>
Thank you,
Rahila Syed


Re: a misbehavior of partition row movement (?)

2021-02-18 Thread Amit Langote
Hi Rahila,

Thanks for the review.

On Thu, Feb 18, 2021 at 7:08 PM Rahila Syed  wrote:
>> Here is an updated version of the patch with some cosmetic changes
>> from the previous version.  I moved the code being added to
>> AfterTriggerSaveEvent() and ExecUpdate() into separate subroutines to
>> improve readability, hopefully.
>>
> I tested these patches.
>
> It works as expected in case of cross partition updates, by correctly 
> updating the
> referencing table.  It works fine for ON UPDATE SET NULL and SET DEFAULT 
> options as well.
> Also, tested by having a table reference only a partition and not the parent. 
> In this case, the delete
> trigger is correctly called when the row is moved out of referenced partition.

I assume these are comments for the v3-0001 & v3-0002 patches...

> The partition-key-update-1.spec test fails with the following error message 
> appearing in the diffs.
>
>  step s1u3pc: UPDATE foo_range_parted SET a=11 WHERE a=7;
> +ERROR:  cannot move row being updated to another partition

...whereas, this error happens with the patch I posted in my last
email (prevent-row-movement-on-pk-table.patch) that is not meant to be
considered for HEAD, but for back-branches (if at all).  I also see
that cfbot got triggered by it and shows the same failure.  I am not
going to try to take care of these failures unless we want to do
something in the back-branches.

To be clear, patches for HEAD do pass make check-world.

> I think the documentation update is missing from the patches.

Hmm, I don't think we document the behavior that is improved by the v3
patches as a limitation of any existing feature, neither of foreign
keys referencing partitioned tables nor of the update row movement
feature.  So maybe there's nothing in the existing documentation that
is to be updated.

However, the patch does add a new error message for a case that the
patch doesn't handle, so maybe we could document that as a limitation.
Not sure if in the Notes section of the UPDATE reference page which
has some notes on row movement or somewhere else.  Do you have
suggestions?

Attaching rebased version of the patches for HEAD to appease the cfbot.

-- 
Amit Langote
EDB: http://www.enterprisedb.com


v4-0001-Create-foreign-key-triggers-in-partitioned-tables.patch
Description: Binary data


v4-0002-Enforce-foreign-key-correctly-during-cross-partit.patch
Description: Binary data


Re: a misbehavior of partition row movement (?)

2021-02-19 Thread Masahiko Sawada
On Mon, Feb 15, 2021 at 10:37 PM Amit Langote  wrote:
>
> Thank you for looking at this.
>
> On Mon, Feb 15, 2021 at 4:12 PM Masahiko Sawada  wrote:
> > On Wed, Jan 20, 2021 at 7:04 PM Amit Langote  
> > wrote:
> > > This shows that the way we've made these triggers behave in general
> > > can cause some unintended behaviors for foreign keys during
> > > cross-partition updates.  I started this thread to do something about
> > > that and sent a patch to prevent cross-partition updates at all when
> > > there are foreign keys pointing to it.  As others pointed out, that's
> > > not a great long-term solution to the problem, but that's what we may
> > > have to do in the back-branches if anything at all.
> >
> > I've started by reviewing the patch for back-patching that the first
> > patch you posted[1].
> >
> > +   for (i = 0; i < trigdesc->numtriggers; i++)
> > +   {
> > +   Trigger *trigger = &trigdesc->triggers[i];
> > +
> > +   if (trigger->tgisinternal &&
> > +   OidIsValid(trigger->tgconstrrelid) &&
> > +   trigger->tgfoid == F_RI_FKEY_CASCADE_DEL)
> > +   {
> > +   found = true;
> > +   break;
> > +   }
> > +   }
> >
> > IIUC the above checks if the partition table is referenced by a
> > foreign key constraint on another table with ON DELETE CASCADE option.
> > I think we should prevent cross-partition update also when ON DELETE
> > SET NULL and ON DELETE SET DEFAULT. For example, with the patch, a
> > tuple in a partition table is still updated to NULL when
> > cross-partition update as follows:
> >
> > postgres=# create table p (a numeric primary key) partition by list (a);
> > CREATE TABLE
> > postgres=# create table p1 partition of p for values in (1);
> > CREATE TABLE
> > postgres=# create table p2 partition of p for values in (2);
> > CREATE TABLE
> > postgres=# insert into p values (1);
> > INSERT 0 1
> > postgres=# create table q (a int references p(a) on delete set null);
> > CREATE TABLE
> > postgres=# insert into q values (1);
> > INSERT 0 1
> > postgres=# update p set a = 2;
> > UPDATE 1
> > postgres=# table p;
> >  a
> > ---
> >  2
> > (1 row)
> >
> > postgres=# table q;
> >  a
> > ---
> >
> > (1 row)
>
> Yeah, I agree that's not good.
>
> Actually, I had meant to send an updated version of the patch to apply
> in back-branches that would prevent such a cross-partition update, but
> never did since starting to work on the patch for HEAD.  I have
> attached it here.

Thank you for updating the patch!

>
> Regarding the patch, I would have liked if it only prevented the
> update when the foreign key that would be violated by the component
> delete references the update query's main target relation.  If the
> foreign key references the source partition directly, then the delete
> would be harmless. However there's not enough infrastructure in v12,
> v13 branches to determine that, which makes back-patching this a bit
> hard.  For example, there's no way to tell in the back-branches if the
> foreign-key-enforcing triggers of a leaf partition have descended from
> the parent table.  IOW, I am not so sure anymore if we should try to
> do anything in the back-branches.

Hmm I'm not sure the necessity of figuring out foreign-key-enforcing
triggers of a leaf partition have descended from the parent table. I
might be missing something but even if the foreign key references the
leaf partition directly, the same problem could happen if doing a
cross-partition update, is that right?

Also, the updated patch prevents a cross-partition update even when
the foreign key references another column of itself. This kind of
cross-update works as expected for now so it seems not to need to
disallow that. What do you think? The scenario is:

create table p (a int primary key, b int references p(a) on delete
cascade) partition by list (a);
create table p1 partition of p for values in (1);
create table p2 partition of p for values in (2);
insert into p values (1, 1);
update p set a = 2, b = 2 where a = 1;
ERROR:  cannot move row being updated to another partition
DETAIL:  Moving the row may cause a foreign key involving the source
partition to be violated.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: a misbehavior of partition row movement (?)

2021-02-21 Thread Amit Langote
On Fri, Feb 19, 2021 at 5:04 PM Masahiko Sawada  wrote:
> On Mon, Feb 15, 2021 at 10:37 PM Amit Langote  wrote:
> > Regarding the patch, I would have liked if it only prevented the
> > update when the foreign key that would be violated by the component
> > delete references the update query's main target relation.  If the
> > foreign key references the source partition directly, then the delete
> > would be harmless. However there's not enough infrastructure in v12,
> > v13 branches to determine that, which makes back-patching this a bit
> > hard.  For example, there's no way to tell in the back-branches if the
> > foreign-key-enforcing triggers of a leaf partition have descended from
> > the parent table.  IOW, I am not so sure anymore if we should try to
> > do anything in the back-branches.
>
> Hmm I'm not sure the necessity of figuring out foreign-key-enforcing
> triggers of a leaf partition have descended from the parent table. I
> might be missing something but even if the foreign key references the
> leaf partition directly, the same problem could happen if doing a
> cross-partition update, is that right?

Actually not, because in that case the referencing relations only care
about the contents of that leaf partition, so it's okay that the ON
DELETE actions are performed in that case, or IOW, no need to abort
the update.  Contrast that with when the foreign key references the
parent table being updated, where both the old and the new row
logically belong to the table being referenced, so performing ON
DELETE actions using the source leaf partition's trigger is wrong.

Does that make sense?

> Also, the updated patch prevents a cross-partition update even when
> the foreign key references another column of itself. This kind of
> cross-update works as expected for now so it seems not to need to
> disallow that. What do you think? The scenario is:
>
> create table p (a int primary key, b int references p(a) on delete
> cascade) partition by list (a);
> create table p1 partition of p for values in (1);
> create table p2 partition of p for values in (2);
> insert into p values (1, 1);
> update p set a = 2, b = 2 where a = 1;
> ERROR:  cannot move row being updated to another partition
> DETAIL:  Moving the row may cause a foreign key involving the source
> partition to be violated.

Hmm yes, it would be nice to not cause an error in this case.  But we
don't have enough details about the actual foreign key in this part of
the code (ExecUpdate).  Given the current ResultRelInfo
infrastructure, this code can only look at the trigger objects to get
some details about the foreign key, such as whether the relation being
operated on is the FK relation or the PK relation.  More detailed
properties of those foreign keys are only checked inside the trigger's
functions (ri_trigger.c) and perhaps also in the dispatch code in
trigger.c: AfterTriggerSaveEvent().  However, it would be hard to
detect in those modules that a delete trigger event indeed comes from
a delete performed as part of a cross-partition update, that is,
without significantly changing their interfaces.

Anyway, I am no longer sure if we should do anything in the back
branches, which the patch you have been looking at is for.  There have
not been many field complaints about this other than the one that
started this thread.  Also, I suspect that aborting the
cross-partition updates for any partitioned table that is referenced
in a foreign key will annoy users, especially those who simply don't
use ON DELETE/UPDATE clauses.

So, it may be better to focus on the patches for master that, instead
of giving an error, make the cross-partition updates just work.






--
Amit Langote
EDB: http://www.enterprisedb.com




Re: a misbehavior of partition row movement (?)

2021-02-23 Thread Rahila Syed
Hi Amit,

Sorry for the late reply.

I assume these are comments for the v3-0001 & v3-0002 patches...
>
> Yes, those were comments for patches on master.


> > The partition-key-update-1.spec test fails with the following error
> message appearing in the diffs.
> >
> >  step s1u3pc: UPDATE foo_range_parted SET a=11 WHERE a=7;
> > +ERROR:  cannot move row being updated to another partition
>
> ...whereas, this error happens with the patch I posted in my last
> email (prevent-row-movement-on-pk-table.patch) that is not meant to be
> considered for HEAD, but for back-branches (if at all).  I also see
> that cfbot got triggered by it and shows the same failure.  I am not
> going to try to take care of these failures unless we want to do
> something in the back-branches.
>
> To be clear, patches for HEAD do pass make check-world.
>
> OK.


> > I think the documentation update is missing from the patches.
>
> Hmm, I don't think we document the behavior that is improved by the v3
> patches as a limitation of any existing feature, neither of foreign
> keys referencing partitioned tables nor of the update row movement
> feature.  So maybe there's nothing in the existing documentation that
> is to be updated.
>
> However, the patch does add a new error message for a case that the
> patch doesn't handle, so maybe we could document that as a limitation.
> Not sure if in the Notes section of the UPDATE reference page which
> has some notes on row movement or somewhere else.  Do you have
> suggestions?
>
> You are right, I could not find any direct explanation of the impact of
row movement during
UPDATE on a referencing table in the PostgreSQL docs.

The two documents that come close are either:

1. https://www.postgresql.org/docs/13/trigger-definition.html .
The para starting with "If an UPDATE on a partitioned table causes a row to
move to another partition"
However, this does not describe the behaviour of  internal triggers which
is the focus of this patch.

2. Another one like you mentioned,
https://www.postgresql.org/docs/11/sql-update.html
This has explanation for row movement behaviour for partitioned table but
does not explain
any impact of such behaviour on a referencing table.
I think it is worth adding some explanation in this document. Thus,
explaining
impact on referencing tables here, as it already describes behaviour of
UPDATE on a partitioned table.


Thank you.
Rahila Syed


Re: a misbehavior of partition row movement (?)

2021-02-25 Thread Masahiko Sawada
On Mon, Feb 22, 2021 at 3:04 PM Amit Langote  wrote:
>
> On Fri, Feb 19, 2021 at 5:04 PM Masahiko Sawada  wrote:
> > On Mon, Feb 15, 2021 at 10:37 PM Amit Langote  
> > wrote:
> > > Regarding the patch, I would have liked if it only prevented the
> > > update when the foreign key that would be violated by the component
> > > delete references the update query's main target relation.  If the
> > > foreign key references the source partition directly, then the delete
> > > would be harmless. However there's not enough infrastructure in v12,
> > > v13 branches to determine that, which makes back-patching this a bit
> > > hard.  For example, there's no way to tell in the back-branches if the
> > > foreign-key-enforcing triggers of a leaf partition have descended from
> > > the parent table.  IOW, I am not so sure anymore if we should try to
> > > do anything in the back-branches.
> >
> > Hmm I'm not sure the necessity of figuring out foreign-key-enforcing
> > triggers of a leaf partition have descended from the parent table. I
> > might be missing something but even if the foreign key references the
> > leaf partition directly, the same problem could happen if doing a
> > cross-partition update, is that right?
>
> Actually not, because in that case the referencing relations only care
> about the contents of that leaf partition, so it's okay that the ON
> DELETE actions are performed in that case, or IOW, no need to abort
> the update.  Contrast that with when the foreign key references the
> parent table being updated, where both the old and the new row
> logically belong to the table being referenced, so performing ON
> DELETE actions using the source leaf partition's trigger is wrong.
>
> Does that make sense?

That makes sense. Thanks for your explanation.

One more thing, users are aware of a cross-partition update is
internally converted to DELETE + INSERT (i.g., documented somewhere)?
If not, users might get confused if a tuple referencing a partition
table through a foreign key constraint with ON DELETE CASCADE is
deleted when UPDATE on the parent table.

>
> > Also, the updated patch prevents a cross-partition update even when
> > the foreign key references another column of itself. This kind of
> > cross-update works as expected for now so it seems not to need to
> > disallow that. What do you think? The scenario is:
> >
> > create table p (a int primary key, b int references p(a) on delete
> > cascade) partition by list (a);
> > create table p1 partition of p for values in (1);
> > create table p2 partition of p for values in (2);
> > insert into p values (1, 1);
> > update p set a = 2, b = 2 where a = 1;
> > ERROR:  cannot move row being updated to another partition
> > DETAIL:  Moving the row may cause a foreign key involving the source
> > partition to be violated.
>
> Hmm yes, it would be nice to not cause an error in this case.  But we
> don't have enough details about the actual foreign key in this part of
> the code (ExecUpdate).  Given the current ResultRelInfo
> infrastructure, this code can only look at the trigger objects to get
> some details about the foreign key, such as whether the relation being
> operated on is the FK relation or the PK relation.  More detailed
> properties of those foreign keys are only checked inside the trigger's
> functions (ri_trigger.c) and perhaps also in the dispatch code in
> trigger.c: AfterTriggerSaveEvent().  However, it would be hard to
> detect in those modules that a delete trigger event indeed comes from
> a delete performed as part of a cross-partition update, that is,
> without significantly changing their interfaces.

Agreed.

>
> Anyway, I am no longer sure if we should do anything in the back
> branches, which the patch you have been looking at is for.  There have
> not been many field complaints about this other than the one that
> started this thread.  Also, I suspect that aborting the
> cross-partition updates for any partitioned table that is referenced
> in a foreign key will annoy users, especially those who simply don't
> use ON DELETE/UPDATE clauses.

I thought to disallow creating foreign key constraint referencing a
partitioned table with ON DELETE/UPDATE actions other than NO ACTION
and RESTRICT but it also seems to annoy users.

>
> So, it may be better to focus on the patches for master that, instead
> of giving an error, make the cross-partition updates just work.

Okay, let's focus on the patches for master. I'll look at that patch next week.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: a misbehavior of partition row movement (?)

2021-02-25 Thread Amit Langote
Hi Rahila,

On Wed, Feb 24, 2021 at 3:07 PM Rahila Syed  wrote:
>> > I think the documentation update is missing from the patches.
>>
>> Hmm, I don't think we document the behavior that is improved by the v3
>> patches as a limitation of any existing feature, neither of foreign
>> keys referencing partitioned tables nor of the update row movement
>> feature.  So maybe there's nothing in the existing documentation that
>> is to be updated.
>>
>> However, the patch does add a new error message for a case that the
>> patch doesn't handle, so maybe we could document that as a limitation.
>> Not sure if in the Notes section of the UPDATE reference page which
>> has some notes on row movement or somewhere else.  Do you have
>> suggestions?
>>
> You are right, I could not find any direct explanation of the impact of row 
> movement during
> UPDATE on a referencing table in the PostgreSQL docs.
>
> The two documents that come close are either:

Thanks for looking those up.

> 1. https://www.postgresql.org/docs/13/trigger-definition.html .
> The para starting with "If an UPDATE on a partitioned table causes a row to 
> move to another partition"
> However, this does not describe the behaviour of  internal triggers which is 
> the focus of this patch.

The paragraph does talk about a very related topic, but, like you, I
am not very excited about adding a line here about what we're doing
with internal triggers.

> 2. Another one like you mentioned,  
> https://www.postgresql.org/docs/11/sql-update.html
> This has explanation for row movement behaviour for partitioned table but 
> does not explain
> any impact of such behaviour on a referencing table.
> I think it is worth adding some explanation in this document. Thus, explaining
> impact on referencing tables here, as it already describes behaviour of
> UPDATE on a partitioned table.

ISTM the description of the case that will now be prevented seems too
obscure to make into a documentation line, but I tried.  Please check.

-- 
Amit Langote
EDB: http://www.enterprisedb.com


v5-0001-Create-foreign-key-triggers-in-partitioned-tables.patch
Description: Binary data


v5-0002-Enforce-foreign-key-correctly-during-cross-partit.patch
Description: Binary data


Re: a misbehavior of partition row movement (?)

2021-03-09 Thread Masahiko Sawada
On Fri, Feb 26, 2021 at 4:30 PM Amit Langote  wrote:
>
> Hi Rahila,
>
> On Wed, Feb 24, 2021 at 3:07 PM Rahila Syed  wrote:
> >> > I think the documentation update is missing from the patches.
> >>
> >> Hmm, I don't think we document the behavior that is improved by the v3
> >> patches as a limitation of any existing feature, neither of foreign
> >> keys referencing partitioned tables nor of the update row movement
> >> feature.  So maybe there's nothing in the existing documentation that
> >> is to be updated.
> >>
> >> However, the patch does add a new error message for a case that the
> >> patch doesn't handle, so maybe we could document that as a limitation.
> >> Not sure if in the Notes section of the UPDATE reference page which
> >> has some notes on row movement or somewhere else.  Do you have
> >> suggestions?
> >>
> > You are right, I could not find any direct explanation of the impact of row 
> > movement during
> > UPDATE on a referencing table in the PostgreSQL docs.
> >
> > The two documents that come close are either:
>
> Thanks for looking those up.
>
> > 1. https://www.postgresql.org/docs/13/trigger-definition.html .
> > The para starting with "If an UPDATE on a partitioned table causes a row to 
> > move to another partition"
> > However, this does not describe the behaviour of  internal triggers which 
> > is the focus of this patch.
>
> The paragraph does talk about a very related topic, but, like you, I
> am not very excited about adding a line here about what we're doing
> with internal triggers.
>
> > 2. Another one like you mentioned,  
> > https://www.postgresql.org/docs/11/sql-update.html
> > This has explanation for row movement behaviour for partitioned table but 
> > does not explain
> > any impact of such behaviour on a referencing table.
> > I think it is worth adding some explanation in this document. Thus, 
> > explaining
> > impact on referencing tables here, as it already describes behaviour of
> > UPDATE on a partitioned table.
>
> ISTM the description of the case that will now be prevented seems too
> obscure to make into a documentation line, but I tried.  Please check.
>

I looked at the 0001 patch and here are random comments. Please ignore
a comment if it is already discussed.

---
@@ -9077,7 +9102,8 @@ addFkRecurseReferenced(List **wqueue, Constraint
*fkconstraint, Relation rel,
   partIndexId, constrOid, numfks,
   mapped_pkattnum, fkattnum,
   pfeqoperators, ppeqoperators, ffeqoperators,
-  old_check_ok);
+  old_check_ok,
+  deleteTriggerOid, updateTriggerOid);

/* Done -- clean up (but keep the lock) */
table_close(partRel, NoLock);
@@ -9126,8 +9152,12 @@ addFkRecurseReferencing(List **wqueue,
Constraint *fkconstraint, Relation rel,
Relation pkrel, Oid indexOid, Oid parentConstr,
int numfks, int16 *pkattnum, int16 *fkattnum,
Oid *pfeqoperators, Oid *ppeqoperators, Oid
*ffeqoperators,
-   bool old_check_ok, LOCKMODE lockmode)
+   bool old_check_ok, LOCKMODE lockmode,
+   Oid parentInsTrigger, Oid parentUpdTrigger)
 {

We need to update the function comments as well.

---
I think it's better to add comments for newly added functions such as
GetForeignKeyActionTriggers() and GetForeignKeyCheckTriggers() etc.
Those functions have no comment at all.

BTW, those two functions out of newly added four functions:
AttachForeignKeyCheckTriggers() and DetachForeignKeyCheckTriggers(),
have only one user. Can we past the functions body at where each
function is called?

---
/*
 * If the referenced table is a plain relation, create the action triggers
 * that enforce the constraint.
 */
-   if (pkrel->rd_rel->relkind == RELKIND_RELATION)
-   {
-   createForeignKeyActionTriggers(rel, RelationGetRelid(pkrel),
-  fkconstraint,
-  constrOid, indexOid);
-   }
+   createForeignKeyActionTriggers(rel, RelationGetRelid(pkrel),
+  fkconstraint,
+  constrOid, indexOid,
+  parentDelTrigger, parentUpdTrigger,
+  &deleteTriggerOid, &updateTriggerOid);

The comment needs to be updated.

---
 /*
  * If the referencing relation is a plain table, add the check triggers to
  * it and, if necessary, schedule it to be checked in Phase 3.
  *
  * If the relation is partitioned, drill down to do it to its partitions.
  */
+createForeignKeyCheckTriggers(RelationGetRelid(rel),
+  RelationGetRelid(pkrel),
+  fkconstraint,
+ 

Re: a misbehavior of partition row movement (?)

2021-01-08 Thread Amit Langote
Hi,

On Mon, Dec 28, 2020 at 10:01 PM Arne Roland  wrote:
> While I'd agree that simply deleting with "on delete cascade" on tuple 
> rerouting is a strong enough violation of the spirit of partitioning changing 
> that behavior, I am surprised by the idea to make a differentiation between 
> fks and other triggers. The way user defined triggers work, is a violation to 
> the same degree I get complaints about that on a monthly (if not weekly) 
> basis. It's easy to point towards the docs, but the docs offer no solution to 
> archive the behavior, that makes partitioning somewhat transparent. Most 
> people I know don't partition because they like to complicate things, but 
> just because they have to much data.
>
> It isn't just a thing with after triggers. Imo before triggers are even 
> worse: If we move a row between partitions we fire all three triggers at once 
> (at different leaf pages obviously). It's easy to point towards the docs. 
> Heart bleed was well documented, but I am happy that it was fixed. I don't 
> want to imply this totally unrelated security issue has anything to do with 
> our weird behavior. I just want to raise the question whether that's a good 
> thing, because frankly I haven't met anyone thus far, who thought it is.

Just to be clear, are you only dissatisfied with the firing of
triggers during a row-moving UPDATEs on partitioned tables or all
firing behaviors of triggers defined on partitioned tables?  Can you
give a specific example of the odd behavior in that case?

> To me it seems the RI triggers are just a specific victim of the way we made 
> triggers work in general.

That's true.

> What I tried to express, albeit I apparently failed: I care about having 
> triggers, which make partitioning transparent, on the long run.
>
> > because what can happen
> > today with CASCADE triggers does not seem like a useful behavior by
> > any measure.
>
> I wholeheartedly agree. I just want to point out, that you could state the 
> same about triggers in general.
>
> Backwards compatibility is usually a pretty compelling argument. I would 
> still want to raise the question, whether this should change, because I 
> frankly think this is a bad idea.
>
> You might ask: Why am I raising this question in this thread?
>
> In case we can not (instantly) agree on the fact that this behavior should 
> change, I'd still prefer to think about making that behavior possible for 
> other triggers (possibly later) as well. One possibility would be having an 
> entry in the catalog to mark when the trigger should fire.

Sorry, I don't understand what new setting for triggers you are
thinking of here.

> I don't want to stall your definitely useful RI-Trigger fix, but I sincerely 
> believe, that we have to do better with triggers in general.
>
> If we would make the condition when to fire or not dependent something 
> besides that fact whether the trigger is internal, we could at a later date 
> choose to make both ways available, if anyone makes a good case for this. 
> Even though I still think it's not worth it.
>
> >I don't quite recall if the decision to implement it like this was
> > based on assuming that this is what users would like to see happen in
> > this case or the perceived difficulty of implementing it the other way
> > around, that is, of firing AFTER UPDATE triggers in this case.
>
> I tried to look that up, but I couldn't find any discussion about this. Do 
> you have any ideas in which thread that was handled?

It was discussed here:

https://www.postgresql.org/message-id/flat/CAJ3gD9do9o2ccQ7j7%2BtSgiE1REY65XRiMb%3DyJO3u3QhyP8EEPQ%40mail.gmail.com

It's a huge discussion, so you'll have to ctrl+f "trigger" to spot
relevant emails.  You might notice that the developers who
participated in that discussion gave various opinions and what we have
today got there as a result of a majority of them voting for the
current approach.  Someone also said this during the discussion:
"Regarding the trigger issue, I can't claim to have a terribly strong
opinion on this. I think that practically anything we do here might
upset somebody, but probably any halfway-reasonable thing we choose to
do will be OK for most people." So what we've got is that
"halfway-reasonable" thing, YMMV. :)


--
Amit Langote
EDB: http://www.enterprisedb.com




Re: a misbehavior of partition row movement (?)

2021-01-19 Thread Peter Eisentraut

On 2021-01-08 09:54, Amit Langote wrote:

I don't quite recall if the decision to implement it like this was
based on assuming that this is what users would like to see happen in
this case or the perceived difficulty of implementing it the other way
around, that is, of firing AFTER UPDATE triggers in this case.

I tried to look that up, but I couldn't find any discussion about this. Do you 
have any ideas in which thread that was handled?

It was discussed here:

https://www.postgresql.org/message-id/flat/CAJ3gD9do9o2ccQ7j7%2BtSgiE1REY65XRiMb%3DyJO3u3QhyP8EEPQ%40mail.gmail.com

It's a huge discussion, so you'll have to ctrl+f "trigger" to spot
relevant emails.  You might notice that the developers who
participated in that discussion gave various opinions and what we have
today got there as a result of a majority of them voting for the
current approach.  Someone also said this during the discussion:
"Regarding the trigger issue, I can't claim to have a terribly strong
opinion on this. I think that practically anything we do here might
upset somebody, but probably any halfway-reasonable thing we choose to
do will be OK for most people." So what we've got is that
"halfway-reasonable" thing, YMMV. :)


Could you summarize here what you are trying to do with respect to what 
was decided before?  I'm a bit confused, looking through the patches you 
have posted.  The first patch you posted hard-coded FK trigger OIDs 
specifically, other patches talk about foreign key triggers in general 
or special case internal triggers or talk about all triggers.





Re: a misbehavior of partition row movement (?)

2021-01-20 Thread Amit Langote
On Wed, Jan 20, 2021 at 4:13 PM Peter Eisentraut
 wrote:
> On 2021-01-08 09:54, Amit Langote wrote:
> >>> I don't quite recall if the decision to implement it like this was
> >>> based on assuming that this is what users would like to see happen in
> >>> this case or the perceived difficulty of implementing it the other way
> >>> around, that is, of firing AFTER UPDATE triggers in this case.
> >> I tried to look that up, but I couldn't find any discussion about this. Do 
> >> you have any ideas in which thread that was handled?
> > It was discussed here:
> >
> > https://www.postgresql.org/message-id/flat/CAJ3gD9do9o2ccQ7j7%2BtSgiE1REY65XRiMb%3DyJO3u3QhyP8EEPQ%40mail.gmail.com
> >
> > It's a huge discussion, so you'll have to ctrl+f "trigger" to spot
> > relevant emails.  You might notice that the developers who
> > participated in that discussion gave various opinions and what we have
> > today got there as a result of a majority of them voting for the
> > current approach.  Someone also said this during the discussion:
> > "Regarding the trigger issue, I can't claim to have a terribly strong
> > opinion on this. I think that practically anything we do here might
> > upset somebody, but probably any halfway-reasonable thing we choose to
> > do will be OK for most people." So what we've got is that
> > "halfway-reasonable" thing, YMMV. :)
>
> Could you summarize here what you are trying to do with respect to what
> was decided before?  I'm a bit confused, looking through the patches you
> have posted.  The first patch you posted hard-coded FK trigger OIDs
> specifically, other patches talk about foreign key triggers in general
> or special case internal triggers or talk about all triggers.

The original problem statement is this: the way we generally fire
row-level triggers of a partitioned table can lead to some unexpected
behaviors of the foreign keys pointing to that partitioned table
during its cross-partition updates.

Let's start with an example that shows how triggers are fired during a
cross-partition update:

create table p (a numeric primary key) partition by list (a);
create table p1 partition of p for values in (1);
create table p2 partition of p for values in (2);
create or replace function report_trig_details() returns trigger as $$
begin raise notice '% % on %', tg_when, tg_op, tg_relname; if tg_op =
'DELETE' then return old; end if; return new; end; $$ language
plpgsql;
create trigger trig1 before update on p for each row execute function
report_trig_details();
create trigger trig2 after update on p for each row execute function
report_trig_details();
create trigger trig3 before delete on p for each row execute function
report_trig_details();
create trigger trig4 after delete on p for each row execute function
report_trig_details();
create trigger trig5 before insert on p for each row execute function
report_trig_details();
create trigger trig6 after insert on p for each row execute function
report_trig_details();

insert into p values (1);
update p set a = 2;
NOTICE:  BEFORE UPDATE on p1
NOTICE:  BEFORE DELETE on p1
NOTICE:  BEFORE INSERT on p2
NOTICE:  AFTER DELETE on p1
NOTICE:  AFTER INSERT on p2
UPDATE 1

(AR update triggers are not fired.)

For contrast, here is an intra-partition update:

update p set a = a;
NOTICE:  BEFORE UPDATE on p2
NOTICE:  AFTER UPDATE on p2
UPDATE 1

Now, the trigger machinery makes no distinction between user-defined
and internal triggers, which has implications for the foreign key
enforcing triggers on partitions.  Consider the following example:

create table q (a bigint references p);
insert into q values (2);
update p set a = 1;
NOTICE:  BEFORE UPDATE on p2
NOTICE:  BEFORE DELETE on p2
NOTICE:  BEFORE INSERT on p1
ERROR:  update or delete on table "p2" violates foreign key constraint
"q_a_fkey2" on table "q"
DETAIL:  Key (a)=(2) is still referenced from table "q".

So the RI delete trigger (NOT update) on p2 prevents the DELETE that
occurs as part of the row movement.  One might make the updates
cascade and expect that to prevent the error:

drop table q;
create table q (a bigint references p on update cascade);
insert into q values (2);
update p set a = 1;
NOTICE:  BEFORE UPDATE on p2
NOTICE:  BEFORE DELETE on p2
NOTICE:  BEFORE INSERT on p1
ERROR:  update or delete on table "p2" violates foreign key constraint
"q_a_fkey2" on table "q"
DETAIL:  Key (a)=(2) is still referenced from table "q".

No luck, because again it's the RI delete trigger on p2 that gets
fired.  If you make deletes cascade too, an even worse thing happens:

drop table q;
create table q (a bigint references p on update cascade on delete cascade);
insert into q values (2);
update p set a = 1;
NOTICE:  BEFORE UPDATE on p2
NOTICE:  BEFORE DELETE on p2
NOTICE:  BEFORE INSERT on p1
NOTICE:  AFTER DELETE on p2
NOTICE:  AFTER INSERT on p1
UPDATE 1
select * from q;
 a
---
(0 rows)

The RI delete trigger deleted 2 from q, whereas the expected result
would've been for q to be updated to change 2 

Re: a misbehavior of partition row movement (?)

2021-01-25 Thread Amit Langote
On Wed, Jan 20, 2021 at 7:03 PM Amit Langote  wrote:
> On Wed, Jan 20, 2021 at 4:13 PM Peter Eisentraut
> > Could you summarize here what you are trying to do with respect to what
> > was decided before?  I'm a bit confused, looking through the patches you
> > have posted.  The first patch you posted hard-coded FK trigger OIDs
> > specifically, other patches talk about foreign key triggers in general
> > or special case internal triggers or talk about all triggers.
>
> The original problem statement is this: the way we generally fire
> row-level triggers of a partitioned table can lead to some unexpected
> behaviors of the foreign keys pointing to that partitioned table
> during its cross-partition updates.
>
> Let's start with an example that shows how triggers are fired during a
> cross-partition update:
>
> create table p (a numeric primary key) partition by list (a);
> create table p1 partition of p for values in (1);
> create table p2 partition of p for values in (2);
> create or replace function report_trig_details() returns trigger as $$
> begin raise notice '% % on %', tg_when, tg_op, tg_relname; if tg_op =
> 'DELETE' then return old; end if; return new; end; $$ language
> plpgsql;
> create trigger trig1 before update on p for each row execute function
> report_trig_details();
> create trigger trig2 after update on p for each row execute function
> report_trig_details();
> create trigger trig3 before delete on p for each row execute function
> report_trig_details();
> create trigger trig4 after delete on p for each row execute function
> report_trig_details();
> create trigger trig5 before insert on p for each row execute function
> report_trig_details();
> create trigger trig6 after insert on p for each row execute function
> report_trig_details();
>
> insert into p values (1);
> update p set a = 2;
> NOTICE:  BEFORE UPDATE on p1
> NOTICE:  BEFORE DELETE on p1
> NOTICE:  BEFORE INSERT on p2
> NOTICE:  AFTER DELETE on p1
> NOTICE:  AFTER INSERT on p2
> UPDATE 1
>
> (AR update triggers are not fired.)
>
> For contrast, here is an intra-partition update:
>
> update p set a = a;
> NOTICE:  BEFORE UPDATE on p2
> NOTICE:  AFTER UPDATE on p2
> UPDATE 1
>
> Now, the trigger machinery makes no distinction between user-defined
> and internal triggers, which has implications for the foreign key
> enforcing triggers on partitions.  Consider the following example:
>
> create table q (a bigint references p);
> insert into q values (2);
> update p set a = 1;
> NOTICE:  BEFORE UPDATE on p2
> NOTICE:  BEFORE DELETE on p2
> NOTICE:  BEFORE INSERT on p1
> ERROR:  update or delete on table "p2" violates foreign key constraint
> "q_a_fkey2" on table "q"
> DETAIL:  Key (a)=(2) is still referenced from table "q".
>
> So the RI delete trigger (NOT update) on p2 prevents the DELETE that
> occurs as part of the row movement.  One might make the updates
> cascade and expect that to prevent the error:
>
> drop table q;
> create table q (a bigint references p on update cascade);
> insert into q values (2);
> update p set a = 1;
> NOTICE:  BEFORE UPDATE on p2
> NOTICE:  BEFORE DELETE on p2
> NOTICE:  BEFORE INSERT on p1
> ERROR:  update or delete on table "p2" violates foreign key constraint
> "q_a_fkey2" on table "q"
> DETAIL:  Key (a)=(2) is still referenced from table "q".
>
> No luck, because again it's the RI delete trigger on p2 that gets
> fired.  If you make deletes cascade too, an even worse thing happens:
>
> drop table q;
> create table q (a bigint references p on update cascade on delete cascade);
> insert into q values (2);
> update p set a = 1;
> NOTICE:  BEFORE UPDATE on p2
> NOTICE:  BEFORE DELETE on p2
> NOTICE:  BEFORE INSERT on p1
> NOTICE:  AFTER DELETE on p2
> NOTICE:  AFTER INSERT on p1
> UPDATE 1
> select * from q;
>  a
> ---
> (0 rows)
>
> The RI delete trigger deleted 2 from q, whereas the expected result
> would've been for q to be updated to change 2 to 1.
>
> This shows that the way we've made these triggers behave in general
> can cause some unintended behaviors for foreign keys during
> cross-partition updates.  I started this thread to do something about
> that and sent a patch to prevent cross-partition updates at all when
> there are foreign keys pointing to it.  As others pointed out, that's
> not a great long-term solution to the problem, but that's what we may
> have to do in the back-branches if anything at all.
>
> So I wrote another patch targeting the dev branch to make the
> cross-partition updates work while producing a sane foreign key
> behavior.  The idea of the patch is to tweak the firing of AFTER
> triggers such that unintended RI triggers don't get fired, that is,
> those corresponding to DELETE and INSERT occurring internally as part
> of a cross-partition update.  Instead we now fire the AFTER UPDATE
> triggers, passing the root table as the target result relation (not
> the source partition because the new row doesn't belong to it).  This
> results in the same foreign key beh

Re: a misbehavior of partition row movement (?)

2020-10-02 Thread Amit Langote
On Fri, Oct 2, 2020 at 11:32 PM David G. Johnston
 wrote:
> On Friday, October 2, 2020, Amit Langote  wrote:
>>
>>
>> Reporter on that thread says that the last update should have failed
>> and I don't quite see a workable alternative to that.
>
>
> To be clear the OP would rather have it just work, the same as the 
> non-row-movement version.  Maybe insert the new row first, execute the on 
> update trigger chained from the old row, then delete the old row?

I was thinking yesterday about making it just work, but considering
the changes that would need to be made to how the underlying triggers
fire, it does not seem we would be able to back-port the solution.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: a misbehavior of partition row movement (?)

2020-10-03 Thread Tomas Vondra

On Sat, Oct 03, 2020 at 11:42:21AM +0900, Amit Langote wrote:

On Fri, Oct 2, 2020 at 11:32 PM David G. Johnston
 wrote:

On Friday, October 2, 2020, Amit Langote 
wrote:



Reporter on that thread says that the last update should have failed
and I don't quite see a workable alternative to that.



To be clear the OP would rather have it just work, the same as the
non-row-movement version.  Maybe insert the new row first, execute
the on update trigger chained from the old row, then delete the old
row?


I was thinking yesterday about making it just work, but considering the
changes that would need to be made to how the underlying triggers fire,
it does not seem we would be able to back-port the solution.



I think we need to differentiate between master and backbranches. IMO we
should try to make it "just work" in master, and the amount of code
should not be an issue there I think (no opinion on whether insert and
update trigger is the way to go). For backbranches we may need to do
something less intrusive, of course.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: a misbehavior of partition row movement (?)

2020-10-03 Thread Amit Langote
On Sat, Oct 3, 2020 at 8:15 PM Tomas Vondra  wrote
> On Sat, Oct 03, 2020 at 11:42:21AM +0900, Amit Langote wrote:
> >On Fri, Oct 2, 2020 at 11:32 PM David G. Johnston
> > wrote:
> >> On Friday, October 2, 2020, Amit Langote 
> >> wrote:
> >>>
> >>>
> >>> Reporter on that thread says that the last update should have failed
> >>> and I don't quite see a workable alternative to that.
> >>
> >>
> >> To be clear the OP would rather have it just work, the same as the
> >> non-row-movement version.  Maybe insert the new row first, execute
> >> the on update trigger chained from the old row, then delete the old
> >> row?
> >
> >I was thinking yesterday about making it just work, but considering the
> >changes that would need to be made to how the underlying triggers fire,
> >it does not seem we would be able to back-port the solution.
> >
>
> I think we need to differentiate between master and backbranches. IMO we
> should try to make it "just work" in master, and the amount of code
> should not be an issue there I think (no opinion on whether insert and
> update trigger is the way to go). For backbranches we may need to do
> something less intrusive, of course.

Sure, that makes sense.  I will try making a patch for HEAD to make it
just work unless someone beats me to it.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: a misbehavior of partition row movement (?)

2022-03-18 Thread Alvaro Herrera
I rebased this patch; v15 attached.  Other than fixing the (very large)
conflicts due to nodeModifyTable.c rework, the most important change is
moving GetAncestorResultRels into execMain.c and renaming it to have the
"Exec-" prefix.  The reason is that what this code is doing is affect
struct ResultRelInfo, which is owned by execMain.c, so it seemed bogus
to do that in nodeModifyTable.c and then let execMain.c's
ExecCloseResultRelations() do the cleanup.  I added a little commentary
in the latter routine too.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"World domination is proceeding according to plan"(Andrew Morton)
>From 345ed49718708d8ebde9e2dcf06bf963190bc5c8 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 18 Mar 2022 11:01:24 +0100
Subject: [PATCH v15] Enforce foreign key correctly during cross-partition
 updates

When an update on a partitioned table referenced in foreign keys
constraint causes a row to move from one partition to another, which
is implemented by deleting the old row from the source leaf partition
followed by inserting the new row into the destination leaf partition,
firing the foreign key triggers on that delete event can result in
surprising outcomes for those keys.  For example, a given foreign
key's delete trigger which implements the ON DELETE CASCADE clause of
that key will delete any referencing rows when triggerred for that
internal DELETE, although it should not, because the referenced row
is simply being moved from one partition of the referenced root
partitioned table into another, not being deleted from it.

This commit teaches trigger.c to skip queuing such delete trigger
events on the leaf partitions in favor of an UPDATE event fired on
the root target relation.  Doing so makes sense because both the
old and the new tuple "logically" belong to the root relation.

The after trigger event queuing interface now allows passing the
source and the destination partitions of a particular cross-partition
update when registering the update event for the root partitioned
table.  Along with the 2 ctids of the old and the new tuple, an after
trigger event now also stores the OIDs of those partitions. The tuples
fetched from the source and the destination partitions are converted
into the root table format before they are passed to the trigger
function.

The implementation currently has a limitation that only the foreign
keys pointing into the query's target relation are considered, not
those of its sub-partitioned partitions.  That seems like a
reasonable limitation, because it sounds rare to have distinct
foreign keys pointing into sub-partitioned partitions, but not into
the root table.

Author: Amit Langote
---
 doc/src/sgml/ref/update.sgml  |   7 +
 src/backend/commands/trigger.c| 389 +++---
 src/backend/executor/execMain.c   |  86 -
 src/backend/executor/execReplication.c|   5 +-
 src/backend/executor/nodeModifyTable.c| 155 -
 src/backend/utils/adt/ri_triggers.c   |   6 +
 src/include/commands/trigger.h|   8 +-
 src/include/executor/executor.h   |   4 +-
 src/include/nodes/execnodes.h |   6 +
 src/test/regress/expected/foreign_key.out | 204 +++-
 src/test/regress/sql/foreign_key.sql  | 135 +++-
 11 files changed, 925 insertions(+), 80 deletions(-)

diff --git a/doc/src/sgml/ref/update.sgml b/doc/src/sgml/ref/update.sgml
index 3fa54e5f70..3ba13010e7 100644
--- a/doc/src/sgml/ref/update.sgml
+++ b/doc/src/sgml/ref/update.sgml
@@ -316,6 +316,13 @@ UPDATE count
partition (provided the foreign data wrapper supports tuple routing), they
cannot be moved from a foreign-table partition to another partition.
   
+
+  
+   An attempt of moving a row from one partition to another will fail if a
+   foreign key is found to directly reference a non-root partitioned table
+   in the partition tree, unless that table is also directly mentioned
+   in the UPDATEquery.
+  
  
 
  
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index e08bd9a370..a9aa043981 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -95,10 +95,13 @@ static HeapTuple ExecCallTriggerFunc(TriggerData *trigdata,
 	 Instrumentation *instr,
 	 MemoryContext per_tuple_context);
 static void AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
+  ResultRelInfo *src_partinfo,
+  ResultRelInfo *dst_partinfo,
   int event, bool row_trigger,
   TupleTableSlot *oldtup, TupleTableSlot *newtup,
   List *recheckIndexes, Bitmapset *modifiedCols,
-  TransitionCaptureState *transition_capture);
+  TransitionCaptureState *transition_capture,
+  bool is_crosspart_update);
 static void AfterTriggerEnlargeQueryState(void);
 static bool before_stmt_triggers_fired(Oid relid, CmdType cmdType);
 
@@ -2458,8 +2461,10 @@ 

Re: a misbehavior of partition row movement (?)

2022-03-18 Thread Zhihong Yu
On Fri, Mar 18, 2022 at 9:38 AM Alvaro Herrera 
wrote:

> I rebased this patch; v15 attached.  Other than fixing the (very large)
> conflicts due to nodeModifyTable.c rework, the most important change is
> moving GetAncestorResultRels into execMain.c and renaming it to have the
> "Exec-" prefix.  The reason is that what this code is doing is affect
> struct ResultRelInfo, which is owned by execMain.c, so it seemed bogus
> to do that in nodeModifyTable.c and then let execMain.c's
> ExecCloseResultRelations() do the cleanup.  I added a little commentary
> in the latter routine too.
>
> --
> Álvaro Herrera  Valdivia, Chile  —
> https://www.EnterpriseDB.com/
> "World domination is proceeding according to plan"(Andrew Morton)
>

Hi,

+#define AFTER_TRIGGER_OFFSET   0x07FF  /* must be low-order
bits */
+#define AFTER_TRIGGER_DONE 0x8000
+#define AFTER_TRIGGER_IN_PROGRESS  0x4000

Is it better if the order of AFTER_TRIGGER_DONE
and AFTER_TRIGGER_IN_PROGRESS is swapped (for the ordinal values to be
sequential) ?

+#define AFTER_TRIGGER_CP_UPDATE0x0800

It would be better to add a comment for this constant, explaining what CP
means (cross partition).

+   if (!partRel->rd_rel->relispartition)
+   elog(ERROR, "cannot find ancestors of a non-partition result
relation");

It would be better to include the relation name in the error message.

+   /* Ignore the root ancestor, because ...?? */

Please fill out the remainder of the comment.

+   if (!trig->tgisclone &&
+   RI_FKey_trigger_type(trig->tgfoid) == RI_TRIGGER_PK)
+   {
+   has_noncloned_fkey = true;

The variable says fkey, but the constant is not RI_TRIGGER_FK. Maybe add a
comment explaining why.

Cheers


Re: a misbehavior of partition row movement (?)

2022-03-19 Thread Alvaro Herrera
On 2022-Mar-18, Zhihong Yu wrote:

> +#define AFTER_TRIGGER_OFFSET   0x07FF  /* must be low-order
> bits */
> +#define AFTER_TRIGGER_DONE 0x8000
> +#define AFTER_TRIGGER_IN_PROGRESS  0x4000
> 
> Is it better if the order of AFTER_TRIGGER_DONE
> and AFTER_TRIGGER_IN_PROGRESS is swapped (for the ordinal values to be
> sequential) ?

They *are* sequential -- See
https://www.postgresql.org/message-id/202201172215.2tse3vjjgi2b%40alvherre.pgsql

> +#define AFTER_TRIGGER_CP_UPDATE0x0800
> 
> It would be better to add a comment for this constant, explaining what CP
> means (cross partition).

Sure.

> +   if (!partRel->rd_rel->relispartition)
> +   elog(ERROR, "cannot find ancestors of a non-partition result
> relation");
> 
> It would be better to include the relation name in the error message.

I don't think it matters.  We don't really expect to hit this.

> +   /* Ignore the root ancestor, because ...?? */
> 
> Please fill out the remainder of the comment.

I actually would like to know what's the rationale for this myself.
Amit?

> +   if (!trig->tgisclone &&
> +   RI_FKey_trigger_type(trig->tgfoid) == RI_TRIGGER_PK)
> +   {
> +   has_noncloned_fkey = true;
> 
> The variable says fkey, but the constant is not RI_TRIGGER_FK. Maybe add a
> comment explaining why.

Well, the constant is about the trigger *function*, not about any
constraint.  This code is testing "is this a noncloned trigger, and does
that trigger use an FK-related function?"  If you have a favorite
comment to include, I'm all ears.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"It takes less than 2 seconds to get to 78% complete; that's a good sign.
A few seconds later it's at 90%, but it seems to have stuck there.  Did
somebody make percentages logarithmic while I wasn't looking?"
http://smylers.hates-software.com/2005/09/08/1995c749.html
>From a71baa9ab81d6f9ed04ce2c37c86d806ef36aa8b Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 18 Mar 2022 11:01:24 +0100
Subject: [PATCH v16] Enforce foreign key correctly during cross-partition
 updates

When an update on a partitioned table referenced in foreign keys
constraint causes a row to move from one partition to another, which
is implemented by deleting the old row from the source leaf partition
followed by inserting the new row into the destination leaf partition,
firing the foreign key triggers on that delete event can result in
surprising outcomes for those keys.  For example, a given foreign
key's delete trigger which implements the ON DELETE CASCADE clause of
that key will delete any referencing rows when triggerred for that
internal DELETE, although it should not, because the referenced row
is simply being moved from one partition of the referenced root
partitioned table into another, not being deleted from it.

This commit teaches trigger.c to skip queuing such delete trigger
events on the leaf partitions in favor of an UPDATE event fired on
the root target relation.  Doing so makes sense because both the
old and the new tuple "logically" belong to the root relation.

The after trigger event queuing interface now allows passing the
source and the destination partitions of a particular cross-partition
update when registering the update event for the root partitioned
table.  Along with the 2 ctids of the old and the new tuple, an after
trigger event now also stores the OIDs of those partitions. The tuples
fetched from the source and the destination partitions are converted
into the root table format before they are passed to the trigger
function.

The implementation currently has a limitation that only the foreign
keys pointing into the query's target relation are considered, not
those of its sub-partitioned partitions.  That seems like a
reasonable limitation, because it sounds rare to have distinct
foreign keys pointing into sub-partitioned partitions, but not into
the root table.

Author: Amit Langote
---
 doc/src/sgml/ref/update.sgml  |   7 +
 src/backend/commands/trigger.c| 394 +++---
 src/backend/executor/execMain.c   |  86 -
 src/backend/executor/execReplication.c|   5 +-
 src/backend/executor/nodeModifyTable.c| 151 -
 src/backend/utils/adt/ri_triggers.c   |   6 +
 src/include/commands/trigger.h|   8 +-
 src/include/executor/executor.h   |   4 +-
 src/include/nodes/execnodes.h |   6 +
 src/test/regress/expected/foreign_key.out | 204 ++-
 src/test/regress/sql/foreign_key.sql  | 135 +++-
 11 files changed, 926 insertions(+), 80 deletions(-)

diff --git a/doc/src/sgml/ref/update.sgml b/doc/src/sgml/ref/update.sgml
index 3fa54e5f70..3ba13010e7 100644
--- a/doc/src/sgml/ref/update.sgml
+++ b/doc/src/sgml/ref/update.sgml
@@ -316,6 +316,13 @@ UPDATE count
partition (provided the foreign data wrap

Re: a misbehavior of partition row movement (?)

2022-03-19 Thread Amit Langote
On Sun, Mar 20, 2022 at 5:13 AM Alvaro Herrera  wrote:
> On 2022-Mar-18, Zhihong Yu wrote:
>
> > +#define AFTER_TRIGGER_OFFSET   0x07FF  /* must be low-order
> > bits */
> > +#define AFTER_TRIGGER_DONE 0x8000
> > +#define AFTER_TRIGGER_IN_PROGRESS  0x4000
> >
> > Is it better if the order of AFTER_TRIGGER_DONE
> > and AFTER_TRIGGER_IN_PROGRESS is swapped (for the ordinal values to be
> > sequential) ?
>
> They *are* sequential -- See
> https://www.postgresql.org/message-id/202201172215.2tse3vjjgi2b%40alvherre.pgsql
>
> > +#define AFTER_TRIGGER_CP_UPDATE0x0800
> >
> > It would be better to add a comment for this constant, explaining what CP
> > means (cross partition).
>
> Sure.

Thanks.

> > +   if (!partRel->rd_rel->relispartition)
> > +   elog(ERROR, "cannot find ancestors of a non-partition result
> > relation");
> >
> > It would be better to include the relation name in the error message.
>
> I don't think it matters.  We don't really expect to hit this.

I tend to think maybe showing at least the OID in the error message
doesn't hurt, but maybe we don't need to.

> > +   /* Ignore the root ancestor, because ...?? */
> >
> > Please fill out the remainder of the comment.
>
> I actually would like to know what's the rationale for this myself.
> Amit?

Ah, it's just that the ancstorRels list contains *all* ancestors,
including the root one, whose triggers will actually be fired to
enforce its foreign key. The code below the line of code that this
comment is for is to check *non-root* ancestor's triggers to spot any
that look like they enforce foreign keys to flag them as
unenforceable.

I've fixed the comment as:

-   /* Ignore the root ancestor, because ...?? */
+   /* Root ancestor's triggers will be processed. */

> > +   if (!trig->tgisclone &&
> > +   RI_FKey_trigger_type(trig->tgfoid) == RI_TRIGGER_PK)
> > +   {
> > +   has_noncloned_fkey = true;
> >
> > The variable says fkey, but the constant is not RI_TRIGGER_FK. Maybe add a
> > comment explaining why.
>
> Well, the constant is about the trigger *function*, not about any
> constraint.  This code is testing "is this a noncloned trigger, and does
> that trigger use an FK-related function?"  If you have a favorite
> comment to include, I'm all ears.

A description of what we're looking for with this code is in the
comment above the loop:

/*
 * For any foreign keys that point directly into a non-root ancestors of
 * the source partition,...

So finding triggers in those non-root ancestors whose function is
RI_TRIGGER_PK tells us that those relations have foreign keys pointing
into it or that it is the PK table in that relationship.  Other than
the comment, the code itself seems self-documenting with regard to
what's being done (given the function/macro/variable naming), so maybe
we're better off without additional commentary here.

I've attached a delta patch on v16 for the above comment and a couple
of other changes.

--
Amit Langote
EDB: http://www.enterprisedb.com


v16_delta.diff
Description: Binary data


Re: a misbehavior of partition row movement (?)

2022-03-20 Thread Alvaro Herrera
On 2022-Mar-20, Amit Langote wrote:

> On Sun, Mar 20, 2022 at 5:13 AM Alvaro Herrera  
> wrote:
> > On 2022-Mar-18, Zhihong Yu wrote:

> > > +   if (!partRel->rd_rel->relispartition)
> > > +   elog(ERROR, "cannot find ancestors of a non-partition result
> > > relation");
> > >
> > > It would be better to include the relation name in the error message.
> >
> > I don't think it matters.  We don't really expect to hit this.
> 
> I tend to think maybe showing at least the OID in the error message
> doesn't hurt, but maybe we don't need to.

Since we don't even know of a situation in which this error message
would be raised, I'm hardly bothered by failing to print the OID.  If
any users complain, we can add more detail.

> I've fixed the comment as:
> 
> -   /* Ignore the root ancestor, because ...?? */
> +   /* Root ancestor's triggers will be processed. */

Okay, included that.

> A description of what we're looking for with this code is in the
> comment above the loop:
> 
> /*
>  * For any foreign keys that point directly into a non-root ancestors of
>  * the source partition,...
> 
> So finding triggers in those non-root ancestors whose function is
> RI_TRIGGER_PK tells us that those relations have foreign keys pointing
> into it or that it is the PK table in that relationship.  Other than
> the comment, the code itself seems self-documenting with regard to
> what's being done (given the function/macro/variable naming), so maybe
> we're better off without additional commentary here.

Yeah, WFM.

> I've attached a delta patch on v16 for the above comment and a couple
> of other changes.

Merged that in, and pushed.  I made a couple of wording changes in
comments here and there as well.

I lament the fact that this fix is not going to hit Postgres 12-14, but
ratio of effort to reward seems a bit too high.  I think we could
backpatch the two involved commits if someone is motivated enough to
verify everything and come up with solutions for the necessary ABI
changes.

Thank you, Amit, for your perseverance in getting this bug fixed!  

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"No hay hombre que no aspire a la plenitud, es decir,
la suma de experiencias de que un hombre es capaz"




Re: a misbehavior of partition row movement (?)

2022-03-21 Thread Amit Langote
Hi Alvaro,

On Mon, Mar 21, 2022 at 2:58 AM Alvaro Herrera  wrote:
> On 2022-Mar-20, Amit Langote wrote:
> > On Sun, Mar 20, 2022 at 5:13 AM Alvaro Herrera  
> > wrote:
> > > On 2022-Mar-18, Zhihong Yu wrote:
>
> > > > +   if (!partRel->rd_rel->relispartition)
> > > > +   elog(ERROR, "cannot find ancestors of a non-partition result
> > > > relation");
> > > >
> > > > It would be better to include the relation name in the error message.
> > >
> > > I don't think it matters.  We don't really expect to hit this.
> >
> > I tend to think maybe showing at least the OID in the error message
> > doesn't hurt, but maybe we don't need to.
>
> Since we don't even know of a situation in which this error message
> would be raised, I'm hardly bothered by failing to print the OID.  If
> any users complain, we can add more detail.

Sure.

> I lament the fact that this fix is not going to hit Postgres 12-14, but
> ratio of effort to reward seems a bit too high.  I think we could
> backpatch the two involved commits if someone is motivated enough to
> verify everything and come up with solutions for the necessary ABI
> changes.
>
> Thank you, Amit, for your perseverance in getting this bug fixed!

Thanks a lot for taking the time to review and commit.


--
Amit Langote
EDB: http://www.enterprisedb.com




Re: a misbehavior of partition row movement (?)

2021-12-20 Thread Amit Langote
On Thu, Oct 14, 2021 at 6:00 PM Amit Langote  wrote:
> On Mon, Sep 20, 2021 at 3:32 PM Amit Langote  wrote:
> > The problem was that the tuplestore
> > (afterTriggers.query_stack[query_level].tuplestore) that I decided to
> > use to store the AFTER trigger tuples of a partitioned table that is
> > the target of an cross-partition update lives only for the duration of
> > a given query.  So that approach wouldn't work if the foreign key
> > pointing into that partitioned table is marked INITIALLY DEFERRED.  To
> > fix, I added a List field to AfterTriggersData that stores the
> > tuplestores to store the tuples of partitioned tables that undergo
> > cross-partition updates in a transaction and are pointed to by
> > INITIALLY DEFERRED foreign key constraints.  I couldn't understand one
> > comment about why using a tuplestore for such cases *might not* work,
> > which as follows:
> >
> >  * Note that we need triggers on foreign tables to be fired in exactly the
> >  * order they were queued, so that the tuples come out of the tuplestore in
> >  * the right order.  To ensure that, we forbid deferrable (constraint)
> >  * triggers on foreign tables.  This also ensures that such triggers do not
> >  * get deferred into outer trigger query levels, meaning that it's okay to
> >  * destroy the tuplestore at the end of the query level.
> >
> > I tried to break the approach using various test cases (some can be
> > seen added by the patch to foreign_key.sql), but couldn't see the
> > issue alluded to in the above comment.  So I've marked the comment
> > with an XXX note as follows:
> >
> > - * Note that we need triggers on foreign tables to be fired in exactly the
> > - * order they were queued, so that the tuples come out of the tuplestore in
> > - * the right order.  To ensure that, we forbid deferrable (constraint)
> > - * triggers on foreign tables.  This also ensures that such triggers do not
> > - * get deferred into outer trigger query levels, meaning that it's okay to
> > - * destroy the tuplestore at the end of the query level.
> > + * Note that we need triggers on foreign and partitioned tables to be 
> > fired in
> > + * exactly the order they were queued, so that the tuples come out of the
> > + * tuplestore in the right order.  To ensure that, we forbid deferrable
> > + * (constraint) triggers on foreign tables.  This also ensures that such
> > + * triggers do not get deferred into outer trigger query levels, meaning 
> > that
> > + * it's okay to destroy the tuplestore at the end of the query level.
> > + * XXX - update this paragraph if the new approach, whereby tuplestores in
> > + * afterTriggers.deferred_tuplestores outlive any given query, can be 
> > proven
> > + * to not really break any assumptions mentioned here.
> >
> > If anyone reading this can think of the issue the original comment
> > seems to be talking about, please let me know.
>
> I brought this up in an off-list discussion with Robert and he asked
> why I hadn't considered not using tuplestores to remember the tuples
> in the first place, specifically pointing out that it may lead to
> unnecessarily consuming a lot of memory when such updates move a bunch
> of rows around.  Like, we could simply remember the tuples to be
> passed to the trigger function using their CTIDs as is done for normal
> (single-heap-relation) updates, though in this case also remember the
> OIDs of the source and the destination partitions of a particular
> cross-partition update.
>
> I had considered that idea before but I think I had overestimated the
> complexity of that approach so didn't go with it.  I tried and the
> resulting patch doesn't look all that complicated, with the added
> bonus that the bulk update case shouldn't consume so much memory with
> this approach like the previous tuplestore version would have.
>
> Updated patches attached.

Patch 0001 conflicted with some pg_dump changes that were recently
committed, so rebased.

-- 
Amit Langote
EDB: http://www.enterprisedb.com


v11-0002-Enforce-foreign-key-correctly-during-cross-parti.patch
Description: Binary data


v11-0001-Create-foreign-key-triggers-in-partitioned-table.patch
Description: Binary data


Re: a misbehavior of partition row movement (?)

2022-01-05 Thread Alvaro Herrera
Pushed 0001.

I had to adjust the query used in pg_dump; you changed the attribute
name in the query used for pg15, but left unchanged the one for older
branches, so pg_dump failed for all branches other than 15.  Also,
psql's describe.c required a small tweak to a version number test.
https://github.com/alvherre/postgres/commit/3451612e0fa082d3ec953551c6d25432bd725502

Thanks!  What was 0002 is attached, to keep cfbot happy.  It's identical
to your v11-0002.

I have pushed it thinking that we would not backpatch any of this fix.
However, after running the tests and realizing that I didn't need an
initdb for either patch, I wonder if maybe the whole series *is*
backpatchable.

There is one possible problem, which is that psql and pg_dump would need
testing to verify that they work decently (i.e. no crash, no
misbehavior) with partitioned tables created with the original code.
But there are few ABI changes, maybe we can cope and get all branches
fixes instead of just 15.

What do you think?

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Los dioses no protegen a los insensatos.  Éstos reciben protección de
otros insensatos mejor dotados" (Luis Wu, Mundo Anillo)
>From d1b067ad541f80191763e329577e0d3f62d00d82 Mon Sep 17 00:00:00 2001
From: amitlan 
Date: Mon, 11 Oct 2021 14:57:19 +0900
Subject: [PATCH v12] Enforce foreign key correctly during cross-partition
 updates

When an update on a partitioned table referenced in foreign keys
constraint causes a row to move from one partition to another, which
is implemented by deleting the old row from the source leaf partition
followed by inserting the new row into the destination leaf partition,
firing the foreign key triggers on that delete event can result in
surprising outcomes for those keys.  For example, a given foreign
key's delete trigger which implements the ON DELETE CASCADE clause of
that key will delete any referencing rows, although it should not,
because the referenced row is simply being moved into another
partition.

This commit teaches trigger.c to skip queuing such delete trigger
events on the leaf partitions in favor of an UPDATE event fired on
the "root" target relation.  Doing so makes sense because both the
old and the new tuple "logically" belong to the latter.

The after trigger event queuing interface now allows passing the
source and the destination partitions of a particular cross-partition
update when registering the update event for the root partitioned
table.  Along with the 2 ctids of the old and the new tuple, an after
trigger event now also stores the OIDs of those partitions. The tuples
fetched from the source and the destination partitions are converted
into the root table format before they are passed to the trigger
function.

The implementation currently has a limitation that only the foreign
keys pointing into the query's target relation are considered, not
those of its sub-partitioned partitions.  That seems like a
reasonable limitation, it sounds rare to have distinct foreign keys
pointing into sub-partitioned partitions, but not into the root
table.
---
 doc/src/sgml/ref/update.sgml  |   7 +
 src/backend/commands/trigger.c| 322 +++---
 src/backend/executor/execMain.c   |  19 +-
 src/backend/executor/execReplication.c|   5 +-
 src/backend/executor/nodeModifyTable.c| 187 -
 src/backend/utils/adt/ri_triggers.c   |   6 +
 src/include/commands/trigger.h|   4 +
 src/include/executor/executor.h   |   3 +-
 src/include/nodes/execnodes.h |   3 +
 src/test/regress/expected/foreign_key.out | 204 +-
 src/test/regress/sql/foreign_key.sql  | 135 -
 11 files changed, 840 insertions(+), 55 deletions(-)

diff --git a/doc/src/sgml/ref/update.sgml b/doc/src/sgml/ref/update.sgml
index 3fa54e5f70..3ba13010e7 100644
--- a/doc/src/sgml/ref/update.sgml
+++ b/doc/src/sgml/ref/update.sgml
@@ -316,6 +316,13 @@ UPDATE count
partition (provided the foreign data wrapper supports tuple routing), they
cannot be moved from a foreign-table partition to another partition.
   
+
+  
+   An attempt of moving a row from one partition to another will fail if a
+   foreign key is found to directly reference a non-root partitioned table
+   in the partition tree, unless that table is also directly mentioned
+   in the UPDATEquery.
+  
  
 
  
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 452b743f21..1ed6dd1b38 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -94,7 +94,11 @@ static HeapTuple ExecCallTriggerFunc(TriggerData *trigdata,
 	 FmgrInfo *finfo,
 	 Instrumentation *instr,
 	 MemoryContext per_tuple_context);
-static void AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
+static void AfterTriggerSaveEvent(EState *estate,
+  ModifyTableState *mtstate,
+  ResultRelInfo *re

Re: a misbehavior of partition row movement (?)

2022-01-05 Thread Amit Langote
Hi Alvaro,

On Thu, Jan 6, 2022 at 7:27 AM Alvaro Herrera  wrote:
> Pushed 0001.

Thank you.

> I had to adjust the query used in pg_dump; you changed the attribute
> name in the query used for pg15, but left unchanged the one for older
> branches, so pg_dump failed for all branches other than 15.

Oops, should've tested that.

>  Also,
> psql's describe.c required a small tweak to a version number test.

Ah, yes -- 15, not 14 anymore.

> https://github.com/alvherre/postgres/commit/3451612e0fa082d3ec953551c6d25432bd725502
>
> Thanks!  What was 0002 is attached, to keep cfbot happy.  It's identical
> to your v11-0002.
>
> I have pushed it thinking that we would not backpatch any of this fix.
> However, after running the tests and realizing that I didn't need an
> initdb for either patch, I wonder if maybe the whole series *is*
> backpatchable.

Interesting thought.

We do lack help from trigger.c in the v12 branch in that there's no
Trigger.tgisclone, which is used in a couple of places in the fix.  I
haven't checked how big of a deal it would be to back-port
Trigger.tgisclone to v12, but maybe that's doable.

> There is one possible problem, which is that psql and pg_dump would need
> testing to verify that they work decently (i.e. no crash, no
> misbehavior) with partitioned tables created with the original code.

I suppose you mean checking if the psql and pg_dump after applying
*0001* work sanely with partitioned tables defined without 0001?

Will test that.

> But there are few ABI changes, maybe we can cope and get all branches
> fixes instead of just 15.
>
> What do you think?

Yeah, as long as triggers are configured as required by the fix, and
that would be ensured if we're able to back-patch 0001 down to v12, I
suppose it would be nice to get this fixed down to v12.

--
Amit Langote
EDB: http://www.enterprisedb.com




Re: a misbehavior of partition row movement (?)

2022-01-06 Thread Alvaro Herrera
On 2022-Jan-06, Amit Langote wrote:

> On Thu, Jan 6, 2022 at 7:27 AM Alvaro Herrera  wrote:

> > I have pushed it thinking that we would not backpatch any of this fix.
> > However, after running the tests and realizing that I didn't need an
> > initdb for either patch, I wonder if maybe the whole series *is*
> > backpatchable.
> 
> We do lack help from trigger.c in the v12 branch in that there's no
> Trigger.tgisclone, which is used in a couple of places in the fix.  I
> haven't checked how big of a deal it would be to back-port
> Trigger.tgisclone to v12, but maybe that's doable.

Yeah, I realized afterwards that we added tgparentid in 13 only
(b9b408c48), so we should only backpatch to that.

> > There is one possible problem, which is that psql and pg_dump would need
> > testing to verify that they work decently (i.e. no crash, no
> > misbehavior) with partitioned tables created with the original code.
> 
> I suppose you mean checking if the psql and pg_dump after applying
> *0001* work sanely with partitioned tables defined without 0001?

Yes.

> Will test that.

I looked at the backpatch at the last minute yesterday.  The tablecmds.c
conflict is easy to resolve, but the one in pg_dump.c is a giant
conflict zone that I didn't have time to look closely :-(

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/




Re: a misbehavior of partition row movement (?)

2022-01-11 Thread Amit Langote
On Thu, Jan 6, 2022 at 9:36 PM Alvaro Herrera  wrote:
> On 2022-Jan-06, Amit Langote wrote:
> > On Thu, Jan 6, 2022 at 7:27 AM Alvaro Herrera  
> > wrote:
>
> > > I have pushed it thinking that we would not backpatch any of this fix.
> > > However, after running the tests and realizing that I didn't need an
> > > initdb for either patch, I wonder if maybe the whole series *is*
> > > backpatchable.
> >
> > We do lack help from trigger.c in the v12 branch in that there's no
> > Trigger.tgisclone, which is used in a couple of places in the fix.  I
> > haven't checked how big of a deal it would be to back-port
> > Trigger.tgisclone to v12, but maybe that's doable.
>
> Yeah, I realized afterwards that we added tgparentid in 13 only
> (b9b408c48), so we should only backpatch to that.
>
> > > There is one possible problem, which is that psql and pg_dump would need
> > > testing to verify that they work decently (i.e. no crash, no
> > > misbehavior) with partitioned tables created with the original code.
> >
> > I suppose you mean checking if the psql and pg_dump after applying
> > *0001* work sanely with partitioned tables defined without 0001?
>
> Yes.
>
> > Will test that.
>
> I looked at the backpatch at the last minute yesterday.  The tablecmds.c
> conflict is easy to resolve, but the one in pg_dump.c is a giant
> conflict zone that I didn't have time to look closely :-(

I think I've managed to apply f4566345cf40b into v13 and v14.  Patches attached.

Patched pg_dump seems to work with existing partition triggers and
psql too with some changes to the condition to decide whether to show
tgisinternal triggers when describing a partition.

As for the fix to make cross-partition updates work correctly with
foreign keys, I just realized it won't work for the users' existing
foreign keys, because the parent table's triggers that are needed for
the fix to work would not be present.  Were you thinking that we'd ask
users of v13 and v14 to drop and recreate those constraints?

--
Amit Langote
EDB: http://www.enterprisedb.com


14_0001-Create-foreign-key-triggers-in-partitioned-tables-to.patch
Description: Binary data


13_0001-Create-foreign-key-triggers-in-partitioned-tables-to.patch
Description: Binary data


Re: a misbehavior of partition row movement (?)

2022-01-11 Thread Alvaro Herrera
On 2022-Jan-11, Amit Langote wrote:

> As for the fix to make cross-partition updates work correctly with
> foreign keys, I just realized it won't work for the users' existing
> foreign keys, because the parent table's triggers that are needed for
> the fix to work would not be present.  Were you thinking that we'd ask
> users of v13 and v14 to drop and recreate those constraints?

Yeah, more or less.  Also, any tables created from 13.6 onwards.

I was mainly thinking that we'll still have people creating new clusters
using pg13 for half a decade.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"Right now the sectors on the hard disk run clockwise, but I heard a rumor that
you can squeeze 0.2% more throughput by running them counterclockwise.
It's worth the effort. Recommended."  (Gerry Pourwelle)




Re: a misbehavior of partition row movement (?)

2022-01-12 Thread Julien Rouhaud
Hi,

On Tue, Jan 11, 2022 at 05:08:59PM +0900, Amit Langote wrote:
> 
> I think I've managed to apply f4566345cf40b into v13 and v14.  Patches 
> attached.
> 

FTR this doesn't play well with the cfbot unfortunately as it tries to apply
both patches, and obviously on the wrong branches anyway.

It means that the previous-0002-now-0001 patch that Álvaro previously sent
(https://www.postgresql.org/message-id/202201052227.bc4yvvy6lqpb%40alvherre.pgsql)
is not tested anymore, and IIUC it's not pushed yet so it's not ideal.

There's now an official documentation on how to send patches that should be
ignored by the cfbot [1], so sending backpatch versions with a .txt extension
could be useful.  Just in case I'm attaching the pending patch to this mail to
make the cfbot happy again.

[1] 
https://wiki.postgresql.org/wiki/Cfbot#Which_attachments_are_considered_to_be_patches.3F
>From d1b067ad541f80191763e329577e0d3f62d00d82 Mon Sep 17 00:00:00 2001
From: amitlan 
Date: Mon, 11 Oct 2021 14:57:19 +0900
Subject: [PATCH v12] Enforce foreign key correctly during cross-partition
 updates

When an update on a partitioned table referenced in foreign keys
constraint causes a row to move from one partition to another, which
is implemented by deleting the old row from the source leaf partition
followed by inserting the new row into the destination leaf partition,
firing the foreign key triggers on that delete event can result in
surprising outcomes for those keys.  For example, a given foreign
key's delete trigger which implements the ON DELETE CASCADE clause of
that key will delete any referencing rows, although it should not,
because the referenced row is simply being moved into another
partition.

This commit teaches trigger.c to skip queuing such delete trigger
events on the leaf partitions in favor of an UPDATE event fired on
the "root" target relation.  Doing so makes sense because both the
old and the new tuple "logically" belong to the latter.

The after trigger event queuing interface now allows passing the
source and the destination partitions of a particular cross-partition
update when registering the update event for the root partitioned
table.  Along with the 2 ctids of the old and the new tuple, an after
trigger event now also stores the OIDs of those partitions. The tuples
fetched from the source and the destination partitions are converted
into the root table format before they are passed to the trigger
function.

The implementation currently has a limitation that only the foreign
keys pointing into the query's target relation are considered, not
those of its sub-partitioned partitions.  That seems like a
reasonable limitation, it sounds rare to have distinct foreign keys
pointing into sub-partitioned partitions, but not into the root
table.
---
 doc/src/sgml/ref/update.sgml  |   7 +
 src/backend/commands/trigger.c| 322 +++---
 src/backend/executor/execMain.c   |  19 +-
 src/backend/executor/execReplication.c|   5 +-
 src/backend/executor/nodeModifyTable.c| 187 -
 src/backend/utils/adt/ri_triggers.c   |   6 +
 src/include/commands/trigger.h|   4 +
 src/include/executor/executor.h   |   3 +-
 src/include/nodes/execnodes.h |   3 +
 src/test/regress/expected/foreign_key.out | 204 +-
 src/test/regress/sql/foreign_key.sql  | 135 -
 11 files changed, 840 insertions(+), 55 deletions(-)

diff --git a/doc/src/sgml/ref/update.sgml b/doc/src/sgml/ref/update.sgml
index 3fa54e5f70..3ba13010e7 100644
--- a/doc/src/sgml/ref/update.sgml
+++ b/doc/src/sgml/ref/update.sgml
@@ -316,6 +316,13 @@ UPDATE count
partition (provided the foreign data wrapper supports tuple routing), they
cannot be moved from a foreign-table partition to another partition.
   
+
+  
+   An attempt of moving a row from one partition to another will fail if a
+   foreign key is found to directly reference a non-root partitioned table
+   in the partition tree, unless that table is also directly mentioned
+   in the UPDATEquery.
+  
  
 
  
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 452b743f21..1ed6dd1b38 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -94,7 +94,11 @@ static HeapTuple ExecCallTriggerFunc(TriggerData *trigdata,
 
FmgrInfo *finfo,
 
Instrumentation *instr,
 
MemoryContext per_tuple_context);
-static void AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
+static void AfterTriggerSaveEvent(EState *estate,
+ 
ModifyTableState *mtstate,
+ ResultRelInfo 
*relinfo,
+

Re: a misbehavior of partition row movement (?)

2022-01-12 Thread Amit Langote
On Thu, Jan 13, 2022 at 12:19 PM Julien Rouhaud  wrote:
> On Tue, Jan 11, 2022 at 05:08:59PM +0900, Amit Langote wrote:
> >
> > I think I've managed to apply f4566345cf40b into v13 and v14.  Patches 
> > attached.
> >
>
> FTR this doesn't play well with the cfbot unfortunately as it tries to apply
> both patches, and obviously on the wrong branches anyway.

Oops, that's right.  Thanks for the heads up.

> It means that the previous-0002-now-0001 patch that Álvaro previously sent
> (https://www.postgresql.org/message-id/202201052227.bc4yvvy6lqpb%40alvherre.pgsql)
> is not tested anymore, and IIUC it's not pushed yet so it's not ideal.

Agreed.

> There's now an official documentation on how to send patches that should be
> ignored by the cfbot [1], so sending backpatch versions with a .txt extension
> could be useful.  Just in case I'm attaching the pending patch to this mail to
> make the cfbot happy again.

Thanks and sorry I wasn't aware of the rule about sending back patch versions.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: a misbehavior of partition row movement (?)

2022-01-17 Thread Alvaro Herrera
On 2022-Jan-17, Amit Langote wrote:

> Note that the fix involves adding fields to ResultRelInfo -- v13 needs
> 2 additional, while v14 and HEAD need 1.  That combined with needing
> new catalog entries for parent FK triggers, back-patching this does
> make me a bit uncomfortable.

Yeah, that's a good point, so I ran abidiff on the binaries in branch 13
to have some data on it.  The report does indeed have a lot of noise
about those three added members in struct ResultRelInfo; but as far as I
can see in the report, there is no ABI affected because of these
changes.

However, the ones that caught my eye next were the ABI changes for
ExecGetTriggerResultRel() and ExecAR{Delete,Update}Triggers().  These seem more
significant, if any external code is calling these.  Now, while I think
we could dodge that (at least part of it) by having a shim for
AfterTriggerSaveEvent that passes a NULL mtstate, and takes the
assumption that there is no row partition migration when that happens
... that seems like treading in dangerous territory: we would have 
code that would behave differently for an extension that was compiled
with an earlier copy of the backend.

So I see two options.  One is to introduce the aforementioned shim, but
instead of making any assumptions, we cause the shim raise an error:
"your extension is outdated, please recompile with the new postgres
version".  However, that seems even more harmful, because production
systems that auto-update to the next Postgres version would start to
fail.

The other is suggested by you:

> Another thing to consider is that we haven't seen many reports of the
> problem (UPDATEs of partitioned PK tables causing DELETEs in
> referencing tables), even though it can be possibly very surprising to
> those who do run into it.

Do nothing in the old branches.


Another thing I saw which surprised me very much is this bit, which I
think must be a bug in abidiff:

type of 'EPQState* EState::es_epq_active' 
changed:
  in pointed to type 'struct EPQState' at 
execnodes.h:1104:1:
type size hasn't changed
1 data member changes (3 filtered):
  type of 'PlanState* 
EPQState::recheckplanstate' changed:
in pointed to type 'struct PlanState' 
at execnodes.h:1056:1:
  entity changed from 'struct 
PlanState' to compatible type 'typedef PlanState' at execnodes.h:1056:1


-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"La grandeza es una experiencia transitoria.  Nunca es consistente.
Depende en gran parte de la imaginación humana creadora de mitos"
(Irulan)




Re: a misbehavior of partition row movement (?)

2022-01-17 Thread Zhihong Yu
On Mon, Jan 17, 2022 at 6:26 AM Alvaro Herrera 
wrote:

> On 2022-Jan-17, Amit Langote wrote:
>
> > Note that the fix involves adding fields to ResultRelInfo -- v13 needs
> > 2 additional, while v14 and HEAD need 1.  That combined with needing
> > new catalog entries for parent FK triggers, back-patching this does
> > make me a bit uncomfortable.
>
> Yeah, that's a good point, so I ran abidiff on the binaries in branch 13
> to have some data on it.  The report does indeed have a lot of noise
> about those three added members in struct ResultRelInfo; but as far as I
> can see in the report, there is no ABI affected because of these
> changes.
>
> However, the ones that caught my eye next were the ABI changes for
> ExecGetTriggerResultRel() and ExecAR{Delete,Update}Triggers().  These seem
> more
> significant, if any external code is calling these.  Now, while I think
> we could dodge that (at least part of it) by having a shim for
> AfterTriggerSaveEvent that passes a NULL mtstate, and takes the
> assumption that there is no row partition migration when that happens
> ... that seems like treading in dangerous territory: we would have
> code that would behave differently for an extension that was compiled
> with an earlier copy of the backend.
>
> So I see two options.  One is to introduce the aforementioned shim, but
> instead of making any assumptions, we cause the shim raise an error:
> "your extension is outdated, please recompile with the new postgres
> version".  However, that seems even more harmful, because production
> systems that auto-update to the next Postgres version would start to
> fail.
>
> The other is suggested by you:
>
> > Another thing to consider is that we haven't seen many reports of the
> > problem (UPDATEs of partitioned PK tables causing DELETEs in
> > referencing tables), even though it can be possibly very surprising to
> > those who do run into it.
>
> Do nothing in the old branches.
>
>
> Another thing I saw which surprised me very much is this bit, which I
> think must be a bug in abidiff:
>
> type of 'EPQState* EState::es_epq_active'
> changed:
>   in pointed to type 'struct EPQState' at
> execnodes.h:1104:1:
> type size hasn't changed
> 1 data member changes (3 filtered):
>   type of 'PlanState*
> EPQState::recheckplanstate' changed:
> in pointed to type 'struct
> PlanState' at execnodes.h:1056:1:
>   entity changed from 'struct
> PlanState' to compatible type 'typedef PlanState' at execnodes.h:1056:1
>
> Hi,
I think option 2, not backpatching, is more desirable at this stage.

If, complaints from the field arise in the future, we can consider
backpatching.

Cheers


Re: a misbehavior of partition row movement (?)

2022-01-17 Thread Alvaro Herrera
On 2022-Jan-17, Zhihong Yu wrote:

> On Mon, Jan 17, 2022 at 6:26 AM Alvaro Herrera 
> wrote:

> > On 2022-Jan-17, Amit Langote wrote:

> > The other is suggested by you:
> >
> > > Another thing to consider is that we haven't seen many reports of the
> > > problem (UPDATEs of partitioned PK tables causing DELETEs in
> > > referencing tables), even though it can be possibly very surprising to
> > > those who do run into it.
> >
> > Do nothing in the old branches.

> I think option 2, not backpatching, is more desirable at this stage.

Preliminarly, I tend to agree.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/




Re: a misbehavior of partition row movement (?)

2022-01-17 Thread Alvaro Herrera


> @@ -3398,7 +3432,7 @@ typedef SetConstraintStateData *SetConstraintState;
>   */
>  typedef uint32 TriggerFlags;
>  
> -#define AFTER_TRIGGER_OFFSET 0x0FFF  /* must be 
> low-order bits */
> +#define AFTER_TRIGGER_OFFSET 0x07FF  /* must be 
> low-order bits */
>  #define AFTER_TRIGGER_DONE   0x1000
>  #define AFTER_TRIGGER_IN_PROGRESS0x2000
>  /* bits describing the size and tuple sources of this event */
> @@ -3406,7 +3440,8 @@ typedef uint32 TriggerFlags;
>  #define AFTER_TRIGGER_FDW_FETCH  0x8000
>  #define AFTER_TRIGGER_1CTID  0x4000
>  #define AFTER_TRIGGER_2CTID  0xC000
> -#define AFTER_TRIGGER_TUP_BITS   0xC000
> +#define AFTER_TRIGGER_CP_UPDATE  0x0800
> +#define AFTER_TRIGGER_TUP_BITS   0xC800

So this patch releases one bit from AFTER_TRIGGER_OFFSET and makes it
become AFTER_TRIGGER_CP_UPDATE.  As far as I can tell there is no harm
in doing so.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/




Re: a misbehavior of partition row movement (?)

2022-01-17 Thread Tom Lane
Alvaro Herrera  writes:
> So this patch releases one bit from AFTER_TRIGGER_OFFSET and makes it
> become AFTER_TRIGGER_CP_UPDATE.  As far as I can tell there is no harm
> in doing so.

I agree that taking a bit away from AFTER_TRIGGER_OFFSET is okay
(it could spare even a couple more, if we need them).

But could we please do it in a way that is designed to keep the
code readable, rather than to minimize the number of lines of diff?
It makes zero sense to have the bits in AFTER_TRIGGER_TUP_BITS not
be adjacent.  So what should happen here is to renumber the symbols
in between to move their bits over one place.

(Since this data is only known within trigger.c, I don't even see
an ABI-stability argument for not changing these assignments.)

regards, tom lane




Re: a misbehavior of partition row movement (?)

2022-01-17 Thread Alvaro Herrera
On 2022-Jan-17, Tom Lane wrote:

> But could we please do it in a way that is designed to keep the
> code readable, rather than to minimize the number of lines of diff?
> It makes zero sense to have the bits in AFTER_TRIGGER_TUP_BITS not
> be adjacent.  So what should happen here is to renumber the symbols
> in between to move their bits over one place.

Is it typical to enumerate bits starting from the right of each byte,
when doing it from the high bits of the word?  DONE and IN_PROGRESS have
been defined as 0x1 and 0x2 of that byte for a very long time and I
found that very strange.  I am inclined to count from the left, so I'd
pick 8 first, defining the set like this:

#define AFTER_TRIGGER_OFFSET0x07FF  /* must be 
low-order bits */
#define AFTER_TRIGGER_DONE  0x8000
#define AFTER_TRIGGER_IN_PROGRESS   0x4000
/* bits describing the size and tuple sources of this event */
#define AFTER_TRIGGER_FDW_REUSE 0x
#define AFTER_TRIGGER_FDW_FETCH 0x2000
#define AFTER_TRIGGER_1CTID 0x1000
#define AFTER_TRIGGER_2CTID 0x3000
#define AFTER_TRIGGER_CP_UPDATE 0x0800
#define AFTER_TRIGGER_TUP_BITS  0x3800

(The fact that FDW_REUSE bits is actually an empty mask comes from
7cbe57c34dec, specifically [1])

Is this what you were thinking?

[1] https://postgr.es/m/20140306033644.ga3527...@tornado.leadboat.com

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"All rings of power are equal,
But some rings of power are more equal than others."
 (George Orwell's The Lord of the Rings)




Re: a misbehavior of partition row movement (?)

2022-01-17 Thread Tom Lane
Alvaro Herrera  writes:
> On 2022-Jan-17, Tom Lane wrote:
>> It makes zero sense to have the bits in AFTER_TRIGGER_TUP_BITS not
>> be adjacent.  So what should happen here is to renumber the symbols
>> in between to move their bits over one place.

> Is it typical to enumerate bits starting from the right of each byte,
> when doing it from the high bits of the word?  DONE and IN_PROGRESS have
> been defined as 0x1 and 0x2 of that byte for a very long time and I
> found that very strange.  I am inclined to count from the left, so I'd
> pick 8 first, defining the set like this:

Doesn't matter to me either way, as long as the values look like they
were all defined by the same person ;-)

> (The fact that FDW_REUSE bits is actually an empty mask comes from
> 7cbe57c34dec, specifically [1])

That seemed a bit odd to me too, but this is not the patch to be
changing it in, I suppose.

regards, tom lane




Re: a misbehavior of partition row movement (?)

2022-01-17 Thread Amit Langote
On Tue, Jan 18, 2022 at 7:15 AM Alvaro Herrera  wrote:
> On 2022-Jan-17, Tom Lane wrote:
> > But could we please do it in a way that is designed to keep the
> > code readable, rather than to minimize the number of lines of diff?
> > It makes zero sense to have the bits in AFTER_TRIGGER_TUP_BITS not
> > be adjacent.  So what should happen here is to renumber the symbols
> > in between to move their bits over one place.
>
> Is it typical to enumerate bits starting from the right of each byte,
> when doing it from the high bits of the word?  DONE and IN_PROGRESS have
> been defined as 0x1 and 0x2 of that byte for a very long time and I
> found that very strange.  I am inclined to count from the left, so I'd
> pick 8 first, defining the set like this:
>
> #define AFTER_TRIGGER_OFFSET0x07FF  /* must be 
> low-order bits */
> #define AFTER_TRIGGER_DONE  0x8000
> #define AFTER_TRIGGER_IN_PROGRESS   0x4000
> /* bits describing the size and tuple sources of this event */
> #define AFTER_TRIGGER_FDW_REUSE 0x
> #define AFTER_TRIGGER_FDW_FETCH 0x2000
> #define AFTER_TRIGGER_1CTID 0x1000
> #define AFTER_TRIGGER_2CTID 0x3000
> #define AFTER_TRIGGER_CP_UPDATE 0x0800
> #define AFTER_TRIGGER_TUP_BITS  0x3800

Agree that the ultimate code readability trumps diff minimalism. :-)

Would you like me to update the patch with the above renumbering or
are you working on a new version yourself?

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: a misbehavior of partition row movement (?)

2022-01-17 Thread Julien Rouhaud
Hi,

On Mon, Jan 17, 2022 at 08:40:54PM +0900, Amit Langote wrote:
> 
> Okay, I created versions of the patch series for branches 13 and 14
> (.txt files).  The one for HEAD is also re-attached.

FYI The patch failed today on FreeBSD, while it was previously quite stable on
all platforms (https://cirrus-ci.com/build/4551468081479680), with this error:

https://api.cirrus-ci.com/v1/artifact/task/6360787076775936/regress_diffs/src/test/recovery/tmp_check/regression.diffs
diff -U3 
/tmp/cirrus-ci-build/src/test/recovery/../regress/expected/reloptions.out 
/tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/reloptions.out
--- /tmp/cirrus-ci-build/src/test/recovery/../regress/expected/reloptions.out   
2022-01-18 00:12:52.533086000 +
+++ /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/reloptions.out 
2022-01-18 00:28:00.000524000 +
@@ -133,7 +133,7 @@
 SELECT pg_relation_size('reloptions_test') = 0;
  ?column?
 --
- t
+ f
 (1 row)


I'm not sure why this test failed as it doesn't seem like something impacted by
the patch, but I may have missed something as I only had a quick look at the
patch and discussion.




Re: a misbehavior of partition row movement (?)

2022-01-17 Thread Michael Paquier
On Tue, Jan 18, 2022 at 12:16:23PM +0800, Julien Rouhaud wrote:
> I'm not sure why this test failed as it doesn't seem like something impacted 
> by
> the patch, but I may have missed something as I only had a quick look at the
> patch and discussion.

This issue is discussed here:
https://www.postgresql.org/message-id/20220117203746.oj43254j5jurb...@alap3.anarazel.de
--
Michael


signature.asc
Description: PGP signature


Re: a misbehavior of partition row movement (?)

2022-01-17 Thread Julien Rouhaud
Hi,

On Tue, Jan 18, 2022 at 02:33:39PM +0900, Michael Paquier wrote:
> On Tue, Jan 18, 2022 at 12:16:23PM +0800, Julien Rouhaud wrote:
> > I'm not sure why this test failed as it doesn't seem like something 
> > impacted by
> > the patch, but I may have missed something as I only had a quick look at the
> > patch and discussion.
> 
> This issue is discussed here:
> https://www.postgresql.org/message-id/20220117203746.oj43254j5jurb...@alap3.anarazel.de

Oh I missed it, thanks!  Sorry for the noise.




Re: a misbehavior of partition row movement (?)

2022-01-17 Thread Amit Langote
On Tue, Jan 18, 2022 at 2:41 PM Julien Rouhaud  wrote:
> On Tue, Jan 18, 2022 at 02:33:39PM +0900, Michael Paquier wrote:
> > On Tue, Jan 18, 2022 at 12:16:23PM +0800, Julien Rouhaud wrote:
> > > I'm not sure why this test failed as it doesn't seem like something 
> > > impacted by
> > > the patch, but I may have missed something as I only had a quick look at 
> > > the
> > > patch and discussion.
> >
> > This issue is discussed here:
> > https://www.postgresql.org/message-id/20220117203746.oj43254j5jurb...@alap3.anarazel.de
>
> Oh I missed it, thanks!  Sorry for the noise.

Thanks, it had puzzled me too when I first saw it this morning.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: a misbehavior of partition row movement (?)

2022-01-18 Thread Alvaro Herrera
On 2022-Jan-18, Amit Langote wrote:

> Would you like me to update the patch with the above renumbering or
> are you working on a new version yourself?

I have a few very minor changes apart from that.  Let me see if I can
get this pushed soon; if I'm unable to (there are parts I don't fully
grok yet), I'll post what I have.

Thank you!

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: a misbehavior of partition row movement (?)

2022-01-18 Thread Tom Lane
Julien Rouhaud  writes:
> @@ -133,7 +133,7 @@
>  SELECT pg_relation_size('reloptions_test') = 0;
>   ?column?
>  --
> - t
> + f
>  (1 row)

Some machines have been showing that on HEAD too, so I doubt
it's the fault of this patch.  That reloptions test isn't stable
yet seemingly.

regards, tom lane




Re: a misbehavior of partition row movement (?)

2022-01-18 Thread Alvaro Herrera
On 2022-Jan-18, Alvaro Herrera wrote:

> On 2022-Jan-18, Amit Langote wrote:
> 
> > Would you like me to update the patch with the above renumbering or
> > are you working on a new version yourself?
> 
> I have a few very minor changes apart from that.  Let me see if I can
> get this pushed soon; if I'm unable to (there are parts I don't fully
> grok yet), I'll post what I have.

Here's v13, a series with your patch as 0001 and a few changes from me;
the bulk is just pgindent, and there are a few stylistic changes and an
unrelated typo fix and I added a couple of comments to your new code.

I don't like the API change to ExecInsert().  You're adding two output
arguments:
- the tuple being inserted.  But surely you have this already, because
it's in the 'slot' argument you passed in. ExecInsert is even careful to
set the ->tts_tableOid argument there.  So ExecInsert's caller doesn't
need to receive the inserted tuple as an argument, it can just read
'slot'.
- the relation to which the tuple was inserted.  Can't this be obtained
by looking at slot->tts_tableOid?  We should be able to use
ExecLookupResultRelByOid() to obtain it, no? (I suppose you may claim
that this is wasteful, but I think this is not a common case anyway and
it's worth keeping ExecInsert()'s API clean for the sake of the 99.99%
of its other calls).

I think the argument definition of ExecCrossPartitionUpdateForeignKey is
a bit messy.  I propose to move mtstate,estate as two first arguments;
IIRC the whole executor does it that way.

AfterTriggerSaveEvent determines maybe_crosspart_update (by looking at
mtstate->operation -- why doesn't it look at 'event' instead?) and later
it determines new_event.ate_flags.  Why can't it use
maybe_crosspart_update to simplify part of that?  Or maybe the other way
around, not sure.  It looks like something could be made simpler there.

Overall, the idea embodied in the patch looks sensible to me.

Thank you,

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"Nunca confiaré en un traidor.  Ni siquiera si el traidor lo he creado yo"
(Barón Vladimir Harkonnen)
>From 3f3dbdcf1228dd78285c4efcb4fc64f732408270 Mon Sep 17 00:00:00 2001
From: amitlan 
Date: Mon, 11 Oct 2021 14:57:19 +0900
Subject: [PATCH v13 1/2] Enforce foreign key correctly during cross-partition
 updates

When an update on a partitioned table referenced in foreign keys
constraint causes a row to move from one partition to another, which
is implemented by deleting the old row from the source leaf partition
followed by inserting the new row into the destination leaf partition,
firing the foreign key triggers on that delete event can result in
surprising outcomes for those keys.  For example, a given foreign
key's delete trigger which implements the ON DELETE CASCADE clause of
that key will delete any referencing rows when triggerred for that
internal DELETE, although it should not, because the referenced row
is simply being moved from one partition of the referenced root
partitioned table into another, not being deleted from it.

This commit teaches trigger.c to skip queuing such delete trigger
events on the leaf partitions in favor of an UPDATE event fired on
the root target relation.  Doing so makes sense because both the
old and the new tuple "logically" belong to the root relation.

The after trigger event queuing interface now allows passing the
source and the destination partitions of a particular cross-partition
update when registering the update event for the root partitioned
table.  Along with the 2 ctids of the old and the new tuple, an after
trigger event now also stores the OIDs of those partitions. The tuples
fetched from the source and the destination partitions are converted
into the root table format before they are passed to the trigger
function.

The implementation currently has a limitation that only the foreign
keys pointing into the query's target relation are considered, not
those of its sub-partitioned partitions.  That seems like a
reasonable limitation, because it sounds rare to have distinct
foreign keys pointing into sub-partitioned partitions, but not into
the root table.
---
 doc/src/sgml/ref/update.sgml  |   7 +
 src/backend/commands/trigger.c| 322 +++---
 src/backend/executor/execMain.c   |  19 +-
 src/backend/executor/execReplication.c|   5 +-
 src/backend/executor/nodeModifyTable.c| 187 -
 src/backend/utils/adt/ri_triggers.c   |   6 +
 src/include/commands/trigger.h|   4 +
 src/include/executor/executor.h   |   3 +-
 src/include/nodes/execnodes.h |   3 +
 src/test/regress/expected/foreign_key.out | 204 +-
 src/test/regress/sql/foreign_key.sql  | 135 -
 11 files changed, 840 insertions(+), 55 deletions(-)

diff --git a/doc/src/sgml/ref/update.sgml b/doc/src/sgml/ref/update.sgml
index 3fa54e5f70..3ba13010e7 100644
--- a/doc/src/sgml/ref/update.sgml
+++ b/doc/

Re: a misbehavior of partition row movement (?)

2022-01-18 Thread Amit Langote
On Wed, Jan 19, 2022 at 7:29 AM Alvaro Herrera  wrote:
> On 2022-Jan-18, Alvaro Herrera wrote:
> > On 2022-Jan-18, Amit Langote wrote:
> >
> > > Would you like me to update the patch with the above renumbering or
> > > are you working on a new version yourself?
> >
> > I have a few very minor changes apart from that.  Let me see if I can
> > get this pushed soon; if I'm unable to (there are parts I don't fully
> > grok yet), I'll post what I have.
>
> Here's v13, a series with your patch as 0001 and a few changes from me;
> the bulk is just pgindent, and there are a few stylistic changes and an
> unrelated typo fix and I added a couple of comments to your new code.

Thank you.

> I don't like the API change to ExecInsert().  You're adding two output
> arguments:
> - the tuple being inserted.  But surely you have this already, because
> it's in the 'slot' argument you passed in. ExecInsert is even careful to
> set the ->tts_tableOid argument there.  So ExecInsert's caller doesn't
> need to receive the inserted tuple as an argument, it can just read
> 'slot'.

Hmm, ExecInsert()'s input slot belongs to either the source partition
or the "root" target relation, the latter due to the following stanza
in ExecCrossPartitionUpdate():

/*
 * resultRelInfo is one of the per-relation resultRelInfos.  So we should
 * convert the tuple into root's tuple descriptor if needed, since
 * ExecInsert() starts the search from root.
 */
tupconv_map = ExecGetChildToRootMap(resultRelInfo);
if (tupconv_map != NULL)
slot = execute_attr_map_slot(tupconv_map->attrMap,
 slot,
 mtstate->mt_root_tuple_slot);

Though the slot whose tuple ExecInsert() ultimately inserts may be
destination partition's ri_PartitionTupleSlot due to the following
stanza in it:

/*
 * If the input result relation is a partitioned table, find the leaf
 * partition to insert the tuple into.
 */
if (proute)
{
ResultRelInfo *partRelInfo;

slot = ExecPrepareTupleRouting(mtstate, estate, proute,
   resultRelInfo, slot,
   &partRelInfo);
resultRelInfo = partRelInfo;
}

It's not great that ExecInsert()'s existing header comment doesn't
mention that the slot whose tuple is actually inserted may not be the
slot it receives from the caller :-(.

> - the relation to which the tuple was inserted.  Can't this be obtained
> by looking at slot->tts_tableOid?  We should be able to use
> ExecLookupResultRelByOid() to obtain it, no? (I suppose you may claim
> that this is wasteful, but I think this is not a common case anyway and
> it's worth keeping ExecInsert()'s API clean for the sake of the 99.99%
> of its other calls).

ExecLookupResultRelByOid() is only useful when *all* relevant leaf
partitions are present in the ModifyTableState.resultRelInfo array
(due to being present in ModifyTable.resultRelations).  Leaf
partitions that are only initialized by tuple routing (such as
destination partitions of cross-partition updates) are only present in
ModifyTableState.mt_partition_tuple_routing.partitions[] that are not
discoverable by ExecLookupResultRelByOid().

> I think the argument definition of ExecCrossPartitionUpdateForeignKey is
> a bit messy.  I propose to move mtstate,estate as two first arguments;
> IIRC the whole executor does it that way.

Okay, done.

> AfterTriggerSaveEvent determines maybe_crosspart_update (by looking at
> mtstate->operation -- why doesn't it look at 'event' instead?) and later
> it determines new_event.ate_flags.  Why can't it use
> maybe_crosspart_update to simplify part of that?  Or maybe the other way
> around, not sure.  It looks like something could be made simpler there.

Actually, I remember disliking maybe_crosspart_update for sounding a
bit fuzzy and also having to add mtstate to a bunch of trigger.c
interface functions only to calculate that.

I now wonder if passing an is_crosspart_update through
ExecAR{Update|Delete}Triggers() would not be better.  Both
ExecDelete() and ExecCrossPartitionUpdateForeignKey() know for sure if
their ExecAR{Update|Delete}Triggers() invocation is for a
cross-partition update, so better to just pass that information down
to AfterTriggerSaveEvent() than pass 'mtstate' and have the latter
reverse-engineer only a fuzzy guess of whether that's the case.

I like that interface better and have implemented it in the updated version.

I've also merged your changes and made some of my own as mentioned
above to end up with the attached v14.  I'm also attaching a delta
patch over v13 (0001+0002) to easily see the changes I made to get
v14.

BTW, your tweaks patch added this:

+ * *inserted_tuple is the tuple that's effectively inserted;
+ * *inserted_destrel is the relation where it was inserted.
+ * These are only set on success.  FIXME -- see what happens on
the "do nothing" cas

Re: a misbehavior of partition row movement (?)

2022-01-18 Thread Amit Langote
On Wed, Jan 19, 2022 at 4:13 PM Amit Langote  wrote:
> On Wed, Jan 19, 2022 at 7:29 AM Alvaro Herrera  
> wrote:
> > On 2022-Jan-18, Alvaro Herrera wrote:
> > > On 2022-Jan-18, Amit Langote wrote:
> > >
> > > > Would you like me to update the patch with the above renumbering or
> > > > are you working on a new version yourself?
> > >
> > > I have a few very minor changes apart from that.  Let me see if I can
> > > get this pushed soon; if I'm unable to (there are parts I don't fully
> > > grok yet), I'll post what I have.
> >
> > Here's v13, a series with your patch as 0001 and a few changes from me;
> > the bulk is just pgindent, and there are a few stylistic changes and an
> > unrelated typo fix and I added a couple of comments to your new code.
>
> Thank you.
>
> > I don't like the API change to ExecInsert().  You're adding two output
> > arguments:
> > - the tuple being inserted.  But surely you have this already, because
> > it's in the 'slot' argument you passed in. ExecInsert is even careful to
> > set the ->tts_tableOid argument there.  So ExecInsert's caller doesn't
> > need to receive the inserted tuple as an argument, it can just read
> > 'slot'.
>
> Hmm, ExecInsert()'s input slot belongs to either the source partition
> or the "root" target relation, the latter due to the following stanza
> in ExecCrossPartitionUpdate():
>
> /*
>  * resultRelInfo is one of the per-relation resultRelInfos.  So we should
>  * convert the tuple into root's tuple descriptor if needed, since
>  * ExecInsert() starts the search from root.
>  */
> tupconv_map = ExecGetChildToRootMap(resultRelInfo);
> if (tupconv_map != NULL)
> slot = execute_attr_map_slot(tupconv_map->attrMap,
>  slot,
>  mtstate->mt_root_tuple_slot);
>
> Though the slot whose tuple ExecInsert() ultimately inserts may be
> destination partition's ri_PartitionTupleSlot due to the following
> stanza in it:
>
> /*
>  * If the input result relation is a partitioned table, find the leaf
>  * partition to insert the tuple into.
>  */
> if (proute)
> {
> ResultRelInfo *partRelInfo;
>
> slot = ExecPrepareTupleRouting(mtstate, estate, proute,
>resultRelInfo, slot,
>&partRelInfo);
> resultRelInfo = partRelInfo;
> }
>
> It's not great that ExecInsert()'s existing header comment doesn't
> mention that the slot whose tuple is actually inserted may not be the
> slot it receives from the caller :-(.
>
> > - the relation to which the tuple was inserted.  Can't this be obtained
> > by looking at slot->tts_tableOid?  We should be able to use
> > ExecLookupResultRelByOid() to obtain it, no? (I suppose you may claim
> > that this is wasteful, but I think this is not a common case anyway and
> > it's worth keeping ExecInsert()'s API clean for the sake of the 99.99%
> > of its other calls).
>
> ExecLookupResultRelByOid() is only useful when *all* relevant leaf
> partitions are present in the ModifyTableState.resultRelInfo array
> (due to being present in ModifyTable.resultRelations).  Leaf
> partitions that are only initialized by tuple routing (such as
> destination partitions of cross-partition updates) are only present in
> ModifyTableState.mt_partition_tuple_routing.partitions[] that are not
> discoverable by ExecLookupResultRelByOid().
>
> > I think the argument definition of ExecCrossPartitionUpdateForeignKey is
> > a bit messy.  I propose to move mtstate,estate as two first arguments;
> > IIRC the whole executor does it that way.
>
> Okay, done.
>
> > AfterTriggerSaveEvent determines maybe_crosspart_update (by looking at
> > mtstate->operation -- why doesn't it look at 'event' instead?) and later
> > it determines new_event.ate_flags.  Why can't it use
> > maybe_crosspart_update to simplify part of that?  Or maybe the other way
> > around, not sure.  It looks like something could be made simpler there.
>
> Actually, I remember disliking maybe_crosspart_update for sounding a
> bit fuzzy and also having to add mtstate to a bunch of trigger.c
> interface functions only to calculate that.
>
> I now wonder if passing an is_crosspart_update through
> ExecAR{Update|Delete}Triggers() would not be better.  Both
> ExecDelete() and ExecCrossPartitionUpdateForeignKey() know for sure if
> their ExecAR{Update|Delete}Triggers() invocation is for a
> cross-partition update, so better to just pass that information down
> to AfterTriggerSaveEvent() than pass 'mtstate' and have the latter
> reverse-engineer only a fuzzy guess of whether that's the case.
>
> I like that interface better and have implemented it in the updated version.
>
> I've also merged your changes and made some of my own as mentioned
> above to end up with the attached v14.  I'm also attaching a delta
> patch over v13 (0001+0002) to easily see the changes I made to get
> v14.

Oop

Re: a misbehavior of partition row movement (?)

2022-01-19 Thread Alvaro Herrera
On 2022-Jan-19, Amit Langote wrote:

> BTW, your tweaks patch added this:
> 
> + * *inserted_tuple is the tuple that's effectively inserted;
> + * *inserted_destrel is the relation where it was inserted.
> + * These are only set on success.  FIXME -- see what happens on
> the "do nothing" cases.
> 
> If by "do nothing cases" you mean INSERT ON CONFLICT ... DO NOTHING,
> then I don't think it matters, because the caller in that case would
> be ExecModifyTable() which doesn't care about inserted_tuple and
> inserted_destrel.

No, I meant a FOR EACH ROW trigger that does RETURN NULL to "abort" the
insertion.  IIRC in non-partitioned cases it is possibly to break
referential integrity by using those.  What I was wondering is whether
you can make this new code crash.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"I can see support will not be a problem.  10 out of 10."(Simon Wittber)
  (http://archives.postgresql.org/pgsql-general/2004-12/msg00159.php)




Re: a misbehavior of partition row movement (?)

2022-01-19 Thread Amit Langote
On Wed, Jan 19, 2022 at 6:26 PM Alvaro Herrera  wrote:
> On 2022-Jan-19, Amit Langote wrote:
> > BTW, your tweaks patch added this:
> >
> > + * *inserted_tuple is the tuple that's effectively inserted;
> > + * *inserted_destrel is the relation where it was inserted.
> > + * These are only set on success.  FIXME -- see what happens on
> > the "do nothing" cases.
> >
> > If by "do nothing cases" you mean INSERT ON CONFLICT ... DO NOTHING,
> > then I don't think it matters, because the caller in that case would
> > be ExecModifyTable() which doesn't care about inserted_tuple and
> > inserted_destrel.
>
> No, I meant a FOR EACH ROW trigger that does RETURN NULL to "abort" the
> insertion.

Ah, gotcha.

>  IIRC in non-partitioned cases it is possibly to break
> referential integrity by using those.  What I was wondering is whether
> you can make this new code crash.

insert_destrel would be left set to NULL, which means
ExecCrossPartitionUpdateForeignKey() won't get called, because:

 * NULL insert_destrel means that the move failed to occur, that
 * is, the update failed, so no need to anything in that case.
 */
if (insert_destrel &&
resultRelInfo->ri_TrigDesc &&
resultRelInfo->ri_TrigDesc->trig_update_after_row)
ExecCrossPartitionUpdateForeignKey(mtstate, estate,

Moreover, trigger documentation warns of a "possibility of surprising
outcomes" if BR triggers are present in partitions that are chosen as
destinations of cross-partition updates:

"Then all row-level BEFORE INSERT triggers are fired on the
destination partition. The possibility of surprising outcomes should
be considered when all these triggers affect the row being moved."

I suppose the new code shouldn't need to take special care in such cases either.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: a misbehavior of partition row movement (?)

2021-09-02 Thread Andrew Dunstan


On 7/13/21 8:09 AM, Amit Langote wrote:
>
>
>
> @Amit patch is not successfully applying, can you please rebase that. 
>
>
> Thanks for the reminder.
>
> Masahiko Sawada, it's been a bit long since you reviewed the
> patch, are you still interested to review that? 
>
>
> Unfortunately, I don’t think I’ll have time in this CF to solve some
> very fundamental issues I found in the patch during the last cycle. 
> I’m fine with either marking this as RwF for now or move to the next CF.


Amit, do you have time now to work on this?


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: a misbehavior of partition row movement (?)

2021-09-02 Thread Amit Langote
Hi Andrew,

On Fri, Sep 3, 2021 at 6:19 AM Andrew Dunstan  wrote:
> On 7/13/21 8:09 AM, Amit Langote wrote:
> > Unfortunately, I don’t think I’ll have time in this CF to solve some
> > very fundamental issues I found in the patch during the last cycle.
> > I’m fine with either marking this as RwF for now or move to the next CF.
>
> Amit, do you have time now to work on this?

I will take some time next week to take a fresh look at this and post an update.

Thank you.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: a misbehavior of partition row movement (?)

2021-09-10 Thread Amit Langote
On Fri, Sep 3, 2021 at 12:23 PM Amit Langote  wrote:
> Hi Andrew,
>
> On Fri, Sep 3, 2021 at 6:19 AM Andrew Dunstan  wrote:
> > On 7/13/21 8:09 AM, Amit Langote wrote:
> > > Unfortunately, I don’t think I’ll have time in this CF to solve some
> > > very fundamental issues I found in the patch during the last cycle.
> > > I’m fine with either marking this as RwF for now or move to the next CF.
> >
> > Amit, do you have time now to work on this?
>
> I will take some time next week to take a fresh look at this and post an 
> update.

So I started looking at this today.  I didn't make much an inroad into
the stumbling block with 0002 patch that I had mentioned back in [1],
though I decided to at least post a rebased version of the patches
that apply.

I think 0001 is independently committable on its own merits,
irrespective of the yet unresolved problems of 0002, a patch to fix
$subject, which I'll continue to work on.

0003 shows a test that crashes the server due to said problem.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

[1] 
https://www.postgresql.org/message-id/CA%2BHiwqHMpNZOc2Z-zgdO9hbJ7wMCOC%3DWpJYszVusZ%3DoE2OTf8w%40mail.gmail.com


v8-0001-Create-foreign-key-triggers-in-partitioned-tables.patch
Description: Binary data


v8-0003-This-test-crashes-the-patch.patch
Description: Binary data


v8-0002-Enforce-foreign-key-correctly-during-cross-partit.patch
Description: Binary data


Re: a misbehavior of partition row movement (?)

2021-09-10 Thread Zhihong Yu
On Fri, Sep 10, 2021 at 7:06 AM Amit Langote 
wrote:

> On Fri, Sep 3, 2021 at 12:23 PM Amit Langote 
> wrote:
> > Hi Andrew,
> >
> > On Fri, Sep 3, 2021 at 6:19 AM Andrew Dunstan 
> wrote:
> > > On 7/13/21 8:09 AM, Amit Langote wrote:
> > > > Unfortunately, I don’t think I’ll have time in this CF to solve some
> > > > very fundamental issues I found in the patch during the last cycle.
> > > > I’m fine with either marking this as RwF for now or move to the next
> CF.
> > >
> > > Amit, do you have time now to work on this?
> >
> > I will take some time next week to take a fresh look at this and post an
> update.
>
> So I started looking at this today.  I didn't make much an inroad into
> the stumbling block with 0002 patch that I had mentioned back in [1],
> though I decided to at least post a rebased version of the patches
> that apply.
>
> I think 0001 is independently committable on its own merits,
> irrespective of the yet unresolved problems of 0002, a patch to fix
> $subject, which I'll continue to work on.
>
> 0003 shows a test that crashes the server due to said problem.
>
> --
> Amit Langote
> EDB: http://www.enterprisedb.com
>
> [1]
> https://www.postgresql.org/message-id/CA%2BHiwqHMpNZOc2Z-zgdO9hbJ7wMCOC%3DWpJYszVusZ%3DoE2OTf8w%40mail.gmail.com

Hi,
For patch 0001, GetForeignKeyActionTriggers:

+   if (!OidIsValid(*deleteTriggerOid) || !OidIsValid(*updateTriggerOid))
+   elog(ERROR, "could not find action triggers of foreign key
constraint %u",

I think if the error message includes whether it is the delete or update
trigger that isn't found, it would be helpful.

Similar comment for error message in GetForeignKeyCheckTriggers().

Cheers


Re: a misbehavior of partition row movement (?)

2021-09-19 Thread Amit Langote
On Fri, Sep 10, 2021 at 11:03 PM Amit Langote  wrote:
> On Fri, Sep 3, 2021 at 12:23 PM Amit Langote  wrote:
> > Hi Andrew,
> >
> > On Fri, Sep 3, 2021 at 6:19 AM Andrew Dunstan  wrote:
> > > On 7/13/21 8:09 AM, Amit Langote wrote:
> > > > Unfortunately, I don’t think I’ll have time in this CF to solve some
> > > > very fundamental issues I found in the patch during the last cycle.
> > > > I’m fine with either marking this as RwF for now or move to the next CF.
> > >
> > > Amit, do you have time now to work on this?
> >
> > I will take some time next week to take a fresh look at this and post an 
> > update.
>
> So I started looking at this today.  I didn't make much an inroad into
> the stumbling block with 0002 patch that I had mentioned back in [1],
> though I decided to at least post a rebased version of the patches
> that apply.
>
> I think 0001 is independently committable on its own merits,
> irrespective of the yet unresolved problems of 0002, a patch to fix
> $subject, which I'll continue to work on.
>
> 0003 shows a test that crashes the server due to said problem.

I think I found a solution to the problem with 0002.

The problem was that the tuplestore
(afterTriggers.query_stack[query_level].tuplestore) that I decided to
use to store the AFTER trigger tuples of a partitioned table that is
the target of an cross-partition update lives only for the duration of
a given query.  So that approach wouldn't work if the foreign key
pointing into that partitioned table is marked INITIALLY DEFERRED.  To
fix, I added a List field to AfterTriggersData that stores the
tuplestores to store the tuples of partitioned tables that undergo
cross-partition updates in a transaction and are pointed to by
INITIALLY DEFERRED foreign key constraints.  I couldn't understand one
comment about why using a tuplestore for such cases *might not* work,
which as follows:

 * Note that we need triggers on foreign tables to be fired in exactly the
 * order they were queued, so that the tuples come out of the tuplestore in
 * the right order.  To ensure that, we forbid deferrable (constraint)
 * triggers on foreign tables.  This also ensures that such triggers do not
 * get deferred into outer trigger query levels, meaning that it's okay to
 * destroy the tuplestore at the end of the query level.

I tried to break the approach using various test cases (some can be
seen added by the patch to foreign_key.sql), but couldn't see the
issue alluded to in the above comment.  So I've marked the comment
with an XXX note as follows:

- * Note that we need triggers on foreign tables to be fired in exactly the
- * order they were queued, so that the tuples come out of the tuplestore in
- * the right order.  To ensure that, we forbid deferrable (constraint)
- * triggers on foreign tables.  This also ensures that such triggers do not
- * get deferred into outer trigger query levels, meaning that it's okay to
- * destroy the tuplestore at the end of the query level.
+ * Note that we need triggers on foreign and partitioned tables to be fired in
+ * exactly the order they were queued, so that the tuples come out of the
+ * tuplestore in the right order.  To ensure that, we forbid deferrable
+ * (constraint) triggers on foreign tables.  This also ensures that such
+ * triggers do not get deferred into outer trigger query levels, meaning that
+ * it's okay to destroy the tuplestore at the end of the query level.
+ * XXX - update this paragraph if the new approach, whereby tuplestores in
+ * afterTriggers.deferred_tuplestores outlive any given query, can be proven
+ * to not really break any assumptions mentioned here.

If anyone reading this can think of the issue the original comment
seems to be talking about, please let me know.

I've attached updated patches.  I've addressed Zhihong Yu's comment too.

Thank you.

--
Amit Langote
EDB: http://www.enterprisedb.com


v9-0001-Create-foreign-key-triggers-in-partitioned-tables.patch
Description: Binary data


v9-0002-Enforce-foreign-key-correctly-during-cross-partit.patch
Description: Binary data


Re: a misbehavior of partition row movement (?)

2021-10-14 Thread Amit Langote
On Mon, Sep 20, 2021 at 3:32 PM Amit Langote  wrote:
> The problem was that the tuplestore
> (afterTriggers.query_stack[query_level].tuplestore) that I decided to
> use to store the AFTER trigger tuples of a partitioned table that is
> the target of an cross-partition update lives only for the duration of
> a given query.  So that approach wouldn't work if the foreign key
> pointing into that partitioned table is marked INITIALLY DEFERRED.  To
> fix, I added a List field to AfterTriggersData that stores the
> tuplestores to store the tuples of partitioned tables that undergo
> cross-partition updates in a transaction and are pointed to by
> INITIALLY DEFERRED foreign key constraints.  I couldn't understand one
> comment about why using a tuplestore for such cases *might not* work,
> which as follows:
>
>  * Note that we need triggers on foreign tables to be fired in exactly the
>  * order they were queued, so that the tuples come out of the tuplestore in
>  * the right order.  To ensure that, we forbid deferrable (constraint)
>  * triggers on foreign tables.  This also ensures that such triggers do not
>  * get deferred into outer trigger query levels, meaning that it's okay to
>  * destroy the tuplestore at the end of the query level.
>
> I tried to break the approach using various test cases (some can be
> seen added by the patch to foreign_key.sql), but couldn't see the
> issue alluded to in the above comment.  So I've marked the comment
> with an XXX note as follows:
>
> - * Note that we need triggers on foreign tables to be fired in exactly the
> - * order they were queued, so that the tuples come out of the tuplestore in
> - * the right order.  To ensure that, we forbid deferrable (constraint)
> - * triggers on foreign tables.  This also ensures that such triggers do not
> - * get deferred into outer trigger query levels, meaning that it's okay to
> - * destroy the tuplestore at the end of the query level.
> + * Note that we need triggers on foreign and partitioned tables to be fired 
> in
> + * exactly the order they were queued, so that the tuples come out of the
> + * tuplestore in the right order.  To ensure that, we forbid deferrable
> + * (constraint) triggers on foreign tables.  This also ensures that such
> + * triggers do not get deferred into outer trigger query levels, meaning that
> + * it's okay to destroy the tuplestore at the end of the query level.
> + * XXX - update this paragraph if the new approach, whereby tuplestores in
> + * afterTriggers.deferred_tuplestores outlive any given query, can be proven
> + * to not really break any assumptions mentioned here.
>
> If anyone reading this can think of the issue the original comment
> seems to be talking about, please let me know.

I brought this up in an off-list discussion with Robert and he asked
why I hadn't considered not using tuplestores to remember the tuples
in the first place, specifically pointing out that it may lead to
unnecessarily consuming a lot of memory when such updates move a bunch
of rows around.  Like, we could simply remember the tuples to be
passed to the trigger function using their CTIDs as is done for normal
(single-heap-relation) updates, though in this case also remember the
OIDs of the source and the destination partitions of a particular
cross-partition update.

I had considered that idea before but I think I had overestimated the
complexity of that approach so didn't go with it.  I tried and the
resulting patch doesn't look all that complicated, with the added
bonus that the bulk update case shouldn't consume so much memory with
this approach like the previous tuplestore version would have.

Updated patches attached.

-- 
Amit Langote
EDB: http://www.enterprisedb.com


v10-0001-Create-foreign-key-triggers-in-partitioned-table.patch
Description: Binary data


v10-0002-Enforce-foreign-key-correctly-during-cross-parti.patch
Description: Binary data


Re: a misbehavior of partition row movement (?)

2021-07-13 Thread Ibrar Ahmed
On Fri, Apr 2, 2021 at 6:09 PM Amit Langote  wrote:

> On Thu, Apr 1, 2021 at 10:56 AM Masahiko Sawada 
> wrote:
> > On Tue, Mar 23, 2021 at 6:27 PM Amit Langote 
> wrote:
> > > Actually, I found a big hole in my assumptions around deferrable
> > > foreign constraints, invalidating the approach I took in 0002 to use a
> > > query-lifetime tuplestore to record root parent tuples.  I'm trying to
> > > find a way to make the tuplestore transaction-lifetime so that the
> > > patch still works.
> > >
> > > In the meantime, I'm attaching an updated set with 0001 changed per
> > > your comments.
> >
> > 0001 patch conflicts with 71f4c8c6f74. Could you please rebase the
> patchset?
>
> Thanks for the heads up.
>
> I still don't have a working patch to address the above mentioned
> shortcoming of the previous approach, but here is a rebased version in
> the meantime.
>
>
> --
> Amit Langote
> EDB: http://www.enterprisedb.com
>


@Amit patch is not successfully applying, can you please rebase that.

Masahiko Sawada, it's been a bit long since you reviewed the patch, are you
still interested to review that?

-- 
Ibrar Ahmed


Re: a misbehavior of partition row movement (?)

2021-07-13 Thread Amit Langote
Hi Ibrar, Sawada-san,

On Tue, Jul 13, 2021 at 20:25 Ibrar Ahmed  wrote:

>
>
> On Fri, Apr 2, 2021 at 6:09 PM Amit Langote 
> wrote:
>
>> On Thu, Apr 1, 2021 at 10:56 AM Masahiko Sawada 
>> wrote:
>> > On Tue, Mar 23, 2021 at 6:27 PM Amit Langote 
>> wrote:
>> > > Actually, I found a big hole in my assumptions around deferrable
>> > > foreign constraints, invalidating the approach I took in 0002 to use a
>> > > query-lifetime tuplestore to record root parent tuples.  I'm trying to
>> > > find a way to make the tuplestore transaction-lifetime so that the
>> > > patch still works.
>> > >
>> > > In the meantime, I'm attaching an updated set with 0001 changed per
>> > > your comments.
>> >
>> > 0001 patch conflicts with 71f4c8c6f74. Could you please rebase the
>> patchset?
>>
>> Thanks for the heads up.
>>
>> I still don't have a working patch to address the above mentioned
>> shortcoming of the previous approach, but here is a rebased version in
>> the meantime.
>>
>>
>> --
>> Amit Langote
>> EDB: http://www.enterprisedb.com
>>
>
>
> @Amit patch is not successfully applying, can you please rebase that.
>

Thanks for the reminder.

Masahiko Sawada, it's been a bit long since you reviewed the patch, are you
> still interested to review that?
>

Unfortunately, I don’t think I’ll have time in this CF to solve some very
fundamental issues I found in the patch during the last cycle.  I’m fine
with either marking this as RwF for now or move to the next CF.

> --
Amit Langote
EDB: http://www.enterprisedb.com