Re: Deferred constraint trigger semantics

2022-05-16 Thread Bryn Llewellyn
> laurenz.albe@cybertec.atwrote:
> 
>> b...@yugabyte.com wrote:
> 
>> 
>> …I tried this:
>> 
>> create table t(
>>   k serial primary key,
>>   v int not null,
>>   constraint t_v_unq unique(v) initially deferred);

Here's a better test:

-- BLUE session
start transaction isolation level read committed;
insert into t(v) values (1), (2);

-- RED session
start transaction isolation level read committed;
insert into t(v) values (1), (3);

-- BLUE session
set constraints all immediate;

-- RED session (hangs until BLUE commits).
-- Then, when it does, gets ERROR... "Key (v)=(1) already exists"
set constraints all immediate;

-- BLUE session
commit;

-- RED session
-- There are no changes to commit 'cos they were rolled back.
commit;

-- "select k, v from t order by k" (in each session) now shows that both 
sessions meet the constraint.

>> Where, in the PG doc, can I read the account of the proper mental model for 
>> the application programmer?
> 
> [See https://www.postgresql.org/docs/current/index-unique-checks.html.]

Thanks for referring me to the account "62.5. Index Uniqueness Checks". It's in 
the section "Part VII. Internals" (…contains assorted information that might be 
of use to PostgreSQL developers). I wouldn't expect to read this because I 
don't intend to write code that might become part of PG's implementation.

> I'd say that the proper mental model is that you don't need to care… The 
> exact sequence of what happens during COMMIT is interesting, but irrelevant 
> to the programmer. All that counts is "a deferred constraint is checked 
> between the time that COMMIT starts processing and the time that it returns".

Yes, I very much like this stance. It seems that, for built-in constraints 
(like "unique" or "foreign key") it's enough to understand that PG implements 
these at the "read committed" isolation level by using methods (that aren't 
exposed via SQL) to peep below the application programmer's MVCC view of the 
world to check the uncommitted state of other, concurrent, sessions.

This explains why, in the (new) test that I used above, the conflict is 
detected when the second session issues "set constraints all immediate" after 
the first already did this (i.e. long before COMMIT). In this case, the second 
session hangs until the first commits—at which point the second sees the 
uniqueness violation error.

In other words, the automagic implementation of the enforcement of built-in 
constraints allows the safe use of "set constraints all immediate" to provoke a 
possible early error that can, very usefully, be handled in PL/pgSQL code. This 
is the clue to understanding why the check of a built-in constraint, when it's 
performed as an implicit consequence of "commit", doesn't need to be within the 
small part of the operations that "commit" causes that are strictly serialized. 
(You've explained how this helps performance in multi-session scenarios.)

Critically, the special methods that implement the enforcement of built-in 
constraints aren't accessible in PL/pgSQL code and therefore not accessible in 
the "ordinary" implementation of trigger functions. This is the point that I 
failed to grasp. (Though I do see, now, that Laurenz's post says this clearly.)

I was able to demonstrate this by implementing a unique constraint with a 
deferred constraint trigger (and no use of "set constraints all immediate"). I 
simply introduced "pg_sleep(5)" between the trigger function's actual check and 
its final "return null". I copied the code below for completeness.

The "BLUE" session, because it reaches its serialized "commit" actions first, 
sees an outcome that meets the constraint. But the "RED" session has enough 
time to do its check before "BLUE" does its serialized "commit" actions. So its 
test passes too. This leaves the final database in conflict with the intended 
constraint.

I see now that the only robust use of an ordinarily (i.e. not using C) 
implemented constraint trigger (deferred or otherwise) is to enforce a single 
row-constraint. (There's a caveat that maybe, after careful analysis, you can 
work out a cunning lockings scheme to allow the safe implementation of an 
entity-level constraint without using C. But the "exactly one or two admins in 
a department" example shows that this isn't generally possible.) So it's 
reasonable that a constraint trigger must be AFTER EACH ROW. Further, it would  
make no sense to do SQL from its implementation function because the only 
values that you might defensibly use are available simply via "old" and "new".

So all that stuff I was concerned about where the deferred constraint fires 
many times when once is enough falls away because the larger endeavor makes no 
sense. (I suppose that it might matter if you implemented the trigger function 
robustly in C.)

It does strike me that the ordinary application programmer—who reads just the 
sections "CREATE TRIGGER", "Chapter 39. Triggers", "CREATE TABLE", and "SET 

Re: Deferred constraint trigger semantics

2022-05-13 Thread Bryn Llewellyn
> laurenz.a...@cybertec.at wrote:
> 
> Be "seeing" I didn't mean "show to the user". I mean that the code that 
> implements PostgreSQL constraints takes uncommitted data into account.
> 
> The documentation describes that for the case of uniqueness in some detail:
> 
> https://www.postgresql.org/docs/current/index-unique-checks.html
> 
> I'd say that the proper mental model is that you don't need to care. The ACID 
> properties are guarantees that the database makes, and these guarantees are 
> usually negative: "no sequence of actions can result in the violation of a 
> unique constraint" or "you don't get to see uncommitted data".
> 
> The exact sequence of what happens during COMMIT is interesting, but 
> irrelevant to the programmer.  All that counts is "a deferred constraint is 
> checked between the time that COMMIT starts processing and the time that it 
> returns".
> 
> If you want to know more, you have to start reading the code. It is open 
> source and well documented.

Thanks for the quick reply, Laurenz. I'm signing off now (US/Pacific) for a 
long weekend. I'll try to digest what you wrote on Monday.

Re: Deferred constraint trigger semantics

2022-05-13 Thread Laurenz Albe
On Thu, 2022-05-12 at 22:06 -0700, Bryn Llewellyn wrote:
> 
> > In the case of constraint triggers, yes. But there is no race condition for 
> > primary key,
> > unique and foreign key constraints, because they also "see" uncommitted 
> > data.
> 
> I can't follow you here, sorry. I tried this:
> 
> create table t(
>   k serial primary key,
>   v int not null,
>   constraint t_v_unq unique(v) initially deferred);
> 
> -- RED
> start transaction isolation level read committed;
> insert into t(v) values (1), (2);
> select k, v from t order by k;
> 
> -- BLUE
> start transaction isolation level read committed;
> insert into t(v) values (1), (3);
> select k, v from t order by k;
> 
> -- RED
> commit;
> select k, v from t order by k;
> 
> -- BLUE
> select k, v from t order by k;
> commit;
> 
> select k, v from t order by k;
> 
> The first "select" from the "BLUE" session at the very end produces this:
> 
>  k | v 
> ---+---
>  1 | 1
>  2 | 2
>  3 | 1
>  4 | 3
> 
> This doesn't surprise me. It's ultimately illegal. But not yet. (Before "RED" 
> committed, "BLUE"
> didn't see the rows with "k = 1" and "k = 2". So it isn't seeing any other 
> sessions uncommitted
> data—but only it's own uncommitted data.)

Be "seeing" I didn't mean "show to the user".
I mean that the code that implements PostgreSQL constraints takes uncommitted 
data into account.

The documentation describes that for the case of uniqueness in some detail:
https://www.postgresql.org/docs/current/index-unique-checks.html

> Then, when "BLUE" commits, it (of course) gets this:
> 
> ERROR:  duplicate key value violates unique constraint "t_v_unq"
> DETAIL:  Key (v)=(1) already exists.
> 
> Then it sees (of course, again) only the rows with "k = 1" and "k = 2"—the 
> same as what "RED" saw.
> 
> It seems to be impossible to do a test in slow motion where "RED" and "BLUE" 
> each issues "commit"
> at the exact same moment. So thinking about this scenario doesn't tell me if:
> 
> (a) Each session runs its constraint check and the rest of what "commit" 
> entails in a genuinely serialized fashion.
> 
> OR
> 
> (b) Each session first runs its constraint check (and some other stuff) 
> non-serializedly—and only
> then runs the small part of the total "commit" action (the WAL part) 
> serializedly.
> (This would result in bad data in the database at rest—just as my 
> contrived misuse of
> "set constraints all immediate" left things in my "one or two admins" 
> scenario.)

I'd say that (b) is a more accurate description.

> The (a) scheme sounds correct. And the (b) scheme sounds wrong. Why would PG 
> prefer to implement (b) rather than (a)?
> 
> I'm clearly missing something.

Because (a) would result in terrible performance if there are many concurrent 
transactions.

I don't see why (b) is wrong - as your example shows, the behavior is correct.

Perhaps you have to understand what a PostgreSQL "snapshot" is and that the 
exact moment at
which a row was created is not important - it is the transaction numbers in 
"xmin" and "xmax"
that count.

> > 
> Where, in the PG doc, can I read the account of the proper mental model for 
> the application programmer?
> It seems to be impossible to conduct an experiment that would disprove the 
> hypothesis that one,
> or the other, of these mental models is correct.

I'd say that the proper mental model is that you don't need to care.
The ACID properties are guarantees that the database makes, and these 
guarantees are usually
negative: "no sequence of actions can result in the violation of a unique 
constraint" or
"you don't get to see uncommitted data".

The exact sequence of what happens during COMMIT is interesting, but irrelevant 
to the
programmer.  All that counts is "a deferred constraint is checked between the 
time that
COMMIT starts processing and the time that it returns".

If you want to know more, you have to start reading the code.  It is open 
source and
well documented.

> > > > 
> Your post's testcase used the condition "at least one guard on duty" [...]
> 
> My testcase used a stricter rule: the table of staff must have exactly one or 
> two rows where
> the job is "Admin". So, here, concurrent sessions can break the rule (when 
> the txn starts
> with one "Admin") by updating different rows to make them "Admin" or by 
> inserting different
> new "Admin" rows. I've convinced myself by experiment that an ordinary 
> trigger can enforce
> this rule when contending sessions use "serializable" isolation. Am I right 
> that you'd say
> that no pessimistic locking scheme can enforce the rule at lower isolation 
> levels except the
> brute-force "lock table"?

If the transaction can insert two rows, I'd agree.
If the transaction only inserts a single row, than it could SELECT ... FOR NO 
KEY UPDATE the
one existing row, thus serializing the concurrent transactions.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: Deferred constraint trigger semantics

2022-05-12 Thread Bryn Llewellyn
> laurenz.a...@cybertec.at wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> …I have always understood that (in Postgres and any respectable RDBMS) 
>> commits in a multi-session environment are always strictly 
>> serialized—irrespective of the transaction's isolation level. Am I correct 
>> to assume this is the case for Postgres? I took "at COMMIT time" to mean "as 
>> part of the strictly serialized operations that implement a session's 
>> COMMIT".
> 
> I am not sure what you mean by serialized commits. Transactions are 
> concurrent, and so are commits. COMMIT takes some time, during which several 
> things happen, among them executing deferred constraints, writing a WAL 
> record and flushing the WAL. The only thing that is necessarily serialized is 
> writing the WAL record.

Oh. I was wrong, then. I'll say more on this below.

>> …I take what you say in your post to mean that each session executes its 
>> deferred constraint check (by extension, not just for constraint triggers 
>> but for all deferred constraint cases) momentarily *before* COMMIT so that 
>> the effect is only to reduce the duration of the race condition window 
>> rather than to eliminate it.
> 
> In the case of constraint triggers, yes. But there is no race condition for 
> primary key, unique and foreign key constraints, because they also "see" 
> uncommitted data.

I can't follow you here, sorry. I tried this:

create table t(
  k serial primary key,
  v int not null,
  constraint t_v_unq unique(v) initially deferred);

-- RED
start transaction isolation level read committed;
insert into t(v) values (1), (2);
select k, v from t order by k;

-- BLUE
start transaction isolation level read committed;
insert into t(v) values (1), (3);
select k, v from t order by k;

-- RED
commit;
select k, v from t order by k;

-- BLUE
select k, v from t order by k;
commit;
select k, v from t order by k;

The first "select" from the "BLUE" session at the very end produces this:

 k | v 
---+---
 1 | 1
 2 | 2
 3 | 1
 4 | 3

This doesn't surprise me. It's ultimately illegal. But not yet. (Before "RED" 
committed, "BLUE" didn't see the rows with "k = 1" and "k = 2". So it isn't 
seeing any other sessions uncommitted data—but only it's own uncommitted data.)

Then, when "BLUE" commits, it (of course) gets this:

ERROR:  duplicate key value violates unique constraint "t_v_unq"
DETAIL:  Key (v)=(1) already exists.

Then it sees (of course, again) only the rows with "k = 1" and "k = 2"—the same 
as what "RED" saw.

It seems to be impossible to do a test in slow motion where "RED" and "BLUE" 
each issues "commit" at the exact same moment. So thinking about this scenario 
doesn't tell me if:

(a) Each session runs its constraint check and the rest of what "commit" 
entails in a genuinely serialized fashion.

OR

(b) Each session first runs its constraint check (and some other stuff) 
non-serializedly—and only then runs the small part of the total "commit" action 
(the WAL part) serializedly. (This would result in bad data in the database at 
rest—just as my contrived misuse of "set constraints all immediate" left things 
in my "one or two admins" scenario.)

I appreciate that this just is a different wording of what I wrote before—but 
now w.r.t. the system-implemented unique constraint use-case.

The (a) scheme sounds correct. And the (b) scheme sounds wrong. Why would PG 
prefer to implement (b) rather than (a)?

I'm clearly missing something.

>> So it all depends on a lawyerly reading of the wording "at COMMIT time". The 
>> current CREATE TABLE doc says this:
>> 
>> «
>> If the constraint is INITIALLY DEFERRED, it is checked only at the end of 
>> the transaction.
>> »
>> 
>> The wording "at the end of the transaction" is not precise enough to 
>> adjudicate—and so the key question remains: Is a deferred constraint checked:
>> 
>> (a) as part of the strictly serialized operations that implement a session's 
>> COMMIT?
>> 
>> or
>> 
>> (b) momentarily *before* COMMIT and not within the serialized COMMIT 
>> execution?
>> 
>> So… (asking the wider audience) is the answer (a) or (b)? An if it's (b), 
>> why? After all, (b) brings the race condition risk. Is (a) simply not 
>> feasible?
> 
> COMMITs are not serialized. You seem to think that as soon as one 
> transaction's COMMIT starts processing, no other transaction may COMMIT at 
> the same time. That is not the case.

Yes, I most certainly did think this.

Where, in the PG doc, can I read the account of the proper mental model for the 
application programmer? It seems to be impossible to conduct an experiment that 
would disprove the hypothesis that one, or the other, of these mental models is 
correct.

 Is my entire concept (and Laurenz's too) fundamentally flawed? 
 Specifically, is querying a trigger's base table in a "for each row" 
 trigger fundamentally unsound and not supported?
>>> 
>>> My post claims that constraint triggers alone are *not* a sufficient 
>>> solution to validate 

Re: Deferred constraint trigger semantics

2022-05-12 Thread Bryn Llewellyn
> postgres.ro...@gmail.com wrote:
> 
> It adds the "initially deferred" decoration to the "create constraint 
> trigger" statement. This is (still) the result:
> 
> INFO:  trg fired. new.v = 10, n = 5
> INFO:  trg fired. new.v = 20, n = 5
> INFO:  trg fired. new.v = 30, n = 5
> INFO:  trg fired. new.v = 40, n = 5
> INFO:  trg fired. new.v = 50, n = 5
> INFO:  trg fired. new.v = 60, n = 8
> INFO:  trg fired. new.v = 70, n = 8
> INFO:  trg fired. new.v = 80, n = 8
> Because You can do 
> create constraint trigger trg
> after insert on t2
> deferrable initially deferred
> for each row
> execute function trg_fn();
> 
> You didn't explicitly defer the trigger trg on t1!. That means after you 
> insert on t1 then the trigger trg on t1 invoked rather than on commit time.  
> If you 
> create constraint trigger trg
> after insert on t1
> deferrable initially deferred
> for each row
> execute function trg_fn();
> 
> create constraint trigger trg
> after insert on t2
> deferrable initially deferred
> for each row
> execute function trg_fn();
> then you will get 
> INFO:  0: trg fired. new.v = 10, n = 8
> INFO:  0: trg fired. new.v = 20, n = 8
> INFO:  0: trg fired. new.v = 30, n = 8
> INFO:  0: trg fired. new.v = 40, n = 8
> INFO:  0: trg fired. new.v = 50, n = 8
> INFO:  0: trg fired. new.v = 60, n = 8
> INFO:  0: trg fired. new.v = 70, n = 8
> INFO:  0: trg fired. new.v = 80, n = 8 

Er… yes. David Johnston pointed that out too. I'm embarrassed beyond belief. 
Sorry to have wasted folks' time because of my mistake.



Re: Deferred constraint trigger semantics

2022-05-12 Thread alias
>
>
> It adds the "initially deferred" decoration to the "create constraint
> trigger" statement. This is (still) the result:
>
> INFO:  trg fired. new.v = 10, n = 5
> INFO:  trg fired. new.v = 20, n = 5
> INFO:  trg fired. new.v = 30, n = 5
> INFO:  trg fired. new.v = 40, n = 5
> INFO:  trg fired. new.v = 50, n = 5
> INFO:  trg fired. new.v = 60, n = 8
> INFO:  trg fired. new.v = 70, n = 8
> INFO:  trg fired. new.v = 80, n = 8
>
Because You can do
create constraint trigger trg
after insert on t2
deferrable initially deferred
for each row
execute function trg_fn();

You didn't explicitly defer the trigger trg on t1!. That means after you
insert on t1 then the trigger trg on t1 invoked rather than on commit
time.
If you
create constraint trigger trg
after insert on t1
deferrable initially deferred
for each row
execute function trg_fn();

create constraint trigger trg
after insert on t2
deferrable initially deferred
for each row
execute function trg_fn();
then you will get

> INFO:  0: trg fired. new.v = 10, n = 8
> INFO:  0: trg fired. new.v = 20, n = 8
> INFO:  0: trg fired. new.v = 30, n = 8
> INFO:  0: trg fired. new.v = 40, n = 8
> INFO:  0: trg fired. new.v = 50, n = 8
> INFO:  0: trg fired. new.v = 60, n = 8
> INFO:  0: trg fired. new.v = 70, n = 8
> INFO:  0: trg fired. new.v = 80, n = 8
>




On Thu, May 12, 2022 at 4:13 AM Bryn Llewellyn  wrote:

> *david.g.johns...@gmail.com  wrote:*
>
> *b...@yugabyte.com  wrote:*
>
>
> Thanks for the point-by-point reply, David.
>
> ...makes no mention of what you might expect to see in an AFTER EACH
> ROW trigger.
>
>
> ...the absence of a comment declaring a guarantee of order means that,
> like the comment for the row-level BEFORE trigger, the row-level AFTER row
> ordering is not guaranteed (even if one cannot produce a counter-example in
> today's codebase).
>
>
> Got it!
>
> ...unless anybody contradicts me.
>
>
> Caveat emptor...? I wouldn't be surprised that doing so is technically
> possible in all cases - as to whether a particular algorithm is sound, to
> some extent, isn't something we try to predict. We do try to describe all
> the known interactions though - and let the user work within what those
> mean for them.
>
>
> Got it again!
>
> ...implies that there's no such unpredictability in the AFTER EACH ROW
> cases.
>
>
> I would not read it that way. In general, absence of mention of
> predictability like this means there is none - that some other sentence
> goes into more detail doesn't change that.
>
>
> OK.
>
> But there has to be a subtle caveat here for the deferred constraint
> trigger when the txn changes two or more tables, all of which participate
> in the query that the trigger function issues… The "raise info"
> output below illustrates my point (n changes from 5 to 8).
>
>
> I'm failing to see the deferral aspect of that example. First statement
> finishes, sees the 5 inserts, next statement finishes, sees 3 more inserts.
> Not, both statements finish, triggers fire, triggers see all 8 inserts
> (which I suspect they will if you actually perform deferral).
>
>
> Oops. I did a copy-and-paste error on going from my test env. to email and
> missed out the "deferral" that I'd intended. For completeness, here's the
> test that I meant:
>
> create table t1(k serial primary key, v int not null);
> create table t2(k serial primary key, v int not null);
>
> create function trg_fn()
>   returns trigger
>   language plpgsql
> as $body$
> declare
>   n int not null := 0;
> begin
>   n := (select count(*) from t1) + (select count(*) from t2);
>   raise info 'trg fired. new.v = %, n = %', new.v, n;
>   return new;
> end;
> $body$;
>
> create constraint trigger trg
> after insert on t1
> for each row
> execute function trg_fn();
>
> create constraint trigger trg
> after insert on t2
> initially deferred
> for each row
> execute function trg_fn();
>
> set default_transaction_isolation = 'read committed';
> do $body$
> begin
>   insert into t1(v)
>   values (10), (20), (30), (40), (50);
>
>   insert into t2(v)
>   values (60), (70), (80);
> end;
> $body$;
>
> It adds the "initially deferred" decoration to the "create constraint
> trigger" statement. This is (still) the result:
>
> INFO:  trg fired. new.v = 10, n = 5
> INFO:  trg fired. new.v = 20, n = 5
> INFO:  trg fired. new.v = 30, n = 5
> INFO:  trg fired. new.v = 40, n = 5
> INFO:  trg fired. new.v = 50, n = 5
> INFO:  trg fired. new.v = 60, n = 8
> INFO:  trg fired. new.v = 70, n = 8
> INFO:  trg fired. new.v = 80, n = 8
>
> Even though both inserts have completed by commit time, only the trigger
> firing caused by the second statement sees the final state that obtains the
> moment before commit. The first statement sees only the state after it
> finishes and before the second statement executes. You said « I suspect
> [that both statements will see the final state] if you actually perform
> deferral ». My test shows that this is not the case.
>
> *Did I 

Re: Deferred constraint trigger semantics

2022-05-12 Thread Laurenz Albe
On Wed, 2022-05-11 at 15:54 -0700, Bryn Llewellyn wrote:
> I re-read the penultimate paragraph in Laurenz's post:
> 
> «
> By making the trigger INITIALLY DEFERRED, we tell PostgreSQL to check the 
> condition at COMMIT time.
> »
> 
> I have always understood that (in Postgres and any respectable RDBMS) commits 
> in a multi-session
> environment are always strictly serialized—irrespective of the transaction's 
> isolation level.
> Am I correct to assume this is the case for Postgres? I took "at COMMIT time" 
> to mean "as part
> of the strictly serialized operations that implement a session's COMMIT".

I am not sure what you mean by serialized commits.  Transactions are 
concurrent, and so are
commits.  COMMIT takes some time, during which several things happen, among 
them executing
deferred constraints, writing a WAL record and flushing the WAL.  The only 
thing that is
necessarily serialized is writing the WAL record.

> But I see (now) that you argue that this is not the case, thus:
> 
> «
> This will reduce the window for the race condition a little, but the problem 
> is still there.
> If concurrent transactions run the trigger function at the same time, they 
> won’t see each other’s modifications.
> »
> 
> I take what you say in your post to mean that each session executes its 
> deferred constraint
> check (by extension, not just for constraint triggers but for all deferred 
> constraint cases)
> momentarily *before* COMMIT so that the effect is only to reduce the duration 
> of the race condition
> window rather than to eliminate it.

In the case of constraint triggers, yes.
But there is no race condition for primary key, unique and foreign key 
constraints, because
they also "see" uncommitted data.

> So it all depends on a lawyerly reading of the wording "at COMMIT time". The 
> current CREATE TABLE doc says this:
> 
> «
> If the constraint is INITIALLY DEFERRED, it is checked only at the end of the 
> transaction.
> »
> 
> The wording "at the end of the transaction" is not precise enough to 
> adjudicate—and so the key
> question remains: Is a deferred constraint checked:
> 
> (a) as part of the strictly serialized operations that implement a session's 
> COMMIT?
> 
> or
> 
> (b) momentarily *before* COMMIT and not within the serialized COMMIT 
> execution?
> 
> So… (asking the wider audience) is the answer (a) or (b)? An if it's (b), 
> why? After all, (b) brings
> the race condition risk. Is (a) simply not feasible?

COMMITs are not serialized.  You seem to think that as soon as one 
transaction's COMMIT starts
processing, no other transaction may COMMIT at the same time.  That is not the 
case.

> 
> > > Is my entire concept (and Laurenz's too) fundamentally flawed? 
> > > Specifically, is querying
> > > a trigger's base table in a "for each row" trigger fundamentally unsound 
> > > and not supported?
> > 
> > My post claims that constraint triggers alone are *not* a sufficient 
> > solution to validate
> > constraints - you need additional locking or SERIALIZABLE isolation to make 
> > that work reliably.
> 
> This doesn't seem to be what you wrote. These two headings [...]

Then I must have been unclear.  Or you only looked at the headings.

> As I reason it, if you use the SERIALIZABLE approach, then an ordinary 
> immediate AFTER EACH
> STATEMENT trigger will work fine—precisely because of how that isolation 
> level is defined.
> So here, a deferred constraint trigger isn't needed and brings no value.

Now that is absolutely true.  If you use the big hammer of SERIALIZABLE, there 
can be no
anomaly, and it is unnecessary to keep the window for a race condition small.
Deferred triggers and constraints still have a value, because they see the state
of the database at the end of the whole transaction.

> This implies that if a deferred constraint trigger is to have any utility, it 
> must be safe
> to use it (as I tested it) at the READ COMMITTED level. I do see that, though 
> I appear to
> be testing this, I cannot do a reliable test because I cannot, in application 
> code, open up,
> and exploit, a race condition window after COMMIT has been issued. (I *am* 
> able to do this
> to expose the fact that "set constraints all immediate" is unsafe.)

This sentence lacks the definition of what you mean by "safe", on which all 
hinges.

If "safe" means that you can use them to make sure that a certain condition is 
always
satisfied (like in a constraint), they are not safe.  But that is not the only 
use for
a trigger.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: Deferred constraint trigger semantics

2022-05-11 Thread Bryn Llewellyn
> david.g.johns...@gmail.com wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> Oops. I did a copy-and-paste error on going from my test env. to email and 
>> missed out the "deferral" that I'd intended. For completeness, here's the 
>> test that I meant:
>> 
>> create constraint trigger trg
>> after insert on t1
>> for each row
>> execute function trg_fn();
>> 
>> create constraint trigger trg
>> after insert on t2
>> initially deferred
>> for each row
>> execute function trg_fn();
>> 
>> It adds the "initially deferred" decoration to the "create constraint 
>> trigger" statement. This is (still) the result:
> 
> You only added it to the uninteresting trigger on t2.  It's the t1 trigger 
> where I'd expect the behavior to change.  I'm assuming your test does both 
> (not in a position to test it myself at the moment).

Damn. I'm horrified that, despite my best intentions, I still managed to do a 
typo. How embarrassing… With the correction in place, I now get this output:

INFO:  trg fired. new.v = 10, n = 8
INFO:  trg fired. new.v = 20, n = 8
INFO:  trg fired. new.v = 30, n = 8
INFO:  trg fired. new.v = 40, n = 8
INFO:  trg fired. new.v = 50, n = 8
INFO:  trg fired. new.v = 60, n = 8
INFO:  trg fired. new.v = 70, n = 8
INFO:  trg fired. new.v = 80, n = 8

This is exactly what you predicted. I'm delighted (finally) to see this outcome.

>> [What I wrote here was rubbish, given that my test code was not what I 
>> claimed it was.]
> 
> [David's response here is now moot.]
>  
>> With respect to « having to keep around a working set of what are the 
>> changed records » I think that the complexity that you envisaged is avoided 
>> by the (emergent) rule that an AFTER EACH STATEMENT trigger cannot see "old" 
>> and "new" values. In other words, all you can sensibly do in its function is 
>> ordinary SQL that sees the current state at the moment it fires.
>> 
>> To my surprise, it *is* legal to write code that accesses "old" and "new" 
>> values. But, because many rows can be affected by a single statement, and 
>> the trigger fires just once, the meanings of "old" and "new" are undefined. 
>> I've seen that, in any test that I do, both are always set to NULL (which 
>> seems reasonable).
> 
> I was thinking more about transition tables - though I admit it's not a 
> complete thought given their opt-in nature.

Ah… I hadn't considered transition tables. However, they don't seem to be 
valuable for a constraint trigger. So your concern could be removed at a stroke 
by adding a semantic rule to the account of the CREATE TRIGGER syntax 
specification thus:

« The SQL that creates a statement-level AFTER constraint trigger may not 
specify using transition tables. »

Might this device give me hope?

Finally, it seems that a not deferrable constraint trigger has no value—except 
in that using the keyword CONSTRAINT is a nice bit of self-documentation. Did I 
miss something? Is this trigger flavor valuable for a use-case that I haven't 
spotted?



Re: Deferred constraint trigger semantics

2022-05-11 Thread David G. Johnston
On Wed, May 11, 2022 at 3:43 PM Bryn Llewellyn  wrote:

>
> Oops. I did a copy-and-paste error on going from my test env. to email and
> missed out the "deferral" that I'd intended. For completeness, here's the
> test that I meant:
>
> create constraint trigger trg
> after insert on t1
> for each row
> execute function trg_fn();
>
> create constraint trigger trg
> after insert on t2
> initially deferred
> for each row
> execute function trg_fn();
>
> It adds the "initially deferred" decoration to the "create constraint
> trigger" statement. This is (still) the result:
>

You only added it to the uninteresting trigger on t2.  It's the t1 trigger
where I'd expect the behavior to change.  I'm assuming your test does both
(not in a position to test it myself at the moment).


> Even though both inserts have completed by commit time, only the trigger
> firing caused by the second statement sees the final state that obtains the
> moment before commit. The first statement sees only the state after it
> finishes and before the second statement executes. You said « I suspect
> [that both statements will see the final state] if you actually perform
> deferral ». My test shows that this is not the case.
>
> *Did I misunderstand you? Or does this result surprise you? If it does, do
> you think that this is a bug*?
>

It both surprises me and fails to surprise me at the same time.  I have no
opinion on whether the observed behavior constitutes a bug or not.  I'd
lean toward not - just maybe a chance to improve the documentation.


>
> With respect to « having to keep around a working set of what are the
> changed records » I think that the complexity that you envisaged is avoided
> by the (emergent) rule that an AFTER EACH STATEMENT trigger cannot see
> "old" and "new" values. In other words, all you can sensibly do in its
> function is ordinary SQL that sees the current state at the moment it fires.
>
> To my surprise, it *is* legal to write code that accesses "old" and "new"
> values. But, because many rows can be affected by a single statement, and
> the trigger fires just once, the meanings of "old" and "new" are undefined.
> I've seen that, in any test that I do, both are always set to NULL (which
> seems reasonable).
>

I was thinking more about transition tables - though I admit it's not a
complete thought given their opt-in nature.

David J.


Re: Deferred constraint trigger semantics

2022-05-11 Thread Bryn Llewellyn
> laurenz.a...@cybertec.at wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> …Internet Search turned up this 2019 post by Laurenz Albe—but nothing else 
>> at all.
>> 
>> https://www.cybertec-postgresql.com/en/triggers-to-enforce-constraints 
>> 
>> 
>> (This is why I CC'd you, Laurenz.)
> 
> So I guess I should answer.

Thanks for replying to my original post with this subject line, Laurenz. Sorry 
to bring up notions that you wrote about three years ago. I judged that, 
because I referred to those notions, it would be polite to copy you—especially 
because I interpreted (maybe incorrectly) what you had written.

> About the starting paragraph of your mail: Constraint triggers are a 
> syntactic leftover from the way that triggers are implemented in PostgreSQL. 
> There is different syntax now, but it was decided to leave constraint 
> triggers, since they may have some use.

If constraint triggers are, in the hands of an informed application programmer, 
to have some sound uses, then the semantics must be clearly defined. And you do 
say that they are supported. David Johnson argues that, as long as you read the 
right snippets from various parts of the doc and synthesize their joint 
meaning, then the semantics are defined. Yes, David, I can accept that—with a 
loud caveat about the answer to my (a) or (b) question below.

I re-read the penultimate paragraph in Laurenz's post:

«
By making the trigger INITIALLY DEFERRED, we tell PostgreSQL to check the 
condition at COMMIT time.
»

I have always understood that (in Postgres and any respectable RDBMS) commits 
in a multi-session environment are always strictly serialized—irrespective of 
the transaction's isolation level. Am I correct to assume this is the case for 
Postgres? I took "at COMMIT time" to mean "as part of the strictly serialized 
operations that implement a session's COMMIT". But I see (now) that you argue 
that this is not the case, thus:

«
This will reduce the window for the race condition a little, but the problem is 
still there. If concurrent transactions run the trigger function at the same 
time, they won’t see each other’s modifications.
»

I take what you say in your post to mean that each session executes its 
deferred constraint check (by extension, not just for constraint triggers but 
for all deferred constraint cases) momentarily *before* COMMIT so that the 
effect is only to reduce the duration of the race condition window rather than 
to eliminate it.

So it all depends on a lawyerly reading of the wording "at COMMIT time". The 
current CREATE TABLE doc says this:

«
If the constraint is INITIALLY DEFERRED, it is checked only at the end of the 
transaction.
»

The wording "at the end of the transaction" is not precise enough to 
adjudicate—and so the key question remains: Is a deferred constraint checked:

(a) as part of the strictly serialized operations that implement a session's 
COMMIT?

or

(b) momentarily *before* COMMIT and not within the serialized COMMIT execution?

So… (asking the wider audience) is the answer (a) or (b)? An if it's (b), why? 
After all, (b) brings the race condition risk. Is (a) simply not feasible?

> [Lots of ruminations and wandering throughts]
> 
> Sorry, that was too much for me to comment on - that would require a 
> mid-sized article.

Oh… I'm sorry to hear that I rambled and lost clarity. I find it impossible to 
say what I want to, striving for precision, without sacrificing brevity. I 
always find that I can improve my wording with successive reads. But life is 
short and I have, eventually, just to publish and be damned.

>> Is my entire concept (and Laurenz's too) fundamentally flawed? Specifically, 
>> is querying a trigger's base table in a "for each row" trigger fundamentally 
>> unsound and not supported?
> 
> My post claims that constraint triggers alone are *not* a sufficient solution 
> to validate constraints - you need additional locking or SERIALIZABLE 
> isolation to make that work reliably.

This doesn't seem to be what you wrote. These two headings

> Solving the problem with “optimistic locking” (which you explain means using 
> SERIALIZABLE)

and

> What about these “constraint triggers”?

read as if they are orthogonal schemes where the constraint trigger approach 
does not rely on SERIALIZABLE.

As I reason it, if you use the SERIALIZABLE approach, then an ordinary 
immediate AFTER EACH STATEMENT trigger will work fine—precisely because of how 
that isolation level is defined. So here, a deferred constraint trigger isn't 
needed and brings no value.

This implies that if a deferred constraint trigger is to have any utility, it 
must be safe to use it (as I tested it) at the READ COMMITTED level. I do see 
that, though I appear to be testing this, I cannot do a reliable test because I 
cannot, in application code, open up, and exploit, a race condition window 
after COMMIT has been issued. (I *am* able 

Re: Deferred constraint trigger semantics

2022-05-11 Thread Bryn Llewellyn
> david.g.johns...@gmail.com wrote:
> 
>> b...@yugabyte.com wrote:

Thanks for the point-by-point reply, David.

>> ...makes no mention of what you might expect to see in an AFTER EACH ROW 
>> trigger.
> 
> ...the absence of a comment declaring a guarantee of order means that, like 
> the comment for the row-level BEFORE trigger, the row-level AFTER row 
> ordering is not guaranteed (even if one cannot produce a counter-example in 
> today's codebase).

Got it!

>> ...unless anybody contradicts me.
> 
> Caveat emptor...? I wouldn't be surprised that doing so is technically 
> possible in all cases - as to whether a particular algorithm is sound, to 
> some extent, isn't something we try to predict. We do try to describe all the 
> known interactions though - and let the user work within what those mean for 
> them.

Got it again!

>> ...implies that there's no such unpredictability in the AFTER EACH ROW cases.
> 
> I would not read it that way. In general, absence of mention of 
> predictability like this means there is none - that some other sentence goes 
> into more detail doesn't change that.

OK.

>> But there has to be a subtle caveat here for the deferred constraint trigger 
>> when the txn changes two or more tables, all of which participate in the 
>> query that the trigger function issues… The "raise info" output below 
>> illustrates my point (n changes from 5 to 8).
> 
> I'm failing to see the deferral aspect of that example. First statement 
> finishes, sees the 5 inserts, next statement finishes, sees 3 more inserts. 
> Not, both statements finish, triggers fire, triggers see all 8 inserts (which 
> I suspect they will if you actually perform deferral).

Oops. I did a copy-and-paste error on going from my test env. to email and 
missed out the "deferral" that I'd intended. For completeness, here's the test 
that I meant:

create table t1(k serial primary key, v int not null);
create table t2(k serial primary key, v int not null);

create function trg_fn()
  returns trigger
  language plpgsql
as $body$
declare
  n int not null := 0;
begin
  n := (select count(*) from t1) + (select count(*) from t2);
  raise info 'trg fired. new.v = %, n = %', new.v, n;
  return new;
end;
$body$;

create constraint trigger trg
after insert on t1
for each row
execute function trg_fn();

create constraint trigger trg
after insert on t2
initially deferred
for each row
execute function trg_fn();

set default_transaction_isolation = 'read committed';
do $body$
begin
  insert into t1(v)
  values (10), (20), (30), (40), (50);

  insert into t2(v)
  values (60), (70), (80);
end;
$body$;

It adds the "initially deferred" decoration to the "create constraint trigger" 
statement. This is (still) the result:

INFO:  trg fired. new.v = 10, n = 5
INFO:  trg fired. new.v = 20, n = 5
INFO:  trg fired. new.v = 30, n = 5
INFO:  trg fired. new.v = 40, n = 5
INFO:  trg fired. new.v = 50, n = 5
INFO:  trg fired. new.v = 60, n = 8
INFO:  trg fired. new.v = 70, n = 8
INFO:  trg fired. new.v = 80, n = 8

Even though both inserts have completed by commit time, only the trigger firing 
caused by the second statement sees the final state that obtains the moment 
before commit. The first statement sees only the state after it finishes and 
before the second statement executes. You said « I suspect [that both 
statements will see the final state] if you actually perform deferral ». My 
test shows that this is not the case.

Did I misunderstand you? Or does this result surprise you? If it does, do you 
think that this is a bug?

> ...Your test case for the deferred constraint, that supposedly allows for the 
> insertion of invalid data per the specification of the constraint trigger, 
> isn't something I've worked through yet; and as written reads like a bug 
> report.

It's not a report of a PG bug. Rather, it shows how an application programmer 
might write a bug in *their* code. When "set constraints all immediate" is 
used, it opens a race condition window between its execution and the commit. 
I'd speculated on that earlier. So I felt that I should show a self-contained 
demo of this possible trap for the application programmer. As long as  "set 
constraints all immediate" is not used, the demo shows proper behavior. Of 
course, it moves the constraint violation error to commit time—and this means 
that PL/pgSQL code cannot handle it (as discussed at length elsewhere).

I appreciate that using  "set constraints all immediate" is useful in many 
scenarios to allow handling the violation error in PL/pgSQL. (Thanks again for 
that tip, David.) For example, and as I reason it, the famous "mandatory 
one-to-one relationship" use-case is safe when you use this technique. This 
use-case needs mutual FK constraints between the two tables. But you have to 
insert one of the two rows (call it "main") that are so related before the 
other (call it "extra"). And so the mutual FK constraints cannot be satisfied 
until both new rows are in place. 

Re: Deferred constraint trigger semantics

2022-05-11 Thread Laurenz Albe
On Tue, 2022-05-10 at 17:46 -0700, Bryn Llewellyn wrote:
> I looked at the sections "CREATE TRIGGER" and "Chapter 39. Triggers" in the 
> Current PG doc.
> But I failed to find any information about the semantics of the deferred 
> constraint trigger
> or about the use cases that motivated this feature. Nor could I find any code 
> examples.
> Internet Search turned up this 2019 post by Laurenz Albe's—but nothing else 
> at all.
> 
> https://www.cybertec-postgresql.com/en/triggers-to-enforce-constraints/
> 
> (This is why I CC'd you, Laurenz.)

So I guess I should answer.

About the starting paragraph of your mail: Constraint triggers are a syntactic 
leftover
from the way that triggers are implemented in PostgreSQL.  There is different 
syntax now,
but it was decided to leave constraint triggers, since they may have some use.

> [Lots of ruminations and wandering throughts]

Sorry, that was too much for me to comment on - that would require a mid-sized
article.

> Is my entire concept (and Laurenz's too) fundamentally flawed? Specifically, 
> is
> querying a trigger's base table in a "for each row" trigger fundamentally 
> unsound
> and not supported? (In Oracle Database, it causes the notorious "mutating 
> table"
> runtime error.)

My post claims that constraint triggers alone are *not* a sufficient solution to
validate constraints - you need additional locking or SERIALIZABLE isolation to
make that work reliably.

That does not mean that using constraint triggers is unsound or unsupported,
and the fact that Oracle's implementation of transaction isolation is somewhat
shoddy has little impact on that.

Yours,
Laurenz Albe




Re: Deferred constraint trigger semantics

2022-05-10 Thread David G. Johnston
On Tue, May 10, 2022 at 7:52 PM Bryn Llewellyn  wrote:

>
> *david.g.johns...@gmail.com  wrote:*
>
> * However, SQL commands executed in a row-level BEFORE trigger will see
> the effects of data changes for rows previously processed in the same outer
> command. This requires caution, since the ordering of these change events
> is not in general predictable; an SQL command that affects multiple rows
> can visit the rows in any order.
> »
>
> Strangely, the wording here, explicit as it is, makes no mention of what
> you might expect to see in an AFTER EACH ROW trigger.
>

Because the previous, quoted, paragraph did that.  Because of the absence
of a comment declaring a guarantee of order means that, like the comment
for the row-level BEFORE trigger, the row-level AFTER row ordering is not
guaranteed (even if one cannot produce a counter-example in today's
codebase).

I'll take this to be the case unless anybody contradicts me.
>

Caveat emptor...?  I wouldn't be surprised that doing so is technically
possible in all cases - as to whether a particular algorithm is sound, to
some extent, isn't something we try to predict.  We do try to describe all
the known interactions though - and let the user work within what those
mean for them.


> The mention of unpredictable results in the third bullet in the BEFORE
> case implies that there's no such unpredictability in the AFTER EACH ROW
> cases.
>

I would not read it that way.  In general, absence of mention of
predictability like this means there is none - that some other sentence
goes into more detail doesn't change that.

But there has to be a subtle caveat here for the deferred constraint
> trigger when the txn changes two or more tables, all of which participate
> in the query that the trigger function issues. I'll assume that you (all)
> know what I mean. The "raise info" output below illustrates my point (n
> changes from 5 to 8).
>

I'm failing to see the deferral aspect of that example.  First statement
finishes, sees the 5 inserts, next statement finishes, sees 3 more
inserts.  Not, both statements finish, triggers fire, triggers see all 8
inserts (which I suspect they will if you actually perform deferral).


> the actual order in which I see the "raise info" output is unpredictable
> while the values shown in each *are* predictable. Can I rely on this rule?
>

Absent a deferred constraint I would say yes.  Your test case for the
deferred constraint, that supposedly allows for the insertion of invalid
data per the specification of the constraint trigger, isn't something I've
worked through yet; and as written reads like a bug report.


> Is there a fundamental reason why a deferred AFTER EACH STATEMENT
> constraint trigger is not allowed? Nothing in what is explained in the
> "Overview of Trigger Behavior" and "Visibility of Data Changes" sections
> lets me see why the present restriction is needed.
>
>
I imagine having to keep around a working set of what are the changed
records is both memory intensive and also problematic should a future
statement make yet more changes to the table.  This is also an area that
the SQL Standard does make rules in.  And given that constraints are
defined per-row everywhere else there is a pull to not push the envelope of
our extension too far.

David J.


Re: Deferred constraint trigger semantics

2022-05-10 Thread Bryn Llewellyn
> david.g.johns...@gmail.com wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> INFO:  trg fired. new.v = 80, n = 8
>> 
>> It shows the expected "new" value each time it fires. And yet the query 
>> reflects the table content on statement completion. This seems to be very 
>> strange.
> 
> From the documentation:
> 
> "Row-level BEFORE triggers fire immediately before a particular row is 
> operated on, while row-level AFTER triggers fire at the end of the statement 
> (but before any statement-level AFTER triggers)."
> 
> https://www.postgresql.org/docs/current/trigger-definition.html

Thanks, David. Those sentences were too deeply buried, in a page with no 
subsections, for me to spot by ordinary searching in the page. The term 
"Row-level BEFORE" trigger subverted my search for "BEFORE EACH ROW" 
trigger—which is the term that I'm used to. This is another lesson for me to 
read every word in what looks like relevant doc, from start to finish, like a 
book. I'm chastened.

The next section, "39.2. Visibility of Data Changes"
https://www.postgresql.org/docs/current/trigger-datachanges.html

says this:

«
* Statement-level triggers follow simple visibility rules: none of the changes 
made by a statement are visible to statement-level BEFORE triggers, whereas all 
modifications are visible to statement-level AFTER triggers.

* The data change (insertion, update, or deletion) causing the trigger to fire 
is naturally not visible to SQL commands executed in a row-level BEFORE 
trigger, because it hasn't happened yet.

* However, SQL commands executed in a row-level BEFORE trigger will see the 
effects of data changes for rows previously processed in the same outer 
command. This requires caution, since the ordering of these change events is 
not in general predictable; an SQL command that affects multiple rows can visit 
the rows in any order.
»

Strangely, the wording here, explicit as it is, makes no mention of what you 
might expect to see in an AFTER EACH ROW trigger. It's a bit of a stretch to 
put the sentences from the previous section that you quoted together with these 
three bullets to conclude this: querying the trigger's base-table's content 
*is* allowed from the trigger's function for all of the five timing points: 
BEFORE and AFTER EACH ROW (not deferred), BEFORE and AFTER EACH STATEMENT  (not 
deferred), and AFTER EACH ROW (deferred to commit time) is indeed supported. 
I'll take this to be the case unless anybody contradicts me.

The mention of unpredictable results in the third bullet in the BEFORE case 
implies that there's no such unpredictability in the AFTER EACH ROW cases. But 
there has to be a subtle caveat here for the deferred constraint trigger when 
the txn changes two or more tables, all of which participate in the query that 
the trigger function issues. I'll assume that you (all) know what I mean. The 
"raise info" output below illustrates my point (n changes from 5 to 8). But 
this seems to be sound inference from the rules that were stated. I'll take 
this, too, to be the case unless anybody contradicts me.

I assume, though, that considering this output that I showed in my original 
mail:

INFO:  trg fired. new.v = 10, n = 5
INFO:  trg fired. new.v = 20, n = 5
INFO:  trg fired. new.v = 30, n = 5
INFO:  trg fired. new.v = 40, n = 5
INFO:  trg fired. new.v = 50, n = 5
INFO:  trg fired. new.v = 60, n = 8
INFO:  trg fired. new.v = 70, n = 8
INFO:  trg fired. new.v = 80, n = 8

the actual order in which I see the "raise info" output is unpredictable while 
the values shown in each *are* predictable. Can I rely on this rule?

Is there a fundamental reason why a deferred AFTER EACH STATEMENT constraint 
trigger is not allowed? Nothing in what is explained in the "Overview of 
Trigger Behavior" and "Visibility of Data Changes" sections lets me see why the 
present restriction is needed.



Re: Deferred constraint trigger semantics

2022-05-10 Thread David G. Johnston
On Tue, May 10, 2022 at 5:47 PM Bryn Llewellyn  wrote:

>
> INFO:  trg fired. new.v = 80, n = 8
>
> It shows the expected "new" value each time it fires. And yet the query
> reflects the table content on statement completion. This seems to be very
> strange.
>

>From the documentation:

"Row-level BEFORE triggers fire immediately before a particular row is
operated on, while row-level AFTER triggers fire at the end of the
statement (but before any statement-level AFTER triggers)."

https://www.postgresql.org/docs/current/trigger-definition.html

David J.