Hi, While testing "[16a0039dc] Reduce lock level for ALTER DOMAIN ... VALIDATE CONSTRAINT", I found that it doesn't handle concurrent DML properly.
Here is a repro:
1. In session 1, set up and use an advisory lock to block a concurrent insert
from session 2:
```
evantest=# create domain d as int;
CREATE DOMAIN
evantest=# create table t (a d);
CREATE TABLE
evantest=# select pg_advisory_lock(8888);
pg_advisory_lock
------------------
(1 row)
```
2. In session 2, run a concurrent insert
```
evantest=# with wait as materialized (select pg_advisory_lock(8888)) insert
into t select (-1)::d from wait;
-- block here
```
3. In session 3, alter the domain to add and validate a check constraint
```
evantest=# alter domain d add constraint d_c check (value > 0) not valid;
ALTER DOMAIN
evantest=# alter domain d validate constraint d_c;
ALTER DOMAIN
evantest=# select convalidated from pg_constraint where conname = 'd_c';
convalidated
--------------
t
(1 row)
```
4. In session 1, unlock to let session’s insert continue
```
evantest=# select pg_advisory_unlock(8888);
pg_advisory_unlock
--------------------
t
(1 row)
```
5. Check the result in any session
```
evantest=# select convalidated from pg_constraint where conname = 'd_c';
convalidated
--------------
t
(1 row)
evantest=# select * from t;
a
----
-1
(1 row)
```
As we can see, the constraint is validated, but “-1", which violates the
constraint, has been inserted.
The commit message for 16a0039dc says table constraints have the same behavior:
```
Now we should still be able to perform DML operations on table t while
the domain constraint is being validated. The equivalent works
already on table constraints.
```
I don't think that is true for already-running DML. ALTER TABLE ADD CONSTRAINT
acquires AccessExclusiveLock, so it waits for a concurrent INSERT on the target
table. Here is a similar test with ALTER TABLE:
1. In session 1:
```
evantest=# create table t1 (a int);
CREATE TABLE
evantest=# select pg_advisory_lock(6666);
pg_advisory_lock
------------------
(1 row)
```
2. In session 2
```
evantest=# with wait as materialized (select pg_advisory_lock(6666)) insert
into t1 select -1 from wait;
-- block here
```
3. In session 3
```
evantest=# alter table t1 add constraint t_c check (a>0) not valid;
-- block here
```
4. In session 4, unlock
```
evantest=# select pg_advisory_unlock(6666);
pg_advisory_unlock
--------------------
t
(1 row)
```
5. ADD constraint succeeded in session 3:
```
ALTER TABLE
```
6. INSERT also succeeded in session 2:
```
INSERT 0 1
```
7. Then, VALIDATE CONSTRAINT failed, and the constraint’s convalidated remains
false:
```
evantest=# alter table t1 validate constraint t_c;
ERROR: check constraint "t_c" of relation "t1" is violated by some row
evantest=# select convalidated from pg_constraint where conname = 't_c';
convalidated
--------------
f
(1 row)
```
So the key difference is that ALTER TABLE ADD CONSTRAINT waits for
already-running DML on the target table before adding the NOT VALID constraint,
but ALTER DOMAIN
ADD CONSTRAINT does not wait for already-running DML on dependent tables.
The only fix I can see is to use ShareLock again while validating the domain
constraint. There is a follow up commit, a99c6b56ffa, and that change looks
sound, so maybe we should update the current code instead of reverting
16a0039dc.
With ShareLock restored, rerunning the repro makes session 2's INSERT succeed,
but session 3's ALTER DOMAIN VALIDATE CONSTRAINT fails and convalidated remains
false. Now the behavior matches the ALTER TABLE case.
See the attached patch for details. I also added an isolation test that follows
the repro above.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
v1-0001-Fix-ALTER-DOMAIN-VALIDATE-CONSTRAINT-locking.patch
Description: Binary data
