Hi,

TLDR: this email describes a serialization failure that happens (as I
understand it) due to too coarse predicate locks granularity for primary
key index.

I have a concurrent testsuite that runs 14 test cases. Each test case
operates on a disjoint set of records, doesn't retry transactions and is
run under 'serializable' isolation level. The test data is small and likely
fits within a single tuple page.

When I finished the test suite I was surprised that PostgreSQL 14.5 returns
serialization failure on every test suite run. I was even more surprised
when I tested the suite against the current CockroachDB and didn't get
serialization failures. Actually I was able to reproduce RETRY_SERIALIZABLE
errors a couple of times on CockroachDB but it required me to run the test
suite in a loop for more than a half hour.

I started to investigate the test behavior with PostgreSQL with more
simplified and shrinked code and found a serialization failure of two
concurrent `update_user` operations.

The test defines the following `Users` table:

CREATE TABLE Users (
>     id UUID,
>     title VARCHAR(255),
>     first_name VARCHAR(40),
>     last_name VARCHAR(80) NOT NULL,
>     email VARCHAR(255) NOT NULL,
>     lower_email VARCHAR(255) GENERATED ALWAYS AS (lower(email)) STORED,
>     marketing_optin BOOLEAN,
>     mobile_phone VARCHAR(50),
>     phone VARCHAR(50),
>     phone_ext VARCHAR(40),
>     is_contact BOOLEAN DEFAULT false NOT NULL,
>     unlinked_link_ids UUID[],


>     CONSTRAINT unique_user_email UNIQUE(lower_email),
>     PRIMARY KEY (id)
> );


Concurrent `update_user` operation run the UPDATE query to change user
email to a unique value

UPDATE Users
> SET
>     title = CASE WHEN false= true THEN 'foo' ELSE title END,
>     first_name = CASE WHEN false= true THEN 'foo' ELSE first_name END,
>     last_name = CASE WHEN false= true THEN 'foo' ELSE last_name END,
>     email = CASE WHEN true = true THEN 'email2' ELSE email END,
>     marketing_optin = CASE WHEN false = true THEN true ELSE
> marketing_optin END,
>     mobile_phone = CASE WHEN false = true THEN 'foo' ELSE mobile_phone END,
>     phone = CASE WHEN false = true THEN 'foo' ELSE phone END,
>     phone_ext = CASE WHEN false = true THEN 'foo' ELSE phone_ext END
> WHERE id = '018629fd-7b28-743c-8647-b6321c166d46';
>

I use the following helper view to monitor locks:

> CREATE VIEW locks_v AS
> SELECT pid,
>         virtualtransaction,
>        locktype,
>        CASE locktype
>          WHEN 'relation' THEN relation::regclass::text
>          WHEN 'virtualxid' THEN virtualxid::text
>          WHEN 'transactionid' THEN transactionid::text
>          WHEN 'tuple' THEN
> relation::regclass::text||':'||page::text||':'||tuple::text
>          WHEN 'page' THEN relation::regclass::text||':'||page::text
>        END AS lockid,
>        mode,
>        granted
> FROM pg_locks;


 When the test Users table has only a few records the query uses a
sequential scan the serialization failure is reproducible without inserting
sleeps before `update_user` transaction commit.

This is caused by relation level predicate locks on Users table:

> select * from locks_v;
>  pid  | virtualtransaction |   locktype    |      lockid       |
> mode       | granted
>
> ------+--------------------+---------------+-------------------+------------------+---------
>  3676 | 5/2444             | relation      | unique_user_email |
> RowExclusiveLock | t
>  3676 | 5/2444             | relation      | users_pkey        |
> RowExclusiveLock | t
>  3676 | 5/2444             | relation      | users             |
> RowExclusiveLock | t
>  3676 | 5/2444             | virtualxid    | 5/2444            |
> ExclusiveLock    | t
>  3737 | 4/13470            | relation      | pg_locks          |
> AccessShareLock  | t
>  3737 | 4/13470            | relation      | locks_v           |
> AccessShareLock  | t
>  3737 | 4/13470            | virtualxid    | 4/13470           |
> ExclusiveLock    | t
>  3669 | 3/17334            | relation      | unique_user_email |
> RowExclusiveLock | t
>  3669 | 3/17334            | relation      | users_pkey        |
> RowExclusiveLock | t
>  3669 | 3/17334            | relation      | users             |
> RowExclusiveLock | t
>  3669 | 3/17334            | virtualxid    | 3/17334           |
> ExclusiveLock    | t
>  3676 | 5/2444             | transactionid | 6571              |
> ExclusiveLock    | t
>  3669 | 3/17334            | transactionid | 6570              |
> ExclusiveLock    | t
>  3676 | 5/2444             | relation      | users             |
> SIReadLock       | t
>  3669 | 3/17334            | relation      | users             |
> SIReadLock       | t
> (15 rows)
>

If I add ballast data to Users table (1000 records) the cost optimizer
switches to index scan and it's hard to reproduce the issue for two
concurrent `update_user` operations without sleeps. After adding long
sleeps after UPDATE query and before commit I could see page-level
predicates locks for the primary key index users_pkey:

select * from locks_v;
>  pid | virtualtransaction |   locktype    |      lockid       |       mode
>       | granted
>
> -----+--------------------+---------------+-------------------+------------------+---------
>  371 | 6/523              | relation      | unique_user_email |
> RowExclusiveLock | t
>  371 | 6/523              | relation      | users_pkey        |
> RowExclusiveLock | t
>  371 | 6/523              | relation      | users             |
> RowExclusiveLock | t
>  371 | 6/523              | virtualxid    | 6/523             |
> ExclusiveLock    | t
>  381 | 14/215             | relation      | unique_user_email |
> RowExclusiveLock | t
>  381 | 14/215             | relation      | users_pkey        |
> RowExclusiveLock | t
>  381 | 14/215             | relation      | users             |
> RowExclusiveLock | t
>  381 | 14/215             | virtualxid    | 14/215            |
> ExclusiveLock    | t
>  350 | 4/885              | relation      | pg_locks          |
> AccessShareLock  | t
>  350 | 4/885              | relation      | locks_v           |
> AccessShareLock  | t
>  350 | 4/885              | virtualxid    | 4/885             |
> ExclusiveLock    | t
>  371 | 6/523              | transactionid | 1439              |
> ExclusiveLock    | t
>  381 | 14/215             | transactionid | 1431              |
> ExclusiveLock    | t
>  381 | 14/215             | page          | users_pkey:5      | SIReadLock
>       | t
>  371 | 6/523              | page          | users_pkey:5      | SIReadLock
>       | t
> (15 rows)
>

With sleeps the serialization failure is reproduced on each run.

I started to read more about SSI implementation in PostgreSQL. The article
https://arxiv.org/pdf/1208.4179.pdf mentions that

> Currently, locks on B+-tree indexes are acquired at page granularity; we
> intend to refine this to next-key locking [16] in a future release.
>
[16] C. Mohan. ARIES/KVL: A key-value locking method for concurrency
> control of multiaction transactions operating on B-tree indexes. In VLDB,
> pages 392–405, 1990.


My question follows:

Does the current PostgreSQL release support B+ tree index predicate locks
more granular then page-level locks?

With kindest regards, Rinat Shigapov

Reply via email to