Re: Too coarse predicate locks granularity for B+ tree indexes
On Wed, Feb 8, 2023 at 10:44 AM Thomas Munro wrote: > On Wed, Feb 8, 2023 at 5:25 AM Andrey Borodin wrote: > > On Tue, Feb 7, 2023 at 4:01 AM Rinat Shigapov > > wrote: > > > Thomas, thank you for the details! > > > > > > Have you kept the branch that you used to generate the patch? Which > > > commit should the patch apply to? > > > > You can try something like > > git checkout 'master@{2018-05-13 13:37:00}' > > to get a commit by date from rev-parse. > > I don't have time to work on this currently but if Rinat or others > want to look into it... maybe I should rebase that experiment on top > of current master. Here's the branch: > > https://github.com/macdice/postgres/tree/ssi-index-locking-refinements Erm, I guess I should also post the rebased patches here, for the mailing list archives. From 299e2dfb54ffabe45d73e3296856132169a7419b Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Thu, 10 Jun 2021 23:35:16 +1200 Subject: [PATCH v5 1/2] Use tuple-level SIREAD locks for index-only scans. When index-only scans manage to avoid fetching heap tuples, previously we'd predicate-lock the whole heap page (since commit cdf91edb). Commits c01262a8 and 6f38d4da made it possible to lock the tuple instead with only a TID, to match the behavior of regular index scans and avoid some false conflicts. Discussion: https://postgr.es/m/CAEepm%3D2GK3FVdnt5V3d%2Bh9njWipCv_fNL%3DwjxyUhzsF%3D0PcbNg%40mail.gmail.com --- src/backend/executor/nodeIndexonlyscan.c | 12 -- .../isolation/expected/index-only-scan-2.out | 19 + .../isolation/expected/index-only-scan-3.out | 20 ++ src/test/isolation/isolation_schedule | 2 + .../isolation/specs/index-only-scan-2.spec| 39 +++ .../isolation/specs/index-only-scan-3.spec| 33 6 files changed, 121 insertions(+), 4 deletions(-) create mode 100644 src/test/isolation/expected/index-only-scan-2.out create mode 100644 src/test/isolation/expected/index-only-scan-3.out create mode 100644 src/test/isolation/specs/index-only-scan-2.spec create mode 100644 src/test/isolation/specs/index-only-scan-3.spec diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c index 0b43a9b969..237f17b07a 100644 --- a/src/backend/executor/nodeIndexonlyscan.c +++ b/src/backend/executor/nodeIndexonlyscan.c @@ -239,12 +239,16 @@ IndexOnlyNext(IndexOnlyScanState *node) /* * If we didn't access the heap, then we'll need to take a predicate - * lock explicitly, as if we had. For now we do that at page level. + * lock explicitly, as if we had. We don't have the inserter's xid, + * but that's only used by PredicateLockTID to check if it matches the + * caller's top level xid, and it can't possibly have been inserted by + * us or the page wouldn't be all visible. */ if (!tuple_from_heap) - PredicateLockPage(scandesc->heapRelation, - ItemPointerGetBlockNumber(tid), - estate->es_snapshot); + PredicateLockTID(scandesc->heapRelation, + tid, + estate->es_snapshot, + InvalidTransactionId); return slot; } diff --git a/src/test/isolation/expected/index-only-scan-2.out b/src/test/isolation/expected/index-only-scan-2.out new file mode 100644 index 00..c541eaeb43 --- /dev/null +++ b/src/test/isolation/expected/index-only-scan-2.out @@ -0,0 +1,19 @@ +Parsed test spec with 2 sessions + +starting permutation: r1 r2 w1 w2 c1 c2 +step r1: SELECT id1 FROM account WHERE id1 = 1; +id1 +--- + 1 +(1 row) + +step r2: SELECT id2 FROM account WHERE id2 = 2; +id2 +--- + 2 +(1 row) + +step w1: UPDATE account SET balance = 200 WHERE id1 = 1; +step w2: UPDATE account SET balance = 200 WHERE id2 = 2; +step c1: COMMIT; +step c2: COMMIT; diff --git a/src/test/isolation/expected/index-only-scan-3.out b/src/test/isolation/expected/index-only-scan-3.out new file mode 100644 index 00..0a7c5955d2 --- /dev/null +++ b/src/test/isolation/expected/index-only-scan-3.out @@ -0,0 +1,20 @@ +Parsed test spec with 2 sessions + +starting permutation: r1 r2 w1 w2 c1 c2 +step r1: SELECT id1 FROM account WHERE id1 = 2; +id1 +--- + 2 +(1 row) + +step r2: SELECT id2 FROM account WHERE id2 = 1; +id2 +--- + 1 +(1 row) + +step w1: UPDATE account SET balance = 200 WHERE id1 = 1; +step w2: UPDATE account SET balance = 200 WHERE id2 = 2; +step c1: COMMIT; +step c2: COMMIT; +ERROR: could not serialize access due to read/write dependencies among transactions diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule index c11dc9a420..e1574ae9f7 100644 --- a/src/test/isolation/isolation_schedule +++ b/src/test/isolation/isolation_schedule @@ -17,6 +17,8 @@ test: partial-index test: two-ids test: multiple-row-versions test: index-only-scan +test: index-only-scan-2 +test: index-only-scan-3 test: predicate-lock-hot-tuple test: update-conflict-out test: deadlock-simple diff --git
Re: Too coarse predicate locks granularity for B+ tree indexes
On Wed, Feb 8, 2023 at 5:25 AM Andrey Borodin wrote: > On Tue, Feb 7, 2023 at 4:01 AM Rinat Shigapov wrote: > > Thomas, thank you for the details! > > > > Have you kept the branch that you used to generate the patch? Which commit > > should the patch apply to? > > You can try something like > git checkout 'master@{2018-05-13 13:37:00}' > to get a commit by date from rev-parse. I don't have time to work on this currently but if Rinat or others want to look into it... maybe I should rebase that experiment on top of current master. Here's the branch: https://github.com/macdice/postgres/tree/ssi-index-locking-refinements
Re: Too coarse predicate locks granularity for B+ tree indexes
On Tue, Feb 7, 2023 at 4:01 AM Rinat Shigapov wrote: > > Thomas, thank you for the details! > > Have you kept the branch that you used to generate the patch? Which commit > should the patch apply to? > You can try something like git checkout 'master@{2018-05-13 13:37:00}' to get a commit by date from rev-parse. Best regards, Andrey Borodin.
Re: Too coarse predicate locks granularity for B+ tree indexes
Thomas, thank you for the details! Have you kept the branch that you used to generate the patch? Which commit should the patch apply to? With kindest regards, Rinat Shigapov вт, 7 февр. 2023 г. в 17:11, Thomas Munro : > On Tue, Feb 7, 2023 at 11:24 PM Rinat Shigapov > wrote: > > Does the current PostgreSQL release support B+ tree index predicate > locks more granular then page-level locks? > > No. I tried to follow some breadcrumbs left by Kevin and Dan that > should allow unique index scans that find a match to skip the btree > page lock, though, and p-lock just the heap tuple. If you like > half-baked experimental code, see the v4-0002 patch in this thread, > where I took some shortcuts (jamming stuff that should be in the > planner down into the executor) for a proof-of-concept: > > > https://www.postgresql.org/message-id/flat/CAEepm%3D2GK3FVdnt5V3d%2Bh9njWipCv_fNL%3DwjxyUhzsF%3D0PcbNg%40mail.gmail.com > > With that approach, if it *doesn't* find a match, then you're back to > having to p-lock the whole index page to represent the "gap", so that > you can conflict with anyone who tries to insert a matching value > later. I believe the next-key approach would allow for finer grained > gap-locks (haven't studied that myself), but that's a secondary > problem; the primary problem (it seems to me) is getting rid of index > locks completely in the (common?) case that you have a qualifying > match. >
Re: Too coarse predicate locks granularity for B+ tree indexes
On Tue, Feb 7, 2023 at 11:24 PM Rinat Shigapov wrote: > Does the current PostgreSQL release support B+ tree index predicate locks > more granular then page-level locks? No. I tried to follow some breadcrumbs left by Kevin and Dan that should allow unique index scans that find a match to skip the btree page lock, though, and p-lock just the heap tuple. If you like half-baked experimental code, see the v4-0002 patch in this thread, where I took some shortcuts (jamming stuff that should be in the planner down into the executor) for a proof-of-concept: https://www.postgresql.org/message-id/flat/CAEepm%3D2GK3FVdnt5V3d%2Bh9njWipCv_fNL%3DwjxyUhzsF%3D0PcbNg%40mail.gmail.com With that approach, if it *doesn't* find a match, then you're back to having to p-lock the whole index page to represent the "gap", so that you can conflict with anyone who tries to insert a matching value later. I believe the next-key approach would allow for finer grained gap-locks (haven't studied that myself), but that's a secondary problem; the primary problem (it seems to me) is getting rid of index locks completely in the (common?) case that you have a qualifying match.
Too coarse predicate locks granularity for B+ tree indexes
Hi, TLDR: this email describes a serialization failure that happens (as I understand it) due to too coarse predicate locks granularity for primary key index. I have a concurrent testsuite that runs 14 test cases. Each test case operates on a disjoint set of records, doesn't retry transactions and is run under 'serializable' isolation level. The test data is small and likely fits within a single tuple page. When I finished the test suite I was surprised that PostgreSQL 14.5 returns serialization failure on every test suite run. I was even more surprised when I tested the suite against the current CockroachDB and didn't get serialization failures. Actually I was able to reproduce RETRY_SERIALIZABLE errors a couple of times on CockroachDB but it required me to run the test suite in a loop for more than a half hour. I started to investigate the test behavior with PostgreSQL with more simplified and shrinked code and found a serialization failure of two concurrent `update_user` operations. The test defines the following `Users` table: CREATE TABLE Users ( > id UUID, > title VARCHAR(255), > first_name VARCHAR(40), > last_name VARCHAR(80) NOT NULL, > email VARCHAR(255) NOT NULL, > lower_email VARCHAR(255) GENERATED ALWAYS AS (lower(email)) STORED, > marketing_optin BOOLEAN, > mobile_phone VARCHAR(50), > phone VARCHAR(50), > phone_ext VARCHAR(40), > is_contact BOOLEAN DEFAULT false NOT NULL, > unlinked_link_ids UUID[], > CONSTRAINT unique_user_email UNIQUE(lower_email), > PRIMARY KEY (id) > ); Concurrent `update_user` operation run the UPDATE query to change user email to a unique value UPDATE Users > SET > title = CASE WHEN false= true THEN 'foo' ELSE title END, > first_name = CASE WHEN false= true THEN 'foo' ELSE first_name END, > last_name = CASE WHEN false= true THEN 'foo' ELSE last_name END, > email = CASE WHEN true = true THEN 'email2' ELSE email END, > marketing_optin = CASE WHEN false = true THEN true ELSE > marketing_optin END, > mobile_phone = CASE WHEN false = true THEN 'foo' ELSE mobile_phone END, > phone = CASE WHEN false = true THEN 'foo' ELSE phone END, > phone_ext = CASE WHEN false = true THEN 'foo' ELSE phone_ext END > WHERE id = '018629fd-7b28-743c-8647-b6321c166d46'; > I use the following helper view to monitor locks: > CREATE VIEW locks_v AS > SELECT pid, > virtualtransaction, >locktype, >CASE locktype > WHEN 'relation' THEN relation::regclass::text > WHEN 'virtualxid' THEN virtualxid::text > WHEN 'transactionid' THEN transactionid::text > WHEN 'tuple' THEN > relation::regclass::text||':'||page::text||':'||tuple::text > WHEN 'page' THEN relation::regclass::text||':'||page::text >END AS lockid, >mode, >granted > FROM pg_locks; When the test Users table has only a few records the query uses a sequential scan the serialization failure is reproducible without inserting sleeps before `update_user` transaction commit. This is caused by relation level predicate locks on Users table: > select * from locks_v; > pid | virtualtransaction | locktype| lockid | > mode | granted > > --++---+---+--+- > 3676 | 5/2444 | relation | unique_user_email | > RowExclusiveLock | t > 3676 | 5/2444 | relation | users_pkey| > RowExclusiveLock | t > 3676 | 5/2444 | relation | users | > RowExclusiveLock | t > 3676 | 5/2444 | virtualxid| 5/2444| > ExclusiveLock| t > 3737 | 4/13470| relation | pg_locks | > AccessShareLock | t > 3737 | 4/13470| relation | locks_v | > AccessShareLock | t > 3737 | 4/13470| virtualxid| 4/13470 | > ExclusiveLock| t > 3669 | 3/17334| relation | unique_user_email | > RowExclusiveLock | t > 3669 | 3/17334| relation | users_pkey| > RowExclusiveLock | t > 3669 | 3/17334| relation | users | > RowExclusiveLock | t > 3669 | 3/17334| virtualxid| 3/17334 | > ExclusiveLock| t > 3676 | 5/2444 | transactionid | 6571 | > ExclusiveLock| t > 3669 | 3/17334| transactionid | 6570 | > ExclusiveLock| t > 3676 | 5/2444 | relation | users | > SIReadLock | t > 3669 | 3/17334| relation | users | > SIReadLock | t > (15 rows) > If I add ballast data to Users table (1000 records) the cost optimizer switches to index scan and it's hard to reproduce the issue for two concurrent `update_user` operations without sleeps. After adding long sleeps after UPDATE query and before commit I could see page-level