subject:"Too coarse predicate locks granularity for B\+ tree indexes"

Re: Too coarse predicate locks granularity for B+ tree indexes

2023-02-07 Thread Thomas Munro

On Wed, Feb 8, 2023 at 10:44 AM Thomas Munro  wrote:
> On Wed, Feb 8, 2023 at 5:25 AM Andrey Borodin  wrote:
> > On Tue, Feb 7, 2023 at 4:01 AM Rinat Shigapov  
> > wrote:
> > > Thomas, thank you for the details!
> > >
> > > Have you kept the branch that you used to generate the patch? Which 
> > > commit should the patch apply to?
> >
> > You can try something like
> > git checkout 'master@{2018-05-13 13:37:00}'
> > to get a commit by date from rev-parse.
>
> I don't have time to work on this currently but if Rinat or others
> want to look into it... maybe I should rebase that experiment on top
> of current master.  Here's the branch:
>
> https://github.com/macdice/postgres/tree/ssi-index-locking-refinements

Erm, I guess I should also post the rebased patches here, for the
mailing list archives.
From 299e2dfb54ffabe45d73e3296856132169a7419b Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Thu, 10 Jun 2021 23:35:16 +1200
Subject: [PATCH v5 1/2] Use tuple-level SIREAD locks for index-only scans.

When index-only scans manage to avoid fetching heap tuples,
previously we'd predicate-lock the whole heap page (since commit
cdf91edb).  Commits c01262a8 and 6f38d4da made it possible to lock the
tuple instead with only a TID, to match the behavior of regular index
scans and avoid some false conflicts.

Discussion: https://postgr.es/m/CAEepm%3D2GK3FVdnt5V3d%2Bh9njWipCv_fNL%3DwjxyUhzsF%3D0PcbNg%40mail.gmail.com
---
 src/backend/executor/nodeIndexonlyscan.c  | 12 --
 .../isolation/expected/index-only-scan-2.out  | 19 +
 .../isolation/expected/index-only-scan-3.out  | 20 ++
 src/test/isolation/isolation_schedule |  2 +
 .../isolation/specs/index-only-scan-2.spec| 39 +++
 .../isolation/specs/index-only-scan-3.spec| 33 
 6 files changed, 121 insertions(+), 4 deletions(-)
 create mode 100644 src/test/isolation/expected/index-only-scan-2.out
 create mode 100644 src/test/isolation/expected/index-only-scan-3.out
 create mode 100644 src/test/isolation/specs/index-only-scan-2.spec
 create mode 100644 src/test/isolation/specs/index-only-scan-3.spec

diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 0b43a9b969..237f17b07a 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -239,12 +239,16 @@ IndexOnlyNext(IndexOnlyScanState *node)
 
 		/*
 		 * If we didn't access the heap, then we'll need to take a predicate
-		 * lock explicitly, as if we had.  For now we do that at page level.
+		 * lock explicitly, as if we had.  We don't have the inserter's xid,
+		 * but that's only used by PredicateLockTID to check if it matches the
+		 * caller's top level xid, and it can't possibly have been inserted by
+		 * us or the page wouldn't be all visible.
 		 */
 		if (!tuple_from_heap)
-			PredicateLockPage(scandesc->heapRelation,
-			  ItemPointerGetBlockNumber(tid),
-			  estate->es_snapshot);
+			PredicateLockTID(scandesc->heapRelation,
+			 tid,
+			 estate->es_snapshot,
+			 InvalidTransactionId);
 
 		return slot;
 	}
diff --git a/src/test/isolation/expected/index-only-scan-2.out b/src/test/isolation/expected/index-only-scan-2.out
new file mode 100644
index 00..c541eaeb43
--- /dev/null
+++ b/src/test/isolation/expected/index-only-scan-2.out
@@ -0,0 +1,19 @@
+Parsed test spec with 2 sessions
+
+starting permutation: r1 r2 w1 w2 c1 c2
+step r1: SELECT id1 FROM account WHERE id1 = 1;
+id1
+---
+  1
+(1 row)
+
+step r2: SELECT id2 FROM account WHERE id2 = 2;
+id2
+---
+  2
+(1 row)
+
+step w1: UPDATE account SET balance = 200 WHERE id1 = 1;
+step w2: UPDATE account SET balance = 200 WHERE id2 = 2;
+step c1: COMMIT;
+step c2: COMMIT;
diff --git a/src/test/isolation/expected/index-only-scan-3.out b/src/test/isolation/expected/index-only-scan-3.out
new file mode 100644
index 00..0a7c5955d2
--- /dev/null
+++ b/src/test/isolation/expected/index-only-scan-3.out
@@ -0,0 +1,20 @@
+Parsed test spec with 2 sessions
+
+starting permutation: r1 r2 w1 w2 c1 c2
+step r1: SELECT id1 FROM account WHERE id1 = 2;
+id1
+---
+  2
+(1 row)
+
+step r2: SELECT id2 FROM account WHERE id2 = 1;
+id2
+---
+  1
+(1 row)
+
+step w1: UPDATE account SET balance = 200 WHERE id1 = 1;
+step w2: UPDATE account SET balance = 200 WHERE id2 = 2;
+step c1: COMMIT;
+step c2: COMMIT;
+ERROR:  could not serialize access due to read/write dependencies among transactions
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index c11dc9a420..e1574ae9f7 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -17,6 +17,8 @@ test: partial-index
 test: two-ids
 test: multiple-row-versions
 test: index-only-scan
+test: index-only-scan-2
+test: index-only-scan-3
 test: predicate-lock-hot-tuple
 test: update-conflict-out
 test: deadlock-simple
diff --git

Re: Too coarse predicate locks granularity for B+ tree indexes

2023-02-07 Thread Thomas Munro

On Wed, Feb 8, 2023 at 5:25 AM Andrey Borodin  wrote:
> On Tue, Feb 7, 2023 at 4:01 AM Rinat Shigapov  wrote:
> > Thomas, thank you for the details!
> >
> > Have you kept the branch that you used to generate the patch? Which commit 
> > should the patch apply to?
>
> You can try something like
> git checkout 'master@{2018-05-13 13:37:00}'
> to get a commit by date from rev-parse.

I don't have time to work on this currently but if Rinat or others
want to look into it... maybe I should rebase that experiment on top
of current master.  Here's the branch:

https://github.com/macdice/postgres/tree/ssi-index-locking-refinements

Re: Too coarse predicate locks granularity for B+ tree indexes

2023-02-07 Thread Andrey Borodin

On Tue, Feb 7, 2023 at 4:01 AM Rinat Shigapov  wrote:
>
> Thomas, thank you for the details!
>
> Have you kept the branch that you used to generate the patch? Which commit 
> should the patch apply to?
>

You can try something like
git checkout 'master@{2018-05-13 13:37:00}'
to get a commit by date from rev-parse.

Best regards, Andrey Borodin.

Re: Too coarse predicate locks granularity for B+ tree indexes

2023-02-07 Thread Rinat Shigapov

Thomas, thank you for the details!

Have you kept the branch that you used to generate the patch? Which commit
should the patch apply to?

With kindest regards, Rinat Shigapov


вт, 7 февр. 2023 г. в 17:11, Thomas Munro :

> On Tue, Feb 7, 2023 at 11:24 PM Rinat Shigapov 
> wrote:
> > Does the current PostgreSQL release support B+ tree index predicate
> locks more granular then page-level locks?
>
> No.  I tried to follow some breadcrumbs left by Kevin and Dan that
> should allow unique index scans that find a match to skip the btree
> page lock, though, and p-lock just the heap tuple.  If you like
> half-baked experimental code, see the v4-0002 patch in this thread,
> where I took some shortcuts (jamming stuff that should be in the
> planner down into the executor) for a proof-of-concept:
>
>
> https://www.postgresql.org/message-id/flat/CAEepm%3D2GK3FVdnt5V3d%2Bh9njWipCv_fNL%3DwjxyUhzsF%3D0PcbNg%40mail.gmail.com
>
> With that approach, if it *doesn't* find a match, then you're back to
> having to p-lock the whole index page to represent the "gap", so that
> you can conflict with anyone who tries to insert a matching value
> later.  I believe the next-key approach would allow for finer grained
> gap-locks (haven't studied that myself), but that's a secondary
> problem; the primary problem (it seems to me) is getting rid of index
> locks completely in the (common?) case that you have a qualifying
> match.
>

Re: Too coarse predicate locks granularity for B+ tree indexes

2023-02-07 Thread Thomas Munro

On Tue, Feb 7, 2023 at 11:24 PM Rinat Shigapov  wrote:
> Does the current PostgreSQL release support B+ tree index predicate locks 
> more granular then page-level locks?

No.  I tried to follow some breadcrumbs left by Kevin and Dan that
should allow unique index scans that find a match to skip the btree
page lock, though, and p-lock just the heap tuple.  If you like
half-baked experimental code, see the v4-0002 patch in this thread,
where I took some shortcuts (jamming stuff that should be in the
planner down into the executor) for a proof-of-concept:

https://www.postgresql.org/message-id/flat/CAEepm%3D2GK3FVdnt5V3d%2Bh9njWipCv_fNL%3DwjxyUhzsF%3D0PcbNg%40mail.gmail.com

With that approach, if it *doesn't* find a match, then you're back to
having to p-lock the whole index page to represent the "gap", so that
you can conflict with anyone who tries to insert a matching value
later.  I believe the next-key approach would allow for finer grained
gap-locks (haven't studied that myself), but that's a secondary
problem; the primary problem (it seems to me) is getting rid of index
locks completely in the (common?) case that you have a qualifying
match.

Too coarse predicate locks granularity for B+ tree indexes

2023-02-07 Thread Rinat Shigapov

Hi,

TLDR: this email describes a serialization failure that happens (as I
understand it) due to too coarse predicate locks granularity for primary
key index.

I have a concurrent testsuite that runs 14 test cases. Each test case
operates on a disjoint set of records, doesn't retry transactions and is
run under 'serializable' isolation level. The test data is small and likely
fits within a single tuple page.

When I finished the test suite I was surprised that PostgreSQL 14.5 returns
serialization failure on every test suite run. I was even more surprised
when I tested the suite against the current CockroachDB and didn't get
serialization failures. Actually I was able to reproduce RETRY_SERIALIZABLE
errors a couple of times on CockroachDB but it required me to run the test
suite in a loop for more than a half hour.

I started to investigate the test behavior with PostgreSQL with more
simplified and shrinked code and found a serialization failure of two
concurrent `update_user` operations.

The test defines the following `Users` table:

CREATE TABLE Users (
> id UUID,
> title VARCHAR(255),
> first_name VARCHAR(40),
> last_name VARCHAR(80) NOT NULL,
> email VARCHAR(255) NOT NULL,
> lower_email VARCHAR(255) GENERATED ALWAYS AS (lower(email)) STORED,
> marketing_optin BOOLEAN,
> mobile_phone VARCHAR(50),
> phone VARCHAR(50),
> phone_ext VARCHAR(40),
> is_contact BOOLEAN DEFAULT false NOT NULL,
> unlinked_link_ids UUID[],


> CONSTRAINT unique_user_email UNIQUE(lower_email),
> PRIMARY KEY (id)
> );


Concurrent `update_user` operation run the UPDATE query to change user
email to a unique value

UPDATE Users
> SET
> title = CASE WHEN false= true THEN 'foo' ELSE title END,
> first_name = CASE WHEN false= true THEN 'foo' ELSE first_name END,
> last_name = CASE WHEN false= true THEN 'foo' ELSE last_name END,
> email = CASE WHEN true = true THEN 'email2' ELSE email END,
> marketing_optin = CASE WHEN false = true THEN true ELSE
> marketing_optin END,
> mobile_phone = CASE WHEN false = true THEN 'foo' ELSE mobile_phone END,
> phone = CASE WHEN false = true THEN 'foo' ELSE phone END,
> phone_ext = CASE WHEN false = true THEN 'foo' ELSE phone_ext END
> WHERE id = '018629fd-7b28-743c-8647-b6321c166d46';
>

I use the following helper view to monitor locks:

> CREATE VIEW locks_v AS
> SELECT pid,
> virtualtransaction,
>locktype,
>CASE locktype
>  WHEN 'relation' THEN relation::regclass::text
>  WHEN 'virtualxid' THEN virtualxid::text
>  WHEN 'transactionid' THEN transactionid::text
>  WHEN 'tuple' THEN
> relation::regclass::text||':'||page::text||':'||tuple::text
>  WHEN 'page' THEN relation::regclass::text||':'||page::text
>END AS lockid,
>mode,
>granted
> FROM pg_locks;


 When the test Users table has only a few records the query uses a
sequential scan the serialization failure is reproducible without inserting
sleeps before `update_user` transaction commit.

This is caused by relation level predicate locks on Users table:

> select * from locks_v;
>  pid  | virtualtransaction |   locktype|  lockid   |
> mode   | granted
>
> --++---+---+--+-
>  3676 | 5/2444 | relation  | unique_user_email |
> RowExclusiveLock | t
>  3676 | 5/2444 | relation  | users_pkey|
> RowExclusiveLock | t
>  3676 | 5/2444 | relation  | users |
> RowExclusiveLock | t
>  3676 | 5/2444 | virtualxid| 5/2444|
> ExclusiveLock| t
>  3737 | 4/13470| relation  | pg_locks  |
> AccessShareLock  | t
>  3737 | 4/13470| relation  | locks_v   |
> AccessShareLock  | t
>  3737 | 4/13470| virtualxid| 4/13470   |
> ExclusiveLock| t
>  3669 | 3/17334| relation  | unique_user_email |
> RowExclusiveLock | t
>  3669 | 3/17334| relation  | users_pkey|
> RowExclusiveLock | t
>  3669 | 3/17334| relation  | users |
> RowExclusiveLock | t
>  3669 | 3/17334| virtualxid| 3/17334   |
> ExclusiveLock| t
>  3676 | 5/2444 | transactionid | 6571  |
> ExclusiveLock| t
>  3669 | 3/17334| transactionid | 6570  |
> ExclusiveLock| t
>  3676 | 5/2444 | relation  | users |
> SIReadLock   | t
>  3669 | 3/17334| relation  | users |
> SIReadLock   | t
> (15 rows)
>

If I add ballast data to Users table (1000 records) the cost optimizer
switches to index scan and it's hard to reproduce the issue for two
concurrent `update_user` operations without sleeps. After adding long
sleeps after UPDATE query and before commit I could see page-level

Re: Too coarse predicate locks granularity for B+ tree indexes

Re: Too coarse predicate locks granularity for B+ tree indexes

Re: Too coarse predicate locks granularity for B+ tree indexes

Re: Too coarse predicate locks granularity for B+ tree indexes

Re: Too coarse predicate locks granularity for B+ tree indexes

Too coarse predicate locks granularity for B+ tree indexes

6 matches

Site Navigation

Mail list logo

Footer information