Re: adding wait_start column to pg_locks

Fujii Masao Tue, 09 Feb 2021 17:44:02 -0800


On 2021/02/09 23:31, torikoshia wrote:

On 2021-02-09 22:54, Fujii Masao wrote:

On 2021/02/09 19:11, Fujii Masao wrote:



On 2021/02/09 18:13, Fujii Masao wrote:



On 2021/02/09 17:48, torikoshia wrote:

On 2021-02-05 18:49, Fujii Masao wrote:

On 2021/02/05 0:03, torikoshia wrote:

On 2021-02-03 11:23, Fujii Masao wrote:

64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating 
"waitStart" without holding the partition lock? Also GetLockStatusData() needs spinlock 
when reading "waitStart"?


Also it might be worth thinking to use 64-bit atomic operations like
pg_atomic_read_u64(), for that.


Thanks for your suggestion and advice!

In the attached patch I used pg_atomic_read_u64() and pg_atomic_write_u64().

waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx and 
pg_atomic_write_xxx only supports unsigned int, so I cast the type.

I may be using these functions not correctly, so if something is wrong, I would 
appreciate any comments.


About the documentation, since your suggestion seems better than v6, I used it 
as is.


Thanks for updating the patch!

+    if (pg_atomic_read_u64(&MyProc->waitStart) == 0)
+        pg_atomic_write_u64(&MyProc->waitStart,
+                            pg_atomic_read_u64((pg_atomic_uint64 *) &now));

pg_atomic_read_u64() is really necessary? I think that
"pg_atomic_write_u64(&MyProc->waitStart, now)" is enough.

+        deadlockStart = get_timeout_start_time(DEADLOCK_TIMEOUT);
+        pg_atomic_write_u64(&MyProc->waitStart,
+                    pg_atomic_read_u64((pg_atomic_uint64 *) &deadlockStart));

Same as above.

+        /*
+         * Record waitStart reusing the deadlock timeout timer.
+         *
+         * It would be ideal this can be synchronously done with updating
+         * lock information. Howerver, since it gives performance impacts
+         * to hold partitionLock longer time, we do it here asynchronously.
+         */

IMO it's better to comment why we reuse the deadlock timeout timer.

     proc->waitStatus = waitStatus;
+    pg_atomic_init_u64(&MyProc->waitStart, 0);

pg_atomic_write_u64() should be used instead? Because waitStart can be
accessed concurrently there.

I updated the patch and addressed the above review comments. Patch attached.
Barring any objection, I will commit this version.


Thanks for modifying the patch!
I agree with your comments.

BTW, I ran pgbench several times before and after applying
this patch.

The environment is virtual machine(CentOS 8), so this is
just for reference, but there were no significant difference
in latency or tps(both are below 1%).


Thanks for the test! I pushed the patch.


But I reverted the patch because buildfarm members rorqual and
prion don't like the patch. I'm trying to investigate the cause
of this failures.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual&dt=2021-02-09%2009%3A20%3A10


-    relation     | locktype |        mode
------------------+----------+---------------------
- test_prepared_1 | relation | RowExclusiveLock
- test_prepared_1 | relation | AccessExclusiveLock
-(2 rows)
-
+ERROR:  invalid spinlock number: 0

"rorqual" reported that the above error happened in the server built with
--disable-atomics --disable-spinlocks when reading pg_locks after
the transaction was prepared. The cause of this issue is that "waitStart"
atomic variable in the dummy proc created at the end of prepare transaction
was not initialized. I updated the patch so that pg_atomic_init_u64() is
called for the "waitStart" in the dummy proc for prepared transaction.
Patch attached. I confirmed that the patched server built with
--disable-atomics --disable-spinlocks passed all the regression tests.


Thanks for fixing the bug, I also tested v9.patch configured with
--disable-atomics --disable-spinlocks on my environment and confirmed
that all tests have passed.


Thanks for the test!

I found another bug in the patch. InitProcess() initializes "waitStart",
but previously InitAuxiliaryProcess() did not. This could cause "invalid
spinlock number" error when reading pg_locks in the standby server.
I fixed that. Attached is the updated version of the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

diff --git a/contrib/amcheck/expected/check_btree.out 
b/contrib/amcheck/expected/check_btree.out
index 13848b7449..5a3f1ef737 100644
--- a/contrib/amcheck/expected/check_btree.out
+++ b/contrib/amcheck/expected/check_btree.out
@@ -97,8 +97,8 @@ SELECT bt_index_parent_check('bttest_b_idx');
 SELECT * FROM pg_locks
 WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 
'bttest_b_idx']::regclass[])
     AND pid = pg_backend_pid();
- locktype | database | relation | page | tuple | virtualxid | transactionid | 
classid | objid | objsubid | virtualtransaction | pid | mode | granted | 
fastpath 
-----------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-----+------+---------+----------
+ locktype | database | relation | page | tuple | virtualxid | transactionid | 
classid | objid | objsubid | virtualtransaction | pid | mode | granted | 
fastpath | waitstart 
+----------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-----+------+---------+----------+-----------
 (0 rows)
 
 COMMIT;
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index ea222c0464..9085d2e64c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -10604,6 +10604,19 @@ SCRAM-SHA-256$<replaceable>&lt;iteration 
count&gt;</replaceable>:<replaceable>&l
        lock table
       </para></entry>
      </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>waitstart</structfield> <type>timestamptz</type>
+      </para>
+      <para>
+       Time when the server process started waiting for this lock,
+       or null if the lock is held.
+       Note that this can be null for a very short period of time after
+       the wait started even though <structfield>granted</structfield>
+       is <literal>false</literal>.
+      </para></entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
diff --git a/src/backend/access/transam/twophase.c 
b/src/backend/access/transam/twophase.c
index fc18b77832..70d22577ce 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -873,8 +873,15 @@ PGPROC *
 TwoPhaseGetDummyProc(TransactionId xid, bool lock_held)
 {
        GlobalTransaction gxact = TwoPhaseGetGXact(xid, lock_held);
+       PGPROC     *dummy = &ProcGlobal->allProcs[gxact->pgprocno];
 
-       return &ProcGlobal->allProcs[gxact->pgprocno];
+       /*
+        * Initialize atomic variable in dummy proc so that GetLockStatusData()
+        * can read it later.
+        */
+       pg_atomic_init_u64(&dummy->waitStart, 0);
+
+       return dummy;
 }
 
 /************************************************************************/
diff --git a/src/backend/storage/ipc/standby.c 
b/src/backend/storage/ipc/standby.c
index 39a30c00f7..5877a60715 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -540,12 +540,34 @@ void
 ResolveRecoveryConflictWithLock(LOCKTAG locktag, bool logging_conflict)
 {
        TimestampTz ltime;
+       TimestampTz now;
 
        Assert(InHotStandby);
 
        ltime = GetStandbyLimitTime();
+       now = GetCurrentTimestamp();
 
-       if (GetCurrentTimestamp() >= ltime && ltime != 0)
+       /*
+        * Update waitStart if first time through after the startup process
+        * started waiting for the lock. It should not be updated every time
+        * ResolveRecoveryConflictWithLock() is called during the wait.
+        *
+        * Use the current time obtained for comparison with ltime as waitStart
+        * (i.e., the time when this process started waiting for the lock). 
Since
+        * getting the current time newly can cause overhead, we reuse the
+        * already-obtained time to avoid that overhead.
+        *
+        * Note that waitStart is updated without holding the lock table's
+        * partition lock, to avoid the overhead by additional lock acquisition.
+        * This can cause "waitstart" in pg_locks to become NULL for a very 
short
+        * period of time after the wait started even though "granted" is false.
+        * This is OK in practice because we can assume that users are likely to
+        * look at "waitstart" when waiting for the lock for a long time.
+        */
+       if (pg_atomic_read_u64(&MyProc->waitStart) == 0)
+               pg_atomic_write_u64(&MyProc->waitStart, now);
+
+       if (now >= ltime && ltime != 0)
        {
                /*
                 * We're already behind, so clear a path as quickly as possible.
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 79c1cf9b8b..108b4d9023 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -3619,6 +3619,12 @@ GetLockStatusData(void)
                        instance->leaderPid = proc->pid;
                        instance->fastpath = true;
 
+                       /*
+                        * Successfully taking fast path lock means there were 
no
+                        * conflicting locks.
+                        */
+                       instance->waitStart = 0;
+
                        el++;
                }
 
@@ -3646,6 +3652,7 @@ GetLockStatusData(void)
                        instance->pid = proc->pid;
                        instance->leaderPid = proc->pid;
                        instance->fastpath = true;
+                       instance->waitStart = 0;
 
                        el++;
                }
@@ -3698,6 +3705,7 @@ GetLockStatusData(void)
                instance->pid = proc->pid;
                instance->leaderPid = proclock->groupLeader->pid;
                instance->fastpath = false;
+               instance->waitStart = (TimestampTz) 
pg_atomic_read_u64(&proc->waitStart);
 
                el++;
        }
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index c87ffc6549..0884909a22 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -402,6 +402,7 @@ InitProcess(void)
        MyProc->lwWaitMode = 0;
        MyProc->waitLock = NULL;
        MyProc->waitProcLock = NULL;
+       pg_atomic_init_u64(&MyProc->waitStart, 0);
 #ifdef USE_ASSERT_CHECKING
        {
                int                     i;
@@ -580,6 +581,7 @@ InitAuxiliaryProcess(void)
        MyProc->lwWaitMode = 0;
        MyProc->waitLock = NULL;
        MyProc->waitProcLock = NULL;
+       pg_atomic_init_u64(&MyProc->waitStart, 0);
 #ifdef USE_ASSERT_CHECKING
        {
                int                     i;
@@ -1262,6 +1264,23 @@ ProcSleep(LOCALLOCK *locallock, LockMethod 
lockMethodTable)
                }
                else
                        enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout);
+
+               /*
+                * Use the current time obtained for the deadlock timeout timer 
as
+                * waitStart (i.e., the time when this process started waiting 
for the
+                * lock). Since getting the current time newly can cause 
overhead, we
+                * reuse the already-obtained time to avoid that overhead.
+                *
+                * Note that waitStart is updated without holding the lock 
table's
+                * partition lock, to avoid the overhead by additional lock
+                * acquisition. This can cause "waitstart" in pg_locks to 
become NULL
+                * for a very short period of time after the wait started even 
though
+                * "granted" is false. This is OK in practice because we can 
assume
+                * that users are likely to look at "waitstart" when waiting 
for the
+                * lock for a long time.
+                */
+               pg_atomic_write_u64(&MyProc->waitStart,
+                                                       
get_timeout_start_time(DEADLOCK_TIMEOUT));
        }
        else if (log_recovery_conflict_waits)
        {
@@ -1678,6 +1697,7 @@ ProcWakeup(PGPROC *proc, ProcWaitStatus waitStatus)
        proc->waitLock = NULL;
        proc->waitProcLock = NULL;
        proc->waitStatus = waitStatus;
+       pg_atomic_write_u64(&MyProc->waitStart, 0);
 
        /* And awaken it */
        SetLatch(&proc->procLatch);
diff --git a/src/backend/utils/adt/lockfuncs.c 
b/src/backend/utils/adt/lockfuncs.c
index b1cf5b79a7..97f0265c12 100644
--- a/src/backend/utils/adt/lockfuncs.c
+++ b/src/backend/utils/adt/lockfuncs.c
@@ -63,7 +63,7 @@ typedef struct
 } PG_Lock_Status;
 
 /* Number of columns in pg_locks output */
-#define NUM_LOCK_STATUS_COLUMNS                15
+#define NUM_LOCK_STATUS_COLUMNS                16
 
 /*
  * VXIDGetDatum - Construct a text representation of a VXID
@@ -142,6 +142,8 @@ pg_lock_status(PG_FUNCTION_ARGS)
                                                   BOOLOID, -1, 0);
                TupleDescInitEntry(tupdesc, (AttrNumber) 15, "fastpath",
                                                   BOOLOID, -1, 0);
+               TupleDescInitEntry(tupdesc, (AttrNumber) 16, "waitstart",
+                                                  TIMESTAMPTZOID, -1, 0);
 
                funcctx->tuple_desc = BlessTupleDesc(tupdesc);
 
@@ -336,6 +338,10 @@ pg_lock_status(PG_FUNCTION_ARGS)
                values[12] = 
CStringGetTextDatum(GetLockmodeName(instance->locktag.locktag_lockmethodid, 
mode));
                values[13] = BoolGetDatum(granted);
                values[14] = BoolGetDatum(instance->fastpath);
+               if (!granted && instance->waitStart != 0)
+                       values[15] = TimestampTzGetDatum(instance->waitStart);
+               else
+                       nulls[15] = true;
 
                tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
                result = HeapTupleGetDatum(tuple);
@@ -406,6 +412,7 @@ pg_lock_status(PG_FUNCTION_ARGS)
                values[12] = CStringGetTextDatum("SIReadLock");
                values[13] = BoolGetDatum(true);
                values[14] = BoolGetDatum(false);
+               nulls[15] = true;
 
                tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
                result = HeapTupleGetDatum(tuple);
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 638830aaac..c1a5dd4ee7 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -53,6 +53,6 @@
  */
 
 /*                                                     yyyymmddN */
-#define CATALOG_VERSION_NO     202102021
+#define CATALOG_VERSION_NO     202102091
 
 #endif
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 4e0c9be58c..1487710d59 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5994,9 +5994,9 @@
 { oid => '1371', descr => 'view system lock information',
   proname => 'pg_lock_status', prorows => '1000', proretset => 't',
   provolatile => 'v', prorettype => 'record', proargtypes => '',
-  proallargtypes => 
'{text,oid,oid,int4,int2,text,xid,oid,oid,int2,text,int4,text,bool,bool}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => 
'{locktype,database,relation,page,tuple,virtualxid,transactionid,classid,objid,objsubid,virtualtransaction,pid,mode,granted,fastpath}',
+  proallargtypes => 
'{text,oid,oid,int4,int2,text,xid,oid,oid,int2,text,int4,text,bool,bool,timestamptz}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => 
'{locktype,database,relation,page,tuple,virtualxid,transactionid,classid,objid,objsubid,virtualtransaction,pid,mode,granted,fastpath,waitstart}',
   prosrc => 'pg_lock_status' },
 { oid => '2561',
   descr => 'get array of PIDs of sessions blocking specified backend PID from 
acquiring a heavyweight lock',
diff --git a/src/include/storage/lock.h b/src/include/storage/lock.h
index 68a3487d49..9b2a421c32 100644
--- a/src/include/storage/lock.h
+++ b/src/include/storage/lock.h
@@ -22,6 +22,7 @@
 #include "storage/lockdefs.h"
 #include "storage/lwlock.h"
 #include "storage/shmem.h"
+#include "utils/timestamp.h"
 
 /* struct PGPROC is declared in proc.h, but must forward-reference it */
 typedef struct PGPROC PGPROC;
@@ -446,6 +447,8 @@ typedef struct LockInstanceData
        LOCKMODE        waitLockMode;   /* lock awaited by this PGPROC, if any 
*/
        BackendId       backend;                /* backend ID of this PGPROC */
        LocalTransactionId lxid;        /* local transaction ID of this PGPROC 
*/
+       TimestampTz waitStart;          /* time at which this PGPROC started 
waiting
+                                                                * for lock */
        int                     pid;                    /* pid of this PGPROC */
        int                     leaderPid;              /* pid of group leader; 
= pid if no group */
        bool            fastpath;               /* taken via fastpath? */
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 683ab64f76..a777cb64a1 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -181,6 +181,8 @@ struct PGPROC
        LOCKMODE        waitLockMode;   /* type of lock we're waiting for */
        LOCKMASK        heldLocks;              /* bitmask for lock types 
already held on this
                                                                 * lock object 
by this backend */
+       pg_atomic_uint64 waitStart; /* time at which wait for lock acquisition
+                                                                * started */
 
        bool            delayChkpt;             /* true if this proc delays 
checkpoint start */
 
diff --git a/src/test/regress/expected/rules.out 
b/src/test/regress/expected/rules.out
index b632d9f2ea..10a1f34ebc 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1394,8 +1394,9 @@ pg_locks| SELECT l.locktype,
     l.pid,
     l.mode,
     l.granted,
-    l.fastpath
-   FROM pg_lock_status() l(locktype, database, relation, page, tuple, 
virtualxid, transactionid, classid, objid, objsubid, virtualtransaction, pid, 
mode, granted, fastpath);
+    l.fastpath,
+    l.waitstart
+   FROM pg_lock_status() l(locktype, database, relation, page, tuple, 
virtualxid, transactionid, classid, objid, objsubid, virtualtransaction, pid, 
mode, granted, fastpath, waitstart);
 pg_matviews| SELECT n.nspname AS schemaname,
     c.relname AS matviewname,
     pg_get_userbyid(c.relowner) AS matviewowner,

Re: adding wait_start column to pg_locks

Reply via email to