Re: adding wait_start column to pg_locks
On 2021/02/18 16:26, torikoshia wrote: On 2021-02-16 16:59, Fujii Masao wrote: On 2021/02/15 15:17, Fujii Masao wrote: On 2021/02/10 10:43, Fujii Masao wrote: On 2021/02/09 23:31, torikoshia wrote: On 2021-02-09 22:54, Fujii Masao wrote: On 2021/02/09 19:11, Fujii Masao wrote: On 2021/02/09 18:13, Fujii Masao wrote: On 2021/02/09 17:48, torikoshia wrote: On 2021-02-05 18:49, Fujii Masao wrote: On 2021/02/05 0:03, torikoshia wrote: On 2021-02-03 11:23, Fujii Masao wrote: 64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating "waitStart" without holding the partition lock? Also GetLockStatusData() needs spinlock when reading "waitStart"? Also it might be worth thinking to use 64-bit atomic operations like pg_atomic_read_u64(), for that. Thanks for your suggestion and advice! In the attached patch I used pg_atomic_read_u64() and pg_atomic_write_u64(). waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx and pg_atomic_write_xxx only supports unsigned int, so I cast the type. I may be using these functions not correctly, so if something is wrong, I would appreciate any comments. About the documentation, since your suggestion seems better than v6, I used it as is. Thanks for updating the patch! + if (pg_atomic_read_u64(>waitStart) == 0) + pg_atomic_write_u64(>waitStart, + pg_atomic_read_u64((pg_atomic_uint64 *) )); pg_atomic_read_u64() is really necessary? I think that "pg_atomic_write_u64(>waitStart, now)" is enough. + deadlockStart = get_timeout_start_time(DEADLOCK_TIMEOUT); + pg_atomic_write_u64(>waitStart, + pg_atomic_read_u64((pg_atomic_uint64 *) )); Same as above. + /* + * Record waitStart reusing the deadlock timeout timer. + * + * It would be ideal this can be synchronously done with updating + * lock information. Howerver, since it gives performance impacts + * to hold partitionLock longer time, we do it here asynchronously. + */ IMO it's better to comment why we reuse the deadlock timeout timer. proc->waitStatus = waitStatus; + pg_atomic_init_u64(>waitStart, 0); pg_atomic_write_u64() should be used instead? Because waitStart can be accessed concurrently there. I updated the patch and addressed the above review comments. Patch attached. Barring any objection, I will commit this version. Thanks for modifying the patch! I agree with your comments. BTW, I ran pgbench several times before and after applying this patch. The environment is virtual machine(CentOS 8), so this is just for reference, but there were no significant difference in latency or tps(both are below 1%). Thanks for the test! I pushed the patch. But I reverted the patch because buildfarm members rorqual and prion don't like the patch. I'm trying to investigate the cause of this failures. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual=2021-02-09%2009%3A20%3A10 - relation | locktype | mode --+--+- - test_prepared_1 | relation | RowExclusiveLock - test_prepared_1 | relation | AccessExclusiveLock -(2 rows) - +ERROR: invalid spinlock number: 0 "rorqual" reported that the above error happened in the server built with --disable-atomics --disable-spinlocks when reading pg_locks after the transaction was prepared. The cause of this issue is that "waitStart" atomic variable in the dummy proc created at the end of prepare transaction was not initialized. I updated the patch so that pg_atomic_init_u64() is called for the "waitStart" in the dummy proc for prepared transaction. Patch attached. I confirmed that the patched server built with --disable-atomics --disable-spinlocks passed all the regression tests. Thanks for fixing the bug, I also tested v9.patch configured with --disable-atomics --disable-spinlocks on my environment and confirmed that all tests have passed. Thanks for the test! I found another bug in the patch. InitProcess() initializes "waitStart", but previously InitAuxiliaryProcess() did not. This could cause "invalid spinlock number" error when reading pg_locks in the standby server. I fixed that. Attached is the updated version of the patch. I pushed this version. Thanks! While reading the patch again, I found two minor things. 1. As discussed in another thread [1], the atomic variable "waitStart" should be initialized at the postmaster startup rather than the startup of each child process. I changed "waitStart" so that it's initialized in InitProcGlobal() and also reset to 0 by using pg_atomic_write_u64() in InitProcess() and InitAuxiliaryProcess(). 2. Thanks to the above change, InitProcGlobal() initializes "waitStart" even in PGPROC entries for prepare transactions. But those entries are zeroed in MarkAsPreparingGuts(), so "waitStart" needs to be initialized again. Currently TwoPhaseGetDummyProc() initializes "waitStart"
Re: adding wait_start column to pg_locks
On 2021-02-16 16:59, Fujii Masao wrote: On 2021/02/15 15:17, Fujii Masao wrote: On 2021/02/10 10:43, Fujii Masao wrote: On 2021/02/09 23:31, torikoshia wrote: On 2021-02-09 22:54, Fujii Masao wrote: On 2021/02/09 19:11, Fujii Masao wrote: On 2021/02/09 18:13, Fujii Masao wrote: On 2021/02/09 17:48, torikoshia wrote: On 2021-02-05 18:49, Fujii Masao wrote: On 2021/02/05 0:03, torikoshia wrote: On 2021-02-03 11:23, Fujii Masao wrote: 64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating "waitStart" without holding the partition lock? Also GetLockStatusData() needs spinlock when reading "waitStart"? Also it might be worth thinking to use 64-bit atomic operations like pg_atomic_read_u64(), for that. Thanks for your suggestion and advice! In the attached patch I used pg_atomic_read_u64() and pg_atomic_write_u64(). waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx and pg_atomic_write_xxx only supports unsigned int, so I cast the type. I may be using these functions not correctly, so if something is wrong, I would appreciate any comments. About the documentation, since your suggestion seems better than v6, I used it as is. Thanks for updating the patch! + if (pg_atomic_read_u64(>waitStart) == 0) + pg_atomic_write_u64(>waitStart, + pg_atomic_read_u64((pg_atomic_uint64 *) )); pg_atomic_read_u64() is really necessary? I think that "pg_atomic_write_u64(>waitStart, now)" is enough. + deadlockStart = get_timeout_start_time(DEADLOCK_TIMEOUT); + pg_atomic_write_u64(>waitStart, + pg_atomic_read_u64((pg_atomic_uint64 *) )); Same as above. + /* + * Record waitStart reusing the deadlock timeout timer. + * + * It would be ideal this can be synchronously done with updating + * lock information. Howerver, since it gives performance impacts + * to hold partitionLock longer time, we do it here asynchronously. + */ IMO it's better to comment why we reuse the deadlock timeout timer. proc->waitStatus = waitStatus; + pg_atomic_init_u64(>waitStart, 0); pg_atomic_write_u64() should be used instead? Because waitStart can be accessed concurrently there. I updated the patch and addressed the above review comments. Patch attached. Barring any objection, I will commit this version. Thanks for modifying the patch! I agree with your comments. BTW, I ran pgbench several times before and after applying this patch. The environment is virtual machine(CentOS 8), so this is just for reference, but there were no significant difference in latency or tps(both are below 1%). Thanks for the test! I pushed the patch. But I reverted the patch because buildfarm members rorqual and prion don't like the patch. I'm trying to investigate the cause of this failures. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual=2021-02-09%2009%3A20%3A10 - relation | locktype | mode --+--+- - test_prepared_1 | relation | RowExclusiveLock - test_prepared_1 | relation | AccessExclusiveLock -(2 rows) - +ERROR: invalid spinlock number: 0 "rorqual" reported that the above error happened in the server built with --disable-atomics --disable-spinlocks when reading pg_locks after the transaction was prepared. The cause of this issue is that "waitStart" atomic variable in the dummy proc created at the end of prepare transaction was not initialized. I updated the patch so that pg_atomic_init_u64() is called for the "waitStart" in the dummy proc for prepared transaction. Patch attached. I confirmed that the patched server built with --disable-atomics --disable-spinlocks passed all the regression tests. Thanks for fixing the bug, I also tested v9.patch configured with --disable-atomics --disable-spinlocks on my environment and confirmed that all tests have passed. Thanks for the test! I found another bug in the patch. InitProcess() initializes "waitStart", but previously InitAuxiliaryProcess() did not. This could cause "invalid spinlock number" error when reading pg_locks in the standby server. I fixed that. Attached is the updated version of the patch. I pushed this version. Thanks! While reading the patch again, I found two minor things. 1. As discussed in another thread [1], the atomic variable "waitStart" should be initialized at the postmaster startup rather than the startup of each child process. I changed "waitStart" so that it's initialized in InitProcGlobal() and also reset to 0 by using pg_atomic_write_u64() in InitProcess() and InitAuxiliaryProcess(). 2. Thanks to the above change, InitProcGlobal() initializes "waitStart" even in PGPROC entries for prepare transactions. But those entries are zeroed in MarkAsPreparingGuts(), so "waitStart" needs to be initialized again. Currently
Re: adding wait_start column to pg_locks
On 2021/02/15 15:17, Fujii Masao wrote: On 2021/02/10 10:43, Fujii Masao wrote: On 2021/02/09 23:31, torikoshia wrote: On 2021-02-09 22:54, Fujii Masao wrote: On 2021/02/09 19:11, Fujii Masao wrote: On 2021/02/09 18:13, Fujii Masao wrote: On 2021/02/09 17:48, torikoshia wrote: On 2021-02-05 18:49, Fujii Masao wrote: On 2021/02/05 0:03, torikoshia wrote: On 2021-02-03 11:23, Fujii Masao wrote: 64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating "waitStart" without holding the partition lock? Also GetLockStatusData() needs spinlock when reading "waitStart"? Also it might be worth thinking to use 64-bit atomic operations like pg_atomic_read_u64(), for that. Thanks for your suggestion and advice! In the attached patch I used pg_atomic_read_u64() and pg_atomic_write_u64(). waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx and pg_atomic_write_xxx only supports unsigned int, so I cast the type. I may be using these functions not correctly, so if something is wrong, I would appreciate any comments. About the documentation, since your suggestion seems better than v6, I used it as is. Thanks for updating the patch! + if (pg_atomic_read_u64(>waitStart) == 0) + pg_atomic_write_u64(>waitStart, + pg_atomic_read_u64((pg_atomic_uint64 *) )); pg_atomic_read_u64() is really necessary? I think that "pg_atomic_write_u64(>waitStart, now)" is enough. + deadlockStart = get_timeout_start_time(DEADLOCK_TIMEOUT); + pg_atomic_write_u64(>waitStart, + pg_atomic_read_u64((pg_atomic_uint64 *) )); Same as above. + /* + * Record waitStart reusing the deadlock timeout timer. + * + * It would be ideal this can be synchronously done with updating + * lock information. Howerver, since it gives performance impacts + * to hold partitionLock longer time, we do it here asynchronously. + */ IMO it's better to comment why we reuse the deadlock timeout timer. proc->waitStatus = waitStatus; + pg_atomic_init_u64(>waitStart, 0); pg_atomic_write_u64() should be used instead? Because waitStart can be accessed concurrently there. I updated the patch and addressed the above review comments. Patch attached. Barring any objection, I will commit this version. Thanks for modifying the patch! I agree with your comments. BTW, I ran pgbench several times before and after applying this patch. The environment is virtual machine(CentOS 8), so this is just for reference, but there were no significant difference in latency or tps(both are below 1%). Thanks for the test! I pushed the patch. But I reverted the patch because buildfarm members rorqual and prion don't like the patch. I'm trying to investigate the cause of this failures. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual=2021-02-09%2009%3A20%3A10 - relation | locktype | mode --+--+- - test_prepared_1 | relation | RowExclusiveLock - test_prepared_1 | relation | AccessExclusiveLock -(2 rows) - +ERROR: invalid spinlock number: 0 "rorqual" reported that the above error happened in the server built with --disable-atomics --disable-spinlocks when reading pg_locks after the transaction was prepared. The cause of this issue is that "waitStart" atomic variable in the dummy proc created at the end of prepare transaction was not initialized. I updated the patch so that pg_atomic_init_u64() is called for the "waitStart" in the dummy proc for prepared transaction. Patch attached. I confirmed that the patched server built with --disable-atomics --disable-spinlocks passed all the regression tests. Thanks for fixing the bug, I also tested v9.patch configured with --disable-atomics --disable-spinlocks on my environment and confirmed that all tests have passed. Thanks for the test! I found another bug in the patch. InitProcess() initializes "waitStart", but previously InitAuxiliaryProcess() did not. This could cause "invalid spinlock number" error when reading pg_locks in the standby server. I fixed that. Attached is the updated version of the patch. I pushed this version. Thanks! While reading the patch again, I found two minor things. 1. As discussed in another thread [1], the atomic variable "waitStart" should be initialized at the postmaster startup rather than the startup of each child process. I changed "waitStart" so that it's initialized in InitProcGlobal() and also reset to 0 by using pg_atomic_write_u64() in InitProcess() and InitAuxiliaryProcess(). 2. Thanks to the above change, InitProcGlobal() initializes "waitStart" even in PGPROC entries for prepare transactions. But those entries are zeroed in MarkAsPreparingGuts(), so "waitStart" needs to be initialized again. Currently TwoPhaseGetDummyProc() initializes "waitStart" in the PGPROC entry for prepare transaction. But
Re: adding wait_start column to pg_locks
On 2021/02/09 23:31, torikoshia wrote: On 2021-02-09 22:54, Fujii Masao wrote: On 2021/02/09 19:11, Fujii Masao wrote: On 2021/02/09 18:13, Fujii Masao wrote: On 2021/02/09 17:48, torikoshia wrote: On 2021-02-05 18:49, Fujii Masao wrote: On 2021/02/05 0:03, torikoshia wrote: On 2021-02-03 11:23, Fujii Masao wrote: 64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating "waitStart" without holding the partition lock? Also GetLockStatusData() needs spinlock when reading "waitStart"? Also it might be worth thinking to use 64-bit atomic operations like pg_atomic_read_u64(), for that. Thanks for your suggestion and advice! In the attached patch I used pg_atomic_read_u64() and pg_atomic_write_u64(). waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx and pg_atomic_write_xxx only supports unsigned int, so I cast the type. I may be using these functions not correctly, so if something is wrong, I would appreciate any comments. About the documentation, since your suggestion seems better than v6, I used it as is. Thanks for updating the patch! + if (pg_atomic_read_u64(>waitStart) == 0) + pg_atomic_write_u64(>waitStart, + pg_atomic_read_u64((pg_atomic_uint64 *) )); pg_atomic_read_u64() is really necessary? I think that "pg_atomic_write_u64(>waitStart, now)" is enough. + deadlockStart = get_timeout_start_time(DEADLOCK_TIMEOUT); + pg_atomic_write_u64(>waitStart, + pg_atomic_read_u64((pg_atomic_uint64 *) )); Same as above. + /* + * Record waitStart reusing the deadlock timeout timer. + * + * It would be ideal this can be synchronously done with updating + * lock information. Howerver, since it gives performance impacts + * to hold partitionLock longer time, we do it here asynchronously. + */ IMO it's better to comment why we reuse the deadlock timeout timer. proc->waitStatus = waitStatus; + pg_atomic_init_u64(>waitStart, 0); pg_atomic_write_u64() should be used instead? Because waitStart can be accessed concurrently there. I updated the patch and addressed the above review comments. Patch attached. Barring any objection, I will commit this version. Thanks for modifying the patch! I agree with your comments. BTW, I ran pgbench several times before and after applying this patch. The environment is virtual machine(CentOS 8), so this is just for reference, but there were no significant difference in latency or tps(both are below 1%). Thanks for the test! I pushed the patch. But I reverted the patch because buildfarm members rorqual and prion don't like the patch. I'm trying to investigate the cause of this failures. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual=2021-02-09%2009%3A20%3A10 - relation | locktype | mode --+--+- - test_prepared_1 | relation | RowExclusiveLock - test_prepared_1 | relation | AccessExclusiveLock -(2 rows) - +ERROR: invalid spinlock number: 0 "rorqual" reported that the above error happened in the server built with --disable-atomics --disable-spinlocks when reading pg_locks after the transaction was prepared. The cause of this issue is that "waitStart" atomic variable in the dummy proc created at the end of prepare transaction was not initialized. I updated the patch so that pg_atomic_init_u64() is called for the "waitStart" in the dummy proc for prepared transaction. Patch attached. I confirmed that the patched server built with --disable-atomics --disable-spinlocks passed all the regression tests. Thanks for fixing the bug, I also tested v9.patch configured with --disable-atomics --disable-spinlocks on my environment and confirmed that all tests have passed. Thanks for the test! I found another bug in the patch. InitProcess() initializes "waitStart", but previously InitAuxiliaryProcess() did not. This could cause "invalid spinlock number" error when reading pg_locks in the standby server. I fixed that. Attached is the updated version of the patch. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out index 13848b7449..5a3f1ef737 100644 --- a/contrib/amcheck/expected/check_btree.out +++ b/contrib/amcheck/expected/check_btree.out @@ -97,8 +97,8 @@ SELECT bt_index_parent_check('bttest_b_idx'); SELECT * FROM pg_locks WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[]) AND pid = pg_backend_pid(); - locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath
Re: adding wait_start column to pg_locks
On 2021-02-09 22:54, Fujii Masao wrote: On 2021/02/09 19:11, Fujii Masao wrote: On 2021/02/09 18:13, Fujii Masao wrote: On 2021/02/09 17:48, torikoshia wrote: On 2021-02-05 18:49, Fujii Masao wrote: On 2021/02/05 0:03, torikoshia wrote: On 2021-02-03 11:23, Fujii Masao wrote: 64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating "waitStart" without holding the partition lock? Also GetLockStatusData() needs spinlock when reading "waitStart"? Also it might be worth thinking to use 64-bit atomic operations like pg_atomic_read_u64(), for that. Thanks for your suggestion and advice! In the attached patch I used pg_atomic_read_u64() and pg_atomic_write_u64(). waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx and pg_atomic_write_xxx only supports unsigned int, so I cast the type. I may be using these functions not correctly, so if something is wrong, I would appreciate any comments. About the documentation, since your suggestion seems better than v6, I used it as is. Thanks for updating the patch! + if (pg_atomic_read_u64(>waitStart) == 0) + pg_atomic_write_u64(>waitStart, + pg_atomic_read_u64((pg_atomic_uint64 *) )); pg_atomic_read_u64() is really necessary? I think that "pg_atomic_write_u64(>waitStart, now)" is enough. + deadlockStart = get_timeout_start_time(DEADLOCK_TIMEOUT); + pg_atomic_write_u64(>waitStart, + pg_atomic_read_u64((pg_atomic_uint64 *) )); Same as above. + /* + * Record waitStart reusing the deadlock timeout timer. + * + * It would be ideal this can be synchronously done with updating + * lock information. Howerver, since it gives performance impacts + * to hold partitionLock longer time, we do it here asynchronously. + */ IMO it's better to comment why we reuse the deadlock timeout timer. proc->waitStatus = waitStatus; + pg_atomic_init_u64(>waitStart, 0); pg_atomic_write_u64() should be used instead? Because waitStart can be accessed concurrently there. I updated the patch and addressed the above review comments. Patch attached. Barring any objection, I will commit this version. Thanks for modifying the patch! I agree with your comments. BTW, I ran pgbench several times before and after applying this patch. The environment is virtual machine(CentOS 8), so this is just for reference, but there were no significant difference in latency or tps(both are below 1%). Thanks for the test! I pushed the patch. But I reverted the patch because buildfarm members rorqual and prion don't like the patch. I'm trying to investigate the cause of this failures. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual=2021-02-09%2009%3A20%3A10 -relation | locktype |mode --+--+- - test_prepared_1 | relation | RowExclusiveLock - test_prepared_1 | relation | AccessExclusiveLock -(2 rows) - +ERROR: invalid spinlock number: 0 "rorqual" reported that the above error happened in the server built with --disable-atomics --disable-spinlocks when reading pg_locks after the transaction was prepared. The cause of this issue is that "waitStart" atomic variable in the dummy proc created at the end of prepare transaction was not initialized. I updated the patch so that pg_atomic_init_u64() is called for the "waitStart" in the dummy proc for prepared transaction. Patch attached. I confirmed that the patched server built with --disable-atomics --disable-spinlocks passed all the regression tests. Thanks for fixing the bug, I also tested v9.patch configured with --disable-atomics --disable-spinlocks on my environment and confirmed that all tests have passed. BTW, while investigating this issue, I found that pg_stat_wal_receiver also could cause this error even in the current master (without the patch). I will report that in separate thread. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion=2021-02-09%2009%3A13%3A16 "prion" reported the following error. But I'm not sure how the changes of pg_locks caused this error. I found that Heikki also reported at [1] that "prion" failed with the same error but was not sure how it happened. This makes me think for now that this issue is not directly related to the pg_locks changes. Thanks! I was wondering how these errors were related to the commit. Regards, -- Atsushi Torikoshi - pg_dump: error: query failed: ERROR: missing chunk number 0 for toast value 1 in pg_toast_2619 pg_dump: error: query was: SELECT a.attnum, a.attname, a.atttypmod, a.attstattarget, a.attstorage, t.typstorage, a.attnotnull, a.atthasdef, a.attisdropped, a.attlen, a.attalign, a.attislocal, pg_catalog.format_type(t.oid, a.atttypmod) AS atttypname, array_to_string(a.attoptions, ', ') AS attoptions, CASE WHEN a.attcollation <>
Re: adding wait_start column to pg_locks
On 2021/02/09 19:11, Fujii Masao wrote: On 2021/02/09 18:13, Fujii Masao wrote: On 2021/02/09 17:48, torikoshia wrote: On 2021-02-05 18:49, Fujii Masao wrote: On 2021/02/05 0:03, torikoshia wrote: On 2021-02-03 11:23, Fujii Masao wrote: 64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating "waitStart" without holding the partition lock? Also GetLockStatusData() needs spinlock when reading "waitStart"? Also it might be worth thinking to use 64-bit atomic operations like pg_atomic_read_u64(), for that. Thanks for your suggestion and advice! In the attached patch I used pg_atomic_read_u64() and pg_atomic_write_u64(). waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx and pg_atomic_write_xxx only supports unsigned int, so I cast the type. I may be using these functions not correctly, so if something is wrong, I would appreciate any comments. About the documentation, since your suggestion seems better than v6, I used it as is. Thanks for updating the patch! + if (pg_atomic_read_u64(>waitStart) == 0) + pg_atomic_write_u64(>waitStart, + pg_atomic_read_u64((pg_atomic_uint64 *) )); pg_atomic_read_u64() is really necessary? I think that "pg_atomic_write_u64(>waitStart, now)" is enough. + deadlockStart = get_timeout_start_time(DEADLOCK_TIMEOUT); + pg_atomic_write_u64(>waitStart, + pg_atomic_read_u64((pg_atomic_uint64 *) )); Same as above. + /* + * Record waitStart reusing the deadlock timeout timer. + * + * It would be ideal this can be synchronously done with updating + * lock information. Howerver, since it gives performance impacts + * to hold partitionLock longer time, we do it here asynchronously. + */ IMO it's better to comment why we reuse the deadlock timeout timer. proc->waitStatus = waitStatus; + pg_atomic_init_u64(>waitStart, 0); pg_atomic_write_u64() should be used instead? Because waitStart can be accessed concurrently there. I updated the patch and addressed the above review comments. Patch attached. Barring any objection, I will commit this version. Thanks for modifying the patch! I agree with your comments. BTW, I ran pgbench several times before and after applying this patch. The environment is virtual machine(CentOS 8), so this is just for reference, but there were no significant difference in latency or tps(both are below 1%). Thanks for the test! I pushed the patch. But I reverted the patch because buildfarm members rorqual and prion don't like the patch. I'm trying to investigate the cause of this failures. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual=2021-02-09%2009%3A20%3A10 -relation | locktype |mode --+--+- - test_prepared_1 | relation | RowExclusiveLock - test_prepared_1 | relation | AccessExclusiveLock -(2 rows) - +ERROR: invalid spinlock number: 0 "rorqual" reported that the above error happened in the server built with --disable-atomics --disable-spinlocks when reading pg_locks after the transaction was prepared. The cause of this issue is that "waitStart" atomic variable in the dummy proc created at the end of prepare transaction was not initialized. I updated the patch so that pg_atomic_init_u64() is called for the "waitStart" in the dummy proc for prepared transaction. Patch attached. I confirmed that the patched server built with --disable-atomics --disable-spinlocks passed all the regression tests. BTW, while investigating this issue, I found that pg_stat_wal_receiver also could cause this error even in the current master (without the patch). I will report that in separate thread. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion=2021-02-09%2009%3A13%3A16 "prion" reported the following error. But I'm not sure how the changes of pg_locks caused this error. I found that Heikki also reported at [1] that "prion" failed with the same error but was not sure how it happened. This makes me think for now that this issue is not directly related to the pg_locks changes. - pg_dump: error: query failed: ERROR: missing chunk number 0 for toast value 1 in pg_toast_2619 pg_dump: error: query was: SELECT a.attnum, a.attname, a.atttypmod, a.attstattarget, a.attstorage, t.typstorage, a.attnotnull, a.atthasdef, a.attisdropped, a.attlen, a.attalign, a.attislocal, pg_catalog.format_type(t.oid, a.atttypmod) AS atttypname, array_to_string(a.attoptions, ', ') AS attoptions, CASE WHEN a.attcollation <> t.typcollation THEN a.attcollation ELSE 0 END AS attcollation, pg_catalog.array_to_string(ARRAY(SELECT pg_catalog.quote_ident(option_name) || ' ' || pg_catalog.quote_literal(option_value) FROM pg_catalog.pg_options_to_table(attfdwoptions) ORDER BY option_name), E', ') AS attfdwoptions, a.attidentity, CASE WHEN a.atthasmissing AND NOT
Re: adding wait_start column to pg_locks
On 2021/02/09 18:13, Fujii Masao wrote: On 2021/02/09 17:48, torikoshia wrote: On 2021-02-05 18:49, Fujii Masao wrote: On 2021/02/05 0:03, torikoshia wrote: On 2021-02-03 11:23, Fujii Masao wrote: 64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating "waitStart" without holding the partition lock? Also GetLockStatusData() needs spinlock when reading "waitStart"? Also it might be worth thinking to use 64-bit atomic operations like pg_atomic_read_u64(), for that. Thanks for your suggestion and advice! In the attached patch I used pg_atomic_read_u64() and pg_atomic_write_u64(). waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx and pg_atomic_write_xxx only supports unsigned int, so I cast the type. I may be using these functions not correctly, so if something is wrong, I would appreciate any comments. About the documentation, since your suggestion seems better than v6, I used it as is. Thanks for updating the patch! + if (pg_atomic_read_u64(>waitStart) == 0) + pg_atomic_write_u64(>waitStart, + pg_atomic_read_u64((pg_atomic_uint64 *) )); pg_atomic_read_u64() is really necessary? I think that "pg_atomic_write_u64(>waitStart, now)" is enough. + deadlockStart = get_timeout_start_time(DEADLOCK_TIMEOUT); + pg_atomic_write_u64(>waitStart, + pg_atomic_read_u64((pg_atomic_uint64 *) )); Same as above. + /* + * Record waitStart reusing the deadlock timeout timer. + * + * It would be ideal this can be synchronously done with updating + * lock information. Howerver, since it gives performance impacts + * to hold partitionLock longer time, we do it here asynchronously. + */ IMO it's better to comment why we reuse the deadlock timeout timer. proc->waitStatus = waitStatus; + pg_atomic_init_u64(>waitStart, 0); pg_atomic_write_u64() should be used instead? Because waitStart can be accessed concurrently there. I updated the patch and addressed the above review comments. Patch attached. Barring any objection, I will commit this version. Thanks for modifying the patch! I agree with your comments. BTW, I ran pgbench several times before and after applying this patch. The environment is virtual machine(CentOS 8), so this is just for reference, but there were no significant difference in latency or tps(both are below 1%). Thanks for the test! I pushed the patch. But I reverted the patch because buildfarm members rorqual and prion don't like the patch. I'm trying to investigate the cause of this failures. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual=2021-02-09%2009%3A20%3A10 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion=2021-02-09%2009%3A13%3A16 Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: adding wait_start column to pg_locks
On 2021/02/09 17:48, torikoshia wrote: On 2021-02-05 18:49, Fujii Masao wrote: On 2021/02/05 0:03, torikoshia wrote: On 2021-02-03 11:23, Fujii Masao wrote: 64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating "waitStart" without holding the partition lock? Also GetLockStatusData() needs spinlock when reading "waitStart"? Also it might be worth thinking to use 64-bit atomic operations like pg_atomic_read_u64(), for that. Thanks for your suggestion and advice! In the attached patch I used pg_atomic_read_u64() and pg_atomic_write_u64(). waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx and pg_atomic_write_xxx only supports unsigned int, so I cast the type. I may be using these functions not correctly, so if something is wrong, I would appreciate any comments. About the documentation, since your suggestion seems better than v6, I used it as is. Thanks for updating the patch! + if (pg_atomic_read_u64(>waitStart) == 0) + pg_atomic_write_u64(>waitStart, + pg_atomic_read_u64((pg_atomic_uint64 *) )); pg_atomic_read_u64() is really necessary? I think that "pg_atomic_write_u64(>waitStart, now)" is enough. + deadlockStart = get_timeout_start_time(DEADLOCK_TIMEOUT); + pg_atomic_write_u64(>waitStart, + pg_atomic_read_u64((pg_atomic_uint64 *) )); Same as above. + /* + * Record waitStart reusing the deadlock timeout timer. + * + * It would be ideal this can be synchronously done with updating + * lock information. Howerver, since it gives performance impacts + * to hold partitionLock longer time, we do it here asynchronously. + */ IMO it's better to comment why we reuse the deadlock timeout timer. proc->waitStatus = waitStatus; + pg_atomic_init_u64(>waitStart, 0); pg_atomic_write_u64() should be used instead? Because waitStart can be accessed concurrently there. I updated the patch and addressed the above review comments. Patch attached. Barring any objection, I will commit this version. Thanks for modifying the patch! I agree with your comments. BTW, I ran pgbench several times before and after applying this patch. The environment is virtual machine(CentOS 8), so this is just for reference, but there were no significant difference in latency or tps(both are below 1%). Thanks for the test! I pushed the patch. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: adding wait_start column to pg_locks
On 2021-02-05 18:49, Fujii Masao wrote: On 2021/02/05 0:03, torikoshia wrote: On 2021-02-03 11:23, Fujii Masao wrote: 64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating "waitStart" without holding the partition lock? Also GetLockStatusData() needs spinlock when reading "waitStart"? Also it might be worth thinking to use 64-bit atomic operations like pg_atomic_read_u64(), for that. Thanks for your suggestion and advice! In the attached patch I used pg_atomic_read_u64() and pg_atomic_write_u64(). waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx and pg_atomic_write_xxx only supports unsigned int, so I cast the type. I may be using these functions not correctly, so if something is wrong, I would appreciate any comments. About the documentation, since your suggestion seems better than v6, I used it as is. Thanks for updating the patch! + if (pg_atomic_read_u64(>waitStart) == 0) + pg_atomic_write_u64(>waitStart, + pg_atomic_read_u64((pg_atomic_uint64 *) )); pg_atomic_read_u64() is really necessary? I think that "pg_atomic_write_u64(>waitStart, now)" is enough. + deadlockStart = get_timeout_start_time(DEADLOCK_TIMEOUT); + pg_atomic_write_u64(>waitStart, + pg_atomic_read_u64((pg_atomic_uint64 *) )); Same as above. + /* +* Record waitStart reusing the deadlock timeout timer. +* +* It would be ideal this can be synchronously done with updating +* lock information. Howerver, since it gives performance impacts +* to hold partitionLock longer time, we do it here asynchronously. +*/ IMO it's better to comment why we reuse the deadlock timeout timer. proc->waitStatus = waitStatus; + pg_atomic_init_u64(>waitStart, 0); pg_atomic_write_u64() should be used instead? Because waitStart can be accessed concurrently there. I updated the patch and addressed the above review comments. Patch attached. Barring any objection, I will commit this version. Thanks for modifying the patch! I agree with your comments. BTW, I ran pgbench several times before and after applying this patch. The environment is virtual machine(CentOS 8), so this is just for reference, but there were no significant difference in latency or tps(both are below 1%). Regards, -- Atsushi Torikoshi
Re: adding wait_start column to pg_locks
On 2021/02/05 0:03, torikoshia wrote: On 2021-02-03 11:23, Fujii Masao wrote: 64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating "waitStart" without holding the partition lock? Also GetLockStatusData() needs spinlock when reading "waitStart"? Also it might be worth thinking to use 64-bit atomic operations like pg_atomic_read_u64(), for that. Thanks for your suggestion and advice! In the attached patch I used pg_atomic_read_u64() and pg_atomic_write_u64(). waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx and pg_atomic_write_xxx only supports unsigned int, so I cast the type. I may be using these functions not correctly, so if something is wrong, I would appreciate any comments. About the documentation, since your suggestion seems better than v6, I used it as is. Thanks for updating the patch! + if (pg_atomic_read_u64(>waitStart) == 0) + pg_atomic_write_u64(>waitStart, + pg_atomic_read_u64((pg_atomic_uint64 *) )); pg_atomic_read_u64() is really necessary? I think that "pg_atomic_write_u64(>waitStart, now)" is enough. + deadlockStart = get_timeout_start_time(DEADLOCK_TIMEOUT); + pg_atomic_write_u64(>waitStart, + pg_atomic_read_u64((pg_atomic_uint64 *) )); Same as above. + /* +* Record waitStart reusing the deadlock timeout timer. +* +* It would be ideal this can be synchronously done with updating +* lock information. Howerver, since it gives performance impacts +* to hold partitionLock longer time, we do it here asynchronously. +*/ IMO it's better to comment why we reuse the deadlock timeout timer. proc->waitStatus = waitStatus; + pg_atomic_init_u64(>waitStart, 0); pg_atomic_write_u64() should be used instead? Because waitStart can be accessed concurrently there. I updated the patch and addressed the above review comments. Patch attached. Barring any objection, I will commit this version. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out index 13848b7449..5a3f1ef737 100644 --- a/contrib/amcheck/expected/check_btree.out +++ b/contrib/amcheck/expected/check_btree.out @@ -97,8 +97,8 @@ SELECT bt_index_parent_check('bttest_b_idx'); SELECT * FROM pg_locks WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[]) AND pid = pg_backend_pid(); - locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath ---+--+--+--+---++---+-+---+--++-+--+-+-- + locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | waitstart +--+--+--+--+---++---+-+---+--++-+--+-+--+--- (0 rows) COMMIT; diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index ea222c0464..9085d2e64c 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -10604,6 +10604,19 @@ SCRAM-SHA-256$iteration count: lock table + + + + waitstart timestamptz + + + Time when the server process started waiting for this lock, + or null if the lock is held. + Note that this can be null for a very short period of time after + the wait started even though granted + is false. + + diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c index 39a30c00f7..5877a60715 100644 --- a/src/backend/storage/ipc/standby.c +++ b/src/backend/storage/ipc/standby.c @@ -540,12 +540,34 @@ void ResolveRecoveryConflictWithLock(LOCKTAG locktag, bool logging_conflict) { TimestampTz ltime; + TimestampTz now; Assert(InHotStandby); ltime = GetStandbyLimitTime(); + now = GetCurrentTimestamp(); - if (GetCurrentTimestamp() >= ltime && ltime != 0) + /* +* Update waitStart if first time through after the startup process +* started waiting for the lock. It should not be updated every time +* ResolveRecoveryConflictWithLock() is called during the wait. +* +* Use the current time obtained for comparison with ltime as waitStart +* (i.e., the time when this process started waiting for the lock). Since +* getting the
Re: adding wait_start column to pg_locks
On 2021-02-03 11:23, Fujii Masao wrote: 64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating "waitStart" without holding the partition lock? Also GetLockStatusData() needs spinlock when reading "waitStart"? Also it might be worth thinking to use 64-bit atomic operations like pg_atomic_read_u64(), for that. Thanks for your suggestion and advice! In the attached patch I used pg_atomic_read_u64() and pg_atomic_write_u64(). waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx and pg_atomic_write_xxx only supports unsigned int, so I cast the type. I may be using these functions not correctly, so if something is wrong, I would appreciate any comments. About the documentation, since your suggestion seems better than v6, I used it as is. Regards, -- Atsushi TorikoshiFrom 38a3d8996c4b1690cf18cdb1015e270201d34330 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Thu, 4 Feb 2021 23:23:36 +0900 Subject: [PATCH v7] To examine the duration of locks, we did join on pg_locks and pg_stat_activity and used columns such as query_start or state_change. However, since they are the moment when queries have started or their state has changed, we could not get the lock duration in this way. This patch adds a new field "waitstart" preserving lock acquisition wait start time. Note that updating this field and lock acquisition are not performed synchronously for performance reasons. Therefore, depending on the timing, it can happen that waitstart is NULL even though granted is false. Author: Atsushi Torikoshi Reviewed-by: Ian Lawrence Barwick, Robert Haas, Fujii Masao Discussion: https://postgr.es/m/a96013dc51cdc56b2a2b84fa8a16a...@oss.nttdata.com modifies --- contrib/amcheck/expected/check_btree.out | 4 ++-- doc/src/sgml/catalogs.sgml | 13 + src/backend/storage/ipc/standby.c| 17 +++-- src/backend/storage/lmgr/lock.c | 8 src/backend/storage/lmgr/proc.c | 14 ++ src/backend/utils/adt/lockfuncs.c| 9 - src/include/catalog/pg_proc.dat | 6 +++--- src/include/storage/lock.h | 2 ++ src/include/storage/proc.h | 1 + src/test/regress/expected/rules.out | 5 +++-- 10 files changed, 69 insertions(+), 10 deletions(-) diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out index 13848b7449..5a3f1ef737 100644 --- a/contrib/amcheck/expected/check_btree.out +++ b/contrib/amcheck/expected/check_btree.out @@ -97,8 +97,8 @@ SELECT bt_index_parent_check('bttest_b_idx'); SELECT * FROM pg_locks WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[]) AND pid = pg_backend_pid(); - locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath ---+--+--+--+---++---+-+---+--++-+--+-+-- + locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | waitstart +--+--+--+--+---++---+-+---+--++-+--+-+--+--- (0 rows) COMMIT; diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 865e826fb0..7df4c30a65 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -10592,6 +10592,19 @@ SCRAM-SHA-256$iteration count: lock table + + + + waitstart timestamptz + + + Time when the server process started waiting for this lock, + or null if the lock is held. + Note that this can be null for a very short period of time after + the wait started even though granted + is false. + + diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c index 39a30c00f7..1c8135ba74 100644 --- a/src/backend/storage/ipc/standby.c +++ b/src/backend/storage/ipc/standby.c @@ -539,13 +539,26 @@ ResolveRecoveryConflictWithDatabase(Oid dbid) void ResolveRecoveryConflictWithLock(LOCKTAG locktag, bool logging_conflict) { - TimestampTz ltime; + TimestampTz ltime, now; Assert(InHotStandby); ltime = GetStandbyLimitTime(); + now = GetCurrentTimestamp(); - if (GetCurrentTimestamp() >= ltime && ltime != 0) + /* + * Record waitStart using the current time obtained for comparison + * with ltime. + * + * It would be ideal this can be synchronously done with updating + * lock information. Howerver, since it gives performance impacts + * to hold partitionLock longer time, we do it here asynchronously. + */ + if (pg_atomic_read_u64(>waitStart) == 0) +
Re: adding wait_start column to pg_locks
On 2021/02/03 1:49, Fujii Masao wrote: On 2021/02/02 22:00, torikoshia wrote: On 2021-01-25 23:44, Fujii Masao wrote: Another comment is; Doesn't the change of MyProc->waitStart need the lock table's partition lock? If yes, we can do that by moving LWLockRelease(partitionLock) just after the change of MyProc->waitStart, but which causes the time that lwlock is being held to be long. So maybe we need another way to do that. Thanks for your comments! It would be ideal for the consistency of the view to record "waitstart" during holding the table partition lock. However, as you pointed out, it would give non-negligible performance impacts. I may miss something, but as far as I can see, the influence of not holding the lock is that "waitstart" can be NULL even though "granted" is false. I think people want to know the start time of the lock when locks are held for a long time. In that case, "waitstart" should have already been recorded. Sounds reasonable. If this is true, I think the current implementation may be enough on the condition that users understand it can happen that "waitStart" is NULL and "granted" is false. Attached a patch describing this in the doc and comments. Any Thoughts? 64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating "waitStart" without holding the partition lock? Also GetLockStatusData() needs spinlock when reading "waitStart"? Also it might be worth thinking to use 64-bit atomic operations like pg_atomic_read_u64(), for that. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: adding wait_start column to pg_locks
On 2021/02/02 22:00, torikoshia wrote: On 2021-01-25 23:44, Fujii Masao wrote: Another comment is; Doesn't the change of MyProc->waitStart need the lock table's partition lock? If yes, we can do that by moving LWLockRelease(partitionLock) just after the change of MyProc->waitStart, but which causes the time that lwlock is being held to be long. So maybe we need another way to do that. Thanks for your comments! It would be ideal for the consistency of the view to record "waitstart" during holding the table partition lock. However, as you pointed out, it would give non-negligible performance impacts. I may miss something, but as far as I can see, the influence of not holding the lock is that "waitstart" can be NULL even though "granted" is false. I think people want to know the start time of the lock when locks are held for a long time. In that case, "waitstart" should have already been recorded. Sounds reasonable. If this is true, I think the current implementation may be enough on the condition that users understand it can happen that "waitStart" is NULL and "granted" is false. Attached a patch describing this in the doc and comments. Any Thoughts? 64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating "waitStart" without holding the partition lock? Also GetLockStatusData() needs spinlock when reading "waitStart"? + Lock acquisition wait start time. Isn't it better to describe this more clearly? What about the following? Time when the server process started waiting for this lock, or null if the lock is held. + Note that updating this field and lock acquisition are not performed + synchronously for performance reasons. Therefore, depending on the + timing, it can happen that waitstart is + NULL even though + granted is false. I agree that it's helpful to add the note about that NULL can be returned even when "granted" is false. But IMO we don't need to document why this behavior can happen internally. So what about the following? Note that this can be null for a very short period of time after the wait started even though granted is false. Since the document for pg_locks uses "null" instead of NULL (I'm not sure why, though), I used "null" for the sake of consistency. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: adding wait_start column to pg_locks
On 2021-01-25 23:44, Fujii Masao wrote: Another comment is; Doesn't the change of MyProc->waitStart need the lock table's partition lock? If yes, we can do that by moving LWLockRelease(partitionLock) just after the change of MyProc->waitStart, but which causes the time that lwlock is being held to be long. So maybe we need another way to do that. Thanks for your comments! It would be ideal for the consistency of the view to record "waitstart" during holding the table partition lock. However, as you pointed out, it would give non-negligible performance impacts. I may miss something, but as far as I can see, the influence of not holding the lock is that "waitstart" can be NULL even though "granted" is false. I think people want to know the start time of the lock when locks are held for a long time. In that case, "waitstart" should have already been recorded. If this is true, I think the current implementation may be enough on the condition that users understand it can happen that "waitStart" is NULL and "granted" is false. Attached a patch describing this in the doc and comments. Any Thoughts? Regards, -- Atsushi TorikoshiFrom 03c6e1ed6ffa215ee898b5a6a75d77277fb8e672 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Tue, 2 Feb 2021 21:32:36 +0900 Subject: [PATCH v6] To examine the duration of locks, we did join on pg_locks and pg_stat_activity and used columns such as query_start or state_change. However, since they are the moment when queries have started or their state has changed, we could not get the lock duration in this way. This patch adds a new field "waitstart" preserving lock acquisition wait start time. Note that updating this field and lock acquisition are not performed synchronously for performance reasons. Therefore, depending on the timing, it can happen that waitstart is NULL even though granted is false. Author: Atsushi Torikoshi Reviewed-by: Ian Lawrence Barwick, Robert Haas, Fujii Masao Discussion: https://postgr.es/m/a96013dc51cdc56b2a2b84fa8a16a...@oss.nttdata.com --- contrib/amcheck/expected/check_btree.out | 4 ++-- doc/src/sgml/catalogs.sgml | 14 ++ src/backend/storage/ipc/standby.c| 16 ++-- src/backend/storage/lmgr/lock.c | 8 src/backend/storage/lmgr/proc.c | 10 ++ src/backend/utils/adt/lockfuncs.c| 9 - src/include/catalog/pg_proc.dat | 6 +++--- src/include/storage/lock.h | 2 ++ src/include/storage/proc.h | 1 + src/test/regress/expected/rules.out | 5 +++-- 10 files changed, 65 insertions(+), 10 deletions(-) diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out index 13848b7449..5a3f1ef737 100644 --- a/contrib/amcheck/expected/check_btree.out +++ b/contrib/amcheck/expected/check_btree.out @@ -97,8 +97,8 @@ SELECT bt_index_parent_check('bttest_b_idx'); SELECT * FROM pg_locks WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[]) AND pid = pg_backend_pid(); - locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath ---+--+--+--+---++---+-+---+--++-+--+-+-- + locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | waitstart +--+--+--+--+---++---+-+---+--++-+--+-+--+--- (0 rows) COMMIT; diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 865e826fb0..d81d6e1c52 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -10592,6 +10592,20 @@ SCRAM-SHA-256$iteration count: lock table + + + + waitstart timestamptz + + + Lock acquisition wait start time. + Note that updating this field and lock acquisition are not performed + synchronously for performance reasons. Therefore, depending on the + timing, it can happen that waitstart is + NULL even though + granted is false. + + diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c index 39a30c00f7..2282229568 100644 --- a/src/backend/storage/ipc/standby.c +++ b/src/backend/storage/ipc/standby.c @@ -539,13 +539,25 @@ ResolveRecoveryConflictWithDatabase(Oid dbid) void ResolveRecoveryConflictWithLock(LOCKTAG locktag, bool logging_conflict) { - TimestampTz ltime; + TimestampTz ltime, now; Assert(InHotStandby); ltime = GetStandbyLimitTime(); + now = GetCurrentTimestamp(); - if (GetCurrentTimestamp() >= ltime && ltime != 0) +
Re: adding wait_start column to pg_locks
On 2021/01/22 18:11, Fujii Masao wrote: On 2021/01/22 14:37, torikoshia wrote: On 2021-01-21 12:48, Fujii Masao wrote: Thanks for updating the patch! I think that this is really useful feature!! Thanks for reviewing! I have two minor comments. + + wait_start timestamptz The column name "wait_start" should be "waitstart" for the sake of consistency with other column names in pg_locks? pg_locks seems to avoid including an underscore in column names, so "locktype" is used instead of "lock_type", "virtualtransaction" is used instead of "virtual_transaction", etc. + Lock acquisition wait start time. NULL if + lock acquired. Agreed. I also changed the variable name "wait_start" in struct PGPROC and LockInstanceData to "waitStart" for the same reason. There seems the case where the wait start time is NULL even when "grant" is false. It's better to add note about that case into the docs? For example, I found that the wait start time is NULL while the startup process is waiting for the lock. Is this only that case? Thanks, this is because I set 'waitstart' in the following condition. ---src/backend/storage/lmgr/proc.c > 1250 if (!InHotStandby) As far as considering this, I guess startup process would be the only case. I also think that in case of startup process, it seems possible to set 'waitstart' in ResolveRecoveryConflictWithLock(), so I did it in the attached patch. This change seems to cause "waitstart" to be reset every time ResolveRecoveryConflictWithLock() is called in the do-while loop. I guess this is not acceptable. Right? To avoid that issue, IMO the following change is better. Thought? - else if (log_recovery_conflict_waits) + else { + TimestampTz now = GetCurrentTimestamp(); + + MyProc->waitStart = now; + /* * Set the wait start timestamp if logging is enabled and in hot * standby. */ - standbyWaitStart = GetCurrentTimestamp(); + if (log_recovery_conflict_waits) + standbyWaitStart = now } This change causes the startup process to call GetCurrentTimestamp() additionally even when log_recovery_conflict_waits is disabled. Which might decrease the performance of the startup process, but that performance degradation can happen only when the startup process waits in ACCESS EXCLUSIVE lock. So if this my understanding right, IMO it's almost harmless to call GetCurrentTimestamp() additionally in that case. Thought? According to the off-list discussion with you, this should not happen because ResolveRecoveryConflictWithDatabase() sets MyProc->waitStart only when it's not set yet (i.e., = 0). That's good. So I'd withdraw my comment. + if (MyProc->waitStart == 0) + MyProc->waitStart = now; + MyProc->waitStart = get_timeout_start_time(DEADLOCK_TIMEOUT); Another comment is; Doesn't the change of MyProc->waitStart need the lock table's partition lock? If yes, we can do that by moving LWLockRelease(partitionLock) just after the change of MyProc->waitStart, but which causes the time that lwlock is being held to be long. So maybe we need another way to do that. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: adding wait_start column to pg_locks
On 2021/01/22 14:37, torikoshia wrote: On 2021-01-21 12:48, Fujii Masao wrote: Thanks for updating the patch! I think that this is really useful feature!! Thanks for reviewing! I have two minor comments. + + wait_start timestamptz The column name "wait_start" should be "waitstart" for the sake of consistency with other column names in pg_locks? pg_locks seems to avoid including an underscore in column names, so "locktype" is used instead of "lock_type", "virtualtransaction" is used instead of "virtual_transaction", etc. + Lock acquisition wait start time. NULL if + lock acquired. Agreed. I also changed the variable name "wait_start" in struct PGPROC and LockInstanceData to "waitStart" for the same reason. There seems the case where the wait start time is NULL even when "grant" is false. It's better to add note about that case into the docs? For example, I found that the wait start time is NULL while the startup process is waiting for the lock. Is this only that case? Thanks, this is because I set 'waitstart' in the following condition. ---src/backend/storage/lmgr/proc.c > 1250 if (!InHotStandby) As far as considering this, I guess startup process would be the only case. I also think that in case of startup process, it seems possible to set 'waitstart' in ResolveRecoveryConflictWithLock(), so I did it in the attached patch. This change seems to cause "waitstart" to be reset every time ResolveRecoveryConflictWithLock() is called in the do-while loop. I guess this is not acceptable. Right? To avoid that issue, IMO the following change is better. Thought? - else if (log_recovery_conflict_waits) + else { + TimestampTz now = GetCurrentTimestamp(); + + MyProc->waitStart = now; + /* * Set the wait start timestamp if logging is enabled and in hot * standby. */ - standbyWaitStart = GetCurrentTimestamp(); +if (log_recovery_conflict_waits) +standbyWaitStart = now } This change causes the startup process to call GetCurrentTimestamp() additionally even when log_recovery_conflict_waits is disabled. Which might decrease the performance of the startup process, but that performance degradation can happen only when the startup process waits in ACCESS EXCLUSIVE lock. So if this my understanding right, IMO it's almost harmless to call GetCurrentTimestamp() additionally in that case. Thought? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: adding wait_start column to pg_locks
On 2021-01-21 12:48, Fujii Masao wrote: Thanks for updating the patch! I think that this is really useful feature!! Thanks for reviewing! I have two minor comments. + role="column_definition"> + wait_start timestamptz The column name "wait_start" should be "waitstart" for the sake of consistency with other column names in pg_locks? pg_locks seems to avoid including an underscore in column names, so "locktype" is used instead of "lock_type", "virtualtransaction" is used instead of "virtual_transaction", etc. + Lock acquisition wait start time. NULL if + lock acquired. Agreed. I also changed the variable name "wait_start" in struct PGPROC and LockInstanceData to "waitStart" for the same reason. There seems the case where the wait start time is NULL even when "grant" is false. It's better to add note about that case into the docs? For example, I found that the wait start time is NULL while the startup process is waiting for the lock. Is this only that case? Thanks, this is because I set 'waitstart' in the following condition. ---src/backend/storage/lmgr/proc.c > 1250 if (!InHotStandby) As far as considering this, I guess startup process would be the only case. I also think that in case of startup process, it seems possible to set 'waitstart' in ResolveRecoveryConflictWithLock(), so I did it in the attached patch. Any thoughts? Regards, -- Atsushi TorikoshiFrom 6beb1c61e72c797c915427ae4e36d6bab9e0594c Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Fri, 22 Jan 2021 13:51:00 +0900 Subject: [PATCH v5] To examine the duration of locks, we did join on pg_locks and pg_stat_activity and used columns such as query_start or state_change. However, since they are the moment when queries have started or their state has changed, we could not get the exact lock duration in this way. --- contrib/amcheck/expected/check_btree.out | 4 ++-- doc/src/sgml/catalogs.sgml | 10 ++ src/backend/storage/ipc/standby.c| 8 ++-- src/backend/storage/lmgr/lock.c | 8 src/backend/storage/lmgr/proc.c | 4 src/backend/utils/adt/lockfuncs.c| 9 - src/include/catalog/pg_proc.dat | 6 +++--- src/include/storage/lock.h | 2 ++ src/include/storage/proc.h | 1 + src/test/regress/expected/rules.out | 5 +++-- 10 files changed, 47 insertions(+), 10 deletions(-) diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out index 13848b7449..5a3f1ef737 100644 --- a/contrib/amcheck/expected/check_btree.out +++ b/contrib/amcheck/expected/check_btree.out @@ -97,8 +97,8 @@ SELECT bt_index_parent_check('bttest_b_idx'); SELECT * FROM pg_locks WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[]) AND pid = pg_backend_pid(); - locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath ---+--+--+--+---++---+-+---+--++-+--+-+-- + locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | waitstart +--+--+--+--+---++---+-+---+--++-+--+-+--+--- (0 rows) COMMIT; diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 43d7a1ad90..ba003ce393 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -10589,6 +10589,16 @@ SCRAM-SHA-256$iteration count: lock table + + + + waitstart timestamptz + + + Lock acquisition wait start time. NULL if + lock acquired + + diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c index 39a30c00f7..819e00e4ab 100644 --- a/src/backend/storage/ipc/standby.c +++ b/src/backend/storage/ipc/standby.c @@ -539,13 +539,17 @@ ResolveRecoveryConflictWithDatabase(Oid dbid) void ResolveRecoveryConflictWithLock(LOCKTAG locktag, bool logging_conflict) { - TimestampTz ltime; + TimestampTz ltime, now; Assert(InHotStandby); ltime = GetStandbyLimitTime(); + now = GetCurrentTimestamp(); - if (GetCurrentTimestamp() >= ltime && ltime != 0) + if (MyProc->waitStart == 0) + MyProc->waitStart = now; + + if (now >= ltime && ltime != 0) { /* * We're already behind, so clear a path as quickly as possible. diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c index 20e50247ea..ffad4e94bc 100644 --- a/src/backend/storage/lmgr/lock.c +++ b/src/backend/storage/lmgr/lock.c @@ -3627,6 +3627,12 @@ GetLockStatusData(void)
Re: adding wait_start column to pg_locks
On 2021/01/18 12:00, torikoshia wrote: On 2021-01-15 15:23, torikoshia wrote: Thanks for your reviewing and comments! On 2021-01-14 12:39, Ian Lawrence Barwick wrote: Looking at the code, this happens as the wait start time is being recorded in the lock record itself, so always contains the value reported by the latest lock acquisition attempt. I think you are right and wait_start should not be recorded in the LOCK. On 2021-01-15 11:48, Ian Lawrence Barwick wrote: 2021年1月15日(金) 3:45 Robert Haas : On Wed, Jan 13, 2021 at 10:40 PM Ian Lawrence Barwick wrote: It looks like the logical place to store the value is in the PROCLOCK structure; ... That seems surprising, because there's one PROCLOCK for every combination of a process and a lock. But, a process can't be waiting for more than one lock at the same time, because once it starts waiting to acquire the first one, it can't do anything else, and thus can't begin waiting for a second one. So I would have thought that this would be recorded in the PROC. Umm, I think we're at cross-purposes here. The suggestion is to note the time when the process started waiting for the lock in the process's PROCLOCK, rather than in the lock itself (which in the original version of the patch resulted in all processes with an interest in the lock appearing to have been waiting to acquire it since the time a lock acquisition was most recently attempted). AFAIU, it seems possible to record wait_start in the PROCLOCK but redundant since each process can wait at most one lock. To confirm my understanding, I'm going to make another patch that records wait_start in the PGPROC. Attached a patch. I noticed previous patches left the wait_start untouched even after it acquired lock. The patch also fixed it. Any thoughts? Thanks for updating the patch! I think that this is really useful feature!! I have two minor comments. + + wait_start timestamptz The column name "wait_start" should be "waitstart" for the sake of consistency with other column names in pg_locks? pg_locks seems to avoid including an underscore in column names, so "locktype" is used instead of "lock_type", "virtualtransaction" is used instead of "virtual_transaction", etc. + Lock acquisition wait start time. NULL if + lock acquired. There seems the case where the wait start time is NULL even when "grant" is false. It's better to add note about that case into the docs? For example, I found that the wait start time is NULL while the startup process is waiting for the lock. Is this only that case? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: adding wait_start column to pg_locks
On 2021-01-15 15:23, torikoshia wrote: Thanks for your reviewing and comments! On 2021-01-14 12:39, Ian Lawrence Barwick wrote: Looking at the code, this happens as the wait start time is being recorded in the lock record itself, so always contains the value reported by the latest lock acquisition attempt. I think you are right and wait_start should not be recorded in the LOCK. On 2021-01-15 11:48, Ian Lawrence Barwick wrote: 2021年1月15日(金) 3:45 Robert Haas : On Wed, Jan 13, 2021 at 10:40 PM Ian Lawrence Barwick wrote: It looks like the logical place to store the value is in the PROCLOCK structure; ... That seems surprising, because there's one PROCLOCK for every combination of a process and a lock. But, a process can't be waiting for more than one lock at the same time, because once it starts waiting to acquire the first one, it can't do anything else, and thus can't begin waiting for a second one. So I would have thought that this would be recorded in the PROC. Umm, I think we're at cross-purposes here. The suggestion is to note the time when the process started waiting for the lock in the process's PROCLOCK, rather than in the lock itself (which in the original version of the patch resulted in all processes with an interest in the lock appearing to have been waiting to acquire it since the time a lock acquisition was most recently attempted). AFAIU, it seems possible to record wait_start in the PROCLOCK but redundant since each process can wait at most one lock. To confirm my understanding, I'm going to make another patch that records wait_start in the PGPROC. Attached a patch. I noticed previous patches left the wait_start untouched even after it acquired lock. The patch also fixed it. Any thoughts? Regards, -- Atsushi TorikoshiFrom 62ff3e4dba7d45c260a62a33425cb2d1e6b822c9 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Mon, 18 Jan 2021 10:01:35 +0900 Subject: [PATCH v4] To examine the duration of locks, we did join on pg_locks and pg_stat_activity and used columns such as query_start or state_change. However, since they are the moment when queries have started or their state has changed, we could not get the exact lock duration in this way. This patch adds a new field preserving the time at which locks started waiting. --- contrib/amcheck/expected/check_btree.out | 4 ++-- doc/src/sgml/catalogs.sgml | 10 ++ src/backend/storage/lmgr/lock.c | 8 src/backend/storage/lmgr/proc.c | 4 src/backend/utils/adt/lockfuncs.c| 9 - src/include/catalog/pg_proc.dat | 6 +++--- src/include/storage/lock.h | 2 ++ src/include/storage/proc.h | 1 + src/test/regress/expected/rules.out | 5 +++-- 9 files changed, 41 insertions(+), 8 deletions(-) diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out index 13848b7449..c0aecb0288 100644 --- a/contrib/amcheck/expected/check_btree.out +++ b/contrib/amcheck/expected/check_btree.out @@ -97,8 +97,8 @@ SELECT bt_index_parent_check('bttest_b_idx'); SELECT * FROM pg_locks WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[]) AND pid = pg_backend_pid(); - locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath ---+--+--+--+---++---+-+---+--++-+--+-+-- + locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | wait_start +--+--+--+--+---++---+-+---+--++-+--+-+--+ (0 rows) COMMIT; diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 43d7a1ad90..a5ce0835a9 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -10589,6 +10589,16 @@ SCRAM-SHA-256$iteration count: lock table + + + + wait_start timestamptz + + + Lock acquisition wait start time. NULL if + lock acquired. + + diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c index 20e50247ea..5b5fb474e0 100644 --- a/src/backend/storage/lmgr/lock.c +++ b/src/backend/storage/lmgr/lock.c @@ -3627,6 +3627,12 @@ GetLockStatusData(void) instance->leaderPid = proc->pid; instance->fastpath = true; + /* + * Successfully taking fast path lock means there were no + * conflicting locks. + */ + instance->wait_start = 0; + el++; } @@ -3654,6 +3660,7 @@ GetLockStatusData(void) instance->pid = proc->pid; instance->leaderPid = proc->pid;
Re: adding wait_start column to pg_locks
Thanks for your reviewing and comments! On 2021-01-14 12:39, Ian Lawrence Barwick wrote: Looking at the code, this happens as the wait start time is being recorded in the lock record itself, so always contains the value reported by the latest lock acquisition attempt. I think you are right and wait_start should not be recorded in the LOCK. On 2021-01-15 11:48, Ian Lawrence Barwick wrote: 2021年1月15日(金) 3:45 Robert Haas : On Wed, Jan 13, 2021 at 10:40 PM Ian Lawrence Barwick wrote: It looks like the logical place to store the value is in the PROCLOCK structure; ... That seems surprising, because there's one PROCLOCK for every combination of a process and a lock. But, a process can't be waiting for more than one lock at the same time, because once it starts waiting to acquire the first one, it can't do anything else, and thus can't begin waiting for a second one. So I would have thought that this would be recorded in the PROC. Umm, I think we're at cross-purposes here. The suggestion is to note the time when the process started waiting for the lock in the process's PROCLOCK, rather than in the lock itself (which in the original version of the patch resulted in all processes with an interest in the lock appearing to have been waiting to acquire it since the time a lock acquisition was most recently attempted). AFAIU, it seems possible to record wait_start in the PROCLOCK but redundant since each process can wait at most one lock. To confirm my understanding, I'm going to make another patch that records wait_start in the PGPROC. Regards, -- Atsushi Torikoshi
Re: adding wait_start column to pg_locks
2021年1月15日(金) 3:45 Robert Haas : > On Wed, Jan 13, 2021 at 10:40 PM Ian Lawrence Barwick > wrote: > > It looks like the logical place to store the value is in the PROCLOCK > > structure; ... > > That seems surprising, because there's one PROCLOCK for every > combination of a process and a lock. But, a process can't be waiting > for more than one lock at the same time, because once it starts > waiting to acquire the first one, it can't do anything else, and thus > can't begin waiting for a second one. So I would have thought that > this would be recorded in the PROC. > Umm, I think we're at cross-purposes here. The suggestion is to note the time when the process started waiting for the lock in the process's PROCLOCK, rather than in the lock itself (which in the original version of the patch resulted in all processes with an interest in the lock appearing to have been waiting to acquire it since the time a lock acquisition was most recently attempted). As mentioned, I hadn't really ever looked at the lock code before so might be barking up the wrong forest. Regards Ian Barwick -- EnterpriseDB: https://www.enterprisedb.com
Re: adding wait_start column to pg_locks
On Wed, Jan 13, 2021 at 10:40 PM Ian Lawrence Barwick wrote: > It looks like the logical place to store the value is in the PROCLOCK > structure; ... That seems surprising, because there's one PROCLOCK for every combination of a process and a lock. But, a process can't be waiting for more than one lock at the same time, because once it starts waiting to acquire the first one, it can't do anything else, and thus can't begin waiting for a second one. So I would have thought that this would be recorded in the PROC. But I haven't looked at the patch so maybe I'm dumb. -- Robert Haas EDB: http://www.enterprisedb.com
Re: adding wait_start column to pg_locks
Hi 2021年1月4日(月) 15:04 torikoshia : > > On 2021-01-02 06:49, Justin Pryzby wrote: > > On Tue, Dec 15, 2020 at 11:47:23AM +0900, torikoshia wrote: > >> So I'm now thinking about adding a new column in pg_locks which > >> keeps the time at which locks started waiting. > >> > >> Attached a patch. > > > > This is failing make check-world, would you send an updated patch ? > > > > I added you as an author so it shows up here. > > http://cfbot.cputube.org/atsushi-torikoshi.html > > Thanks! > > Attached an updated patch. I took a look at this patch as it seems useful (and I have an item on my bucket list labelled "look at the locking code", which I am not at all familiar with). I tested the patch by doing the following: Session 1: postgres=# CREATE TABLE foo (id int); CREATE TABLE postgres=# BEGIN ; BEGIN postgres=*# INSERT INTO foo VALUES (1); INSERT 0 1 Session 2: postgres=# BEGIN ; BEGIN postgres=*# LOCK TABLE foo; Session 3: postgres=# SELECT locktype, relation, pid, mode, granted, fastpath, wait_start FROM pg_locks WHERE relation = 'foo'::regclass AND NOT granted\x\g\x -[ RECORD 1 ]- locktype | relation relation | 16452 pid| 3513935 mode | AccessExclusiveLock granted| f fastpath | f wait_start | 2021-01-14 12:03:06.683053+09 So far so good, but checking *all* the locks on this relation: postgres=# SELECT locktype, relation, pid, mode, granted, fastpath, wait_start FROM pg_locks WHERE relation = 'foo'::regclass\x\g\x -[ RECORD 1 ]- locktype | relation relation | 16452 pid| 3513824 mode | RowExclusiveLock granted| t fastpath | f wait_start | 2021-01-14 12:03:06.683053+09 -[ RECORD 2 ]- locktype | relation relation | 16452 pid| 3513935 mode | AccessExclusiveLock granted| f fastpath | f wait_start | 2021-01-14 12:03:06.683053+09 shows the RowExclusiveLock granted in session 1 as apparently waiting from exactly the same time as session 2's attempt to acquire the lock, which is clearly not right. Also, if a further session attempts to acquire a lock, we get: postgres=# SELECT locktype, relation, pid, mode, granted, fastpath, wait_start FROM pg_locks WHERE relation = 'foo'::regclass\x\g\x -[ RECORD 1 ]- locktype | relation relation | 16452 pid| 3513824 mode | RowExclusiveLock granted| t fastpath | f wait_start | 2021-01-14 12:05:53.747309+09 -[ RECORD 2 ]- locktype | relation relation | 16452 pid| 3514240 mode | AccessExclusiveLock granted| f fastpath | f wait_start | 2021-01-14 12:05:53.747309+09 -[ RECORD 3 ]- locktype | relation relation | 16452 pid| 3513935 mode | AccessExclusiveLock granted| f fastpath | f wait_start | 2021-01-14 12:05:53.747309+09 i.e. all entries now have "wait_start" set to the start time of the latest session's lock acquisition attempt. Looking at the code, this happens as the wait start time is being recorded in the lock record itself, so always contains the value reported by the latest lock acquisition attempt. It looks like the logical place to store the value is in the PROCLOCK structure; the attached patch reworks your patch to do that, and given the above scenario produces following output: postgres=# SELECT locktype, relation, pid, mode, granted, fastpath, wait_start FROM pg_locks WHERE relation = 'foo'::regclass\x\g\x -[ RECORD 1 ]- locktype | relation relation | 16452 pid| 3516391 mode | RowExclusiveLock granted| t fastpath | f wait_start | -[ RECORD 2 ]- locktype | relation relation | 16452 pid| 3516470 mode | AccessExclusiveLock granted| f fastpath | f wait_start | 2021-01-14 12:19:16.217163+09 -[ RECORD 3 ]- locktype | relation relation | 16452 pid| 3516968 mode | AccessExclusiveLock granted| f fastpath | f wait_start | 2021-01-14 12:18:08.195429+09 As mentioned, I'm not at all familiar with the locking code so it's quite possible that it's incomplete,there may be non-obvious side-effects, or it's the wrong approach altogether etc., so further review is necessary. Regards Ian Barwick -- EnterpriseDB: https://www.enterprisedb.com diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out
Re: adding wait_start column to pg_locks
On 2021-01-02 06:49, Justin Pryzby wrote: On Tue, Dec 15, 2020 at 11:47:23AM +0900, torikoshia wrote: So I'm now thinking about adding a new column in pg_locks which keeps the time at which locks started waiting. Attached a patch. This is failing make check-world, would you send an updated patch ? I added you as an author so it shows up here. http://cfbot.cputube.org/atsushi-torikoshi.html Thanks! Attached an updated patch. Regards,From 608bba31da1bc5d15db991662fa858cd4632d849 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Mon, 4 Jan 2021 09:53:17 +0900 Subject: [PATCH v2] To examine the duration of locks, we did join on pg_locks and pg_stat_activity and used columns such as query_start or state_change. However, since they are the moment when queries have started or their state has changed, we could not get the exact lock duration in this way. This patch adds a new field preserving the time at which locks started waiting. --- contrib/amcheck/expected/check_btree.out | 4 ++-- doc/src/sgml/catalogs.sgml | 9 + src/backend/storage/lmgr/lock.c | 10 ++ src/backend/storage/lmgr/proc.c | 2 ++ src/backend/utils/adt/lockfuncs.c| 9 - src/include/catalog/pg_proc.dat | 6 +++--- src/include/storage/lock.h | 3 +++ src/test/regress/expected/rules.out | 5 +++-- 8 files changed, 40 insertions(+), 8 deletions(-) diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out index 13848b7449..c0aecb0288 100644 --- a/contrib/amcheck/expected/check_btree.out +++ b/contrib/amcheck/expected/check_btree.out @@ -97,8 +97,8 @@ SELECT bt_index_parent_check('bttest_b_idx'); SELECT * FROM pg_locks WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[]) AND pid = pg_backend_pid(); - locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath ---+--+--+--+---++---+-+---+--++-+--+-+-- + locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | wait_start +--+--+--+--+---++---+-+---+--++-+--+-+--+ (0 rows) COMMIT; diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 3a2266526c..626e5672bd 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -10578,6 +10578,15 @@ SCRAM-SHA-256$iteration count: lock table + + + + wait_start timestamptz + + + The time at which lock started waiting + + diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c index 20e50247ea..27969d3772 100644 --- a/src/backend/storage/lmgr/lock.c +++ b/src/backend/storage/lmgr/lock.c @@ -1195,6 +1195,7 @@ SetupLockInTable(LockMethod lockMethodTable, PGPROC *proc, lock->waitMask = 0; SHMQueueInit(&(lock->procLocks)); ProcQueueInit(&(lock->waitProcs)); + lock->wait_start = 0; lock->nRequested = 0; lock->nGranted = 0; MemSet(lock->requested, 0, sizeof(int) * MAX_LOCKMODES); @@ -3627,6 +3628,12 @@ GetLockStatusData(void) instance->leaderPid = proc->pid; instance->fastpath = true; + /* + * Successfully taking fast path lock means there was no + * conflicting locks. + */ + instance->wait_start = 0; + el++; } @@ -3654,6 +3661,7 @@ GetLockStatusData(void) instance->pid = proc->pid; instance->leaderPid = proc->pid; instance->fastpath = true; + instance->wait_start = 0; el++; } @@ -3706,6 +3714,7 @@ GetLockStatusData(void) instance->pid = proc->pid; instance->leaderPid = proclock->groupLeader->pid; instance->fastpath = false; + instance->wait_start = lock->wait_start; el++; } @@ -4183,6 +4192,7 @@ lock_twophase_recover(TransactionId xid, uint16 info, lock->waitMask = 0; SHMQueueInit(&(lock->procLocks)); ProcQueueInit(&(lock->waitProcs)); + lock->wait_start = 0; lock->nRequested = 0; lock->nGranted = 0; MemSet(lock->requested, 0, sizeof(int) * MAX_LOCKMODES); diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c index 57717f666d..56aa8b7f6b 100644 --- a/src/backend/storage/lmgr/proc.c +++ b/src/backend/storage/lmgr/proc.c @@ -1259,6 +1259,8 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable) } else enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout); + + lock->wait_start = get_timeout_start_time(DEADLOCK_TIMEOUT); } /* diff --git a/src/backend/utils/adt/lockfuncs.c b/src/backend/utils/adt/lockfuncs.c index
Re: adding wait_start column to pg_locks
On Tue, Dec 15, 2020 at 11:47:23AM +0900, torikoshia wrote: > So I'm now thinking about adding a new column in pg_locks which > keeps the time at which locks started waiting. > > Attached a patch. This is failing make check-world, would you send an updated patch ? I added you as an author so it shows up here. http://cfbot.cputube.org/atsushi-torikoshi.html -- Justin