Re: perform_spin_delay() vs wait events
Hi, On 2022-11-22 12:51:25 +0900, Michael Paquier wrote: > On Mon, Nov 21, 2022 at 07:01:18PM -0800, Andres Freund wrote: > > I plan to push this soon unless somebody has further comments. > > > @@ -146,7 +146,8 @@ typedef enum > > WAIT_EVENT_RECOVERY_RETRIEVE_RETRY_INTERVAL, > > WAIT_EVENT_REGISTER_SYNC_REQUEST, > > WAIT_EVENT_VACUUM_DELAY, > > - WAIT_EVENT_VACUUM_TRUNCATE > > + WAIT_EVENT_VACUUM_TRUNCATE, > > + WAIT_EVENT_SPIN_DELAY > > } WaitEventTimeout; > > That would be fine for stable branches, but could you keep that in an > alphabetical order on HEAD? Fair point. I wasn't planning to backpatch. Greetings, Andres Freund
Re: perform_spin_delay() vs wait events
On Mon, Nov 21, 2022 at 07:01:18PM -0800, Andres Freund wrote: > I plan to push this soon unless somebody has further comments. > @@ -146,7 +146,8 @@ typedef enum > WAIT_EVENT_RECOVERY_RETRIEVE_RETRY_INTERVAL, > WAIT_EVENT_REGISTER_SYNC_REQUEST, > WAIT_EVENT_VACUUM_DELAY, > - WAIT_EVENT_VACUUM_TRUNCATE > + WAIT_EVENT_VACUUM_TRUNCATE, > + WAIT_EVENT_SPIN_DELAY > } WaitEventTimeout; That would be fine for stable branches, but could you keep that in an alphabetical order on HEAD? -- Michael signature.asc Description: PGP signature
Re: perform_spin_delay() vs wait events
Hi, On 2022-11-22 00:03:23 +0300, Alexander Korotkov wrote: > On Tue, Nov 22, 2022 at 12:01 AM Andres Freund wrote: > > On November 21, 2022 12:58:16 PM PST, Alexander Korotkov > > wrote: > > >On Mon, Nov 21, 2022 at 2:10 AM Andres Freund wrote: > > >+1 for making a group of individual names spin delays. > > > > Personally I'm not interested in doing that work, tbh. > > Oh, then I have no objection to the "as is" state, because it doesn't > exclude the future improvements. But this is still my 2 cents though. I added a note about possibly extending this in the future to both code and commit message. Attached. I plan to push this soon unless somebody has further comments. Greetings, Andres Freund >From 5a83b778bc7fe015af70e8f56039c6cde7071b3f Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Sun, 20 Nov 2022 12:39:14 -0800 Subject: [PATCH v2] Add wait event for pg_usleep() in perform_spin_delay() The lwlock wait queue scalability issue fixed in a4adc31f690 was quite hard to find because of the exponential backoff and because we adjust spins_per_delay over time within a backend. To make it easier to find similar issues in the future, add a wait event for the pg_usleep() in perform_spin_delay(). Showing a wait event while spinning without sleeping would increase the overhead of spinlocks, which we do not want. We may at some later point want to have more granular wait events, but that'd be a substantial amount of work. This provides at least some insights into something currently hard to observe. Reviewed-by: Robert Haas Reviewed-by: Alexander Korotkov https://postgr.es/m/20221120204310.xywrhyxyytsaj...@awork3.anarazel.de --- src/include/utils/wait_event.h | 3 ++- src/backend/storage/lmgr/s_lock.c | 11 +++ src/backend/utils/activity/wait_event.c | 3 +++ doc/src/sgml/monitoring.sgml| 4 4 files changed, 20 insertions(+), 1 deletion(-) diff --git a/src/include/utils/wait_event.h b/src/include/utils/wait_event.h index 6f2d5612e06..3d87d550119 100644 --- a/src/include/utils/wait_event.h +++ b/src/include/utils/wait_event.h @@ -146,7 +146,8 @@ typedef enum WAIT_EVENT_RECOVERY_RETRIEVE_RETRY_INTERVAL, WAIT_EVENT_REGISTER_SYNC_REQUEST, WAIT_EVENT_VACUUM_DELAY, - WAIT_EVENT_VACUUM_TRUNCATE + WAIT_EVENT_VACUUM_TRUNCATE, + WAIT_EVENT_SPIN_DELAY } WaitEventTimeout; /* -- diff --git a/src/backend/storage/lmgr/s_lock.c b/src/backend/storage/lmgr/s_lock.c index 4e473ec27ec..c3227b11abc 100644 --- a/src/backend/storage/lmgr/s_lock.c +++ b/src/backend/storage/lmgr/s_lock.c @@ -53,6 +53,7 @@ #include "common/pg_prng.h" #include "port/atomics.h" #include "storage/s_lock.h" +#include "utils/wait_event.h" #define MIN_SPINS_PER_DELAY 10 #define MAX_SPINS_PER_DELAY 1000 @@ -136,7 +137,17 @@ perform_spin_delay(SpinDelayStatus *status) if (status->cur_delay == 0) /* first time to delay? */ status->cur_delay = MIN_DELAY_USEC; + /* + * Once we start sleeping, the overhead of reporting a wait event is + * justified. Actively spinning easily stands out in profilers, but + * sleeping with an exponential backoff is harder to spot... + * + * We might want to report something more granular at some point, but + * this is better than nothing. + */ + pgstat_report_wait_start(WAIT_EVENT_SPIN_DELAY); pg_usleep(status->cur_delay); + pgstat_report_wait_end(); #if defined(S_LOCK_TEST) fprintf(stdout, "*"); diff --git a/src/backend/utils/activity/wait_event.c b/src/backend/utils/activity/wait_event.c index 92f24a6c9bc..02e953420e5 100644 --- a/src/backend/utils/activity/wait_event.c +++ b/src/backend/utils/activity/wait_event.c @@ -503,6 +503,9 @@ pgstat_get_wait_timeout(WaitEventTimeout w) case WAIT_EVENT_VACUUM_TRUNCATE: event_name = "VacuumTruncate"; break; + case WAIT_EVENT_SPIN_DELAY: + event_name = "SpinDelay"; + break; /* no default case, so that compiler will warn */ } diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index e5d622d5147..781b5d5a6f8 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -2298,6 +2298,10 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser Waiting to acquire an exclusive lock to truncate off any empty pages at the end of a table vacuumed. + + SpinDelay + Waiting while acquiring a contended spinlock. + -- 2.38.0
Re: perform_spin_delay() vs wait events
On Tue, Nov 22, 2022 at 12:01 AM Andres Freund wrote: > On November 21, 2022 12:58:16 PM PST, Alexander Korotkov > wrote: > >On Mon, Nov 21, 2022 at 2:10 AM Andres Freund wrote: > >> On 2022-11-20 17:26:11 -0500, Robert Haas wrote: > >> > On Sun, Nov 20, 2022 at 3:43 PM Andres Freund wrote: > >> > > I couldn't quite decide what wait_event_type to best group this under? > >> > > In the > >> > > attached patch I put it under timeouts, which doesn't seem awful. > >> > > >> > I think it would be best to make it its own category, like we do with > >> > buffer pins. > >> > >> I was wondering about that too - but decided against it because it would > >> only > >> show a single wait event. And wouldn't really describe spinlocks as a > >> whole, > >> just the "extreme" delays. If we wanted to report the spin waits more > >> granular, we'd presumably have to fit the wait events into the lwlock, > >> buffers > >> and some new category where we name individual spinlocks. > > > >+1 for making a group of individual names spin delays. > > Personally I'm not interested in doing that work, tbh. Oh, then I have no objection to the "as is" state, because it doesn't exclude the future improvements. But this is still my 2 cents though. -- Regards, Alexander Korotkov
Re: perform_spin_delay() vs wait events
Hi, On November 21, 2022 12:58:16 PM PST, Alexander Korotkov wrote: >On Mon, Nov 21, 2022 at 2:10 AM Andres Freund wrote: >> On 2022-11-20 17:26:11 -0500, Robert Haas wrote: >> > On Sun, Nov 20, 2022 at 3:43 PM Andres Freund wrote: >> > > I couldn't quite decide what wait_event_type to best group this under? >> > > In the >> > > attached patch I put it under timeouts, which doesn't seem awful. >> > >> > I think it would be best to make it its own category, like we do with >> > buffer pins. >> >> I was wondering about that too - but decided against it because it would only >> show a single wait event. And wouldn't really describe spinlocks as a whole, >> just the "extreme" delays. If we wanted to report the spin waits more >> granular, we'd presumably have to fit the wait events into the lwlock, >> buffers >> and some new category where we name individual spinlocks. > >+1 for making a group of individual names spin delays. Personally I'm not interested in doing that work, tbh. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: perform_spin_delay() vs wait events
On Mon, Nov 21, 2022 at 2:10 AM Andres Freund wrote: > On 2022-11-20 17:26:11 -0500, Robert Haas wrote: > > On Sun, Nov 20, 2022 at 3:43 PM Andres Freund wrote: > > > I couldn't quite decide what wait_event_type to best group this under? In > > > the > > > attached patch I put it under timeouts, which doesn't seem awful. > > > > I think it would be best to make it its own category, like we do with > > buffer pins. > > I was wondering about that too - but decided against it because it would only > show a single wait event. And wouldn't really describe spinlocks as a whole, > just the "extreme" delays. If we wanted to report the spin waits more > granular, we'd presumably have to fit the wait events into the lwlock, buffers > and some new category where we name individual spinlocks. +1 for making a group of individual names spin delays. -- Regards, Alexander Korotkov
Re: perform_spin_delay() vs wait events
On Sun, Nov 20, 2022 at 6:10 PM Andres Freund wrote: > I was wondering about that too - but decided against it because it would only > show a single wait event. And wouldn't really describe spinlocks as a whole, > just the "extreme" delays. If we wanted to report the spin waits more > granular, we'd presumably have to fit the wait events into the lwlock, buffers > and some new category where we name individual spinlocks. > > But I guess a single spinlock wait event type with an ExponentialBackoff wait > event or such wouldn't be too bad. Oh, hmm. I guess it is actually bracketing a timed wait, now that I look closer at what you did. So perhaps your first idea was best after all. -- Robert Haas EDB: http://www.enterprisedb.com
Re: perform_spin_delay() vs wait events
Hi, On 2022-11-20 17:26:11 -0500, Robert Haas wrote: > On Sun, Nov 20, 2022 at 3:43 PM Andres Freund wrote: > > I couldn't quite decide what wait_event_type to best group this under? In > > the > > attached patch I put it under timeouts, which doesn't seem awful. > > I think it would be best to make it its own category, like we do with > buffer pins. I was wondering about that too - but decided against it because it would only show a single wait event. And wouldn't really describe spinlocks as a whole, just the "extreme" delays. If we wanted to report the spin waits more granular, we'd presumably have to fit the wait events into the lwlock, buffers and some new category where we name individual spinlocks. But I guess a single spinlock wait event type with an ExponentialBackoff wait event or such wouldn't be too bad. Greetings, Andres Freund
Re: perform_spin_delay() vs wait events
On Sun, Nov 20, 2022 at 3:43 PM Andres Freund wrote: > The lwlock wait queue scalability issue [1] was quite hard to find because of > the exponential backoff and because we adjust spins_per_delay over time within > a backend. > > I think the least we could do to make this easier would be to signal spin > delays as wait events. We surely don't want to do so for non-contended spins > because of the overhead, but once we get to the point of calling pg_usleep() > that's not an issue. > > I don't think it's worth trying to hand down more detailed information about > the specific spinlock we're waiting on, at least for now. We'd have to invent > a whole lot of new wait events because most spinlocks don't have ones yet. > > I couldn't quite decide what wait_event_type to best group this under? In the > attached patch I put it under timeouts, which doesn't seem awful. I think it would be best to make it its own category, like we do with buffer pins. -- Robert Haas EDB: http://www.enterprisedb.com
perform_spin_delay() vs wait events
Hi, The lwlock wait queue scalability issue [1] was quite hard to find because of the exponential backoff and because we adjust spins_per_delay over time within a backend. I think the least we could do to make this easier would be to signal spin delays as wait events. We surely don't want to do so for non-contended spins because of the overhead, but once we get to the point of calling pg_usleep() that's not an issue. I don't think it's worth trying to hand down more detailed information about the specific spinlock we're waiting on, at least for now. We'd have to invent a whole lot of new wait events because most spinlocks don't have ones yet. I couldn't quite decide what wait_event_type to best group this under? In the attached patch I put it under timeouts, which doesn't seem awful. I reverted a4adc31f690 and indeed it shows SpinDelay wait events before the fix and not after. Greetings, Andres Freund [1] https://postgr.es/m/20221027165914.2hofzp4cvutj6gin%40awork3.anarazel.de >From 6d7c62e5ebbe9bf906e5f4e1682ece6fff69cd37 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Sun, 20 Nov 2022 12:39:14 -0800 Subject: [PATCH] Add WAIT_EVENT_SPIN_DELAY --- src/include/utils/wait_event.h | 3 ++- src/backend/storage/lmgr/s_lock.c | 8 src/backend/utils/activity/wait_event.c | 3 +++ doc/src/sgml/monitoring.sgml| 4 4 files changed, 17 insertions(+), 1 deletion(-) diff --git a/src/include/utils/wait_event.h b/src/include/utils/wait_event.h index 6f2d5612e06..3d87d550119 100644 --- a/src/include/utils/wait_event.h +++ b/src/include/utils/wait_event.h @@ -146,7 +146,8 @@ typedef enum WAIT_EVENT_RECOVERY_RETRIEVE_RETRY_INTERVAL, WAIT_EVENT_REGISTER_SYNC_REQUEST, WAIT_EVENT_VACUUM_DELAY, - WAIT_EVENT_VACUUM_TRUNCATE + WAIT_EVENT_VACUUM_TRUNCATE, + WAIT_EVENT_SPIN_DELAY } WaitEventTimeout; /* -- diff --git a/src/backend/storage/lmgr/s_lock.c b/src/backend/storage/lmgr/s_lock.c index 4e473ec27ec..b04930db760 100644 --- a/src/backend/storage/lmgr/s_lock.c +++ b/src/backend/storage/lmgr/s_lock.c @@ -53,6 +53,7 @@ #include "common/pg_prng.h" #include "port/atomics.h" #include "storage/s_lock.h" +#include "utils/wait_event.h" #define MIN_SPINS_PER_DELAY 10 #define MAX_SPINS_PER_DELAY 1000 @@ -136,7 +137,14 @@ perform_spin_delay(SpinDelayStatus *status) if (status->cur_delay == 0) /* first time to delay? */ status->cur_delay = MIN_DELAY_USEC; + /* + * Once we start sleeping, the overhead of reporting a wait event is + * justified. Actively spinning easily stands out in profilers, but + * sleeping with an exponential backoff is harder to spot... + */ + pgstat_report_wait_start(WAIT_EVENT_SPIN_DELAY); pg_usleep(status->cur_delay); + pgstat_report_wait_end(); #if defined(S_LOCK_TEST) fprintf(stdout, "*"); diff --git a/src/backend/utils/activity/wait_event.c b/src/backend/utils/activity/wait_event.c index 92f24a6c9bc..02e953420e5 100644 --- a/src/backend/utils/activity/wait_event.c +++ b/src/backend/utils/activity/wait_event.c @@ -503,6 +503,9 @@ pgstat_get_wait_timeout(WaitEventTimeout w) case WAIT_EVENT_VACUUM_TRUNCATE: event_name = "VacuumTruncate"; break; + case WAIT_EVENT_SPIN_DELAY: + event_name = "SpinDelay"; + break; /* no default case, so that compiler will warn */ } diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index e5d622d5147..0599eac71a6 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -2298,6 +2298,10 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser Waiting to acquire an exclusive lock to truncate off any empty pages at the end of a table vacuumed. + + SpinDelay + Waiting to acquire a contended spinlock. + -- 2.38.0