On Sun, May 31, 2026 at 4:50 AM 신성준 <[email protected]> wrote:

> Hi hackers,
>
> The write(2) calls that flush server log output aren't covered by wait
> events. When a backend logs something, the writes go out in:
>
>   - write_pipe_chunks(): write(2) to the syslogger pipe
>   - write_console(): write(2) to stderr (WriteConsoleW() on Windows)
>
> If one of those blocks -- syslogger pipe full, slow console, slow log
> device -- pg_stat_activity just shows wait_event = NULL until it
> returns. Since NULL usually reads as "on CPU", a backend stuck writing
> logs looks like it's doing work, so logging-related stalls are easy to
> miss.
>
> Attached is a short series that adds two WaitEventIO events and reports
> them around those writes:
>
>   IO / SysloggerWrite - write(2) to the syslogger pipe
>   IO / StderrWrite - write(2) to stderr, and WriteConsoleW()
>
> 0001 adds the events and covers the write(2) paths. 0002 does the
> Windows WriteConsoleW() path, split out since it's platform-specific.
>
> It only wraps the leaf write call and uses the existing
> pgstat_report_wait_start()/end() helpers, so it stays allocation-free
> and safe to call from inside the error-reporting path.
>
> I did a quick before/after to make sure the events show up: 8 backends
> each emitting large RAISE LOG lines, sampling wait_event from
> pg_stat_activity every 50 ms for 20 s.
>
>   - logging_collector = on (syslogger pipe):
>     master:  NULL                100.0%  (2184/2184)
>     patched: IO/SysloggerWrite     99.1%  (2204/2224), NULL 0.9%
>
>   - logging_collector = off (stderr):
>     master:  NULL                100.0%  (2144/2144)
>     patched: IO/StderrWrite        90.7%  (1952/2152), NULL 9.3%
>
> On master that wait time is just invisible; with the patch it lands on
> the new events. I can send the scripts and raw samples if anyone wants
> to reproduce it.
>
+1
  Nice.  We have too many waits that are registered as CPU.

>
>

Reply via email to