Re: pg_stat_io not tracking smgrwriteback() is confusing

Andres Freund Mon, 24 Apr 2023 14:14:59 -0700

Hi,

On 2023-04-24 16:39:36 -0400, Melanie Plageman wrote:
> On Wed, Apr 19, 2023 at 10:23:26AM -0700, Andres Freund wrote:
> > Hi,
> > 
> > I noticed that the numbers in pg_stat_io dont't quite add up to what I
> > expected in write heavy workloads. Particularly for checkpointer, the 
> > numbers
> > for "write" in log_checkpoints output are larger than what is visible in
> > pg_stat_io.
> > 
> > That partially is because log_checkpoints' "write" covers way too many 
> > things,
> > but there's an issue with pg_stat_io as well:
> > 
> > Checkpoints, and some other sources of writes, will often end up doing a lot
> > of smgrwriteback() calls - which pg_stat_io doesn't track. Nor do any
> > pre-existing forms of IO statistics.
> > 
> > It seems pretty clear that we should track writeback as well. I wonder if 
> > it's
> > worth doing so for 16? It'd give a more complete picture that way. The
> > counter-argument I see is that we didn't track the time for it in existing
> > stats either, and that nobody complained - but I suspect that's mostly 
> > because
> > nobody knew to look.
> 
> Not complaining about making pg_stat_io more accurate, but what exactly
> would we be tracking for smgrwriteback()? I assume you are talking about
> IO timing. AFAICT, on Linux, it does sync_file_range() with
> SYNC_FILE_RANGE_WRITE, which is asynchronous. Wouldn't we just be
> tracking the system call overhead time?


It starts blocking once "enough" IO is in flight. For things like an immediate
checkpoint, that can happen fairly quickly, unless you have a very fast IO
subsystem. So often it'll not matter whether we track smgrwriteback(), but
when it matter, it can matter a lot.

As an example, I inited' a pgbench w/ scale 1000, on a decent but not great
NVMe SSD. Created dirty data with:

  c=96;/srv/dev/build/m-opt/src/bin/pgbench/pgbench --random-seed=0 -n -M 
prepared -c$c -j$c -T30 -P1
and then measured the checkpoint:
  perf trace -s -p $pid_of_checkpointer psql -x -c "SELECT 
pg_stat_reset_shared('io')" -c "checkpoint"

 postgres (367444), 1891280 events, 100.0%

   syscall            calls  errors  total       min       avg       max       
stddev
                                     (msec)    (msec)    (msec)    (msec)       
 (%)
   --------------- --------  ------ -------- --------- --------- ---------     
------
   sync_file_range   359176      0  4560.670     0.002     0.013   238.955     
10.36%
   pwrite64          582964      0  2874.634     0.003     0.005     0.156      
0.06%
   fsync                242      0   251.631     0.001     1.040    42.166     
18.81%
   openat               317     65     2.171     0.002     0.007     0.068      
5.69%
   rename                69      0     1.973     0.012     0.029     0.084      
5.81%
   fdatasync              1      0     1.543     1.543     1.543     1.543      
0.00%
   unlink               150    137     1.278     0.002     0.009     0.062     
10.69%
   close                250      0     0.694     0.001     0.003     0.007      
3.14%
   newfstatat           140     68     0.667     0.001     0.005     0.022      
7.26%
   write                  5      0     0.067     0.005     0.013     0.025     
24.55%
   lseek                 14      0     0.050     0.001     0.004     0.018     
33.87%
   getdents64             8      0     0.047     0.002     0.006     0.022     
39.51%
   kill                   3      0     0.029     0.008     0.010     0.011     
10.18%
   epoll_wait             2      1     0.006     0.000     0.003     0.006    
100.00%
   read                   1      0     0.004     0.004     0.004     0.004      
0.00%

Log output:

2023-04-24 14:11:59.234 PDT [367444][checkpointer][:0][] LOG:  checkpoint 
starting: immediate force wait
2023-04-24 14:12:09.236 PDT [367444][checkpointer][:0][] LOG:  checkpoint 
complete: wrote 595974 buffers (28.4%); 0 WAL file(s) added, 0 removed, 68 
recycled; write=9.740 s, sync=0.057 s, total=10.002 s; sync files=27, 
longest=0.043 s, average=0.003 s; distance=4467386 kB, estimate=4467386 kB; 
lsn=6/E5D33F98, redo lsn=6/E5D33F28


# SELECT writes, write_time, fsyncs, fsync_time FROM pg_stat_io WHERE 
backend_type = 'checkpointer';
┌────────┬────────────────────┬────────┬────────────┐
│ writes │     write_time     │ fsyncs │ fsync_time │
├────────┼────────────────────┼────────┼────────────┤
│ 595914 │ 4002.1730000000002 │     24 │     46.359 │
└────────┴────────────────────┴────────┴────────────┘


Greetings,

Andres Freund

Re: pg_stat_io not tracking smgrwriteback() is confusing

Reply via email to