Hi, On 2023-04-24 18:36:24 -0400, Melanie Plageman wrote: > On Mon, Apr 24, 2023 at 6:13 PM Andres Freund <and...@anarazel.de> wrote: > > > Also, it seems like this (given the current code) is only reachable for > > > permanent relations (i.e. not for IO object temp relation). If other > > backend > > > types than checkpointer may call smgrwriteback(), we likely have to > > consider > > > the IO context. > > > > I think we should take it into account - it'd e.g. interesting to see a > > COPY > > is bottlenecked on smgrwriteback() rather than just writing the data. > > > > With the quick and dirty attached patch and using your example but with a > pgbench -T200 on my rather fast local NVMe SSD, you can still see quite > a difference.
Quite a difference between what? What scale of pgbench did you use? -T200 is likely not a good idea, because a timed checkpoint might "interfere", unless you use a non-default checkpoint_timeout. A timed checkpoint won't show the issue as easily, because checkpointer spend most of the time sleeping. > This is with a stats reset before the checkpoint. > > unpatched: > > backend_type | object | context | writes | write_time | > fsyncs | fsync_time > ---------------------+---------------+-----------+---------+------------+---------+------------ > background writer | relation | normal | 443 | 1.408 | > 0 | 0 > checkpointer | relation | normal | 187804 | 396.829 | > 47 | 254.226 > > patched: > > backend_type | object | context | writes | write_time > | fsyncs | fsync_time > ---------------------+---------------+-----------+---------+--------------------+--------+------------ > background writer | relation | normal | 917 | > 4.4670000000000005 | 0 | 0 > checkpointer | relation | normal | 375798 | > 977.354 | 48 | 202.514 > > I did compare client backend stats before and after pgbench and it made > basically no difference. I'll do a COPY example like you mentioned. > Patch needs cleanup/comments and a bit more work, but I could do with > a sanity check review on the approach. I was thinking we'd track writeback separately from the write, rather than attributing the writeback to "write". Otherwise it looks good, based on a quick skim. Greetings, Andres Freund