Re: pg_stat_io_histogram

Andres Freund Thu, 29 Jan 2026 08:27:49 -0800

Hi,

On 2026-01-28 12:12:10 +0100, Jakub Wartak wrote:
> On Tue, Jan 27, 2026 at 1:06 PM Jakub Wartak <[email protected]>
> > Not yet, I first wanted to hear if I'm not sailing into some plain stupid
> > direction somewhere with this idea or implementation (e.g.
> > that INSTR_TIME_GET_MICROSEC() was a really stupid omission from my side).
> >
> > I'll try to perform this test overhead measurement hopefully with v3 once
> > we settle on how to do that bit shifting/clz().
> >
>
> [..]
> Here's the answer: on properly isolated perf test run (my
> old&legacy&predictiable
> 4s32c64t NUMA box, s_b=8GB, DB size 16GB, hugepages, no turboboost, proper
> warmup,
> no THP, cpupower D0, no physical I/O, ~22k pread64() calls/sec combined to
> VFS
> cache)
>     and started on just using single NUMA: numactl --membind=0
> --cpunodebind=0
>     measured using: pgbench -M prepared -c 4 -j 4 postgres -T 20 -P 1 -S
>
> master+track_io_timings=on, 60s warmup and then 3x runs
>     tps = 44615.603668
>     tps = 44556.191492
>     tps = 44813.793981
>     avg = 44662
>
> master+track_io_timings=on+patch, , 60s warmup and then 3x runs
>     tps = 44441.879384
>     tps = 44403.101737
>     tps = 45036.747418
>     avg = 44627
>
> so that's like 99.921% (so literally no overhead) and yields picture like:


I don't think that's a particularly useful assurance, unfortunately:

1) Using pgbench with an in-memory readonly workload is typically limited by
   context switch overhead and per-statement overhead. After a short while you
   have at most one IO per statement (the heap page), which obviously isn't
   going to be affected by a small per-IO overhead.

2) The per-core memory bandwidth on that old machine, if it's the quite old
   EDB machine I think it is, is so low, that you'd be bottlenecked by memory
   bandwidth well before you're going to be bottlenecked by actual CPU stuff
   (which the bucket computation is).

I think you'd have to test something like pg_prewarm(), with
io_combine_limit=1, on a modern *client* CPU (client CPUs typically have much
higher per-core memory bandwidth than the more scalable server CPUs).

Greetings,

Andres Freund

Re: pg_stat_io_histogram

Reply via email to