Hi Andres,

On Fri, May 29, 2026 at 5:56 PM Andres Freund <[email protected]> wrote:
>
> Hi,
>
> On 2026-05-29 13:38:17 +0200, Jakub Wartak wrote:
> > On Fri, May 29, 2026 at 11:51 AM Nazir Bilal Yavuz <[email protected]> 
> > wrote:
> > [..]
> > Hi, thanks to everybody for working on this.
> >
> > > https://github.com/nbyavuz/postgres/actions/runs/26628396798
> >
> > Windows (runs-on: windows-2022) seems kind of slow isn't it ?
> >
> > Maybe that's not related to the patch itself, but any idea why the windows
> > tests are so slow? Or will we able to somehow accelerate those?
> >
> > Windows - VS - Meson & ninja / succeeded [..] minutes ago in 31m 28s
> >
> > Processor(s):              1 Processor(s) Installed.
> > [..]
> > Total Physical Memory:     16,379 MB
> > [..]
> >
> > but:
> > NUMBER_OF_PROCESSORS=4
> > [..]
> > +      TEST_JOBS: 8
> >
> > vs
> >
> > 392/396 test_json_parser - postgresql:test_json_parser/002_inline
> >                  OK              152.56s   3712 subtests passed
> > 393/396 pgbench - postgresql:pgbench/001_pgbench_with_server
> >                  OK              574.61s   474 subtests passed
> > 394/396 pg_rewind - postgresql:pg_rewind/002_databases
> >                  OK              772.86s   10 subtests passed
> > 395/396 pg_waldump - postgresql:pg_waldump/001_basic
> >                  OK              771.19s   156 subtests passed
> > 396/396 libpq_pipeline - postgresql:libpq_pipeline/001_libpq_pipeline
> >                  OK              395.76s   23 subtests passed
> >
> > while last CirrusCI run for me for Windows took 19min 21s (4 CPUs / 4 GBs,
> > but sysinfo reported there "Total Physical Memory: 16,380 MB").
>
> The difference here likely is due to the different type of CPU cores. On
> cirrus, we got 4 non-SMT cores (because the type of CPU used didn't use SMT),
> whereas on GHA we have 4 hardware threads, but only two real cores.
>
>
> > If that's IO traffic as Andres described, maybe we could enable feature
> > called "Turn off Windows write-cache buffer flushing on the device"
> > in device manager -> disk -> policies, but dunno how much that would
> > help really as we seem to be already using fsync=off, maybe it helps
> > when saving other files too (???)
>
> I think I was wrong about IO being the main issue. I've measured the CPU
> utilization during a linux run, and basically it's 100% busy during the whole
> test run (baring the first and last few seconds).  Which does seem to mainly
> point to the difference being simply that we just have half the real cores as
> we had before.
>
> I do see higher %sys CPU utilization than I'd expect, so that may be worth
> investigating.

So I've spent half of day on trying to see what makes the tests so slow at
least in my case. I can also confirm %CPU combined (with high 33% sys).

0. baseline was ~71s (stuff already hot)
1a. down to 64s with dirtywriteback tune (and mostly to avoid NVMe/SSD wear)
1b. ~65s with tmpfs, so I've left using dirtywriteback sysctls:
    sudo mount -t tmpfs -o size=4G,uid=XXX,mode=755  tmpfs build/tmp_install
    sudo mount -t tmpfs -o size=16G,uid=XXX,mode=755 tmpfs /build/testrun
2. Splitting the tests (isolation, 027_stream_regress, pg_upgrade) into 4
   parallel streams of each did not help much (they are longest ones)
3. I've spotted the falcon-sensor (EDR agent, using eBPF) very busy, so
   I've shut it down, got the duratiion down to 43s.
4. Still for that 43s dominant factor was the mmap/page-fault/PTEs related
   to the number of backends we spawn. Literally later when I put
   Claude  to work he said to me this "Backend startup costs roughly 2.5x
   as much as the actual queries". And later when I've pushed to count using
   log_connections it said "Got 24,903 total connections in 46 s = 541
   backend forks/second." and got this top report:
     8,610   subscription      - 35 % of all connections in the suite
     4,382   recovery          - 18 %
     1,100   pg_upgrade
       896   isolation
       694   pg_dump
       682   pg_basebackup

    Fixing above subscription to ~5000 conns did not gain much (well it saved
    5% of runtime 43s -> 41s). It's literally 10k lines of
    s/$node_subscriber->safe_psql/sub_bg->query_safe/g across dozens of files
    in src/test/subscription/t/). Too big for review and I'm not sharing as
    it could contain errors.

5. Spotted that we do plenty of initdb and cached-initdb (cp), so I had idea
   about XFS's cp reflinks=always in build/, but I couldn't do that without
   /dev/loop, so apparently XFS (reflink=1) vs ext4(reflink=0) halves number
   of writes while even still on /dev/loop device, but that somehow
   does not directly contribute to duration of the test (well we are
   bottlenecked on CPU anyway, so this is just smarter? way of avoiding I/O;
   maybe with cold-caches and on real VMs running with XFS would be faster)

   +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
   @@ -687,7 +687,13 @@ sub init
                  }
                  else
                  {
   -                       @copycmd = qw(cp -RPp);
   +                       @copycmd = qw(cp --reflink=always -RPp);


Other interesting ideas: pg_regress with built-in connection pool (IMHO not
worth it), mitigations=off (to avoid syscalls being taxed, got not
improvement with this).

As for the Windows, I don't have better idea than the just avoid I/O if possible
("Turn off Windows write-cache buffer flushing on the device"), sorry(!), and
maybe throwing in more bigger box... ;]

-J.


Reply via email to