Hi Nazir/Andres,

On Tue, Jun 2, 2026 at 12:13 PM Jakub Wartak
<[email protected]> wrote:
>
> Hi Andres/Nazir,
>
[..]
> Continuing on previous story...:
> Windows was still @ 31mins, and whatever I've tried it is was not helping it
> (but I cannot measure inside GHA Runner what was happening, so those were 
> blind
> shots with fstweaks, etc). One important thing, altough I failed altering
> CacheIsPowerProtected (avoid flushing the write cache) as it seems impossible
> for me to do so on D:\ (as paging file is there and and altering it also
> requires reboot), at least we know stuff is way slower than it could be on
> those runners:
>
> "Get-PhysicalDisk | Get-StorageAdvancedProperty" reported:
>
> FriendlyName      SerialNumber IsPowerProtected IsDeviceCacheEnabled
> ------------      ------------ ---------------- --------------------
> Msft Virtual Disk                         False                False
> Msft Virtual Disk                         False                False
>
> Perhaps there's way to use some custom image/templ with different settings,
> especially for D:\, after all it's just volatile stuff. Thoughts? (not that I
> care that much for Win, but waiting half hour for it finish every time is
> not going to be nice...)
>
[..]

OK, so to close the loop: does no no-write-flushing (and ReFS) can help us here?
I've made it work, but the possible configuration is just slower (just
"Test run"
step) by +2mins (26vs28 mins) :(

Longer:
* This is windows 2022 server, so ReFS (MS next-gen fs) is available.
Technically
  robocopy should do CoW (for our initdb clones out there).
* D:\ cannot cannot be reformatted from NTFS as ReFS mainly due to
active pagefile
  and github agent places files there too.
* But (!) one can make loop-image on D:\ with ReFS (sic!)
* And disable write-cache-flushing with some hacks (usually used with
RAID cards with
  BBU)

And I've bumped TEST_JOBS 4->8 (even with 4 VCPUs), because my local
runs showed in
taskmgr that after quite some time we have ended up using just ~40%
CPU (also with
4 VCPUs) while not doing I/O (this is somehow contrary to what Andres
was stating
earlier). I cannot find way to add observability of CPU usage on GHA runner, so
just gonna leave it as that (but before anybody wishes to add more CPU it would
actually help if such workload on GHA is really on CPU or I/O there).

So it appears that without going into the dragon's den (I mean deeply
analyzing our
tests, especially subscription and recovery), we won't gain much in such setup.

Patch attached if anybody wants to experiment more.

-J.
From b12be7baf025287752b365cb59861f8d54fe2c0a Mon Sep 17 00:00:00 2001
From: Jakub Wartak <[email protected]>
Date: Tue, 2 Jun 2026 11:43:56 +0200
Subject: [PATCH v1] Try ReFS

ci-os-only: windows
---
 .github/workflows/postgresql-ci.yml | 52 ++++++++++++++++++++++++-----
 1 file changed, 44 insertions(+), 8 deletions(-)

diff --git a/.github/workflows/postgresql-ci.yml 
b/.github/workflows/postgresql-ci.yml
index e2795ca0ffb..971fb9a705b 100644
--- a/.github/workflows/postgresql-ci.yml
+++ b/.github/workflows/postgresql-ci.yml
@@ -28,7 +28,7 @@ env:
 
   # It's possible that some jobs benefit from an increased test concurrency,
   # but a default of 4 is a safe bet. Individual jobs can override.
-  TEST_JOBS: 4
+  TEST_JOBS: 8
 
   CCACHE_MAXSIZE: "250M"
   CCACHE_DIR: ${{ github.workspace }}/ccache_dir
@@ -45,6 +45,7 @@ env:
 
   # Can be set to a non-empty value to run a limited set of tests
   # (e.g. --suite regress to only run the main regression tests).
+#  MTEST_TARGET: --suite regress --suite postgresql:recovery
   MTEST_TARGET:
 
   PGCTLTIMEOUT: 120  # avoids spurious failures during parallel tests
@@ -134,6 +135,9 @@ jobs:
       - &nix_sysinfo_step
         name: sysinfo
         run: |
+          mount
+          lsblk -O
+          ps auxww
           id
           uname -a
           ulimit -a -H && ulimit -a -S
@@ -307,10 +311,11 @@ jobs:
         with:
           name: logs-${{ github.job }}-${{ github.run_id }}-${{ 
github.run_attempt }}
           path: |
-              **/*.log
-              **/*.diffs
-              **/regress_log_*
-              **/crashlog-*.txt
+              # avoids R:/System Volume Information (EINVAL)
+              R:/build/**/*.log
+              R:/build/*/*.diffs
+              R:/build/**/regress_log_*
+              R:/build/**/crashlog-*.txt
           if-no-files-found: ignore
 
 
@@ -683,8 +688,35 @@ jobs:
         name: Disable Windows Defender
         shell: powershell
         run: |
+          $diskpartScript = @"
+          create vdisk file="D:\a\refs.vhd" maximum=16000 type=expandable
+          attach vdisk
+          create partition primary
+          format fs=refs quick
+          assign letter=R
+          "@
+          $diskpartScript | diskpart
+          Get-Volume -DriveLetter R
+
+          $DiskNumber = (Get-Partition -DriveLetter R).DiskNumber
+          $PnpId = (Get-CimInstance Win32_DiskDrive | Where-Object { 
$_.DeviceID -match "PhysicalDrive$DiskNumber" }).PNPDeviceID
+          $RegPath = "HKLM:\SYSTEM\CurrentControlSet\Enum\$PnpId\Device 
Parameters\Disk"
+          if (-not (Test-Path $RegPath)) {
+              New-Item -Path $RegPath -Force | Out-Null
+          }
+
+          # 4. Turn off write-cache buffer flushing (CacheAttributes = 1 tells 
Windows to ignore OS flush requests)
+          Set-ItemProperty -Path $RegPath -Name "CacheAttributes" -Value 1 
-Type DWord
+          Set-ItemProperty -Path $RegPath -Name "WriteCacheSetting" -Value 1 
-Type DWord
+
+          Set-Disk -Number $DiskNumber -IsOffline $true
+          Set-Disk -Number $DiskNumber -IsOffline $false
+          Write-Host "Success: Force-flushing disabled for Drive R: (Disk 
$DiskNumber)" -ForegroundColor Green
+          Get-PhysicalDisk | Where-Object { $_.DeviceID -eq $DiskNumber } | 
Get-StorageAdvancedProperty
+
           Set-MpPreference -DisableRealtimeMonitoring $true 
-SubmitSamplesConsent NeverSend -MAPSReporting Disable
           # Verify Defender status
+          Get-PhysicalDisk | Get-StorageAdvancedProperty
           $status = Get-MpComputerStatus -ErrorAction SilentlyContinue
           if ($status) {
               Write-Host "RealTimeProtectionEnabled: 
$($status.RealTimeProtectionEnabled)"
@@ -719,6 +751,9 @@ jobs:
         run: |
           icacls "${{ github.workspace }}" /grant "${env:USERNAME}:(OI)(CI)F" 
/Q | Out-Null
           Write-Host "Granted Full Control to $env:USERNAME on ${{ 
github.workspace }}"
+          mkdir R:\build
+          icacls "R:\build" /grant "${env:USERNAME}:(OI)(CI)F" /Q | Out-Null
+          Write-Host "Granted Full Control to $env:USERNAME on R:\build"
 
       # postgres' plpython3u loads python3.dll (the stable-ABI forwarder)
       # which in turn loads whichever python3NN.dll the Windows loader finds
@@ -792,18 +827,19 @@ jobs:
             -Db_pch=true ^
             -Dextra_lib_dirs=d:\openssl\1.1\lib 
-Dextra_include_dirs=d:\openssl\1.1\include ^
             -DTAR=${{env.TAR}} ^
-            build
+            R:/build
 
       - name: Build
         run: |
           call "C:\Program Files\Microsoft Visual 
Studio\2022\Enterprise\VC\Auxiliary\Build\vcvarsall.bat" x64
-          ninja -C build ${{env.MBUILD_TARGET}}
-          ninja -C build -t missingdeps
+          ninja -C R:/build ${{env.MBUILD_TARGET}}
+          ninja -C R:/build -t missingdeps
 
       - name: Test world
         env:
           ADDITIONAL_SETUP: |
             call "C:\Program Files\Microsoft Visual 
Studio\2022\Enterprise\VC\Auxiliary\Build\vcvarsall.bat" x64
+            R:
         run: *meson_test_world_cmd
 
       # FIX: We need to collect crashlogs but they are not collected. cdb.exe
-- 
2.43.0

Reply via email to