Hi Jakub, thanks for your valuable feedback.
> - on the flamegraphs BufferAlloc->GetVictimBuffer->FlushBuffer() is visible
> often in both scenarios p0/p1 . I recall from my old WAL IO prefetching stress
> tests experiments [1] that tuning bgwriter was playing some important role
> (most of the time bgwriter values are left on default). It's connected to
> the pipelining idea too for sure, and somehow we are not effective on it
> based on Your's and mine results and some other field reports. Basically we
> have wrong defaults for bgwriter on standbys, and no-one is going to notice
> unless you measure it in-depth. We could also tweak the bgwriter in standby
> role to make the pipelining more effective that way too, but maybe I am
> missing
> something? Maybe bgwriter on standby should work full-steam instead of being
> rate limited as long as there is work to be done (pressure) (?) It's not a
> critique of the patch, but maybe have guys investigated that road too as it is
> related to the pipelining concept?
>
Ok thanks, I will look into that.
> - there's plenty of ValidXLogRecord->pg_comp_crc32c_sb8(), probably even too
> much, so I think either old CPU was used or something happened that SSE4.2
> was not available (pg_comp_crc32c_sse42()). I'm speculating, but probably
> you would even getter results from the patch (which are impressive anyway)
> by using pg_comp_crc32c_sse42() because it wouldn't be such a
> bottleneck anymore.
>
As mentioned in the benchmarks pdf the cpu was used `CPU: Common KVM
(4) @ 2.593GHz`. I was using
a virtual machine. Maybe that could be the reason for this bottleneck.
I will do some more research on it.
> - The flamegraphs itself are for whole Postgresql, right or am I
> misunderstanding it? Probably in the long run it would be better to have just
> the PID of startup/recovering (but that's way harder to script for sure,
> due to the need for isolated PID: perf record -p <PID>)
>
Yes the flame graphs were for the whole postgresql, I will try to
improve the script to make it only for the recovering processes.
> - You need to rebase due to due to 1eb09ed63a8 from couple of days ago:
> CONFLICT (content): Merge conflict in src/backend/postmaster/bgworker.c
>
Noted, thanks.
> - naming: I don't have anything against wal_pipeline.c, it's just that
> the suite of C files is named src/backend/access/transam/xlog*.c
> (so perhaps xlogrecoverypipeline.c (?) to stay consistent)
>
I agree, I will change that in the next patch. Maybe `xlogpipeline.c`
would be enough.
> - I'm was getting lots of problems during linking (e.g. undefined reference to
> `WalPipeline_IsActive') and it appears that the patch is missing
> meson support:
>
> --- a/src/backend/access/transam/meson.build
> +++ b/src/backend/access/transam/meson.build
> @@ -25,6 +25,7 @@ backend_sources += files(
> 'xlogstats.c',
> 'xlogutils.c',
> 'xlogwait.c',
> +'wal_pipeline.c'
>
Noted, thanks. This was some initial efforts and I still haven't
tested on windows env.
>
> - nitpicking, but you need to add wal_pipeline to postgresql.conf.sample
> otherwise 003_check_guc.pl test fails.
>
Got it.
> - good news is that test with PG_TEST_EXTRA="wal_consistency_checking" passes
>
> - but bad news is that PG_TEST_EXTRA="wal_consistency_checking" with
> PG_TEST_INITDB_EXTRA_OPTS="-c wal_pipeline=on" which actives all of this
> does
> not multiple tests (possibly I look for those in the following days,
> unless you are faster)
>
> - at least here, fresh after pg_basebackup -R for building standby and
> restart with wal_pipeline=on
> I've got bug:
> 2026-02-04 10:29:40.146 CET [335197] LOG: [walpipeline] started.
> 2026-02-04 10:29:40.146 CET [335197] LOG: redo starts at 0/02000028
> 2026-02-04 10:29:40.147 CET [335198] LOG: invalid record length
> at 0/03000060: expected at least 24, got 0
> 2026-02-04 10:29:40.147 CET [335197] LOG: consistent recovery
> state reached at 0/03000060
> 2026-02-04 10:29:40.147 CET [335197] LOG: [walpipeline] consumer:
> received shutdown message from the producer
> 2026-02-04 10:29:40.147 CET [335191] LOG: database system is
> ready to accept read-only connections
> 2026-02-04 10:29:40.157 CET [335198] LOG: [walpipeline] producer:
> exiting: sent=5 received=5
> 2026-02-04 10:29:40.167 CET [335197] LOG: WAL pipeline stopped
> 2026-02-04 10:29:40.167 CET [335197] LOG: redo done at 0/03000028
> system usage: CPU: user: 0.00 s, system: 0.03 s, elapsed: 0.04 s
> 2026-02-04 10:29:40.169 CET [335197] LOG: selected new timeline ID: 2
> 2026-02-04 10:29:40.184 CET [335197] LOG: archive recovery complete
>
> - even when starting from scratch with wal_pipeline = on it's similiar:
> 2026-02-04 10:34:14.386 CET [335833] LOG: database system was
> interrupted; last known up at 2026-02-04 10:34:02 CET
> 2026-02-04 10:34:14.453 CET [335833] LOG: starting backup recovery
> with redo LSN 0/0B0000E0, checkpoint LSN 0/0B000138, on timeline ID 1
> 2026-02-04 10:34:14.453 CET [335833] LOG: entering standby mode
> 2026-02-04 10:34:14.482 CET [335833] LOG: [walpipeline] started.
> 2026-02-04 10:34:14.482 CET [335833] LOG: redo starts at 0/0B0000E0
> 2026-02-04 10:34:14.484 CET [335833] LOG: completed backup
> recovery with redo LSN 0/0B0000E0 and end LSN 0/0B0001E0
> 2026-02-04 10:34:14.484 CET [335833] LOG: consistent recovery
> state reached at 0/0B0001E0
> 2026-02-04 10:34:14.484 CET [335833] LOG: [walpipeline] consumer:
> received shutdown message from the producer
> 2026-02-04 10:34:14.484 CET [335827] LOG: database system is ready
> to accept read-only connections
> 2026-02-04 10:34:14.493 CET [335834] LOG: [walpipeline] producer:
> exiting: sent=4 received=4
> 2026-02-04 10:34:14.504 CET [335833] LOG: WAL pipeline stopped
> 2026-02-04 10:34:14.504 CET [335833] LOG: redo done at 0/0B0001E0
> system usage: CPU: user: 0.00 s, system: 0.03 s, elapsed: 0.04 s
> 2026-02-04 10:34:14.506 CET [335833] LOG: selected new timeline ID: 2
>
> it appears that "Cannot read more records, shut it down" ->
> WalPipeline_SendShutdown() path is taken but I haven't pursued further.
>
Yes I initial testing was only done with the archive recovery. I am
still trying to add support for streaming replication.
Also many tests from `make -C src/test/recovery/ check
PG_TEST_INITDB_EXTRA_OPTS="-c wal_pipeline=on"` are failing and need
to be fixed.
> Also some very quick review comments:
>
> +bool
> +WalPipeline_SendRecord(XLogReaderState *record)
> +{
> [..]
> +
> + if (msglen > WAL_PIPELINE_MAX_MSG_SIZE)
> + {
> + elog(WARNING, "[walpipeline] producer: wal record at %X/%X
> too large (%zu bytes), skipping",
> + LSN_FORMAT_ARGS(record->ReadRecPtr), msglen);
> + pfree(buffer);
> + return true;
> + }
>
> When/why it could happen and if it would happen, shouldn't this be more
> like PANIC instead?
>
Noted, Will fix that.
> +/* Size of the shared memory queue (can be made configurable) */
> +#define WAL_PIPELINE_QUEUE_SIZE (128 * 1024 * 1024) /* 8 MB */
> +
> +/* Maximum size of a single message */
> +#define WAL_PIPELINE_MAX_MSG_SIZE (2 * 1024 * 1024) /* 1 MB */
>
> Maybe those should take into account (XLOG_)BLCKSZ too?
>
Yes, that's a good idea, I will do that.
Thanks for the valuable feedback. I will try to fix most of these
issues in the next patch.
Regards,
Imran Zaheer