Re: New WAL record to detect the checkpoint redo location

Robert Haas Tue, 03 Oct 2023 05:51:56 -0700

On Wed, Sep 20, 2023 at 4:20 PM Robert Haas <[email protected]> wrote:
> Here are some patches.


Here are some updated patches. Following some off-list conversation
with Andres, I restructured 0003 to put the common case first and use
likely(), and I fixed the brown-paper-bag noted by Amit. I then turned
my attention to performance testing. I was happy to find out when I
did a bunch of testing on Friday that my branch with these patches
applied outperformed master. I was then less happy to find that when I
repeated the same tests today, master outperformed the branch. So now
I don't know what is going on, but it doesn't seem like my test
results are stable enough to draw meaningful conclusions.

I was trying to think of a test case where XLogInsertRecord would be
exercised as heavily as possible, so I really wanted to generate a lot
of WAL while doing as little real work as possible. The best idea that
I had was to run pg_create_restore_point() in a loop. Initially,
performance was dominated by the log messages which that function
emits, so I set log_min_messages='FATAL' to suppress those. To try to
further reduce other bottlenecks, I also set max_wal_size='50GB',
fsync='off', synchronous_commit='off', and wal_buffers='256MB'. Then I
ran this query:

select count(*) from (SELECT pg_create_restore_point('banana') from
generate_series(1,100000000) g) x;

I can't help laughing at the comedy of creating 100 million
banana-named restore points with no fsyncs or logging, but here we
are. All of my test runs with master, and with the patches, and with
just the first patch run in between 34 and 39 seconds. As I say, I
can't really separate out which versions are faster and slower with
any confidence. Before I fixed the brown-paper bag that Amit pointed
out, it was using WALInsertLockAcquireExclusive() instead of
WALInsertLockAcquire() for *all* WAL records, and that created an
extremely large and obvious increase in the runtime of the tests. So
I'm relatively confident that this test case is sensitive to changes
in execution time of XLogInsertRecord(), but apparently the changes
caused by rearranging the branches are a bit too marginal for them to
show up here.

One possible conclusion is that the differences here aren't actually
big enough to get stressed about, but I don't want to jump to that
conclusion without investigating the competing hypothesis that this
isn't the right way to test this, and that some better test would show
clearer results. Suggestions?

--
Robert Haas
EDB: http://www.enterprisedb.com

v7-0001-Unify-two-isLogSwitch-tests-in-XLogInsertRecord.patch
Description: Binary data

v7-0003-WIP-Insert-XLOG_CHECKPOINT_REDO-at-the-redo-point.patch
Description: Binary data

v7-0002-Minimally-add-XLOG_CHECKPOINT_REDO.patch
Description: Binary data

Re: New WAL record to detect the checkpoint redo location

Reply via email to