Re: About to add WAL write/fsync statistics to pg_stat_wal view

Masahiro Ikeda Tue, 09 Mar 2021 21:12:13 -0800

On 2021-03-09 17:51, Fujii Masao wrote:

On 2021/03/05 8:38, Masahiro Ikeda wrote:
On 2021-03-05 01:02, Fujii Masao wrote:
On 2021/03/04 16:14, Masahiro Ikeda wrote:
On 2021-03-03 20:27, Masahiro Ikeda wrote:
On 2021-03-03 16:30, Fujii Masao wrote:
On 2021/03/03 14:33, Masahiro Ikeda wrote:
On 2021-02-24 16:14, Fujii Masao wrote:
On 2021/02/15 11:59, Masahiro Ikeda wrote:
On 2021-02-10 00:51, David G. Johnston wrote:
On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
<[email protected]> wrote:
I pgindented the patches.
... <function>XLogWrite</function>, which is invoked during an
<function>XLogFlush</function> request (see ...). This isalso
incremented by the WAL receiver during replication.
("which normally called" should be "which is normally called"or"which normally is called" if you want to keep true to theoriginal)You missed the adding the space before an opening parenthesishere and
elsewhere (probably copy-paste)

is ether -> is either
"This parameter is off by default as it will repeatedly querythe
operating system..."
", because" -> "as"
Thanks, I fixed them.
wal_write_time and the sync items also need the note: "This isalso
incremented by the WAL receiver during replication."
I skipped changing it since I separated the stats for the WALreceiver
in pg_stat_wal_receiver.
"The number of times it happened..." -> " (the tally of thisevent isreported in wal_buffers_full in....) This is undesirablebecause ..."
Thanks, I fixed it.
I notice that the patch for WAL receiver doesn't requireexplicitlycomputing the sync statistics but does require computing thewritestatistics. This is because of the presence ofissue_xlog_fsync butabsence of an equivalent pg_xlog_pwrite. Additionally, Iobserve thatthe XLogWrite code path calls pgstat_report_wait_*() while theWALreceiver path does not. It seems technically straight-forwardtorefactor here to avoid the almost-duplicated logic in the twoplaces,though I suspect there may be a trade-off for not addinganotherfunction call to the stack given the importance of WALprocessing(though that seems marginalized compared to the cost ofactuallywriting the WAL). Or, as Fujii noted, go the other way anddon't haveany shared code between the two but instead implement the WALreceiver
one to use pg_stat_wal_receiver instead.  In either case, this
half-and-half implementation seems undesirable.
OK, as Fujii-san mentioned, I separated the WAL receiver stats.
(v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)
Thanks for updating the patches!
I added the infrastructure code to communicate the WAL receiverstats messages between the WAL receiver and the statscollector, and
the stats for WAL receiver is counted in pg_stat_wal_receiver.
What do you think?
On second thought, this idea seems not good. Because those statsare
collected between multiple walreceivers, but other values in
pg_stat_wal_receiver is only related to the walreceiver processrunningat that moment. IOW, it seems strange that some values showdynamicstats and the others show collected stats, even though they arein
the same view pg_stat_wal_receiver. Thought?
OK, I fixed it.
The stats collected in the WAL receiver is exposed in pg_stat_walview in v11 patch.
Thanks for updating the patches! I'm now reading 001 patch.

+    /* Check whether the WAL file was synced to disk right now */
+    if (enableFsync &&
+        (sync_method == SYNC_METHOD_FSYNC ||
+         sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
+         sync_method == SYNC_METHOD_FDATASYNC))
+    {

Isn't it better to make issue_xlog_fsync() return immediately
if enableFsync is off, sync_method is open_sync or open_data_sync,
to simplify the code more?
Thanks for the comments.
I added the above code in v12 patch.
+        /*
+ * Send WAL statistics only if WalWriterDelay has elapsedto minimize
+         * the overhead in WAL-writing.
+         */
+        if (rc & WL_TIMEOUT)
+            pgstat_send_wal();
On second thought, this change means that it always takeswal_writer_delaybefore walwriter's WAL stats is sent after XLogBackgroundFlush()is called.For example, if wal_writer_delay is set to several seconds, somevalues inpg_stat_wal would be not up-to-date meaninglessly for thoseseconds.So I'm thinking to withdraw my previous comment and it's ok tosend
the stats every after XLogBackgroundFlush() is called. Thought?
Thanks, I didn't notice that.

Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's
default value is 200msec and it may be set shorter time.
Yeah, if wal_writer_delay is set to very small value, there is a risk
that the WAL stats are sent too frequently. I agree that's a problem.
Why don't to make another way to check the timestamp?

+               /*
+                * Don't send a message unless it's been at least
PGSTAT_STAT_INTERVAL
+                * msec since we last sent one
+                */
+               now = GetCurrentTimestamp();
+               if (TimestampDifferenceExceeds(last_report, now,
PGSTAT_STAT_INTERVAL))
+               {
+                       pgstat_send_wal();
+                       last_report = now;
+               }
+
Although I worried that it's better to add the check code inpgstat_send_wal(),
Agreed.
I didn't do so because to avoid to double checkPGSTAT_STAT_INTERVAL.pgstat_send_wal() is invoked pg_report_stat() and it already checksthe
PGSTAT_STAT_INTERVAL.
I think that we can do that. What about the attached patch?
Thanks, I thought it's better.
I forgot to remove an unused variable.
The attached v13 patch is fixed.
Thanks for updating the patch!

+        w.wal_write,
+        w.wal_write_time,
+        w.wal_sync,
+        w.wal_sync_time,

It's more natural to put wal_write_time and wal_sync_time next to
each other? That is, what about the following order of columns?

wal_write
wal_sync
wal_write_time
wal_sync_time
Yes, I fixed it.
-        case SYNC_METHOD_OPEN:
-        case SYNC_METHOD_OPEN_DSYNC:
-            /* write synced it already */
-            break;
IMO it's better to add Assert(false) here to ensure that we neverreach
here, as follows. Thought?

+        case SYNC_METHOD_OPEN:
+        case SYNC_METHOD_OPEN_DSYNC:
+            /* not reachable */
+            Assert(false);
I agree.
Even when a backend exits, it sends the stats viapgstat_beshutdown_hook().On the other hand, walwriter doesn't do that. Walwriter also shouldsendthe stats even at its exit? Otherwise some stats can fail to becollected.But ISTM that this issue existed from before, for examplecheckpointerdoesn't call pgstat_send_bgwriter() at its exit, so it's overkill tofix
this issue in this patch?
Thanks, I thought it's better to do so.
I added the shutdown hook for the walwriter and the checkpointer inv14-0003 patch.
Thanks for 0003 patch!
Isn't it overkill to send the stats in the walwriter-exit-callback? IMOwe canjust send the stats only when ShutdownRequestPending is true in thewalwriter
main loop (maybe just before calling HandleMainLoopInterrupts()).
If we do this, we cannot send the stats when walwriter throws FATALerror.But that's ok because FATAL error on walwriter causes the server tocrash.
Thought?


Thanks for your comments!
Yes, I agree.

Also ISTM that we don't need to use the callback for that purpose in
checkpointer because of the same reason. That is, we can send the stats
just after calling ShutdownXLOG(0, 0) inHandleCheckpointerInterrupts().
Thought?


Yes, I think so too.

Since ShutdownXLOG() may create restartpoint or checkpoint,
it might generate WAL records.

I'm now not sure how much useful these changes are. As far as I readpgstat.c,
when shutdown is requested, the stats collector seems to exit even when
there are outstanding stats messages. So if checkpointer and walwritersend
the stats in their last cycles, those stats might not be collected.
On the other hand, I can think that sending the stats in the lastcycles wouldimprove the situation a bit than now. So I'm inclined to apply thosechanges...


I didn't notice that. I agree this is an important aspect.

I understood there is a case that the stats collector exits before thecheckpointer

or the walwriter exits and some stats might not be collected.

Of course, there is another direction; we can improve the statscollector sothat it guarantees to collect all the sent stats messages. But I'mafraid
this change might be big.

For example, implement to manage background process status in sharedmemory andthe stats collector collects the stats until another background processexits?


In my understanding, the statistics are not required high accuracy,
it's ok to ignore them if the impact is not big.

If we guarantee high accuracy, another background process likeautovacuum launchermust send the WAL stats because it accesses the system catalog and mightgenerate

WAL records due to HOT update even though the possibility is low.

I thought the impact is small because the time uncollected stats aregenerated isshort compared to the time from startup. So, it's ok to ignore theremaining stats

when the process exists.

BTW, I found BgWriterStats.m_timed_checkpoints is not counted inShutdownLOG()

and we need to count it if to collect stats before it exits.


Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

Re: About to add WAL write/fsync statistics to pg_stat_wal view

Reply via email to