Fix comments of heap_prune_chain()

2021-07-12 Thread ikedamsh
Hi,While I’m reading source codes related to vacuum, I found comments whichdon’t seem to fit the reality. I think the commit[1] just forgot to fix them.What do you think?[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=dc7420c2c9274a283779ec19718d2d16323640c0
Regards,-- Masahiro IkedaNTT DATA CORPORATION



v1-0001-fix_heap_prune_chain.patch
Description: Binary data


Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-22 Thread ikedamsh


On 2021/03/22 16:50, Fujii Masao wrote:
> 
> 
> On 2021/03/22 9:50, ikedamsh wrote:
>> Agreed. I separated the patches.
>>
>> If only the former is committed, my trivial concern is that there may be
>> a disadvantage, but no advantage for the standby server. It may lead to
>> performance degradation to the wal receiver by calling
>> INSTR_TIME_SET_CURRENT(), but the stats can't visible for users until the
>> latter patch is committed.
> 
> Your concern is valid, so let's polish and commit also the 0003 patch to v14.
> I'm still thinking that it's better to separate wal_xxx columns into
> walreceiver's and the others. But if we count even walreceiver activity on
> the existing columns, regarding 0003 patch, we need to update the document?
> For example, "Number of times WAL buffers were written out to disk via
> XLogWrite request." should be "Number of times WAL buffers were written
> out to disk via XLogWrite request and by WAL receiver process."? Maybe
> we need to append some descriptions about this into "WAL configuration"
> section?

Agreed. Users can know whether the stats is for walreceiver or not. The
pg_stat_wal view in standby server shows for the walreceiver, and in primary
server it shows for the others. So, I updated the document.
(v20-0003-Makes-the-wal-receiver-report-WAL-statistics.patch)

>> I followed the argument of pg_pwrite().
>> But, I think "char *" is better, so fixed it.
> 
> Thanks for updating the patch!
> 
> +extern int    XLogWriteFile(int fd, char *buf,
> +  size_t nbyte, off_t offset,
> +  TimeLineID timelineid, XLogSegNo segno,
> +  bool write_all);
> 
> write_all seems not to be necessary. You added this flag for walreceiver,
> I guess. But even without the argument, walreceiver seems to work expectedly.
> So, what about the attached patch? I applied some cosmetic changes to the 
> patch.

Thanks a lot. Yes, "write_all" is unnecessary.
Your patch is looks good to me.


Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index db4b4e460c..281b13b9fa 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3493,7 +3493,8 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
   
   
Number of times WAL buffers were written out to disk via
-   XLogWrite request.
+   XLogWrite request and WAL data were written
+   out to disk by the WAL receiver process.
See  for more information about
the internal WAL function XLogWrite.
   
@@ -3521,7 +3522,8 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
   
   
Total amount of time spent writing WAL buffers to disk via
-   XLogWrite request, in milliseconds
+   XLogWrite request and WAL data to disk
+   by the WAL receiver process, in milliseconds
(if  is enabled,
otherwise zero).  This includes the sync time when
wal_sync_method is either
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index ae4a3c1293..39e7028c96 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -769,7 +769,7 @@
   
 
   
-   There are two internal functions to write WAL data to disk:
+   There are two internal functions to write generated WAL data to disk:
XLogWrite and issue_xlog_fsync.
When  is enabled, the total
amounts of time XLogWrite writes and
@@ -795,7 +795,19 @@
issue_xlog_fsync syncs WAL data to disk are also
counted as wal_write and wal_sync
in pg_stat_wal, respectively.
-  
+   To write replicated WAL data to disk by the WAL receiver is almost the same
+   as above except for some points. First, there is a dedicated code path for the
+   WAL receiver to write data although issue_xlog_fsync is
+   the same for syncing data.
+   Second, the WAL receiver writes replicated WAL data per bytes from the local
+   memory although the generated WAL data is written per WAL buffer pages.
+   The counters of wal_write, wal_sync,
+   wal_write_time, and wal_sync_time are
+   common statistics for writing/syncing both generated and replicated WAL data.
+   But, you can distinguish them because the generated WAL data is written/synced
+   in the primary server and the replicated WAL data is written/synced in
+   the standby server.
+   
  
 
  
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index a7a94d2a83..df028c5039 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -771,6 +771,9 @@ WalRcvDie(int code, Datum arg)
 	/* Ensure that all WAL records received are flushed to disk */
 	XLogWalRcvFlush(true);
 
+	/* Send WAL statisti

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-21 Thread ikedamsh
On 2021-03-19 16:30, Fujii Masao wrote:
> On 2021/03/15 10:39, Masahiro Ikeda wrote:
>> Thanks, I understood get_sync_bit() checks the sync flags and
>> the write unit of generated wal data and replicated wal data is 
>> different.
>> (It's interesting optimization whether to use kernel cache or not.)
>> 
>> OK. Although I agree to separate the stats for the walrecever,
>> I want to hear opinions from other people too. I didn't change the 
>> patch.
>> 
>> Please feel to your comments.
> 
> What about applying the patch for common WAL write function like
> XLogWriteFile(), separately from the patch for walreceiver's stats?
> Seems the former reaches the consensus, so we can commit it firstly.
> Also even only the former change is useful because which allows
> walreceiver to report WALWrite wait event.

Agreed. I separated the patches.

If only the former is committed, my trivial concern is that there may be
a disadvantage, but no advantage for the standby server. It may lead to
performance degradation to the wal receiver by calling
INSTR_TIME_SET_CURRENT(), but the stats can't visible for users until the
latter patch is committed.

I think it's ok because this not happening in the case to disable the
"track_wal_io_timing" in the standby server. Although some users may start the
standby server using the backup which "track_wal_io_timing" is enabled in the
primary server, they will say it's ok since the users already accept the
performance degradation in the primary server.

>> OK. I agree.
>> 
>> I wonder to change the error check ways depending on who calls this 
>> function?
>> Now, only the walreceiver checks (1)errno==0 and doesn't check 
>> (2)errno==ENITR.
>> Other processes are the opposite.
>> 
>> IIUC, it's appropriate that every process checks (1)(2).
>> Please let me know my understanding is wrong.
> 
> I'm thinking the same. Regarding (2), commit 79ce29c734 introduced
> that code. According to the following commit log, it seems harmless
> to retry on EINTR even walreceiver.
> 
> Also retry on EINTR. All signals used in the backend are flagged 
> SA_RESTART
> nowadays, so it shouldn't happen, but better to be defensive.

Thanks, I understood.


>>> BTW, currently XLogWrite() increments IO timing even when pg_pwrite()
>>> reports an error. But this is useless. Probably IO timing should be
>>> incremented after the return code of pg_pwrite() is checked, instead?
>> 
>> Yes, I agree. I fixed it.
>> (v18-0003-Makes-the-wal-receiver-report-WAL-statistics.patch)
> 
> Thanks for the patch!
> 
>   nleft = nbytes;
>   do
>   {
> - errno = 0;
> + written = XLogWriteFile(openLogFile, from, 
> nleft, (off_t) 
> startoffset,
> + 
> ThisTimeLineID, openLogSegNo, wal_segment_size);
> 
> Can we merge this do-while loop in XLogWrite() into the loop
> in XLogWriteFile()?
> If we do that, ISTM that the following codes are not necessary in 
> XLogWrite().
> 
>   nleft -= written;
>   from += written;

OK, I fixed it.


> + * 'segsize' is a segment size of WAL segment file.
> 
> Since segsize is always wal_segment_size, segsize argument seems
> not necessary in XLogWriteFile().

Right. I fixed it.


> +XLogWriteFile(int fd, const void *buf, size_t nbyte, off_t offset,
> +   TimeLineID timelineid, XLogSegNo segno, int segsize)
> 
> Why did you use "const void *" instead of "char *" for *buf?

I followed the argument of pg_pwrite().
But, I think "char *" is better, so fixed it.


> Regarding 0005 patch, I will review it later.

Thanks.


Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION



diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index a7a94d2a83..df028c5039 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -771,6 +771,9 @@ WalRcvDie(int code, Datum arg)
 	/* Ensure that all WAL records received are flushed to disk */
 	XLogWalRcvFlush(true);
 
+	/* Send WAL statistics to the stats collector before terminating */
+	pgstat_send_wal(true);
+
 	/* Mark ourselves inactive in shared memory */
 	SpinLockAcquire(>mutex);
 	Assert(walrcv->walRcvState == WALRCV_STREAMING ||
@@ -910,6 +913,12 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
 	XLogArchiveForceDone(xlogfname);
 else
 	XLogArchiveNotify(xlogfname);
+
+/*
+ * Send WAL statistics to the stats collector when finishing
+ * the current WAL segment file to avoid overloading it.
+ */
+pgstat_send_wal(false);
 			}
 			recvFile = -1;
 



diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index bd5e787e55..4c7d90f1b9 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2536,61 +2536,14 @@ 

It is not documented that pg_promote can exit standby mode

2020-04-16 Thread ikedamsh

Hi, 

The document(high-availability.sgml) says that there are only two ways 
to exit standby mode.


 26.2.2. Standby Server Operation
 Standby mode is exited and the server switches to normal operation when 
pg_ctl promote is run or a trigger file is found (promote_trigger_file).


But there is another way, by calling pg_promote function.
I think we need to document it, doesn't it?

I attached a patch. Please review and let me know your thoughts.

Regards,
Masahiro IkedaFrom 719b754c36f4537aaab7c6de1951d7b7ec4759f6 Mon Sep 17 00:00:00 2001
From: Masahiro Ikeda 
Date: Tue, 7 Apr 2020 08:34:55 +0900
Subject: [PATCH] fix doc about the way to exit standby mode

---
 doc/src/sgml/high-availability.sgml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index 4659b9e..88bf960 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -644,8 +644,8 @@ protocol to make nodes agree on a serializable transactional order.
 

 Standby mode is exited and the server switches to normal operation
-when pg_ctl promote is run or a trigger file is found
-(promote_trigger_file). Before failover,
+when pg_ctl promote is run, pg_promote is called,
+or a trigger file is found(promote_trigger_file). Before failover,
 any WAL immediately available in the archive or in pg_wal will be
 restored, but no attempt is made to connect to the master.

-- 
1.8.3.1