Re: outdated references to replication timeout

2021-01-14 Thread John Naylor
On Thu, Jan 14, 2021 at 1:55 AM Michael Paquier  wrote:
>
> On Wed, Jan 13, 2021 at 11:28:55PM +0900, Fujii Masao wrote:
> > On Wed, Jan 13, 2021 at 10:51 PM John Naylor <
john.nay...@enterprisedb.com> wrote:
> >> It is strange, now that I think about it. My thinking was that the
> >> former wording of "replication timeout" was a less literal
> >> reference to the replication_timeout parameter, so I did the same
> >> for wal_sender_timeout. A quick look shows we are not consistent
> >> in the documentation as far as walsender vs. WAL sender. For the
> >> purpose of the patch I agree it should be consistent within a
> >> single message. Maybe the parameter should be spelled exactly as
> >> is, with underscores?
> >
> > I'm ok with that. But there seems no other timeout messages using
> > the parameter name.
>
> Could it be that nothing needs to change here because this refers to
> timeouts with the replication protocol?  The current state of things
> on HEAD does not sound completely incorrect to me either.

It was off enough to cause brief confusion during a customer emergency. To
show a bit more context around one of the proposed corrections:

 timeout = TimestampTzPlusMilliseconds(last_reply_timestamp,
 wal_sender_timeout);

if (wal_sender_timeout > 0 && last_processing >= timeout)
{
/*
* Since typically expiration of replication timeout means
* communication problem, we don't send the error message to the
* standby.
*/
ereport(COMMERROR,
(errmsg("terminating walsender process due to replication timeout")));

My reading is, this is a case where the message didn't keep up with the
change in parameter name.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: outdated references to replication timeout

2021-01-13 Thread Michael Paquier
On Wed, Jan 13, 2021 at 11:28:55PM +0900, Fujii Masao wrote:
> On Wed, Jan 13, 2021 at 10:51 PM John Naylor  
> wrote:
>> It is strange, now that I think about it. My thinking was that the
>> former wording of "replication timeout" was a less literal
>> reference to the replication_timeout parameter, so I did the same
>> for wal_sender_timeout. A quick look shows we are not consistent
>> in the documentation as far as walsender vs. WAL sender. For the
>> purpose of the patch I agree it should be consistent within a
>> single message. Maybe the parameter should be spelled exactly as
>> is, with underscores? 
>
> I'm ok with that. But there seems no other timeout messages using
> the parameter name.

Could it be that nothing needs to change here because this refers to
timeouts with the replication protocol?  The current state of things
on HEAD does not sound completely incorrect to me either.
--
Michael


signature.asc
Description: PGP signature


Re: outdated references to replication timeout

2021-01-13 Thread Fujii Masao
On Wed, Jan 13, 2021 at 10:51 PM John Naylor
 wrote:
>
>
> On Tue, Jan 12, 2021 at 9:37 PM Fujii Masao  wrote:
> >
> > Thanks for the patch! I think this change makes sense.
> >
> > -   (errmsg("terminating walsender process
> > due to replication timeout")));
> > +   (errmsg("terminating walsender process
> > due to WAL sender timeout")));
> >
> > Isn't it a bit strange to include different expressions "walsender" and
> > "WAL sender" for the same thing in one message?
>
> It is strange, now that I think about it. My thinking was that the former 
> wording of "replication timeout" was a less literal reference to the 
> replication_timeout parameter, so I did the same for wal_sender_timeout. A 
> quick look shows we are not consistent in the documentation as far as 
> walsender vs. WAL sender. For the purpose of the patch I agree it should be 
> consistent within a single message. Maybe the parameter should be spelled 
> exactly as is, with underscores?

I'm ok with that. But there seems no other timeout messages using
the parameter name.

src/backend/replication/logical/worker.c: (errmsg("terminating logical
replication worker due to timeout")));
src/backend/replication/walreceiver.c: (errmsg("terminating
walreceiver due to timeout")));
src/backend/replication/walsender.c: (errmsg("terminating walsender
process due to replication timeout")));
src/backend/tcop/postgres.c: errmsg("canceling authentication due to
timeout")));
src/backend/tcop/postgres.c: errmsg("canceling statement due to lock
timeout")));
src/backend/tcop/postgres.c: errmsg("canceling statement due to
statement timeout")));
src/backend/tcop/postgres.c: errmsg("terminating connection due to
idle-in-transaction timeout")));
src/backend/tcop/postgres.c: errmsg("terminating connection due to
idle-session timeout")));
src/backend/utils/misc/timeout.c: errmsg("cannot add more timeout reasons")));


> I'll take a broader look and send an updated patch.

Thanks!

Regards,

-- 
Fujii Masao




Re: outdated references to replication timeout

2021-01-13 Thread John Naylor
On Tue, Jan 12, 2021 at 9:37 PM Fujii Masao  wrote:
>
> Thanks for the patch! I think this change makes sense.
>
> -   (errmsg("terminating walsender process
> due to replication timeout")));
> +   (errmsg("terminating walsender process
> due to WAL sender timeout")));
>
> Isn't it a bit strange to include different expressions "walsender" and
> "WAL sender" for the same thing in one message?

It is strange, now that I think about it. My thinking was that the former
wording of "replication timeout" was a less literal reference to the
replication_timeout parameter, so I did the same for wal_sender_timeout. A
quick look shows we are not consistent in the documentation as far
as walsender vs. WAL sender. For the purpose of the patch I agree it should
be consistent within a single message. Maybe the parameter should be
spelled exactly as is, with underscores? I'll take a broader look and send
an updated patch.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: outdated references to replication timeout

2021-01-12 Thread Fujii Masao
On Wed, Jan 13, 2021 at 5:39 AM John Naylor
 wrote:
>
> Hi,
>
> The parameter replication_timeout was retired in commit 6f60fdd701 back in 
> 2012, but some comments and error messages seem to refer to that old setting 
> instead of wal_sender_timeout or wal_receiver_timeout. The attached patch 
> replaces the old language with more specific references.

Thanks for the patch! I think this change makes sense.

-   (errmsg("terminating walsender process
due to replication timeout")));
+   (errmsg("terminating walsender process
due to WAL sender timeout")));

Isn't it a bit strange to include different expressions "walsender" and
"WAL sender" for the same thing in one message?


This is a bit related, but different topic, though. If we change the above
message about walsender timeout, I also want to change the message about
walreceiver timeout, so as to make them more consistent. For example,

- (errmsg("terminating walreceiver due to timeout")));
+ (errmsg("terminating WAL receiver process due to WAL receiver timeout")));

Regards,

-- 
Fujii Masao




outdated references to replication timeout

2021-01-12 Thread John Naylor
Hi,

The parameter replication_timeout was retired in commit 6f60fdd701 back in
2012, but some comments and error messages seem to refer to that old
setting instead of wal_sender_timeout or wal_receiver_timeout. The attached
patch replaces the old language with more specific references.

-- 
John Naylor
EDB: http://www.enterprisedb.com


stray-repl-timeout.patch
Description: Binary data