Fwd: [GENERAL] wal_retrieve_retry_interval

Ludovic Vaugeois-Pepin Sun, 28 May 2017 01:08:36 -0700

On Sun, May 28, 2017 at 6:54 AM, Michael Paquier <[email protected]>
wrote:


> On Sat, May 27, 2017 at 2:40 PM, Ludovic Vaugeois-Pepin
> <[email protected]> wrote:
> > Say, with 9.6.2,  a hot_standby fails to connect to a replication slot:
> >     FATAL:  could not start WAL streaming: ERROR:  replication slot
> "test3"
> > does not exist
> > or
> >     FATAL:  could not connect to the primary server: FATAL:  the database
> > system is starting up
> >
> > Is there a way to reduce the time it takes until the next attempt? I
> > assumed, wrongly I think, that this would be wal_retrieve_retry_interval,
> > but it seems that it won't make a difference. I tried setting it to 3s,
> but
> > it seems to take 15s still. Here are two log samples:
>
> Could you double-check your configuration? If I set
> wal_retrieve_retry_interval to 1s on a standby, I am able to see a
> connection attempt from a WAL receiver happening with this interval of
> time in the case of repetitive failures.
>

It really is set at 3s on all servers (master and standbies) earlier in the
"deployment" process at the same time "listen_addresses", "hot_standby",
and others are set. This doesn't seem to happen every time I run tests. I
increased logging to DEBUG1. This is what I got when the problem occurred
again. Note that the empty line is there in the log file.

< 2017-05-28 09:29:36.127 CEST > LOG:  database system was shut down in
recovery at 2017-05-28 09:29:03 CEST
< 2017-05-28 09:29:36.127 CEST > LOG:  entering standby mode
< 2017-05-28 09:29:36.127 CEST > DEBUG:  checkpoint record is at 0/80002B8
< 2017-05-28 09:29:36.128 CEST > DEBUG:  redo record is at 0/80002B8;
shutdown TRUE
< 2017-05-28 09:29:36.128 CEST > DEBUG:  next transaction ID: 0:2535; next
OID: 18660
< 2017-05-28 09:29:36.128 CEST > DEBUG:  next MultiXactId: 1; next
MultiXactOffset: 0
< 2017-05-28 09:29:36.128 CEST > DEBUG:  oldest unfrozen transaction ID:
1750, in database 1
< 2017-05-28 09:29:36.128 CEST > DEBUG:  oldest MultiXactId: 1, in database
1
< 2017-05-28 09:29:36.128 CEST > DEBUG:  commit timestamp Xid
oldest/newest: 0/0
< 2017-05-28 09:29:36.128 CEST > DEBUG:  transaction ID wrap limit is
2147485397 <(214)%20748-5397>, limited by database with OID 1
< 2017-05-28 09:29:36.128 CEST > DEBUG:  MultiXactId wrap limit is
2147483648 <(214)%20748-3648>, limited by database with OID 1
< 2017-05-28 09:29:36.128 CEST > DEBUG:  starting up replication slots
< 2017-05-28 09:29:36.128 CEST > DEBUG:  resetting unlogged relations:
cleanup 1 init 0
< 2017-05-28 09:29:36.129 CEST > DEBUG:  initializing for hot standby
< 2017-05-28 09:29:36.129 CEST > DEBUG:  recovery snapshots are now enabled
< 2017-05-28 09:29:36.129 CEST > LOG:  consistent recovery state reached at
0/8000328
< 2017-05-28 09:29:36.129 CEST > LOG:  invalid record length at 0/8000328:
wanted 24, got 0
< 2017-05-28 09:29:36.129 CEST > LOG:  database system is ready to accept
read only connections
< 2017-05-28 09:29:36.135 CEST > FATAL:  could not connect to the primary
server: FATAL:  the database system is starting up

< 2017-05-28 09:29:36.135 CEST > DEBUG:  invalid record length at
0/8000328: wanted 24, got 0
< 2017-05-28 09:29:51.153 CEST > DEBUG:  invalid record length at
0/8000328: wanted 24, got 0
< 2017-05-28 09:29:51.158 CEST > LOG:  fetching timeline history file for
timeline 4 from primary server
< 2017-05-28 09:29:51.160 CEST > LOG:  started streaming WAL from primary
at 0/8000000 on timeline 3
< 2017-05-28 09:29:51.160 CEST > LOG:  replication terminated by primary
server
< 2017-05-28 09:29:51.160 CEST > DETAIL:  End of WAL reached on timeline 3
at 0/8000328.
< 2017-05-28 09:29:51.161 CEST > DEBUG:  walreceiver ended streaming and
awaits new instructions
< 2017-05-28 09:29:51.161 CEST > LOG:  new target timeline is 4
< 2017-05-28 09:29:51.161 CEST > DEBUG:  invalid record length at
0/8000328: wanted 24, got 0
< 2017-05-28 09:29:51.161 CEST > LOG:  restarted WAL streaming at 0/8000000
on timeline 4
< 2017-05-28 09:29:51.191 CEST > LOG:  redo starts at 0/8000328




> --
> Michael
>



-- 
Ludovic Vaugeois-Pepin



-- 
Ludovic Vaugeois-Pepin

Fwd: [GENERAL] wal_retrieve_retry_interval

Reply via email to