(please, the list policy is bottom-posting to keep history clean, thanks).

On Thu, 14 May 2020 07:18:33 +0500
godjan • <g0d...@gmail.com> wrote:

> -> Why do you kill -9 your standby?   
> Hi, it’s Jepsen test for our HA solution. It checks that we don’t lose data
> in such situation.

OK. This test is highly useful to stress data high availability and durability,
of course. However, how useful is this test in a context of auto failover for
**service** high availability?  If all your nodes are killed in the same
disaster, how/why an automatic cluster manager should take care of starting all
nodes again and pick the right node to promote?

> So, now we update logic as Michael said. All ha alive standbys now waiting
> for replaying all WAL that they have and after we use pg_last_replay_lsn() to
> choose which standby will be promoted in failover.
> 
> It fixed out trouble, but there is one another. Now we should wait when all
> ha alive hosts finish replaying WAL to failover. It might take a while(for
> example WAL contains wal_record about splitting b-tree).

Indeed, this is the concern I wrote about yesterday in a second mail on this
thread.

> We are looking for options that will allow us to find a standby that contains
> all data and replay all WAL only for this standby before failover.

Note that when you promote a node, it first replays available WALs before
acting as a primary. So you can safely signal the promotion to the node and
wait for it to finish the replay and promote.

> Maybe you have ideas on how to keep the last actual value of
> pg_last_wal_receive_lsn()? 

Nope, no clean and elegant idea. One your instances are killed, maybe you can
force flush the system cache (secure in-memory-only data) and read the latest
received WAL using pg_waldump?

But, what if some more data are available from archives, but not received from
streaming rep because of a high lag?

> As I understand WAL receiver doesn’t write to disk walrcv->flushedUpto.

I'm not sure to understand what you mean here.
pg_last_wal_receive_lsn() reports the actual value of walrcv->flushedUpto.
walrcv->flushedUpto reports the latest LSN force-flushed to disk.


> > On 13 May 2020, at 19:52, Jehan-Guillaume de Rorthais <j...@dalibo.com>
> > wrote:
> > 
> > 
> > (too bad the history has been removed to keep context)
> > 
> > On Fri, 8 May 2020 15:02:26 +0500
> > godjan • <g0d...@gmail.com> wrote:
> >   
> >> I got it, thank you.
> >> Can you recommend what to use to determine which quorum standby should be
> >> promoted in such case? We planned to use pg_last_wal_receive_lsn() to
> >> determine which has fresh data but if it returns the beginning of the
> >> segment on both replicas we can’t determine which standby confirmed that
> >> write transaction to disk.  
> > 
> > Wait, pg_last_wal_receive_lsn() only decrease because you killed your
> > standby.
> > 
> > pg_last_wal_receive_lsn() returns the value of walrcv->flushedUpto. The
> > later is set to the beginning of the segment requested only during the first
> > walreceiver startup or a timeline fork:
> > 
> >     /*
> >      * If this is the first startup of walreceiver (on this timeline),
> >      * initialize flushedUpto and latestChunkStart to the starting
> > point. */
> >     if (walrcv->receiveStart == 0 || walrcv->receivedTLI != tli)
> >     {
> >             walrcv->flushedUpto = recptr;
> >             walrcv->receivedTLI = tli;
> >             walrcv->latestChunkStart = recptr;
> >     }
> >     walrcv->receiveStart = recptr;
> >     walrcv->receiveStartTLI = tli;
> > 
> > After a primary loss, as far as the standby are up and running, it is fine
> > to use pg_last_wal_receive_lsn().
> > 
> > Why do you kill -9 your standby? Whay am I missing? Could you explain the
> > usecase you are working on to justify this?
> > 
> > Regards,  
> 
> 
> 



-- 
Jehan-Guillaume de Rorthais
Dalibo


Reply via email to