Re: Using old master as new replica after clean switchover

2019-02-22 Thread Claudio Freire
On Fri, Feb 22, 2019 at 5:47 AM Jehan-Guillaume de Rorthais
 wrote:
>
> On Thu, 21 Feb 2019 15:38:21 -0300
> Claudio Freire  wrote:
>
> > On Tue, Feb 19, 2019 at 9:44 PM Michael Paquier  wrote:
> > >
> > > On Tue, Feb 19, 2019 at 04:27:02PM -0800, RSR999GMAILCOM wrote:
> > > > So  wanted to clarify if this procedure really requires the WAL archive
> > > > location on a shared storage ?
> > >
> > > Shared storage for WAL archives is not a requirement.  It is perfectly
> > > possible to use streaming replication to get correct WAL changes.
> > > Using an archive is recommended for some deployments and depending on
> > > your requirements and data retention policy, still you could have
> > > those archives on a different host and have the restore_command of the
> > > standbyt in recovery or the archive_command of the primary save the
> > > segments to it.  Depending on the frequency new WAL segments are
> > > generated, this depends of course.
> >
> > If I'm not mistaken, if you don't have WAL archive set up (a shared
> > filesystem isn't necessary, but the standby has to be able to restore
> > WAL segments from the archive), a few transactions that haven't been
> > streamed at primary shutdown could be lost, since the secondary won't
> > be able to stream anything after the primary has shut down.
>
> This has been fixed in 9.3. The primary node wait for all WAL records to be
> streamed to the connected standbys before shutting down. Including its 
> shutdown
> checkpoint. See:
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=985bd7d49726c9f178558491d31a570d47340459
>
> Because a standby could disconnect because of some failure during the shutdown
> process, you still need to make sure the standby-to-be-promoted received the
> shutdown checkpoint though.
>
> > WAL archive can always be restored even without a primary running, hence
> > why a WAL archive is needed.
>
> No. Primary does not force a WAL switch/archive during shutdown.

That's good to know, both of the above.



Re: Using old master as new replica after clean switchover

2019-02-22 Thread Jehan-Guillaume de Rorthais
On Thu, 21 Feb 2019 15:38:21 -0300
Claudio Freire  wrote:

> On Tue, Feb 19, 2019 at 9:44 PM Michael Paquier  wrote:
> >
> > On Tue, Feb 19, 2019 at 04:27:02PM -0800, RSR999GMAILCOM wrote:  
> > > So  wanted to clarify if this procedure really requires the WAL archive
> > > location on a shared storage ?  
> >
> > Shared storage for WAL archives is not a requirement.  It is perfectly
> > possible to use streaming replication to get correct WAL changes.
> > Using an archive is recommended for some deployments and depending on
> > your requirements and data retention policy, still you could have
> > those archives on a different host and have the restore_command of the
> > standbyt in recovery or the archive_command of the primary save the
> > segments to it.  Depending on the frequency new WAL segments are
> > generated, this depends of course.  
> 
> If I'm not mistaken, if you don't have WAL archive set up (a shared
> filesystem isn't necessary, but the standby has to be able to restore
> WAL segments from the archive), a few transactions that haven't been
> streamed at primary shutdown could be lost, since the secondary won't
> be able to stream anything after the primary has shut down.

This has been fixed in 9.3. The primary node wait for all WAL records to be
streamed to the connected standbys before shutting down. Including its shutdown
checkpoint. See:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=985bd7d49726c9f178558491d31a570d47340459

Because a standby could disconnect because of some failure during the shutdown
process, you still need to make sure the standby-to-be-promoted received the
shutdown checkpoint though.

> WAL archive can always be restored even without a primary running, hence
> why a WAL archive is needed.

No. Primary does not force a WAL switch/archive during shutdown.

-- 
Jehan-Guillaume de Rorthais
Dalibo



Re: Using old master as new replica after clean switchover

2019-02-21 Thread Michael Paquier
On Thu, Feb 21, 2019 at 10:26:37AM -0800, RSR999GMAILCOM wrote:
> Is there any link where  the required setup and the step by step procedure
> for performing the controlled switchover are listed?

Docs about failover are here:
https://www.postgresql.org/docs/current/warm-standby-failover.html

Now I don't recall that we have a section about a step-by-step
procedure for one case of failover or another.  The docs could be
perhaps improved regarding that, particularly for the case mentioned
here where it is possible to relink a previous master to a promoted
standby without risks of corruption:
- Stop cleanly the primary with smart or fast mode.
- Promote the standby.
- Add recovery.conf to the previous primary.
- Restart the previous primary as a new standby.
--
Michael


signature.asc
Description: PGP signature


Re: Using old master as new replica after clean switchover

2019-02-21 Thread Michael Paquier
On Thu, Feb 21, 2019 at 03:38:21PM -0300, Claudio Freire wrote:
> If I'm not mistaken, if you don't have WAL archive set up (a shared
> filesystem isn't necessary, but the standby has to be able to restore
> WAL segments from the archive), a few transactions that haven't been
> streamed at primary shutdown could be lost, since the secondary won't
> be able to stream anything after the primary has shut down. WAL
> archive can always be restored even without a primary running, hence
> why a WAL archive is needed.
> 
> Or am I missing something?

Well, my point is that you may not need an archive if you are able to
stream the changes from a primary using streaming if the primary has a
replication slot or if a checkpoint has not recycled yet the segments
that a standby may need.  If the primary is offline, and you need to
recover a standby, then an archive is mandatory.  When recovering from
an archive, the standby would be able to catch up to the end of the
segment archived as we don't enforce a segment switch when a node
shuts down.  If using pg_receivewal as a form of archiving with its
--synchronous mode, it is also possible to stream up to the point
where the primary has generated its shutdown checkpoint, so you would
not lose data included on the last segment the primary was working on
when stopped.
--
Michael


signature.asc
Description: PGP signature


Re: Using old master as new replica after clean switchover

2019-02-21 Thread Claudio Freire
On Tue, Feb 19, 2019 at 9:44 PM Michael Paquier  wrote:
>
> On Tue, Feb 19, 2019 at 04:27:02PM -0800, RSR999GMAILCOM wrote:
> > So  wanted to clarify if this procedure really requires the WAL archive
> > location on a shared storage ?
>
> Shared storage for WAL archives is not a requirement.  It is perfectly
> possible to use streaming replication to get correct WAL changes.
> Using an archive is recommended for some deployments and depending on
> your requirements and data retention policy, still you could have
> those archives on a different host and have the restore_command of the
> standbyt in recovery or the archive_command of the primary save the
> segments to it.  Depending on the frequency new WAL segments are
> generated, this depends of course.

If I'm not mistaken, if you don't have WAL archive set up (a shared
filesystem isn't necessary, but the standby has to be able to restore
WAL segments from the archive), a few transactions that haven't been
streamed at primary shutdown could be lost, since the secondary won't
be able to stream anything after the primary has shut down. WAL
archive can always be restored even without a primary running, hence
why a WAL archive is needed.

Or am I missing something?



Re: Using old master as new replica after clean switchover

2019-02-21 Thread RSR999GMAILCOM
Is there any link where  the required setup and the step by step procedure
for performing the controlled switchover are listed?

Thanks
Raj

On Tue, Feb 19, 2019 at 4:44 PM Michael Paquier  wrote:

> On Tue, Feb 19, 2019 at 04:27:02PM -0800, RSR999GMAILCOM wrote:
> > So  wanted to clarify if this procedure really requires the WAL archive
> > location on a shared storage ?
>
> Shared storage for WAL archives is not a requirement.  It is perfectly
> possible to use streaming replication to get correct WAL changes.
> Using an archive is recommended for some deployments and depending on
> your requirements and data retention policy, still you could have
> those archives on a different host and have the restore_command of the
> standbyt in recovery or the archive_command of the primary save the
> segments to it.  Depending on the frequency new WAL segments are
> generated, this depends of course.
> --
> Michael
>


Re: Using old master as new replica after clean switchover

2019-02-19 Thread Michael Paquier
On Tue, Feb 19, 2019 at 04:27:02PM -0800, RSR999GMAILCOM wrote:
> So  wanted to clarify if this procedure really requires the WAL archive
> location on a shared storage ?

Shared storage for WAL archives is not a requirement.  It is perfectly
possible to use streaming replication to get correct WAL changes.
Using an archive is recommended for some deployments and depending on
your requirements and data retention policy, still you could have
those archives on a different host and have the restore_command of the
standbyt in recovery or the archive_command of the primary save the
segments to it.  Depending on the frequency new WAL segments are
generated, this depends of course.
--
Michael


signature.asc
Description: PGP signature


Using old master as new replica after clean switchover

2019-02-19 Thread RSR999GMAILCOM
Hello Postgres Gurus,

After searching (on www.postgresql.org/Google) I found that the following
steps can be used to perform a switchover in Postgres (version 9.3):
*Step 1.* Do clean shutdown of Primary (-m fast or smart).
*Step 2. *Check for sync status and recovery status of Standby before
promoting it.
  Once Standby is in complete sync. At this stage we are safe
to promote it as Primary.
*Step 3. *Open the Standby as new Primary by pg_ctl promote or creating a
trigger file.
*Step 4.* Restart old Primary as standby and allow to follow the new
timeline by passing "recovery_target_timline='latest'" in \
$PGDATA/recovery.conf file.

But I also read in one of the google post that this procedure requires the
WAL archive location to exist on a shared storage to which both the Master
and Slave should have access to.

So  wanted to clarify if this procedure really requires the WAL archive
location on a shared storage ?

Thanks
Raj


Re: Using old master as new replica after clean switchover

2018-10-25 Thread Nikolay Samokhvalov
On Thu, Oct 25, 2018 at 6:03 AM Jehan-Guillaume de Rorthais 
wrote:

> What about logging the shutdown checkpoint on the old master?
> On the standby side, we could cross-check it with a function confirming:
> 1/ the very last XLogRecord received was the old master shutdown checkpoint
> 2/ the received shutdown checkpoint has the same LSN
>

Additionally, the new instructions in the doc might include recommendation,
what to do if we
found that shutdown checkpoint wasn't received and replayed by the
replica-to-promote. From my
understanding, before promotion, we could "manually" transfer missing WAL
data from the old,
inactive master and replay it on the replica-to-promote (of course, if
recovery_command is
properly configured on it). Right?

By the way, if it looks to me that it might be better to write more than
just few sentences, what if it
will be a new chapter – say, "Switchover", next to "Failover". It would
also give better understanding
to the reading, explicitly distinguishing planned and unplanned processes
of master/replica role
changes.

Regards,
Nik


Re: Using old master as new replica after clean switchover

2018-10-25 Thread Michael Paquier
On Thu, Oct 25, 2018 at 11:15:51AM +0200, Jehan-Guillaume de Rorthais wrote:
> On Thu, 25 Oct 2018 02:57:18 -0400
> Nikolay Samokhvalov  wrote:
>> My research shows that some people already rely on the following when
>> planned failover (aka switchover) procedure, doing it in production:
>> 
>>  1) shutdown the current master
>>  2) ensure that the "master candidate" replica has received all WAL data
>> including shutdown checkpoint from the old master
>>  3) promote the master candidate to make it new master
>>  4) configure recovery.conf on the old master node, while it's inactive
>>  5) start the old master node as a new replica following the new master.
> 
> Indeed.

The important point here is that the primary will wait for the shutdown
checkpoint record to be replayed on the standbys before finishing to
shut down.

> The only additional nice step would be to be able to run some more safety 
> tests
> AFTER the switchover process on te old master. The only way I can think of
> would be to run pg_rewind even if it doesn't do much.

Do you have something specific in mind here?  I am curious if you're
thinking about things like page-level checks for LSN matches under some
threshold or such, because you should not have pages on the previous
primary which have LSNs newer than the point up to which the standby has
replayed.

>> if so, let's add it to the documentation, making it official. The patch is
>> attached.
> 
> I suppose we should add the technical steps in a sample procedure?

If an addition to the docs is done, symbolizing the steps in a list
would be cleaner, with perhaps something in a dedicated section or a new
sub-section.  The failover flow you are mentioning is good practice
because that's safe, and there is always room for improvements in the
docs.
--
Michael


signature.asc
Description: PGP signature


Re: Using old master as new replica after clean switchover

2018-10-25 Thread Jehan-Guillaume de Rorthais
On Thu, 25 Oct 2018 20:45:57 +0900
Michael Paquier  wrote:

> On Thu, Oct 25, 2018 at 11:15:51AM +0200, Jehan-Guillaume de Rorthais wrote:
> > On Thu, 25 Oct 2018 02:57:18 -0400
> > Nikolay Samokhvalov  wrote:  
> >> My research shows that some people already rely on the following when
> >> planned failover (aka switchover) procedure, doing it in production:
> >> 
> >>  1) shutdown the current master
> >>  2) ensure that the "master candidate" replica has received all WAL data
> >> including shutdown checkpoint from the old master
> >>  3) promote the master candidate to make it new master
> >>  4) configure recovery.conf on the old master node, while it's inactive
> >>  5) start the old master node as a new replica following the new master.  
> > 
> > Indeed.  
> 
> The important point here is that the primary will wait for the shutdown
> checkpoint record to be replayed on the standbys before finishing to
> shut down.

Yes. However, it gives up if the connection to the standby fails. This is
obvious. But that's why we really need to double check on the standby the
shutdown checkpoints has been received. Just in case of some network troubles
or such.

> > The only additional nice step would be to be able to run some more safety
> > tests AFTER the switchover process on te old master. The only way I can
> > think of would be to run pg_rewind even if it doesn't do much.  
> 
> Do you have something specific in mind here?  I am curious if you're
> thinking about things like page-level checks for LSN matches under some
> threshold or such, because you should not have pages on the previous
> primary which have LSNs newer than the point up to which the standby has
> replayed.

This could be a decent check. Heavy and slow, but safe.

Other ideas I have (see bellow) are only related to ease the existing
procedure.

Both are interesting projects I could hopefully work on.

> >> if so, let's add it to the documentation, making it official. The patch is
> >> attached.  
> > 
> > I suppose we should add the technical steps in a sample procedure?  
> 
> If an addition to the docs is done, symbolizing the steps in a list
> would be cleaner, with perhaps something in a dedicated section or a new
> sub-section.  The failover flow you are mentioning is good practice
> because that's safe, and there is always room for improvements in the
> docs.

The hardest part to explain here is how to check the shutdown checkpoint hit
the standby-to-promote. 
* in PAF, I'm using pg_waldump to check if the shutdown checkpoint has been
  received. 
* in manual operation, I force a checkpoint on the standby and compare "Latest
  checkpoint's REDO location" from the controldata file with the one on the old
  master.

I'm not sure how to explain clearly one or the other method in the doc.

Two ideas come in mind to improve this.

What about logging the shutdown checkpoint on the old master?
On the standby side, we could cross-check it with a function confirming:
1/ the very last XLogRecord received was the old master shutdown checkpoint
2/ the received shutdown checkpoint has the same LSN

Second idea would be that an old master detect it has been started as a new
standby and only replay XLogRecord from the new master if its TL fork is
following its previous TL and shutdown checkpoint?



Re: Using old master as new replica after clean switchover

2018-10-25 Thread Jehan-Guillaume de Rorthais
On Thu, 25 Oct 2018 02:57:18 -0400
Nikolay Samokhvalov  wrote:
...
> My research shows that some people already rely on the following when
> planned failover (aka switchover) procedure, doing it in production:
> 
>  1) shutdown the current master
>  2) ensure that the "master candidate" replica has received all WAL data
> including shutdown checkpoint from the old master
>  3) promote the master candidate to make it new master
>  4) configure recovery.conf on the old master node, while it's inactive
>  5) start the old master node as a new replica following the new master.

Indeed.

> It looks to me now, that if no steps missed in the procedure, this approach
> is eligible for Postgres versions 9.3+ (for older versions like 9.3 maybe
> not really always – people who know details better will correct me here
> maybe). Am I right? Or I'm missing some risks here?

As far as I know, this is correct.

> Two changes were made in 9.3 which allowed this approach in general [1]
> [2]. Also, I see from the code [3] that during shutdown process, the
> walsenders are the last who are stopped, so allow replicas to get the
> shutdown checkpoint information.

I had the same conclusions when I was studying controlled failover some years
ago to implement it PAF project (allowing controlled switchover in one command).
Here is a discussions around switchover taking place three years ago on
Pacemaker mailing list:

https://lists.clusterlabs.org/pipermail/users/2016-October/011568.html

> Is this approach considered as safe now?

Considering above points, I do think so.

The only additional nice step would be to be able to run some more safety tests
AFTER the switchover process on te old master. The only way I can think of
would be to run pg_rewind even if it doesn't do much.

> if so, let's add it to the documentation, making it official. The patch is
> attached.

I suppose we should add the technical steps in a sample procedure?



Using old master as new replica after clean switchover

2018-10-25 Thread Nikolay Samokhvalov
Currently, the documentation explicitly states, that after failover, the
old master must be recreated from scratch, or pg_rewind should be used
(requiring wal_log_hints to be on, which is off by default):

> The former standby is now the primary, but the former primary is down and
might stay down. To return to normal operation, a standby server must be
recreated, either on the former primary system when it comes up, or on a
third, possibly new, system. The pg_rewind utility can be used to speed up
this process on large clusters.

My research shows that some people already rely on the following when
planned failover (aka switchover) procedure, doing it in production:

 1) shutdown the current master
 2) ensure that the "master candidate" replica has received all WAL data
including shutdown checkpoint from the old master
 3) promote the master candidate to make it new master
 4) configure recovery.conf on the old master node, while it's inactive
 5) start the old master node as a new replica following the new master.

It looks to me now, that if no steps missed in the procedure, this approach
is eligible for Postgres versions 9.3+ (for older versions like 9.3 maybe
not really always – people who know details better will correct me here
maybe). Am I right? Or I'm missing some risks here?

Two changes were made in 9.3 which allowed this approach in general [1]
[2]. Also, I see from the code [3] that during shutdown process, the
walsenders are the last who are stopped, so allow replicas to get the
shutdown checkpoint information.

Is this approach considered as safe now?

if so, let's add it to the documentation, making it official. The patch is
attached.

Links:
[0] 26.3 Failover
https://www.postgresql.org/docs/current/static/warm-standby-failover.html
[1] Support clean switchover
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=985bd7d49726c9f178558491d31a570d47340459
[2] Allow a streaming replication standby to follow a timeline switch
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=abfd192b1b5ba5216ac4b1f31dcd553106304b19
[3]
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/replication/walsender.c;hb=HEAD#l276


Regards,
Nik


failover_doc.patch
Description: Binary data