Re: Re: Re: Enabling checksums on a streaming replica

2019-06-28 Thread Michael Paquier
On Fri, Jun 28, 2019 at 07:43:30AM -0400, Brad Nicholson wrote:
> So if all the checksums are being recalculated on the replica, this
> approach should be relatively safe, should it not?

Yep.

> Assuming pg_checksums is doing the right thing (and it looks to me like it
> should be).

The constraint that a cluster needs to be cleanly shut down to be able
to enable checksums with pg_checksums is the actual deal here.  After
that of course comes the WAL retention on the primary or in the WAL
archives that a standby would need again to catch up while it was
offline.
--
Michael


signature.asc
Description: PGP signature


Re: Re: Re: Enabling checksums on a streaming replica

2019-06-28 Thread Brad Nicholson
Michael Paquier  wrote on 06/27/2019 11:17:10 PM:

> From: Michael Paquier 
> To: Brad Nicholson 
> Cc: Grigory Smolkin , pgsql-
> gene...@lists.postgresql.org
> Date: 06/27/2019 11:17 PM
> Subject: [EXTERNAL] Re: Re: Enabling checksums on a streaming replica
>
> On Wed, Jun 26, 2019 at 02:08:19PM -0400, Brad Nicholson wrote:
> > In testing, it doesn't appear to matter.  I've ensured that I've
generated
> > some full page writes (confirmed via pg_waldump), and those apply
> > fine.
>
> Full pages writes are first written from shared buffers to WAL, where
> their checksums does not actually apply.  When the WAL records are
> read an applied, a full page image is recovered in shared buffers.
> The checksum of the page would get updated once the shared buffer page
> used is evicted and written back to disk.
>

So if all the checksums are being recalculated on the replica, this
approach should be relatively safe, should it not?

Assuming pg_checksums is doing the right thing (and it looks to me like it
should be).

Brad.


Re: Re: Enabling checksums on a streaming replica

2019-06-27 Thread Michael Paquier
On Wed, Jun 26, 2019 at 02:08:19PM -0400, Brad Nicholson wrote:
> In testing, it doesn't appear to matter.  I've ensured that I've generated
> some full page writes (confirmed via pg_waldump), and those apply
> fine.

Full pages writes are first written from shared buffers to WAL, where
their checksums does not actually apply.  When the WAL records are
read an applied, a full page image is recovered in shared buffers.
The checksum of the page would get updated once the shared buffer page
used is evicted and written back to disk.

> The one thing I'm not sure of, when verifying checksums via pg_checksums I
> see blocks being skipped.  I'm not sure what or why it is skipping blocks,
> and if that indicates a problem or not.

Some files can get skipped entirely, but all the blocks of a file
are basically checked.  Well, except for new pages of course which
have no checksums to look at yet.
--
Michael


signature.asc
Description: PGP signature


Re: Re: Enabling checksums on a streaming replica

2019-06-26 Thread Brad Nicholson
Grigory Smolkin  wrote on 06/26/2019 12:37:10 PM:

> From: Grigory Smolkin 
> To: pgsql-general@lists.postgresql.org
> Date: 06/26/2019 12:37 PM
> Subject: [EXTERNAL] Re: Enabling checksums on a streaming replica
>
> Hello!
> Very interesting idea, but what about full page writes, that comes
> from master?
> Can they be a problem?


In testing, it doesn't appear to matter.  I've ensured that I've generated
some full page writes (confirmed via pg_waldump), and those apply fine.

The one thing I'm not sure of, when verifying checksums via pg_checksums I
see blocks being skipped.  I'm not sure what or why it is skipping blocks,
and if that indicates a problem or not.

pg_checksums -c /data/10_repl/

Checksum scan completed
Files scanned:  1530
Blocks scanned: 100782
Blocks skipped: 3
Bad checksums:  0
Data checksum version: 1

Brad


Re: Enabling checksums on a streaming replica

2019-06-26 Thread Grigory Smolkin

Hello!

Very interesting idea, but what about full page writes, that comes from 
master?

Can they be a problem?

On 6/26/19 6:32 PM, Brad Nicholson wrote:


I'm wondering about the validity of using the pg_checksums utility to 
enable checksum's on a streaming replication standby, and then 
promoting that standby as a way to enable checksums on existing clusters.


I've tested the process out, and it "works" (by works I mean doesn't 
blow up or log any errors). But this seems far enough outside of 
supported territory that I'm curious what others think.


The process is:

- Primary with checksums off
- create a streaming replica off that primary
- stop the secondary
- enable checksums on the secondary with the pg_checksums utility
- start the replica
- promote the replica

I've thrown load at it while the it was replicating from checksums off 
-> checksums on, promoted it, and verified the checksums after with 
pg_checksums.


Any thoughts on this approach?

Thanks,
Brad


--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company