Re: [HACKERS] Online enabling of page level checksums

2017-01-24 Thread David Steele
On 1/23/17 9:45 PM, Jim Nasby wrote:

> To put it another way, I think it's entirely reasonable *from a
> technical standpoint* to enable by default in 10, with only the ability
> to dynamically disable. Given the concerns that keep popping up about
> dynamically enabling, I'm not at all sure that we could get that into 10.

+1.  I think this is a reasonable suggestion and realistic to complete
for inclusion in 10.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Online enabling of page level checksums

2017-01-23 Thread Jim Nasby

On 1/23/17 12:03 PM, Greg Stark wrote:

On Jan 22, 2017 11:13 AM, "Magnus Hagander" mailto:mag...@hagander.net>> wrote:


Yes, this means the entire db will end up in the transaction log
since everything is rewritten. That's not great, but for a lot of
people that will be a trade they're willing to make since it's a
one-time thing. Yes, this background process might take days or
weeks - that's OK as long as it happens online.


I'm not sure that's actually necessary. You could just log a wal record
saying "checksum this block" and if it gets replayed then recalculate
the checksum on that block again. This record could be exempt from the
usual rules for having a fpw.

There's no danger of torn pages from the checksum alone. The danger
would be if some other operation does dirty that page then your need to
know that the page is in this weird in between state where it's dirty
but not yet had a fpw written.

I'm not sure whether it's worth the infrastructure to have such a state
just for this or not. On the other hand it sounds like something that
would be useful.


I'm a bit concerned about how much fancy we're trying to put into a 
first pass at this.


I think it's reasonable to require a fairly significant amount of effort 
on the part of an admin to enable checksums. For that matter, I think 
it'd be a significant improvement if there was NO way to enable 
checksums, only disable them. That would at least allow us to enable 
them by default in initdb and provide DBAs an easy way to disable them 
if desired. That would get us a lot more data about whether checksums 
help with corruption than we're going to get otherwise.


To put it another way, I think it's entirely reasonable *from a 
technical standpoint* to enable by default in 10, with only the ability 
to dynamically disable. Given the concerns that keep popping up about 
dynamically enabling, I'm not at all sure that we could get that into 10.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Online enabling of page level checksums

2017-01-23 Thread Tom Lane
Jim Nasby  writes:
> For a first pass, I think it's acceptable for autovac and vac to notice 
> if a relation needs checksums computed and ignore the VM in that case 
> (make sure it's ignoring all frozen bits too).

> Likewise, for right now I think it's OK to force users that are enabling 
> this to manually connect to datallowcon=false and run vacuum.

I think it's a *complete* mistake to overload vacuum with this
responsibility.  Build a separate tool with a separate scheduling policy.

As one reason why not: vacuum doesn't generally attempt to scan indexes
sequentially at all.  Some of the index AMs might happen to do that, but
touching every page of an index is nowhere in vacuum's charter.  Nor is
there a good reason for index AMs to be involved in the job of placing
checksums, but they'd all have to know about this if you insist on going
through vacuum.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Online enabling of page level checksums

2017-01-23 Thread Jim Nasby

On 1/23/17 1:11 PM, David Christensen wrote:

I’m not sure that this will work as-is, unless we’re forcing VACUUM to ignore 
the visibility map.  I had originally considered having this sit on top of 
VACUUm though, we just need a way to iterate over all relations and read every 
page.

Another issue with this (that I think would still exist with the bgworker 
approach) is how to handle databases with datallowconn = 0.  `vacuumdb`, at 
least, explicitly filters out these rows when iterating over databases to 
connect to, so while we could enable them for all databases, we can’t enable 
for the cluster without verifying that these disallowed dbs are already 
checksummed.


For a first pass, I think it's acceptable for autovac and vac to notice 
if a relation needs checksums computed and ignore the VM in that case 
(make sure it's ignoring all frozen bits too).


Likewise, for right now I think it's OK to force users that are enabling 
this to manually connect to datallowcon=false and run vacuum.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Online enabling of page level checksums

2017-01-23 Thread David Christensen

> On Jan 23, 2017, at 10:59 AM, David Christensen  wrote:
> 
>> 
>> On Jan 23, 2017, at 10:53 AM, Simon Riggs  wrote:
>> 
>> On 23 January 2017 at 16:32, David Christensen  wrote:
>> 
>>> ** Handling checksums on a standby:
>>> 
>>> How to handle checksums on a standby is a bit trickier since checksums are 
>>> inherently a local cluster state and not WAL logged but we are storing 
>>> state in the system tables for each database we need to make sure that the 
>>> replicas reflect truthful state for the checksums for the cluster.
>> 
>> Not WAL logged? If the objective of this feature is robustness, it
>> will need to be WAL logged.
>> 
>> Relation options aren't accessed by the startup process during
>> recovery, so that part won't work in recovery. Perhaps disable
>> checksumming until the everything is enabled rather than relation by
>> relation.
>> 
>> If y'all serious about this then you're pretty late in the cycle for
>> such a huge/critical patch. So lets think of ways of reducing the
>> complexity...
>> 
>> Seems most sensible to add the "enable checksums for table" function
>> into VACUUM because then it will happen automatically via autovacuum,
>> or you could force the issue using something like
>> vacuumdb --jobs 4 --enable-checksums
>> That gives you vacuum_delay and a background worker for no effort
> 
> I’m not sure that this will work as-is, unless we’re forcing VACUUM to ignore 
> the visibility map.  I had originally considered having this sit on top of 
> VACUUm though, we just need a way to iterate over all relations and read 
> every page.

Another issue with this (that I think would still exist with the bgworker 
approach) is how to handle databases with datallowconn = 0.  `vacuumdb`, at 
least, explicitly filters out these rows when iterating over databases to 
connect to, so while we could enable them for all databases, we can’t enable 
for the cluster without verifying that these disallowed dbs are already 
checksummed.
--
David Christensen
End Point Corporation
da...@endpoint.com
785-727-1171





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Online enabling of page level checksums

2017-01-23 Thread Greg Stark
On Jan 22, 2017 11:13 AM, "Magnus Hagander"  wrote:


Yes, this means the entire db will end up in the transaction log since
everything is rewritten. That's not great, but for a lot of people that
will be a trade they're willing to make since it's a one-time thing. Yes,
this background process might take days or weeks - that's OK as long as it
happens online.


I'm not sure that's actually necessary. You could just log a wal record
saying "checksum this block" and if it gets replayed then recalculate the
checksum on that block again. This record could be exempt from the usual
rules for having a fpw.

There's no danger of torn pages from the checksum alone. The danger would
be if some other operation does dirty that page then your need to know that
the page is in this weird in between state where it's dirty but not yet had
a fpw written.

I'm not sure whether it's worth the infrastructure to have such a state
just for this or not. On the other hand it sounds like something that would
be useful.


Re: [HACKERS] Online enabling of page level checksums

2017-01-23 Thread David Christensen

> On Jan 23, 2017, at 10:53 AM, Simon Riggs  wrote:
> 
> On 23 January 2017 at 16:32, David Christensen  wrote:
> 
>> ** Handling checksums on a standby:
>> 
>> How to handle checksums on a standby is a bit trickier since checksums are 
>> inherently a local cluster state and not WAL logged but we are storing state 
>> in the system tables for each database we need to make sure that the 
>> replicas reflect truthful state for the checksums for the cluster.
> 
> Not WAL logged? If the objective of this feature is robustness, it
> will need to be WAL logged.
> 
> Relation options aren't accessed by the startup process during
> recovery, so that part won't work in recovery. Perhaps disable
> checksumming until the everything is enabled rather than relation by
> relation.
> 
> If y'all serious about this then you're pretty late in the cycle for
> such a huge/critical patch. So lets think of ways of reducing the
> complexity...
> 
> Seems most sensible to add the "enable checksums for table" function
> into VACUUM because then it will happen automatically via autovacuum,
> or you could force the issue using something like
> vacuumdb --jobs 4 --enable-checksums
> That gives you vacuum_delay and a background worker for no effort

I’m not sure that this will work as-is, unless we’re forcing VACUUM to ignore 
the visibility map.  I had originally considered having this sit on top of 
VACUUm though, we just need a way to iterate over all relations and read every 
page.

--
David Christensen
End Point Corporation
da...@endpoint.com
785-727-1171





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Online enabling of page level checksums

2017-01-23 Thread Simon Riggs
On 23 January 2017 at 16:32, David Christensen  wrote:

> ** Handling checksums on a standby:
>
> How to handle checksums on a standby is a bit trickier since checksums are 
> inherently a local cluster state and not WAL logged but we are storing state 
> in the system tables for each database we need to make sure that the replicas 
> reflect truthful state for the checksums for the cluster.

Not WAL logged? If the objective of this feature is robustness, it
will need to be WAL logged.

Relation options aren't accessed by the startup process during
recovery, so that part won't work in recovery. Perhaps disable
checksumming until the everything is enabled rather than relation by
relation.

If y'all serious about this then you're pretty late in the cycle for
such a huge/critical patch. So lets think of ways of reducing the
complexity...

Seems most sensible to add the "enable checksums for table" function
into VACUUM because then it will happen automatically via autovacuum,
or you could force the issue using something like
vacuumdb --jobs 4 --enable-checksums
That gives you vacuum_delay and a background worker for no effort

Hope that helps

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Online enabling of page level checksums

2017-01-23 Thread David Christensen
So as mentioned on IRC, I have a patch that I am working to rebase on HEAD with 
the following design.  It is very similar to what you have proposed, so maybe 
we can use my development notes as a jumping-off point for 
discussion/refinement.

* Incremental Checksums

PostgreSQL users should have a way up upgrading their cluster to use data 
checksums without having to do a costly pg_dump/pg_restore; in particular, 
checksums should be able to be enabled/disabled at will, with the database 
enforcing the logic of whether the pages considered for a given database are 
valid.

Considered approaches for this are having additional flags to pg_upgrade to set 
up the new cluster to use checksums where they did not before (or optionally 
turning these off).  This approach is a nice tool to have, but in order to be 
able to support this process in a manner which has the database online while 
the database is going throught the initial checksum process.

In order to support the idea of incremental checksums, this design adds the 
following things:

** pg_control:

Keep "data_checksum_version", but have it indicate *only* the algorithm version 
for checksums. i.e., it's no longer used for the data_checksum enabled/disabled 
state.

Add "data_checksum_state", an enum with multiple states: "disabled", 
"enabling", "enforcing" (perhaps "revalidating" too; something to indicate that 
we are reprocessing a database that purports to have been completely 
checksummed already)

An explanation of the states as well as the behavior of the checksums for each.

- disabled => not in a checksum cycle; no read validation, no checksums 
written.  This is the current behavior for Postgres *without* checksums.

- enabling => in a checksum cycle; no read validation, write checksums.  Any 
page that gets written to disk will be a valid checksum.  This is required when 
transitioning a cluster which has never had checksums, as the page reads would 
normally fail since they are uninitialized.

- enforcing => not in a checksum cycle; read validation, write checksums.  This 
is the current behavior of Postgres *with* checksums.

 (caveat: I'm not certain the following state is needed (and the current 
version of this patch doesn't have it)):

- revalidating => in a checksum cycle; read validation, write checksums.  The 
difference between this and "enabling" is that we care if page reads fail, 
since by definition they should have been validly checksummed, as we should 
verify this.

Add "data_checksum_cycle", a counter that gets incremented with every checksum 
cycle change.  This is used as a flag to verify when new checksum actions take 
place, for instance if we wanted to upgrade/change the checksum algorithm, or 
if we just want to support periodic checksum validation.

This variable will be compared against new values in the system tables to keep 
track of which relations still need to be checksummed in the cluster.

** pg_database:

Add a field "datlastchecksum" which will be the last checksum cycle which has 
completed for all relations in that database.

** pg_class:

Add a field "rellastchecksum" which stores the last successful checksum cycle 
for each relation.

** The checksum bgworker:

When the enabling event is initiated, we will iterate over all databases, 
checking for all databases where the "datlastchecksum" field is < the current 
checksum cycle.  For each of these, we will spawn a bgworker to connect to 
these dbs and iterate over pg_class looking for "rellastchecksum < 
data_checksum_cycle".  If it finds none (i.e., every record has rellastchecksum 
== data_checksum_cycle) then it marks the containing database as up-to-date by 
updating "datlastchecksum = data_checksum_cycle".  We can presumably skip over 
temporary and unlogged relations here.

For any relation that it finds in the database which is not checksummed, it 
starts an actual worker to handle the checksum process for this table.  Since 
the state of the cluster is already either "enforcing" or "revalidating", any 
block writes will get checksums added automatically, so the only thing the 
bgworker needs to do is load each block in the relation and explicitly mark as 
dirty (unless that's not required for FlushBuffer() to do its thing).  After 
every block in the relation is visited this way and checksummed, its pg_class 
record will have "rellastchecksum" updated.

(XXX: how to handle databases where connections are disabled, like "template1"?)

When all database have "datlastchecksum" == data_checksum_cycle, we initiate 
checksumming of any global cluster heap files.  When the global cluster tables 
heap files have been checksummed, then we consider the checksum cycle complete, 
change pg_control's "data_checksum_state" to "enforcing" and consider things 
fully up-to-date.


** Function API:

Interface to the functionality will be via the following Utility functions:

  - pg_enable_checksums(void) => turn checksums on for a cluster.  Will error 
if the s

Re: [HACKERS] Online enabling of page level checksums

2017-01-22 Thread Jim Nasby

On 1/22/17 5:13 AM, Magnus Hagander wrote:

If the system is interrupted before the background worker is done, it
starts over from the beginning. Previously touched blocks will be read
and verified, but not written (because their checksum is already
correct). This will take time, but not re-generate the WAL.


Another option would be to store a watermark of the largest block ID 
that had been verified for each relation/fork. That would be used to 
determine if checksums could be trusted, and more usefully would allow 
you to start where you left off if the verification process got interrupted.



I think the actual functions and background worker could go in an
extension that's installed and loaded only by those who need it. But the
core functionality of being able to have "checksum in progress" would
have to be in the core codebase.


If it was in contrib I think that'd be fine.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Online enabling of page level checksums

2017-01-22 Thread Magnus Hagander
So, that post I made about checksums certainly spurred a lot of discussion
:) One summary is that life would be a lot easier if we could turn
checksums on (and off) without re-initdbing.  I'm breaking out this
question into this thread to talk about it separately.


I've been toying with a branch to work on this, but haven't had a time to
get it even to compiling state. But instead of waiting until I have some
code to show, let me outline the idea I had.

My general idea is this:

Take one bit in the checksum version field and make it mean "in progress".
That means chat checksums can now be "on", "off", or "in progress".

When checksums are "in progress", PostgreSQL will compute and write
checksums whenever it writes out a buffer, but it will *not* verify
checksums on read.

This state would be set by calling a function (or an external command with
the system shut down if need be - I can accept a restart for this, but I'd
rather avoid it if possible).

This function would also launch a background worker. This worker would
enumerate the entire database block by block. Read a block, verify if the
checksum is set and correct. If it is, ignore it (because any further
updates will keep it in state ok when we're in state "in progress"). If not
then mark it as dirty and write it out through regular means, which will
include computing and writing the checksum since we're "in progress". With
something similar to vacuum cost delay to control how quickly it writes.

Yes, this means the entire db will end up in the transaction log since
everything is rewritten. That's not great, but for a lot of people that
will be a trade they're willing to make since it's a one-time thing. Yes,
this background process might take days or weeks - that's OK as long as it
happens online.

Once the background worker is done, it flips the checksum state to "on",
and the system starts verifying checksums as well.

If the system is interrupted before the background worker is done, it
starts over from the beginning. Previously touched blocks will be read and
verified, but not written (because their checksum is already correct). This
will take time, but not re-generate the WAL.

I think the actual functions and background worker could go in an extension
that's installed and loaded only by those who need it. But the core
functionality of being able to have "checksum in progress" would have to be
in the core codebase.

So, is there something obviously missing in this plan? Or just the code to
do it :)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/