Re: [HACKERS] WAL consistency check facility

2017-02-14 Thread Michael Paquier
On Wed, Feb 15, 2017 at 11:08 AM, Robert Haas  wrote:
> On Tue, Feb 14, 2017 at 7:12 PM, Tom Lane  wrote:
>> FWIW, my own habit when creating new PG files is generally to write
>>
>>  * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
>>  * Portions Copyright (c) 1994, Regents of the University of California
>>
>> even if it's "all new" code.  The main reason being that it's hardly ever
>> the case that you didn't copy-and-paste some amount of stuff out of a
>> pre-existing file, and trying to sort out how much of what originated
>> exactly when is an unrewarding exercise.  Even if it is basically all
>> new code, this feels like giving an appropriate amount of credit to
>> Those Who Went Before Us.
>
> Right.  I tend to do the same, and wonder if we shouldn't make that a
> general practice.

This looks sensible to me. No-brainer rules that make sense are less
things to worry about.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2017-02-14 Thread Robert Haas
On Tue, Feb 14, 2017 at 7:12 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Tue, Feb 14, 2017 at 5:16 PM, Michael Paquier
>>  wrote:
>>> Just for curiosity: does the moment when the code has been written or
>>> committed counts? It's no big deal seeing how liberal the Postgres
>>> license is, but this makes me wonder...
>
>> IANAL, but I think if you ask one, he or she will tell you that what
>> matters is the date the work was created.  In the case of code, that
>> means when the code was written.
>
> FWIW, my own habit when creating new PG files is generally to write
>
>  * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
>  * Portions Copyright (c) 1994, Regents of the University of California
>
> even if it's "all new" code.  The main reason being that it's hardly ever
> the case that you didn't copy-and-paste some amount of stuff out of a
> pre-existing file, and trying to sort out how much of what originated
> exactly when is an unrewarding exercise.  Even if it is basically all
> new code, this feels like giving an appropriate amount of credit to
> Those Who Went Before Us.

Right.  I tend to do the same, and wonder if we shouldn't make that a
general practice.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2017-02-14 Thread Tom Lane
Robert Haas  writes:
> On Tue, Feb 14, 2017 at 5:16 PM, Michael Paquier
>  wrote:
>> Just for curiosity: does the moment when the code has been written or
>> committed counts? It's no big deal seeing how liberal the Postgres
>> license is, but this makes me wonder...

> IANAL, but I think if you ask one, he or she will tell you that what
> matters is the date the work was created.  In the case of code, that
> means when the code was written.

FWIW, my own habit when creating new PG files is generally to write

 * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
 * Portions Copyright (c) 1994, Regents of the University of California

even if it's "all new" code.  The main reason being that it's hardly ever
the case that you didn't copy-and-paste some amount of stuff out of a
pre-existing file, and trying to sort out how much of what originated
exactly when is an unrewarding exercise.  Even if it is basically all
new code, this feels like giving an appropriate amount of credit to
Those Who Went Before Us.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2017-02-14 Thread Robert Haas
On Tue, Feb 14, 2017 at 5:16 PM, Michael Paquier
 wrote:
> On Wed, Feb 15, 2017 at 2:43 AM, Robert Haas  wrote:
>> On Thu, Feb 9, 2017 at 8:17 PM, Michael Paquier
>>  wrote:
>>> Please find attached a patch with those fixes.
>>
>> Committed, but I changed the copyright dates to 2016-2017 rather than
>> just 2017 since surely some of the code was originally written before
>> 2017.  Even that might not really be going back far enough, but it
>> doesn't matter too much.
>
> Just for curiosity: does the moment when the code has been written or
> committed counts? It's no big deal seeing how liberal the Postgres
> license is, but this makes me wonder...

IANAL, but I think if you ask one, he or she will tell you that what
matters is the date the work was created.  In the case of code, that
means when the code was written.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2017-02-14 Thread Michael Paquier
On Wed, Feb 15, 2017 at 2:43 AM, Robert Haas  wrote:
> On Thu, Feb 9, 2017 at 8:17 PM, Michael Paquier
>  wrote:
>> Please find attached a patch with those fixes.
>
> Committed, but I changed the copyright dates to 2016-2017 rather than
> just 2017 since surely some of the code was originally written before
> 2017.  Even that might not really be going back far enough, but it
> doesn't matter too much.

Just for curiosity: does the moment when the code has been written or
committed counts? It's no big deal seeing how liberal the Postgres
license is, but this makes me wonder...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2017-02-14 Thread Robert Haas
On Thu, Feb 9, 2017 at 8:17 PM, Michael Paquier
 wrote:
> Please find attached a patch with those fixes.

Committed, but I changed the copyright dates to 2016-2017 rather than
just 2017 since surely some of the code was originally written before
2017.  Even that might not really be going back far enough, but it
doesn't matter too much.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2017-02-09 Thread Michael Paquier
On Thu, Feb 9, 2017 at 5:56 AM, Robert Haas  wrote:
> On Wed, Feb 8, 2017 at 1:25 AM, Kuntal Ghosh  
> wrote:
>> Thank you Robert for the review. Please find the updated patch in the
>> attachment.
>
> I have committed this patch after fairly extensive revisions:

Cool. I had finally a look at what has been committed in a507b869.
Running regression tests with all RMGRs enabled, a single installcheck
generates 7GB of WAL. Woah.

> * Rewrote the documentation to give some idea what the underlying
> mechanism of operation of the feature is, so that users who choose to
> enable this will hopefully have some understanding of what they've
> turned on.

Thanks, those look good to me.

> * Renamed "char *page" arguments to "char *pagedata" and "Page page",
> because tempPage doesn't seem to be to be any better a name than
> page_norm.

> * Removed assertion in checkXLogConsistency that consistency checking
> is enabled for the RM; that's trivially false if
> wal_consistency_checking is not the same on the master and the
> standby.  (Note that quite apart from the issue of whether this
> function should exist at all, adding it to a header file after the
> closing #endif guard is certainly not right.)

I recall doing those two things the same way as in the commit. Not
sure at which point they have been re-introduced.

> * Changed checkXLogConsistency to skip pages whose LSN is newer than
> that of the record.  Without this, if you shut down recovery and
> restart it, it complains of inconsistent pages and dies.  (I'm not
> sure this is the only scenario that needs to be covered; it would be
> smart to do more testing of restarting the standby.)

Good point.

> -- a WAL consistency failure causes your standby to die a hard death.
> (Maybe there should be a way to suppress consistency checking on the
> standby -- but I think not just by requiring wal_consistency_checking
> on both ends.  Or maybe we should just downgrade the FATAL to WARNING;
> blowing up the standby irrevocably seems like poor behavior.)

Having a FATAL is useful for buildfarm members, that would show up in
red. Having a switch to generate a warning would be useful for live
deployments I agree. Now I think that we need as well two things:
- A recovery test to run regression tests with a standby behind.
- Extend the TAP tests so as it is possible to fill in postgresql.conf
with custom variables.
- have the buildfarm client run recovery tests!
I am fine to write those patches.

> I also bumped XLOG_PAGE_MAGIC (which is normally done by the
> committer, not the patch author, so I wasn't expecting that to be in
> the patch as submitted).

Here are a couple of things I have noticed while looking at the code.

+ * Portions Copyright (c) 2016, PostgreSQL Global Development Group
s/2016/2017/ in bufmask.c and bufmask.h.

+   if (ItemIdIsNormal(iid))
+   {
+
+   HeapTupleHeader page_htup = (HeapTupleHeader) page_item;
Unnecessary newline here.

+* Read the contents from the backup copy, stored in WAL record and
+* store it in a temporary page. There is not need to allocate a new
+* page here, a local buffer is fine to hold its contents and a mask
+* can be directly applied on it.
s/not need/no need/.

In checkXLogConsistency(), FPWs that have the flag BKPIMAGE_APPLY set
will still be checked, resulting in a FPW being compared to itself. I
think that those had better be bypassed.

Please find attached a patch with those fixes.
-- 
Michael


consistency-checks-fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2017-02-08 Thread Kuntal Ghosh
On Thu, Feb 9, 2017 at 2:26 AM, Robert Haas  wrote:
> On Wed, Feb 8, 2017 at 1:25 AM, Kuntal Ghosh  
> wrote:
>> Thank you Robert for the review. Please find the updated patch in the
>> attachment.
>
> I have committed this patch after fairly extensive revisions:
>
Thank you, Robert, for the above corrections and commit. Thanks to
Michael Paquier, Peter Eisentraut, Amit Kapila, Álvaro Herrera, and
Simon Riggs for taking their time to complete the patch. It was a
great learning experience for me.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2017-02-08 Thread Robert Haas
On Wed, Feb 8, 2017 at 1:25 AM, Kuntal Ghosh  wrote:
> Thank you Robert for the review. Please find the updated patch in the
> attachment.

I have committed this patch after fairly extensive revisions:

* Rewrote the documentation to give some idea what the underlying
mechanism of operation of the feature is, so that users who choose to
enable this will hopefully have some understanding of what they've
turned on.
* Renamed "char *page" arguments to "char *pagedata" and "Page page",
because tempPage doesn't seem to be to be any better a name than
page_norm.
* Moved bufmask.c to src/backend/access/common, because there's no
code in src/backend/storage/buffer that knows anything about the
format of pages; that is the job of AMs, hence src/backend/access.
* Improved some comments in bufmask.c
* Removed consistencyCheck_is_enabled in favor of determining which
RMs support masking by the presence of absence of an rm_mask function.
* Removed assertion in checkXLogConsistency that consistency checking
is enabled for the RM; that's trivially false if
wal_consistency_checking is not the same on the master and the
standby.  (Note that quite apart from the issue of whether this
function should exist at all, adding it to a header file after the
closing #endif guard is certainly not right.)
* Changed checkXLogConsistency to use RBM_NORMAL_NO_LOG instead of
RBM_NORMAL.  I'm not sure if there are any cases where this makes a
difference, but it seems safer.
* Changed checkXLogConsistency to skip pages whose LSN is newer than
that of the record.  Without this, if you shut down recovery and
restart it, it complains of inconsistent pages and dies.  (I'm not
sure this is the only scenario that needs to be covered; it would be
smart to do more testing of restarting the standby.)
* Made wal_consistency_checking a developer option instead of a WAL
option.  Even though it CAN be used in production, we don't
particularly want to encourage that; enabling WAL consistency checking
has a big performance cost and makes your system more fragile not less
-- a WAL consistency failure causes your standby to die a hard death.
(Maybe there should be a way to suppress consistency checking on the
standby -- but I think not just by requiring wal_consistency_checking
on both ends.  Or maybe we should just downgrade the FATAL to WARNING;
blowing up the standby irrevocably seems like poor behavior.)
* Coding style improvement in check_wal_consistency_checking.
* Removed commas in messages added to pg_xlogdump; those didn't look
good to me, on further review.
* Comment improvements in xlog_internal.h and xlogreader.h

I also bumped XLOG_PAGE_MAGIC (which is normally done by the
committer, not the patch author, so I wasn't expecting that to be in
the patch as submitted).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2017-02-07 Thread Kuntal Ghosh
On Tue, Jan 31, 2017 at 9:36 PM, Robert Haas  wrote:
> On Thu, Jan 5, 2017 at 12:54 AM, Kuntal Ghosh
>  wrote:
>> I've attached the patch with the modified changes. PFA.
>
> Can this patch check contrib/bloom?
>
Added support for generic resource manager type.

> +/*
> + * Mask some line pointer bits, particularly those marked as
> + * used on a master and unused on a standby.
> + */
>
> Comment doesn't explain why we need to do this.
>
Added comments.

> +/*
> + * For GIN_DELETED page, the page is initialized to empty.
> + * Hence mask everything.
> + */
> +if (opaque->flags & GIN_DELETED)
> +memset(page_norm, MASK_MARKER, BLCKSZ);
> +else
> +mask_unused_space(page_norm);
>
> If the page is initialized to empty, why do we need to mask
> anything/everything?  Anyway, it doesn't seem right to mask the
> GinPageOpaque structure itself. Or at least it doesn't seem right to
> mask the flags word.
>
Modified accordingly.

>
> +if (!HeapTupleHeaderXminFrozen(page_htup))
> +page_htup->t_infomask |= HEAP_XACT_MASK;
> +else
> +page_htup->t_infomask |= HEAP_XMAX_COMMITTED |
> HEAP_XMAX_INVALID;
>
> Comment doesn't address this logic.  Also, the "else" case shouldn't
> exist at all, I think.
Added comments. I think that "Else" part is needed for xmax.

>
> +/*
> + * For a speculative tuple, the content of t_ctid is conflicting
> + * between the backup page and current page. Hence, we set it
> + * to the current block number and current offset.
> + */
>
> Why does it differ?  Is that a bug?
>
Added comments.

>
> +/*
> + * During replay of a btree page split, we don't set the BTP_SPLIT_END
> + * flag of the right sibling and initialize th cycle_id to 0 for the same
> + * page.
> + */
>
> A reference to the name of the redo function might be helpful here
> (and in some other places), unless it's just ${AMNAME}_redo.  "th" is
> a typo for "the".
Corrected.

> +appendStringInfoString(buf, " (full page
> image, apply)");
> +else
> +appendStringInfoString(buf, " (full page image)");
>
> How about "(full page image)" and "(full page image, for WAL verification)"?
>
> Similarly in XLogDumpDisplayRecord, I think we should assume that
> "FPW" means what it has always meant, and leave that output alone.
> Instead, distinguish the WAL-consistency-checking case when it
> happens, e.g. "(FPW for consistency check)".
>
Corrected.

> +checkConsistency(XLogReaderState *record)
>
> How about CheckXLogConsistency()?
>
> + * If needs_backup is true or wal consistency check is enabled for
>
> ...or WAL checking is enabled...
>
> + * If WAL consistency is enabled for the resource manager of
>
> If WAL consistency checking is enabled...
>
> + * Note: when a backup block is available in XLOG with BKPIMAGE_APPLY flag
>
> with the BKPIMAGE_APPLY flag
Modified accordingly.

>
> - * In RBM_ZERO_* modes, if the page doesn't exist, the relation is extended
> - * with all-zeroes pages up to the referenced block number.  In
> - * RBM_ZERO_AND_LOCK and RBM_ZERO_AND_CLEANUP_LOCK modes, the return value
> + * In RBM_ZERO_* modes, if the page doesn't exist or BKPIMAGE_APPLY flag
> + * is not set for the backup block, the relation is extended with all-zeroes
> + * pages up to the referenced block number.
>
> OK, I'm puzzled by this.  Surely we don't want the WAL consistency
> checking facility to cause the relation to be extended like this.  And
> I don't see why it would, because the WAL consistency checking happens
> after the page changes have already been applied.  I wonder if this
> hunk is in error and should be dropped.
>
Dropped comments.

> +PageXLogRecPtrSet(phdr->pd_lsn, PG_UINT64_MAX);
> +phdr->pd_prune_xid = PG_UINT32_MAX;
> +phdr->pd_flags |= PD_PAGE_FULL | PD_HAS_FREE_LINES;
> +phdr->pd_flags |= PD_ALL_VISIBLE;
> +#define MASK_MARKER0xFF
> (and many others)
>
> Why do we mask by setting bits rather than clearing bits?  My
> intuition would have been to zero things we want to mask, rather than
> setting them to one.
>
Modified accordingly so that we can zero things during masking instead
of setting them to one.

> +{
> +newwalconsistency[i] = true;
> +}
>
> Superfluous braces.
>
Corrected.

> +/*
> + * Leave if no masking functions defined, this is possible in the case
> + * resource managers generating just full page writes, comparing an
> + * image to itself has no meaning in those cases.
> + */
> +if (RmgrTable[rmid].rm_mask == NULL)
> +return;
>
> ...and also...
>
> +/*
> + * This feature is enabled only for the resource managers where
> + * a masking function is defined.

Re: [HACKERS] WAL consistency check facility

2017-02-07 Thread Robert Haas
On Tue, Feb 7, 2017 at 6:32 AM, Amit Kapila  wrote:
> On Tue, Jan 31, 2017 at 9:36 PM, Robert Haas  wrote:
>>
>> +if (!HeapTupleHeaderXminFrozen(page_htup))
>> +page_htup->t_infomask |= HEAP_XACT_MASK;
>> +else
>> +page_htup->t_infomask |= HEAP_XMAX_COMMITTED |
>> HEAP_XMAX_INVALID;
>>
>> Comment doesn't address this logic.  Also, the "else" case shouldn't
>> exist at all, I think.
>>
>
> In the *if* check, it just checks frozen status of xmin, so I think
> you need to mask xmax related bits in else check.  Can you explain
> what makes you think that the else case shouldn't exist?

Oh, you're right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2017-02-07 Thread Amit Kapila
On Tue, Jan 31, 2017 at 9:36 PM, Robert Haas  wrote:
>
> +if (!HeapTupleHeaderXminFrozen(page_htup))
> +page_htup->t_infomask |= HEAP_XACT_MASK;
> +else
> +page_htup->t_infomask |= HEAP_XMAX_COMMITTED |
> HEAP_XMAX_INVALID;
>
> Comment doesn't address this logic.  Also, the "else" case shouldn't
> exist at all, I think.
>

In the *if* check, it just checks frozen status of xmin, so I think
you need to mask xmax related bits in else check.  Can you explain
what makes you think that the else case shouldn't exist?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2017-02-01 Thread Robert Haas
On Tue, Jan 31, 2017 at 9:31 PM, Michael Paquier
 wrote:
> On Wed, Feb 1, 2017 at 1:06 AM, Robert Haas  wrote:
>> On Thu, Jan 5, 2017 at 12:54 AM, Kuntal Ghosh
>>  wrote:
>>> I've attached the patch with the modified changes. PFA.
>>
>> Can this patch check contrib/bloom?
>
> Only full pages are applied at redo by the generic WAL facility. So
> you would finish by comparing a page with itself in generic_redo.

generic_redo is more complicated than just restoring an FPI.  I admit
that generic_redo *probably* has no bugs, but I don't see why it would
hurt to allow it to be checked.  It's not IMPOSSIBLE that restoring
the page delta could go wrong somehow.

>> +/*
>> + * For a speculative tuple, the content of t_ctid is conflicting
>> + * between the backup page and current page. Hence, we set it
>> + * to the current block number and current offset.
>> + */
>>
>> Why does it differ?  Is that a bug?
>
> This has been discussed twice in this thread, once by me, once by Alvaro:
> https://www.postgresql.org/message-id/CAM3SWZQC8nUgp8SjKDY3d74VLpdf9puHc7-n3zf4xcr_bghPzg%40mail.gmail.com
> https://www.postgresql.org/message-id/CAB7nPqQTLGvn_XePjS07kZMMw46kS6S7LfsTocK%2BgLpTN0bcZw%40mail.gmail.com

Sorry, I missed/forgot about that.  I think this is evidence that the
comment isn't really good enough.  It says "hey, this might differ"
... with no real explanation of why it might differ or why that's OK.
If there were an explanation there, I wouldn't have flagged it.

>> +/*
>> + * Leave if no masking functions defined, this is possible in the case
>> + * resource managers generating just full page writes, comparing an
>> + * image to itself has no meaning in those cases.
>> + */
>> +if (RmgrTable[rmid].rm_mask == NULL)
>> +return;
>>
>> ...and also...
>>
>> +/*
>> + * This feature is enabled only for the resource managers where
>> + * a masking function is defined.
>> + */
>> +for (i = 0; i <= RM_MAX_ID; i++)
>> +{
>> +if (RmgrTable[i].rm_mask != NULL)
>>
>> Why do we assume that the feature is only enabled for RMs that have a
>> mask function?  Why not instead assume that if there's no masking
>> function, no masking is required?
>
> Not all RMs work on full pages. Tracking in smgr.h the list of RMs
> that are no-ops when masking things is easier than having empty
> routines declared all over the code base. Don't you think so>

Sure, but I don't think that's what the code is doing.  If an RM is a
no-op when masking things, it must define an empty function.  If it
defines no function, checking is disabled.  I think we want to be able
to check any RM.  No?

>> +void(*rm_mask) (char *page, BlockNumber blkno);
>>
>> Could the page be passed as type "Page" rather than a "char *" to make
>> things more convenient for the masking functions?  If not, could those
>> functions at least do something like "Page page = (Page) pagebytes;"
>> rather than "Page page_norm = (Page) page;"?
>
> xlog_internal.h is used as well by frontends, this makes the header
> dependencies cleaner. (I have looked at using Page when hacking this
> stuff, but the header dependencies are not worth it, I don't recall
> all the details though this was a couple of months back).

OK.  I still think page_norm is a lame variable name.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2017-01-31 Thread Kuntal Ghosh
On Wed, Feb 1, 2017 at 8:01 AM, Michael Paquier
 wrote:
> On Wed, Feb 1, 2017 at 1:06 AM, Robert Haas  wrote:

>> +/*
>> + * Leave if no masking functions defined, this is possible in the case
>> + * resource managers generating just full page writes, comparing an
>> + * image to itself has no meaning in those cases.
>> + */
>> +if (RmgrTable[rmid].rm_mask == NULL)
>> +return;
>>
>> ...and also...
>>
>> +/*
>> + * This feature is enabled only for the resource managers where
>> + * a masking function is defined.
>> + */
>> +for (i = 0; i <= RM_MAX_ID; i++)
>> +{
>> +if (RmgrTable[i].rm_mask != NULL)
>>
>> Why do we assume that the feature is only enabled for RMs that have a
>> mask function?  Why not instead assume that if there's no masking
>> function, no masking is required?
>
> Not all RMs work on full pages. Tracking in smgr.h the list of RMs
> that are no-ops when masking things is easier than having empty
> routines declared all over the code base. Don't you think so>
>
Robert's suggestion surely makes the approach more general. But,  the
existing approach makes it easier to decide the RM's for which we
support the consistency check facility. Surely, we can use a list to
track the RMs which should (/not) be masked. But, I think we always
have to mask the lsn of the pages at least. Isn't it?

>> +void(*rm_mask) (char *page, BlockNumber blkno);
>>
>> Could the page be passed as type "Page" rather than a "char *" to make
>> things more convenient for the masking functions?  If not, could those
>> functions at least do something like "Page page = (Page) pagebytes;"
>> rather than "Page page_norm = (Page) page;"?
>
> xlog_internal.h is used as well by frontends, this makes the header
> dependencies cleaner. (I have looked at using Page when hacking this
> stuff, but the header dependencies are not worth it, I don't recall
> all the details though this was a couple of months back).
+ 1



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2017-01-31 Thread Kuntal Ghosh
On Tue, Jan 31, 2017 at 9:36 PM, Robert Haas  wrote:
> On Thu, Jan 5, 2017 at 12:54 AM, Kuntal Ghosh
>  wrote:
>> I've attached the patch with the modified changes. PFA.
Thanks Robert for taking your time for the review.  I'll update the
patch with the changes suggested by you.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2017-01-31 Thread Michael Paquier
On Wed, Feb 1, 2017 at 1:06 AM, Robert Haas  wrote:
> On Thu, Jan 5, 2017 at 12:54 AM, Kuntal Ghosh
>  wrote:
>> I've attached the patch with the modified changes. PFA.
>
> Can this patch check contrib/bloom?

Only full pages are applied at redo by the generic WAL facility. So
you would finish by comparing a page with itself in generic_redo.

> +/*
> + * For a speculative tuple, the content of t_ctid is conflicting
> + * between the backup page and current page. Hence, we set it
> + * to the current block number and current offset.
> + */
>
> Why does it differ?  Is that a bug?

This has been discussed twice in this thread, once by me, once by Alvaro:
https://www.postgresql.org/message-id/CAM3SWZQC8nUgp8SjKDY3d74VLpdf9puHc7-n3zf4xcr_bghPzg%40mail.gmail.com
https://www.postgresql.org/message-id/CAB7nPqQTLGvn_XePjS07kZMMw46kS6S7LfsTocK%2BgLpTN0bcZw%40mail.gmail.com

> +/*
> + * Leave if no masking functions defined, this is possible in the case
> + * resource managers generating just full page writes, comparing an
> + * image to itself has no meaning in those cases.
> + */
> +if (RmgrTable[rmid].rm_mask == NULL)
> +return;
>
> ...and also...
>
> +/*
> + * This feature is enabled only for the resource managers where
> + * a masking function is defined.
> + */
> +for (i = 0; i <= RM_MAX_ID; i++)
> +{
> +if (RmgrTable[i].rm_mask != NULL)
>
> Why do we assume that the feature is only enabled for RMs that have a
> mask function?  Why not instead assume that if there's no masking
> function, no masking is required?

Not all RMs work on full pages. Tracking in smgr.h the list of RMs
that are no-ops when masking things is easier than having empty
routines declared all over the code base. Don't you think so>

> +void(*rm_mask) (char *page, BlockNumber blkno);
>
> Could the page be passed as type "Page" rather than a "char *" to make
> things more convenient for the masking functions?  If not, could those
> functions at least do something like "Page page = (Page) pagebytes;"
> rather than "Page page_norm = (Page) page;"?

xlog_internal.h is used as well by frontends, this makes the header
dependencies cleaner. (I have looked at using Page when hacking this
stuff, but the header dependencies are not worth it, I don't recall
all the details though this was a couple of months back).
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2017-01-31 Thread Robert Haas
On Thu, Jan 5, 2017 at 12:54 AM, Kuntal Ghosh
 wrote:
> I've attached the patch with the modified changes. PFA.

Can this patch check contrib/bloom?

+/*
+ * Mask some line pointer bits, particularly those marked as
+ * used on a master and unused on a standby.
+ */

Comment doesn't explain why we need to do this.

+/*
+ * For GIN_DELETED page, the page is initialized to empty.
+ * Hence mask everything.
+ */
+if (opaque->flags & GIN_DELETED)
+memset(page_norm, MASK_MARKER, BLCKSZ);
+else
+mask_unused_space(page_norm);

If the page is initialized to empty, why do we need to mask
anything/everything?  Anyway, it doesn't seem right to mask the
GinPageOpaque structure itself. Or at least it doesn't seem right to
mask the flags word.

+/*
+ * For gist leaf pages, mask some line pointer bits, particularly
+ * those marked as used on a master and unused on a standby.
+ */

Comment doesn't explain why we need to do this.

+if (!HeapTupleHeaderXminFrozen(page_htup))
+page_htup->t_infomask |= HEAP_XACT_MASK;
+else
+page_htup->t_infomask |= HEAP_XMAX_COMMITTED |
HEAP_XMAX_INVALID;

Comment doesn't address this logic.  Also, the "else" case shouldn't
exist at all, I think.

+/*
+ * For a speculative tuple, the content of t_ctid is conflicting
+ * between the backup page and current page. Hence, we set it
+ * to the current block number and current offset.
+ */

Why does it differ?  Is that a bug?

+/*
+ * Mask everything on a DELETED page since it will be re-initialized
+ * during replay.
+ */
+if ((maskopaq->btpo_flags & BTP_DELETED) != 0)
+{
+/* Mask Page Content */
+memset(page_norm + SizeOfPageHeaderData, MASK_MARKER,
+   BLCKSZ - SizeOfPageHeaderData);
+
+/* Mask pd_lower and pd_upper */
+memset(&((PageHeader) page_norm)->pd_lower, MASK_MARKER,
+   sizeof(uint16));
+memset(&((PageHeader) page_norm)->pd_upper, MASK_MARKER,
+   sizeof(uint16));

This isn't consistent with the GIN_DELETE case - it is more selective
about what it masks.  Probably that logic should be adapted to look
more like this.

+/*
+ * Mask some line pointer bits, particularly those marked as
+ * used on a master and unused on a standby.
+ */

Comment (still) doesn't explain why we need to do this.

+/*
+ * During replay of a btree page split, we don't set the BTP_SPLIT_END
+ * flag of the right sibling and initialize th cycle_id to 0 for the same
+ * page.
+ */

A reference to the name of the redo function might be helpful here
(and in some other places), unless it's just ${AMNAME}_redo.  "th" is
a typo for "the".

+appendStringInfoString(buf, " (full page
image, apply)");
+else
+appendStringInfoString(buf, " (full page image)");

How about "(full page image)" and "(full page image, for WAL verification)"?

Similarly in XLogDumpDisplayRecord, I think we should assume that
"FPW" means what it has always meant, and leave that output alone.
Instead, distinguish the WAL-consistency-checking case when it
happens, e.g. "(FPW for consistency check)".

+checkConsistency(XLogReaderState *record)

How about CheckXLogConsistency()?

+ * If needs_backup is true or wal consistency check is enabled for

...or WAL checking is enabled...

+ * If WAL consistency is enabled for the resource manager of

If WAL consistency checking is enabled...

+ * Note: when a backup block is available in XLOG with BKPIMAGE_APPLY flag

with the BKPIMAGE_APPLY flag

- * In RBM_ZERO_* modes, if the page doesn't exist, the relation is extended
- * with all-zeroes pages up to the referenced block number.  In
- * RBM_ZERO_AND_LOCK and RBM_ZERO_AND_CLEANUP_LOCK modes, the return value
+ * In RBM_ZERO_* modes, if the page doesn't exist or BKPIMAGE_APPLY flag
+ * is not set for the backup block, the relation is extended with all-zeroes
+ * pages up to the referenced block number.

OK, I'm puzzled by this.  Surely we don't want the WAL consistency
checking facility to cause the relation to be extended like this.  And
I don't see why it would, because the WAL consistency checking happens
after the page changes have already been applied.  I wonder if this
hunk is in error and should be dropped.

+PageXLogRecPtrSet(phdr->pd_lsn, PG_UINT64_MAX);
+phdr->pd_prune_xid = PG_UINT32_MAX;
+phdr->pd_flags |= PD_PAGE_FULL | PD_HAS_FREE_LINES;
+phdr->pd_flags |= PD_ALL_VISIBLE;
+#define MASK_MARKER0xFF
(and many others)

Why do we mask by setting bits rather than clearing bits?  My
intuition would have been to zero things we want to mask, rather than
setting them to one.

+{
+ 

Re: [HACKERS] WAL consistency check facility

2017-01-30 Thread Michael Paquier
On Thu, Jan 5, 2017 at 2:54 PM, Kuntal Ghosh  wrote:
> On Wed, Dec 21, 2016 at 10:53 PM, Robert Haas  wrote:
>
>> On a first read-through of this patch -- I have not studied it in
>> detail yet -- this looks pretty good to me.  One concern is that this
>> patch adds a bit of code to XLogInsert(), which is a very hot piece of
>> code.  Conceivably, that might produce a regression even when this is
>> disabled; if so, we'd probably need to make it a build-time option.  I
>> hope that's not necessary, because I think it would be great to
>> compile this into the server by default, but we better make sure it's
>> not a problem.  A bulk load into an existing table might be a good
>> test case.
>>
> I've done some bulk load testing with 16,32,64 clients. I didn't
> notice any regression
> in the results.
>
>> Aside from that, I think the biggest issue here is that the masking
>> functions are virtually free of comments, whereas I think they should
>> have extensive and detailed comments.  For example, in heap_mask, you
>> have this:
>>
>> +page_htup->t_infomask =
>> +HEAP_XMIN_COMMITTED | HEAP_XMIN_INVALID |
>> +HEAP_XMAX_COMMITTED | HEAP_XMAX_INVALID;
>>
>> For something like this, you could write "We want to ignore
>> differences in hint bits, since they can be set by SetHintBits without
>> emitting WAL.  Force them all to be set so that we don't notice
>> discrepancies."  Actually, though, I think that you could be a bit
>> more nuanced here: HEAP_XMIN_COMMITTED + HEAP_XMIN_INVALID =
>> HEAP_XMIN_FROZEN, so maybe what you should do is clear
>> HEAP_XMAX_COMMITTED and HEAP_XMAX_INVALID but only clear the others if
>> one is set but not both.
>>
> I've modified it as follows:
> +
> +   if (!HeapTupleHeaderXminFrozen(page_htup))
> +   page_htup->t_infomask |= HEAP_XACT_MASK;
> +   else
> +   page_htup->t_infomask |=
> HEAP_XMAX_COMMITTED | HEAP_XMAX_INVALID;
>
>> Anyway, leaving that aside, I think every single change that gets
>> masked in every single masking routine needs a similar comment,
>> explaining why that change can happen on the master without also
>> happening on the standby and hopefully referring to the code that
>> makes that unlogged change.
>>
> I've added comments for all the masking routines.
>
>> I think wal_consistency_checking, as proposed by Peter, is better than
>> wal_consistency_check, as implemented.
>>
> Modified to wal_consistency_checking.
>
>> Having StartupXLOG() call pfree() on the masking buffers is a waste of
>> code.  The process is going to exit anyway.
>>
>> + "Inconsistent page found, rel %u/%u/%u, forknum %u, blkno 
>> %u",
>>
> Done.
>
>> Primary error messages aren't capitalized.
>>
>> +if (!XLogRecGetBlockTag(record, block_id, &rnode, &forknum, &blkno))
>> +{
>> +/* Caller specified a bogus block_id. Do nothing. */
>> +continue;
>> +}
>>
>> Why would the caller do something so dastardly?
>>
> Modified to following comment:
> +   /*
> +* WAL record doesn't contain a block reference
> +* with the given id. Do nothing.
> +*/
>
> I've attached the patch with the modified changes. PFA.

Moved to CF 2017-03 with same status, "ready for committer".
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2017-01-04 Thread Kuntal Ghosh
On Wed, Dec 21, 2016 at 10:53 PM, Robert Haas  wrote:

> On a first read-through of this patch -- I have not studied it in
> detail yet -- this looks pretty good to me.  One concern is that this
> patch adds a bit of code to XLogInsert(), which is a very hot piece of
> code.  Conceivably, that might produce a regression even when this is
> disabled; if so, we'd probably need to make it a build-time option.  I
> hope that's not necessary, because I think it would be great to
> compile this into the server by default, but we better make sure it's
> not a problem.  A bulk load into an existing table might be a good
> test case.
>
I've done some bulk load testing with 16,32,64 clients. I didn't
notice any regression
in the results.

> Aside from that, I think the biggest issue here is that the masking
> functions are virtually free of comments, whereas I think they should
> have extensive and detailed comments.  For example, in heap_mask, you
> have this:
>
> +page_htup->t_infomask =
> +HEAP_XMIN_COMMITTED | HEAP_XMIN_INVALID |
> +HEAP_XMAX_COMMITTED | HEAP_XMAX_INVALID;
>
> For something like this, you could write "We want to ignore
> differences in hint bits, since they can be set by SetHintBits without
> emitting WAL.  Force them all to be set so that we don't notice
> discrepancies."  Actually, though, I think that you could be a bit
> more nuanced here: HEAP_XMIN_COMMITTED + HEAP_XMIN_INVALID =
> HEAP_XMIN_FROZEN, so maybe what you should do is clear
> HEAP_XMAX_COMMITTED and HEAP_XMAX_INVALID but only clear the others if
> one is set but not both.
>
I've modified it as follows:
+
+   if (!HeapTupleHeaderXminFrozen(page_htup))
+   page_htup->t_infomask |= HEAP_XACT_MASK;
+   else
+   page_htup->t_infomask |=
HEAP_XMAX_COMMITTED | HEAP_XMAX_INVALID;

> Anyway, leaving that aside, I think every single change that gets
> masked in every single masking routine needs a similar comment,
> explaining why that change can happen on the master without also
> happening on the standby and hopefully referring to the code that
> makes that unlogged change.
>
I've added comments for all the masking routines.

> I think wal_consistency_checking, as proposed by Peter, is better than
> wal_consistency_check, as implemented.
>
Modified to wal_consistency_checking.

> Having StartupXLOG() call pfree() on the masking buffers is a waste of
> code.  The process is going to exit anyway.
>
> + "Inconsistent page found, rel %u/%u/%u, forknum %u, blkno 
> %u",
>
Done.

> Primary error messages aren't capitalized.
>
> +if (!XLogRecGetBlockTag(record, block_id, &rnode, &forknum, &blkno))
> +{
> +/* Caller specified a bogus block_id. Do nothing. */
> +continue;
> +}
>
> Why would the caller do something so dastardly?
>
Modified to following comment:
+   /*
+* WAL record doesn't contain a block reference
+* with the given id. Do nothing.
+*/

I've attached the patch with the modified changes. PFA.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


walconsistency_v16.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-12-21 Thread Kuntal Ghosh
On Wed, Dec 21, 2016 at 10:53 PM, Robert Haas  wrote:
> On Mon, Nov 28, 2016 at 11:31 PM, Michael Paquier
>  wrote:
>> Moved to CF 2017-01, as no committers have showed up yet :(
>
> Seeing no other volunteers, here I am.
>
Thanks Robert for looking into the patch.

> On a first read-through of this patch -- I have not studied it in
> detail yet -- this looks pretty good to me.  One concern is that this
> patch adds a bit of code to XLogInsert(), which is a very hot piece of
> code.  Conceivably, that might produce a regression even when this is
> disabled; if so, we'd probably need to make it a build-time option.  I
> hope that's not necessary, because I think it would be great to
> compile this into the server by default, but we better make sure it's
> not a problem.  A bulk load into an existing table might be a good
> test case.
>
I'll do this test and post the results.

> +if (!XLogRecGetBlockTag(record, block_id, &rnode, &forknum, &blkno))
> +{
> +/* Caller specified a bogus block_id. Do nothing. */
> +continue;
> +}
>
> Why would the caller do something so dastardly?
>
Sorry, it's my bad. I've copied the code from somewhere else, but forgot
to modify the comment. It should be something like
/* block_id is not used. */

I'll modify the above along with other suggested changes.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-12-21 Thread Robert Haas
On Mon, Nov 28, 2016 at 11:31 PM, Michael Paquier
 wrote:
> Moved to CF 2017-01, as no committers have showed up yet :(

Seeing no other volunteers, here I am.

On a first read-through of this patch -- I have not studied it in
detail yet -- this looks pretty good to me.  One concern is that this
patch adds a bit of code to XLogInsert(), which is a very hot piece of
code.  Conceivably, that might produce a regression even when this is
disabled; if so, we'd probably need to make it a build-time option.  I
hope that's not necessary, because I think it would be great to
compile this into the server by default, but we better make sure it's
not a problem.  A bulk load into an existing table might be a good
test case.

Aside from that, I think the biggest issue here is that the masking
functions are virtually free of comments, whereas I think they should
have extensive and detailed comments.  For example, in heap_mask, you
have this:

+page_htup->t_infomask =
+HEAP_XMIN_COMMITTED | HEAP_XMIN_INVALID |
+HEAP_XMAX_COMMITTED | HEAP_XMAX_INVALID;

For something like this, you could write "We want to ignore
differences in hint bits, since they can be set by SetHintBits without
emitting WAL.  Force them all to be set so that we don't notice
discrepancies."  Actually, though, I think that you could be a bit
more nuanced here: HEAP_XMIN_COMMITTED + HEAP_XMIN_INVALID =
HEAP_XMIN_FROZEN, so maybe what you should do is clear
HEAP_XMAX_COMMITTED and HEAP_XMAX_INVALID but only clear the others if
one is set but not both.

Anyway, leaving that aside, I think every single change that gets
masked in every single masking routine needs a similar comment,
explaining why that change can happen on the master without also
happening on the standby and hopefully referring to the code that
makes that unlogged change.

I think wal_consistency_checking, as proposed by Peter, is better than
wal_consistency_check, as implemented.

Having StartupXLOG() call pfree() on the masking buffers is a waste of
code.  The process is going to exit anyway.

+ "Inconsistent page found, rel %u/%u/%u, forknum %u, blkno %u",

Primary error messages aren't capitalized.

+if (!XLogRecGetBlockTag(record, block_id, &rnode, &forknum, &blkno))
+{
+/* Caller specified a bogus block_id. Do nothing. */
+continue;
+}

Why would the caller do something so dastardly?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-28 Thread Michael Paquier
On Wed, Nov 16, 2016 at 2:15 AM, Michael Paquier
 wrote:
> On Tue, Nov 15, 2016 at 7:50 AM, Kuntal Ghosh
>  wrote:
>> I've modified the guc parameter name as wal_consistency_check (little
>> hesitant for a participle in suffix :) ). Also, updated the sgml and
>> variable name accordingly.
>
> The changes look good to me.

Moved to CF 2017-01, as no committers have showed up yet :(
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-15 Thread Michael Paquier
On Tue, Nov 15, 2016 at 7:50 AM, Kuntal Ghosh
 wrote:
> I've modified the guc parameter name as wal_consistency_check (little
> hesitant for a participle in suffix :) ). Also, updated the sgml and
> variable name accordingly.

The changes look good to me.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-15 Thread Kuntal Ghosh
On Tue, Nov 15, 2016 at 7:23 PM, Robert Haas  wrote:
> On Sat, Nov 12, 2016 at 10:06 PM, Peter Eisentraut
>  wrote:
>> On 11/9/16 11:55 PM, Michael Paquier wrote:
>>> + 
>>> +  wal_consistency (string)
>>> +  
>>> +   wal_consistency configuration 
>>> parameter
>>> +  
>>> +  
>>> +  
>>> +   
>>> +This parameter is used to check the consistency of WAL records, 
>>> i.e,
>>> +whether the WAL records are inserted and applied correctly. When
>>> +wal_consistency is enabled for a WAL record, it
>>> +stores a full-page image along with the record. When a full-page 
>>> image
>>> +arrives during redo, it compares against the current page to check 
>>> whether
>>> +both are consistent.
>>> +   
>>
>> Could we name this something like wal_consistency_checking?
>>
>> Otherwise it sounds like you use this to select the amount of
>> consistency in the WAL (similar to, say, wal_level).
>
> +1.  I like that name.
I've modified the guc parameter name as wal_consistency_check (little
hesitant for a participle in suffix :) ). Also, updated the sgml and
variable name accordingly.
FYI, regression test will fail because of an inconsistency in brin
page. I've already submitted a patch for that. Following is the thread
for the same:
https://www.postgresql.org/message-id/flat/CAGz5QCJ%3D00UQjScSEFbV%3D0qO5ShTZB9WWz_Fm7%2BWd83zPs9Geg%40mail.gmail.com#CAGz5QCJ=00UQjScSEFbV=0qo5shtzb9wwz_fm7+wd83zps9...@mail.gmail.com
-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


walconsistency_v15.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-15 Thread Robert Haas
On Sat, Nov 12, 2016 at 10:06 PM, Peter Eisentraut
 wrote:
> On 11/9/16 11:55 PM, Michael Paquier wrote:
>> + 
>> +  wal_consistency (string)
>> +  
>> +   wal_consistency configuration 
>> parameter
>> +  
>> +  
>> +  
>> +   
>> +This parameter is used to check the consistency of WAL records, i.e,
>> +whether the WAL records are inserted and applied correctly. When
>> +wal_consistency is enabled for a WAL record, it
>> +stores a full-page image along with the record. When a full-page 
>> image
>> +arrives during redo, it compares against the current page to check 
>> whether
>> +both are consistent.
>> +   
>
> Could we name this something like wal_consistency_checking?
>
> Otherwise it sounds like you use this to select the amount of
> consistency in the WAL (similar to, say, wal_level).

+1.  I like that name.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-12 Thread Michael Paquier
On Sun, Nov 13, 2016 at 12:06 PM, Peter Eisentraut
 wrote:
> Could we name this something like wal_consistency_checking?
>
> Otherwise it sounds like you use this to select the amount of
> consistency in the WAL (similar to, say, wal_level).

Or wal_check? Or wal_consistency_check?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-12 Thread Peter Eisentraut
On 11/9/16 11:55 PM, Michael Paquier wrote:
> + 
> +  wal_consistency (string)
> +  
> +   wal_consistency configuration parameter
> +  
> +  
> +  
> +   
> +This parameter is used to check the consistency of WAL records, i.e,
> +whether the WAL records are inserted and applied correctly. When
> +wal_consistency is enabled for a WAL record, it
> +stores a full-page image along with the record. When a full-page 
> image
> +arrives during redo, it compares against the current page to check 
> whether
> +both are consistent.
> +   

Could we name this something like wal_consistency_checking?

Otherwise it sounds like you use this to select the amount of
consistency in the WAL (similar to, say, wal_level).

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-11 Thread Kuntal Ghosh
On Fri, Nov 11, 2016 at 3:36 AM, Michael Paquier
 wrote:
> On Fri, Nov 11, 2016 at 1:36 AM, Robert Haas  wrote:
>> So, who are all of the people involved in the effort to produce this
>> patch, and what's the right way to attribute credit?
>
> The original idea was from Heikki as he has introduced the idea of
> doing such checks if you look at the original thread. I added on top
> of it a couple of things like the concept of page masking, and hacked
> an early version of the versoin we have now
> (https://www.postgresql.org/message-id/cab7npqr4vxdkijp+du82vocongmvutq-gfqiu2dsh4bsm77...@mail.gmail.com).
> So it seems to me that marking Heikki as an author would be fair as
> the original idea is from him. I don't mind being marked only as a
> reviewer of the feature, or as someone from which has written an
> earlier version of the patch, but I let that up to your judgement.
> Kuntai is definitely an author.
Although lot of changes have been done later, but I've started with the patch
attached in the above thread. Hence, I feel the author of that patch should
also get the credit.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-10 Thread Michael Paquier
On Fri, Nov 11, 2016 at 1:36 AM, Robert Haas  wrote:
> So, who are all of the people involved in the effort to produce this
> patch, and what's the right way to attribute credit?

The original idea was from Heikki as he has introduced the idea of
doing such checks if you look at the original thread. I added on top
of it a couple of things like the concept of page masking, and hacked
an early version of the versoin we have now
(https://www.postgresql.org/message-id/cab7npqr4vxdkijp+du82vocongmvutq-gfqiu2dsh4bsm77...@mail.gmail.com).
So it seems to me that marking Heikki as an author would be fair as
the original idea is from him. I don't mind being marked only as a
reviewer of the feature, or as someone from which has written an
earlier version of the patch, but I let that up to your judgement.
Kuntai is definitely an author.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-10 Thread Robert Haas
On Thu, Nov 10, 2016 at 10:02 AM, Kuntal Ghosh
 wrote:
>> With the patch for BRIN applied, I am able to get installcheck-world
>> working with wal_consistency = all and a standby doing the consistency
>> checks behind. Adding wal_consistency = all in PostgresNode.pm, the
>> recovery tests are passing. This patch is switched as "Ready for
>> Committer". Thanks for completing this effort begun 3 years ago!
> Thanks to you for reviewing all the patches in so much detail. Amit, Robert
> and Dilip also helped me a lot in developing the feature. Thanks to them
> as well.

So, who should be credited as co-authors of this patch and in what
order, if and when it gets committed?  If X started this patch and
then Kuntal did a little more work on it, I would credit it as:

X and Kuntal Ghosh

If Kuntal did major work on it, though, then I would think of
something more like:

Kuntal Ghosh, based on an earlier patch from X

If he didn't use any of the old code but just the idea, then I would
do something like this:

Kuntal Ghosh, inspired by a previous patch from X

So, who are all of the people involved in the effort to produce this
patch, and what's the right way to attribute credit?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-10 Thread Kuntal Ghosh
On Thu, Nov 10, 2016 at 10:25 AM, Michael Paquier
 wrote:
> On Wed, Nov 9, 2016 at 11:32 PM, Kuntal Ghosh
>> Thanks a lot for reviewing the patch. Based on your review, I've attached the

>> I've done a fair amount of testing which includes regression tests
>> and manual creation of inconsistencies in the page. I've also found the
>> reason behind inconsistency in brin revmap page.
>> Brin revmap page doesn't have standard page layout. Besides, It doesn't 
>> update
>> pd_upper and pd_lower in its operations as well. But, during WAL
>> insertions, it uses
>> REGBUF_STANDARD to register a reference in the WAL record. Hence, when we
>> restore image before consistency check, RestoreBlockImage fills the space
>> between pd_upper and pd_lower(page hole) with zero. I've posted this as a
>> separate thread.
>> https://www.postgresql.org/message-id/flat/CAGz5QCJ%3D00UQjScSEFbV%3D0qO5ShTZB9WWz_Fm7%2BWd83zPs9Geg%40mail.gmail.com#CAGz5QCJ=00UQjScSEFbV=0qo5shtzb9wwz_fm7+wd83zps9...@mail.gmail.com
>
> Nice to have spotted the inconsistency. This tool is going to be useful..
>
> I have spent a couple of hours playing with the patch, and worked on
> it a bit more with a couple of minor changes:
> - In gindesc.c, the if blocks are more readable with brackets.
> - Addition of a comment when info is set, to mention that this is done
> at the beginning of the routine so as callers of XLogInsert() can pass
> the flag for consistency checks.
> - apply_image should be reset in ResetDecoder().
> - The BRIN problem is here:
> 2016-11-10 12:24:10 JST [65776]: [23-1] db=,user=,app=,client= FATAL:
> Inconsistent page found, rel 1663/16385/30625, forknum 0, blkno 1
> 2016-11-10 12:24:10 JST [65776]: [24-1] db=,user=,app=,client=
> CONTEXT:  xlog redo at 0/9BD31148 for BRIN/UPDATE+INIT: heapBlk 100
> pagesPerRange 1 old offnum 11, new offnum 1
> 2016-11-10 12:24:10 JST [65776]: [25-1] db=,user=,app=,client=
> WARNING:  buffer refcount leak: [4540] (rel=base/16385/30625,
> blockNum=1, flags=0x9380, refcount=1 1)
> TRAP: FailedAssertion("!(RefCountErrors == 0)", File: "bufmgr.c", Line: 2506)
> Now the buffer refcount leak is not normal! The safest thing to do
> here is to release the buffer once a copy of it has been taken, and
> the leaks goes away when calling FATAL to report the inconsistency.
> - When checking for XLR_CHECK_CONSISTENCY, better to add a != 0 to get
> a boolean out of it.
> - guc_malloc() should be done as late as possible, this simplifies the
> code and prevents any memory leaks which is what your patch is doing
> when there is an error. (I have finally put my finger on what was
> itching me here).
> - In btree_mask, the lookup of BTP_DELETED can be deadly simplified.
> - I wondered also about putting assign_wal_consistency and
> check_wal_consistency in a separate file, say xlogparams.c, concluding
> that the current code does nothing bad either even if it adds rmgr.h
> in the list of headers in guc.c.
> - Some comment blocks are longer than 72~80 characters.
> - Typos!
All the changes make perfect sense to me.

> With the patch for BRIN applied, I am able to get installcheck-world
> working with wal_consistency = all and a standby doing the consistency
> checks behind. Adding wal_consistency = all in PostgresNode.pm, the
> recovery tests are passing. This patch is switched as "Ready for
> Committer". Thanks for completing this effort begun 3 years ago!
Thanks to you for reviewing all the patches in so much detail. Amit, Robert
and Dilip also helped me a lot in developing the feature. Thanks to them
as well.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-09 Thread Michael Paquier
On Wed, Nov 9, 2016 at 11:32 PM, Kuntal Ghosh
 wrote:
> On Fri, Nov 4, 2016 at 1:32 PM, Michael Paquier
>  wrote:
>> Thank you for the new patch. This will be hopefully the last round of
>> reviews, we are getting close to something that has an acceptable
>> shape.
>
> Thanks a lot for reviewing the patch. Based on your review, I've attached the
> updated patch along with few comments.

Thanks for the new version. pg_xlogdump is really helpful now for debugging.

>> I haven't performed any tests with the patch, and that's all I have
>> regarding the code. With that done we should be in good shape
>> code-speaking I think...
>
> I've done a fair amount of testing which includes regression tests
> and manual creation of inconsistencies in the page. I've also found the
> reason behind inconsistency in brin revmap page.
> Brin revmap page doesn't have standard page layout. Besides, It doesn't update
> pd_upper and pd_lower in its operations as well. But, during WAL
> insertions, it uses
> REGBUF_STANDARD to register a reference in the WAL record. Hence, when we
> restore image before consistency check, RestoreBlockImage fills the space
> between pd_upper and pd_lower(page hole) with zero. I've posted this as a
> separate thread.
> https://www.postgresql.org/message-id/flat/CAGz5QCJ%3D00UQjScSEFbV%3D0qO5ShTZB9WWz_Fm7%2BWd83zPs9Geg%40mail.gmail.com#CAGz5QCJ=00UQjScSEFbV=0qo5shtzb9wwz_fm7+wd83zps9...@mail.gmail.com

Nice to have spotted the inconsistency. This tool is going to be useful..

I have spent a couple of hours playing with the patch, and worked on
it a bit more with a couple of minor changes:
- In gindesc.c, the if blocks are more readable with brackets.
- Addition of a comment when info is set, to mention that this is done
at the beginning of the routine so as callers of XLogInsert() can pass
the flag for consistency checks.
- apply_image should be reset in ResetDecoder().
- The BRIN problem is here:
2016-11-10 12:24:10 JST [65776]: [23-1] db=,user=,app=,client= FATAL:
Inconsistent page found, rel 1663/16385/30625, forknum 0, blkno 1
2016-11-10 12:24:10 JST [65776]: [24-1] db=,user=,app=,client=
CONTEXT:  xlog redo at 0/9BD31148 for BRIN/UPDATE+INIT: heapBlk 100
pagesPerRange 1 old offnum 11, new offnum 1
2016-11-10 12:24:10 JST [65776]: [25-1] db=,user=,app=,client=
WARNING:  buffer refcount leak: [4540] (rel=base/16385/30625,
blockNum=1, flags=0x9380, refcount=1 1)
TRAP: FailedAssertion("!(RefCountErrors == 0)", File: "bufmgr.c", Line: 2506)
Now the buffer refcount leak is not normal! The safest thing to do
here is to release the buffer once a copy of it has been taken, and
the leaks goes away when calling FATAL to report the inconsistency.
- When checking for XLR_CHECK_CONSISTENCY, better to add a != 0 to get
a boolean out of it.
- guc_malloc() should be done as late as possible, this simplifies the
code and prevents any memory leaks which is what your patch is doing
when there is an error. (I have finally put my finger on what was
itching me here).
- In btree_mask, the lookup of BTP_DELETED can be deadly simplified.
- I wondered also about putting assign_wal_consistency and
check_wal_consistency in a separate file, say xlogparams.c, concluding
that the current code does nothing bad either even if it adds rmgr.h
in the list of headers in guc.c.
- Some comment blocks are longer than 72~80 characters.
- Typos!

With the patch for BRIN applied, I am able to get installcheck-world
working with wal_consistency = all and a standby doing the consistency
checks behind. Adding wal_consistency = all in PostgresNode.pm, the
recovery tests are passing. This patch is switched as "Ready for
Committer". Thanks for completing this effort begun 3 years ago!
-- 
Michael
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index adab2f8..57660d3 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2476,6 +2476,35 @@ include_dir 'conf.d'
   
  
 
+ 
+  wal_consistency (string)
+  
+   wal_consistency configuration parameter
+  
+  
+  
+   
+This parameter is used to check the consistency of WAL records, i.e,
+whether the WAL records are inserted and applied correctly. When
+wal_consistency is enabled for a WAL record, it
+stores a full-page image along with the record. When a full-page image
+arrives during redo, it compares against the current page to check 
whether
+both are consistent.
+   
+
+   
+By default, this setting does not contain any value. To check
+all records written to the write-ahead log, set this parameter to
+all. To check only some records, specify a
+comma-separated list of resource managers. The resource managers
+which are currently supported are heap2, heap,
+btree, gin, gist,
+spgist, sequence and brin. Only
+superusers can change this setting.
+   
+  
+ 
+
  
   wal_buffers (int

Re: [HACKERS] WAL consistency check facility

2016-11-09 Thread Kuntal Ghosh
On Fri, Nov 4, 2016 at 1:32 PM, Michael Paquier
 wrote:
> Thank you for the new patch. This will be hopefully the last round of
> reviews, we are getting close to something that has an acceptable
> shape.
Thanks a lot for reviewing the patch. Based on your review, I've attached the
updated patch along with few comments.

> In DecodeXLogRecord@xlogreader.c, please add a boolean flag "apply"
> and then please could you do some error checks on it. Only one is
> needed: if "apply" is true and has_image is false, xlogreader.c should
> complain about an inconsistency!
Added a flag named apply_image in DecodedBkpBlock and used it to
check whether image apply is required or not.

> I would still for the removal of blkno in the list of arguments of the
> masking functions. This is used just for speculative inserts, where we
> could just enforce the page number to 0 because this does not matter,
> as Peter has mentioned upthread.
It just doesn't feel right to me to enforce the number manually when
I can use the blkno without any harm.

> I haven't performed any tests with the patch, and that's all I have
> regarding the code. With that done we should be in good shape
> code-speaking I think...
I've done a fair amount of testing which includes regression tests
and manual creation of inconsistencies in the page. I've also found the
reason behind inconsistency in brin revmap page.
Brin revmap page doesn't have standard page layout. Besides, It doesn't update
pd_upper and pd_lower in its operations as well. But, during WAL
insertions, it uses
REGBUF_STANDARD to register a reference in the WAL record. Hence, when we
restore image before consistency check, RestoreBlockImage fills the space
between pd_upper and pd_lower(page hole) with zero. I've posted this as a
separate thread.
https://www.postgresql.org/message-id/flat/CAGz5QCJ%3D00UQjScSEFbV%3D0qO5ShTZB9WWz_Fm7%2BWd83zPs9Geg%40mail.gmail.com#CAGz5QCJ=00UQjScSEFbV=0qo5shtzb9wwz_fm7+wd83zps9...@mail.gmail.com

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


walconsistency_v13.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-04 Thread Michael Paquier
On Fri, Nov 4, 2016 at 6:02 PM, Michael Paquier
 wrote:
> On Fri, Nov 4, 2016 at 5:02 PM, Michael Paquier
>  wrote:
>> On Thu, Nov 3, 2016 at 11:17 PM, Kuntal Ghosh
>>  wrote:
>>> I've updated the patch for review.
>>
>> Thank you for the new patch. This will be hopefully the last round of
>> reviews, we are getting close to something that has an acceptable
>> shape.
>
> One last thing: in XLogRecordAssemble(), could you enforce the value
> of info at the beginning of the routine when wal_consistency[rmid] is
> true? And then use the value of info to decide if include_image is
> true or not? The idea here is to allow callers of XLogInsert() to pass
> by themselves XLR_CHECK_CONSISTENCY and still have consistency checks
> enabled for a given record even if wal_consistency is false for the
> rmgr of the record happening. This would be potentially useful for
> extension and feature developers when debugging some stuff, for some
> builds compiled with a DEBUG flag, or whatever.

And you need to rebase the patch, d5f6f13 has fixed the handling of
xl_info with XLR_INFO_MASK.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-04 Thread Michael Paquier
On Fri, Nov 4, 2016 at 5:02 PM, Michael Paquier
 wrote:
> On Thu, Nov 3, 2016 at 11:17 PM, Kuntal Ghosh
>  wrote:
>> I've updated the patch for review.
>
> Thank you for the new patch. This will be hopefully the last round of
> reviews, we are getting close to something that has an acceptable
> shape.

One last thing: in XLogRecordAssemble(), could you enforce the value
of info at the beginning of the routine when wal_consistency[rmid] is
true? And then use the value of info to decide if include_image is
true or not? The idea here is to allow callers of XLogInsert() to pass
by themselves XLR_CHECK_CONSISTENCY and still have consistency checks
enabled for a given record even if wal_consistency is false for the
rmgr of the record happening. This would be potentially useful for
extension and feature developers when debugging some stuff, for some
builds compiled with a DEBUG flag, or whatever.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-04 Thread Michael Paquier
On Thu, Nov 3, 2016 at 11:17 PM, Kuntal Ghosh
 wrote:
> I've updated the patch for review.

Thank you for the new patch. This will be hopefully the last round of
reviews, we are getting close to something that has an acceptable
shape.

+   
+  
+ 
+
+  
+ 
Did you try to compile the docs? Because that will break. (Likely my
fault). What needs to be done is removing one  and one
 markup.

+/*-
+ *
+ * bufmask.h
+ *   Buffer masking definitions.
+ *
+ * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/storage/bufmask.h
+ */
We could likely survive here with just a copyright mention as 2016,
PGDG... Same remark for bufmask.c.

--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -25,6 +25,7 @@
 #include "commands/vacuum.h"
 #include "miscadmin.h"
 #include "optimizer/plancat.h"
+#include "storage/bufmask.h"
 #include "utils/index_selfuncs.h"
 #include "utils/rel.h"
This header declaration is not necessary.

+   /*
+* Mask the Page LSN. Because, we store the page before updating the LSN.
+* Hence, LSNs of both pages will always be different.
+*/
+   mask_page_lsn(page_norm);
I don't fully understand this comment if phrased this way. Well, I do
understand it, but people who would read this code for the first time
may have a hard time understanding it. So I would suggest removing it,
but add a comment on top of mask_page_lsn() to mention that in
consistency checks the LSN of the two pages compared will likely be
different because of concurrent operations when the WAL is generated
and the state of the page where WAL is applied.

+   maskopaq = (BTPageOpaque)
+   (((char *) page_norm) + ((PageHeader) page_norm)->pd_special);
+   /*
+* Mask everything on a DELETED page.
+*/
Let's make the code breath and add a space here.

+/* Aligned Buffers dedicated to consistency checks of size BLCKSZ */
+static char *new_page_masked = NULL;
+static char *old_page_masked = NULL;
palloc'd buffers are aligned, so you could just remove the work
"Aligned" in this comment?

+   /* If we've just restored the block from backup image, skip
consistency check. */
+   if (XLogRecHasBlockImage(record, block_id) &&
+   XLogRecBlockImageApply(record, block_id))
+   continue;
Here you could just check for Apply() to decide if continue should be
called or not, and Assert afterwards on HasBlockImage(). The assertion
would help checking for inconsistency errors.

@@ -7810,6 +7929,7 @@ ReadCheckpointRecord(XLogReaderState
*xlogreader, XLogRecPtr RecPtr,
}

record = ReadRecord(xlogreader, RecPtr, LOG, true);
+   info = record->xl_info & ~XLR_INFO_MASK;

if (record == NULL)
{
@@ -7852,8 +7972,8 @@ ReadCheckpointRecord(XLogReaderState
*xlogreader, XLogRecPtr RecPtr,
}
return NULL;
}
-   if (record->xl_info != XLOG_CHECKPOINT_SHUTDOWN &&
-   record->xl_info != XLOG_CHECKPOINT_ONLINE)
+   if (info != XLOG_CHECKPOINT_SHUTDOWN &&
+   info != XLOG_CHECKPOINT_ONLINE)
Those changes are not directly related to this patch, but make sure
that record checks are done correctly or this patch would just fail.
It may be better to apply those changes independently first per the
patch on this thread:
https://www.postgresql.org/message-id/CAB7nPqSWVyaZJg7_amRKVqRpEP=_=54e+762+2pf9u3q3+z...@mail.gmail.com
My recommendation is to do so.

+   /*
+* Remember that, if WAL consistency check is enabled for
the current rmid,
+* we always include backup image with the WAL record.
But, during redo we
+* restore the backup block only if needs_backup is set.
+*/
This could be rewritten a bit:
"if WAL consistency is enabled for the resource manager of this WAL
record, a full-page image is included in the record for the block
modified. During redo, the full-page is replayed only of
BKPIMAGE_APPLY is set."

- * In RBM_ZERO_* modes, if the page doesn't exist, the relation is extended
- * with all-zeroes pages up to the referenced block number.  In
- * RBM_ZERO_AND_LOCK and RBM_ZERO_AND_CLEANUP_LOCK modes, the return value
+ * In RBM_ZERO_* modes, if BKPIMAGE_APPLY flag is not set for the backup block,
+ * the relation is extended with all-zeroes pages up to the
referenced block number.
+ * In RBM_ZERO_AND_LOCK and RBM_ZERO_AND_CLEANUP_LOCK modes, the return value
  * is always BLK_NEEDS_REDO
You are forgetting to mention "if the page does not exist" in the new
comment block.

+   /* If it has a full-page image and it should be restored, do it. */
+   if (XLogRecHasBlockImage(record, block_id) &&
XLogRecBlockImageApply(record, block_id))
{
Perhaps on two lines?

The headers of the functions in bufmask.c could be more descriptive,
there should be explanations regarding in which aspect they are useful
t

Re: [HACKERS] WAL consistency check facility

2016-11-03 Thread Michael Paquier
On Fri, Nov 4, 2016 at 4:16 AM, Kuntal Ghosh  wrote:
> On Thu, Nov 3, 2016 at 8:24 PM, Robert Haas  wrote:
>> On Wed, Nov 2, 2016 at 10:30 AM, Kuntal Ghosh
>>  wrote:
>>> - Another suggestion was to remove wal_consistency from PostgresNode.pm
>>> because small buildfarm machines may suffer on it. Although I've no
>>> experience in this matter, I would like to be certain that nothings breaks
>>> in recovery tests after some modifications.
>>
>> I think running the whole test suite with this enabled is going to
>> provoke complaints from buildfarm owners.  That's too bad, because I
>> agree with you that it would be nice to have the test coverage, but it
>> seems that many of the buildfarm machines are VMs with very minimal
>> resource allocations -- or very old physical machines -- or running
>> with settings like CLOBBER_CACHE_ALWAYS that make runs very slow.  If
>> you blow on them too hard, they fall over.

Count me in. My RPIs won't like that! Actually I have a couple of
things internally mimicking the buildfarm client code on machines with
far higher capacity. And FWIW I am definitely going to enable this
option in the test suite, finishing with reports here.

> Thanks Robert. I got your point. Then, as Michael has suggested, it is nice to
> have some environment variable to pass optional conf parameters during
> tap-tests.
> Implementing this feature actually solves the problem.

We just need make PostgresNode.pm aware of something like PGTAPOPTIONS
to enforce a server started by TAP tests to append options to it.
There is already PGCTLTIMEOUT that behaves similarly. Even if this
brings extra load to buildfarm owners, that will limit complaints.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-03 Thread Kuntal Ghosh
On Thu, Nov 3, 2016 at 8:24 PM, Robert Haas  wrote:
> On Wed, Nov 2, 2016 at 10:30 AM, Kuntal Ghosh
>  wrote:
>> - Another suggestion was to remove wal_consistency from PostgresNode.pm
>> because small buildfarm machines may suffer on it. Although I've no
>> experience in this matter, I would like to be certain that nothings breaks
>> in recovery tests after some modifications.
>
> I think running the whole test suite with this enabled is going to
> provoke complaints from buildfarm owners.  That's too bad, because I
> agree with you that it would be nice to have the test coverage, but it
> seems that many of the buildfarm machines are VMs with very minimal
> resource allocations -- or very old physical machines -- or running
> with settings like CLOBBER_CACHE_ALWAYS that make runs very slow.  If
> you blow on them too hard, they fall over.
>
Thanks Robert. I got your point. Then, as Michael has suggested, it is nice to
have some environment variable to pass optional conf parameters during
tap-tests.
Implementing this feature actually solves the problem.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-03 Thread Robert Haas
On Wed, Nov 2, 2016 at 10:30 AM, Kuntal Ghosh
 wrote:
> - Another suggestion was to remove wal_consistency from PostgresNode.pm
> because small buildfarm machines may suffer on it. Although I've no
> experience in this matter, I would like to be certain that nothings breaks
> in recovery tests after some modifications.

I think running the whole test suite with this enabled is going to
provoke complaints from buildfarm owners.  That's too bad, because I
agree with you that it would be nice to have the test coverage, but it
seems that many of the buildfarm machines are VMs with very minimal
resource allocations -- or very old physical machines -- or running
with settings like CLOBBER_CACHE_ALWAYS that make runs very slow.  If
you blow on them too hard, they fall over.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-03 Thread Kuntal Ghosh
On Thu, Nov 3, 2016 at 7:47 PM, Kuntal Ghosh  wrote:
> I've updated the patch for review.
>
If an inconsistency is found, it'll just log it for now. Once, the
patch is finalized, we can
change it to FATAL as before. I was making sure that all regression
tests should pass with the patch.
It seems that there is some inconsistency in regression tests for BRIN index.

LOG:  Inconsistent page found, rel 1663/16384/30607, forknum 0, blkno 1
CONTEXT:  xlog redo at 0/9BAE08C8 for BRIN/UPDATE+INIT: heapBlk 100
pagesPerRange 1 old offnum 11, new offnum 1

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-03 Thread Kuntal Ghosh
I've updated the patch for review.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


walconsistency_v12.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-03 Thread Michael Paquier
On Thu, Nov 3, 2016 at 6:48 PM, Kuntal Ghosh  wrote:
> So, whenever we are required to use bimg_info flag, we should make
> sure that has_image
> is set.

OK, we are taking past each other here. There are two possible patterns:
- has_image is set, not apply, meaning that the image block is used
for consistency checks.
- has_image is set, as well as apply, meaning that the block needs to
be applied at redo.
So I mean exactly the same thing as you do. The point I am trying to
raise is that it would be meaningful to put in some code paths checks
of the type (apply && !has_image) and ERROR on them. Perhaps we could
just do that in xlogreader.c though. If having those checks external
to xlogreader.c makes sense, then using separate macros is more
portable.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-03 Thread Kuntal Ghosh
On Thu, Nov 3, 2016 at 2:52 PM, Michael Paquier
 wrote:
> On Thu, Nov 3, 2016 at 6:15 PM, Kuntal Ghosh  
> wrote:
>> Actually, I just verified that bimg_info is not even valid if
>> has_image is not set.
>> In DecodeXLogRecord, we initialize bimg_info only when has_image flag
>> is set. So, keeping them
>> separate doesn't look a good approach to me. If we keep them separate,
>> the output
>> of the following assert is undefined:
>> Assert(XLogRecHasBlockImage(record, block_id) ||
>> !XLogRecBlockImageApply(record, block_id)).
>>
>> Thoughts??
>
> Yes, that's exactly the reason why we should keep both macros as
> checking for separate things: apply implies that has_image is set and
> that's normal, hence we could use sanity checks by just using those
> macros and not propagating xlogreader.h.
>
No, apply doesn't mean has_image is set. If has_image is not set,
apply/bimg_info
is invalid(/undefined) and we should not use that. For example, in
XLogDumpDisplayRecord we use
bimg_info as following,
if (XLogRecHasBlockImage(record, block_id))
{
   if (record->blocks[block_id].bimg_info & BKPIMAGE_IS_COMPRESSED)
   {
   }
}
So, whenever we are required to use bimg_info flag, we should make
sure that has_image
is set.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-03 Thread Michael Paquier
On Thu, Nov 3, 2016 at 6:15 PM, Kuntal Ghosh  wrote:
> Actually, I just verified that bimg_info is not even valid if
> has_image is not set.
> In DecodeXLogRecord, we initialize bimg_info only when has_image flag
> is set. So, keeping them
> separate doesn't look a good approach to me. If we keep them separate,
> the output
> of the following assert is undefined:
> Assert(XLogRecHasBlockImage(record, block_id) ||
> !XLogRecBlockImageApply(record, block_id)).
>
> Thoughts??

Yes, that's exactly the reason why we should keep both macros as
checking for separate things: apply implies that has_image is set and
that's normal, hence we could use sanity checks by just using those
macros and not propagating xlogreader.h.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-03 Thread Michael Paquier
On Thu, Nov 3, 2016 at 5:56 PM, Kuntal Ghosh  wrote:
> I'm not getting why we should introduce a new redo action and return
> from the function beforehand.

Per my last email, same conclusion from here :)
Sorry if I am picky and noisy on many points, I am trying to think
about the value of each change introduced in this patch, particularly
if they are meaningful, can be improved in some way, or can be
simplified and make the code more simple.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-03 Thread Kuntal Ghosh
On Thu, Nov 3, 2016 at 2:27 PM, Michael Paquier
 wrote:
> On Thu, Nov 3, 2016 at 4:04 PM, Michael Paquier
>  wrote:
>> Wouldn't the definition of a new redo action make sense then? Say
>> SKIPPED. None of the existing actions match the non-apply case.
>
> I just took 5 minutes to look at the code and reason about it, and
> something like what your patch is doing would be actually fine. Still
> I don't think that checking for the apply flag in the macro routine
> should look for has_image. Let's keep things separate.
Actually, I just verified that bimg_info is not even valid if
has_image is not set.
In DecodeXLogRecord, we initialize bimg_info only when has_image flag
is set. So, keeping them
separate doesn't look a good approach to me. If we keep them separate,
the output
of the following assert is undefined:
Assert(XLogRecHasBlockImage(record, block_id) ||
!XLogRecBlockImageApply(record, block_id)).

Thoughts??
-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-03 Thread Michael Paquier
On Thu, Nov 3, 2016 at 4:04 PM, Michael Paquier
 wrote:
> Wouldn't the definition of a new redo action make sense then? Say
> SKIPPED. None of the existing actions match the non-apply case.

I just took 5 minutes to look at the code and reason about it, and
something like what your patch is doing would be actually fine. Still
I don't think that checking for the apply flag in the macro routine
should look for has_image. Let's keep things separate.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-03 Thread Kuntal Ghosh
On Thu, Nov 3, 2016 at 12:34 PM, Michael Paquier
 wrote:
> On Thu, Nov 3, 2016 at 3:24 PM, Kuntal Ghosh  
> wrote:
>> On Thu, Nov 3, 2016 at 2:35 AM, Michael Paquier
>>> -/* If it's a full-page image, restore it. */
>>> -if (XLogRecHasBlockImage(record, block_id))
>>> +/* If full-page image should be restored, do it. */
>>> +if (XLogRecBlockImageApply(record, block_id))
>>> Hm. It seems to me that this modification is incorrect. If the page
>>> does not need to be applied, aka if it needs to be used for
>>> consistency checks, what should be done is more something like the
>>> following in XLogReadBufferForRedoExtended:
>>> if (Apply(record, block_id))
>>> return;
>>> if (HasImage)
>>> {
>>> //do what needs to be done with an image
>>> }
>>> else
>>> {
>>> //do default things
>>> }
>>>
>> XLogReadBufferForRedoExtended should return a redo action
>> (block restored, done, block needs redo or block not found). So, we
>> can't just return
>> from the function if BLKIMAGE_APPLY flag is not set. It still has to
>> check whether a
>> redo is required or not.
>
> Wouldn't the definition of a new redo action make sense then? Say
> SKIPPED. None of the existing actions match the non-apply case.

As per my understanding, XLogReadBufferForRedoExtended works as follows:
1.  If wal record has backup block
2.  {
3.   restore the backup block;
4.   return BLK_RESTORED;
5.  }
6.  else
7.  {
8.   If block found in buffer
10.  If lsn of block is less than last replayed record
11.   return BLK_DONE;
12.  else
13.   return BLK_NEEDS_REDO;
14. else
15.  return BLK_NOT_FOUND;
16. }
Now, we can just change step 1 as follows:
1.  If wal record has backup block and it needs to be restored.

I'm not getting why we should introduce a new redo action and return
from the function beforehand.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-03 Thread Michael Paquier
On Thu, Nov 3, 2016 at 3:24 PM, Kuntal Ghosh  wrote:
> On Thu, Nov 3, 2016 at 2:35 AM, Michael Paquier
>  wrote:
>>> - Another suggestion was to remove wal_consistency from PostgresNode.pm
>>> because small buildfarm machines may suffer on it. Although I've no
>>> experience in this matter, I would like to be certain that nothings breaks
>>> in recovery tests after some modifications.
>>
>> An extra idea that I have here would be to extend the TAP tests to
>> accept an environment variable that would be used to specify extra
>> options when starting Postgres instances. Buildfarm machines could use
>> it.
>>
> It can be added as a separate feature.
>
>>
>> -/* If it's a full-page image, restore it. */
>> -if (XLogRecHasBlockImage(record, block_id))
>> +/* If full-page image should be restored, do it. */
>> +if (XLogRecBlockImageApply(record, block_id))
>> Hm. It seems to me that this modification is incorrect. If the page
>> does not need to be applied, aka if it needs to be used for
>> consistency checks, what should be done is more something like the
>> following in XLogReadBufferForRedoExtended:
>> if (Apply(record, block_id))
>> return;
>> if (HasImage)
>> {
>> //do what needs to be done with an image
>> }
>> else
>> {
>> //do default things
>> }
>>
> XLogReadBufferForRedoExtended should return a redo action
> (block restored, done, block needs redo or block not found). So, we
> can't just return
> from the function if BLKIMAGE_APPLY flag is not set. It still has to
> check whether a
> redo is required or not.

Wouldn't the definition of a new redo action make sense then? Say
SKIPPED. None of the existing actions match the non-apply case.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-02 Thread Kuntal Ghosh
On Thu, Nov 3, 2016 at 2:35 AM, Michael Paquier
 wrote:
>> - Another suggestion was to remove wal_consistency from PostgresNode.pm
>> because small buildfarm machines may suffer on it. Although I've no
>> experience in this matter, I would like to be certain that nothings breaks
>> in recovery tests after some modifications.
>
> An extra idea that I have here would be to extend the TAP tests to
> accept an environment variable that would be used to specify extra
> options when starting Postgres instances. Buildfarm machines could use
> it.
>
It can be added as a separate feature.

>
> -/* If it's a full-page image, restore it. */
> -if (XLogRecHasBlockImage(record, block_id))
> +/* If full-page image should be restored, do it. */
> +if (XLogRecBlockImageApply(record, block_id))
> Hm. It seems to me that this modification is incorrect. If the page
> does not need to be applied, aka if it needs to be used for
> consistency checks, what should be done is more something like the
> following in XLogReadBufferForRedoExtended:
> if (Apply(record, block_id))
> return;
> if (HasImage)
> {
> //do what needs to be done with an image
> }
> else
> {
> //do default things
> }
>
XLogReadBufferForRedoExtended should return a redo action
(block restored, done, block needs redo or block not found). So, we
can't just return
from the function if BLKIMAGE_APPLY flag is not set. It still has to
check whether a
redo is required or not.

> XLogRecBlockImageApply() should only check for BKP_APPLY and not imply
> HasImage(). This will be more flexible when for example using it for
> assertions.
seems reasonable.

> It would be more consistent to have a boolean flag to treat
> BKPIMAGE_APPLY in xlogreader.c. Have a look at has_image for example.
For flags in bimg_info, we directly check if the mask bit is set in bimg_info
(ex: hole, compressed). Besides, we use this flag only at
XLogReadBufferForRedoExtended.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-02 Thread Michael Paquier
On Wed, Nov 2, 2016 at 11:30 PM, Kuntal Ghosh
 wrote:
> On Wed, Nov 2, 2016 at 1:11 PM, Kuntal Ghosh  
> wrote:
>> Rest of the suggestions are well-taken. I'll update the patch accordingly.
> I've updated the last submitted patch(v10) with the following changes:
> - added a block level flag BKPIMAGE_APPLY to distinguish backup image
> blocks which needs to be restored during replay.
> - at present, hash index operations are not WAL-logged. Hence, I've removed
> the consistency check option for hash indices. It can be added later.

Both make sense.

> - Another suggestion was to remove wal_consistency from PostgresNode.pm
> because small buildfarm machines may suffer on it. Although I've no
> experience in this matter, I would like to be certain that nothings breaks
> in recovery tests after some modifications.

An extra idea that I have here would be to extend the TAP tests to
accept an environment variable that would be used to specify extra
options when starting Postgres instances. Buildfarm machines could use
it.

+/*
+ * Remember that, if WAL consistency check is enabled for
the current rmid,
+ * we always include backup image with the WAL record.
But, during redo we
+ * restore the backup block only if needs_backup is set.
+ */
+if (needs_backup)
+bimg.bimg_info |= BKPIMAGE_APPLY;
+
+
You should be careful about extra newlines and noise in the code.

-/* If it's a full-page image, restore it. */
-if (XLogRecHasBlockImage(record, block_id))
+/* If full-page image should be restored, do it. */
+if (XLogRecBlockImageApply(record, block_id))
Hm. It seems to me that this modification is incorrect. If the page
does not need to be applied, aka if it needs to be used for
consistency checks, what should be done is more something like the
following in XLogReadBufferForRedoExtended:
if (Apply(record, block_id))
return;
if (HasImage)
{
//do what needs to be done with an image
}
else
{
//do default things
}

XLogRecBlockImageApply() should only check for BKP_APPLY and not imply
HasImage(). This will be more flexible when for example using it for
assertions.

With this patch the comments on top of XLogReadBufferForRedo are
wrong. A block is not unconditionally applied.

+#define XLogRecBlockImageApply(decoder, block_id) \
+(XLogRecHasBlockImage(decoder, block_id) && \
+(((decoder)->blocks[block_id].bimg_info & BKPIMAGE_APPLY) > 0))
Maybe != 0? That's the most common practice in the code.

It would be more consistent to have a boolean flag to treat
BKPIMAGE_APPLY in xlogreader.c. Have a look at has_image for example.
This will as well reduce dependencies on the header xlog_record.h

+/*
+ * For a speculative tuple, the content of t_ctid is conflicting
+ * between the backup page and current page. Hence, we set it
+ * to the current block number and current offset.
+ */
+if (HeapTupleHeaderIsSpeculative(page_htup))
+ItemPointerSet(&page_htup->t_ctid, blkno, off);
In the set of masking functions this is the only portion of the code
depending on blkno. But isn't that actually a bug in speculative
inserts? Andres (added now in CC), Peter, could you provide some input
regarding that? The masking functions should not prevent the detection
of future errors, and this code is making me uneasy.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-02 Thread Kuntal Ghosh
On Wed, Nov 2, 2016 at 1:11 PM, Kuntal Ghosh  wrote:
> Rest of the suggestions are well-taken. I'll update the patch accordingly.
I've updated the last submitted patch(v10) with the following changes:
- added a block level flag BKPIMAGE_APPLY to distinguish backup image
blocks which needs to be restored during replay.
- at present, hash index operations are not WAL-logged. Hence, I've removed
the consistency check option for hash indices. It can be added later.

Few comments:
- Michael suggested to use an integer variable and bitwise-shift
operation to store
the RMGR values instead of using a boolean array. But, boolean array
implementation looks cleaner to me. For example,
+if (wal_consistency[rmid])
+   rechdr->xl_info |= XLR_CHECK_CONSISTENCY;

+include_image = needs_backup || wal_consistency[rmid];

- Another suggestion was to remove wal_consistency from PostgresNode.pm
because small buildfarm machines may suffer on it. Although I've no
experience in this matter, I would like to be certain that nothings breaks
in recovery tests after some modifications.

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


walconsistency_v11.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-02 Thread Kuntal Ghosh
On Wed, Nov 2, 2016 at 1:34 PM, Michael Paquier
 wrote:
> Hm... Right. That was broken. And actually, while the record-level
> flag is useful so as you don't need to rely on checking
> wal_consistency when doing WAL redo, the block-level flag is useful to
> make a distinction between blocks that have to be replayed and the
> ones that are used only for consistency, and both types could be mixed
> in a record. Using it in bimg_info would be fine... Perhaps a better
> name for the flag would be something like BKPIMAGE_APPLY, to mean that
> the FPW needs to be applied at redo. Or BKPIMAGE_IGNORE, to bypass it
> when replaying it. IS_REQUIRED_FOR_REDO is quite confusing.
BKPIMAGE_APPLY seems reasonable.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-02 Thread Michael Paquier
On Wed, Nov 2, 2016 at 4:41 PM, Kuntal Ghosh  wrote:
> On Wed, Nov 2, 2016 at 10:23 AM, Michael Paquier
>  wrote:
>> On Tue, Nov 1, 2016 at 10:31 PM, Robert Haas  wrote:
>>> IMHO, your rewrite of this patch was a bit heavy-handed.
>>
>> OK... Sorry for that.
>>
>>> I haven't
>>> scrutinized the code here so maybe it was a big improvement, and if so
>>> fine, but if not it's better to collaborate with the author than to
>>> take over.
>>
>> While reviewing the code, that has finished by being a large rewrite,
>> and that was more understandable than a review looking at all the
>> small tweaks and things I have been through while reading it. I have
>> also experimented a couple of ideas with the patch that I added, so at
>> the end it proves to be a gain for everybody. I think that the last
>> patch is an improvement, if you want to make your own opinion on the
>> matter looking at the differences between both patches would be the
>> most direct way to go.
>>
> If my understanding is correct regarding this feature, last two patches
> completely break the fundamental idea of wal consistency check feature.
> I mentioned this in my last reply as well that we've to use some flag
> to indicate
> whether an image should be restored during replay or not. Otherwise,
> XLogReadBufferForRedoExtended will always restore the image skipping the usual
> redo operation. What's happening now is the following:
> 1. If wal_consistency is on, include backup block image with the wal record.
> 2. During replay, XLogReadBufferForRedoExtended always restores the backup 
> block
> image in local buffer since XLogRecHasBlockImage is true for each block.
> 3. In checkConsistency, you compare the local buffer with the backup block 
> image
> from the wal record. It'll always be consistent.
> This feature aims to validate whether wal replay operation is
> happening correctly or not.
> To achieve that aim, we should not alter the wal replay operation itself.

Hm... Right. That was broken. And actually, while the record-level
flag is useful so as you don't need to rely on checking
wal_consistency when doing WAL redo, the block-level flag is useful to
make a distinction between blocks that have to be replayed and the
ones that are used only for consistency, and both types could be mixed
in a record. Using it in bimg_info would be fine... Perhaps a better
name for the flag would be something like BKPIMAGE_APPLY, to mean that
the FPW needs to be applied at redo. Or BKPIMAGE_IGNORE, to bypass it
when replaying it. IS_REQUIRED_FOR_REDO is quite confusing.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-02 Thread Kuntal Ghosh
On Wed, Nov 2, 2016 at 10:23 AM, Michael Paquier
 wrote:
> On Tue, Nov 1, 2016 at 10:31 PM, Robert Haas  wrote:
>> IMHO, your rewrite of this patch was a bit heavy-handed.
>
> OK... Sorry for that.
>
>> I haven't
>> scrutinized the code here so maybe it was a big improvement, and if so
>> fine, but if not it's better to collaborate with the author than to
>> take over.
>
> While reviewing the code, that has finished by being a large rewrite,
> and that was more understandable than a review looking at all the
> small tweaks and things I have been through while reading it. I have
> also experimented a couple of ideas with the patch that I added, so at
> the end it proves to be a gain for everybody. I think that the last
> patch is an improvement, if you want to make your own opinion on the
> matter looking at the differences between both patches would be the
> most direct way to go.
>
If my understanding is correct regarding this feature, last two patches
completely break the fundamental idea of wal consistency check feature.
I mentioned this in my last reply as well that we've to use some flag
to indicate
whether an image should be restored during replay or not. Otherwise,
XLogReadBufferForRedoExtended will always restore the image skipping the usual
redo operation. What's happening now is the following:
1. If wal_consistency is on, include backup block image with the wal record.
2. During replay, XLogReadBufferForRedoExtended always restores the backup block
image in local buffer since XLogRecHasBlockImage is true for each block.
3. In checkConsistency, you compare the local buffer with the backup block image
from the wal record. It'll always be consistent.
This feature aims to validate whether wal replay operation is
happening correctly or not.
To achieve that aim, we should not alter the wal replay operation itself.

Rest of the suggestions are well-taken. I'll update the patch accordingly.
-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-01 Thread Michael Paquier
On Tue, Nov 1, 2016 at 10:31 PM, Robert Haas  wrote:
> IMHO, your rewrite of this patch was a bit heavy-handed.

OK... Sorry for that.

> I haven't
> scrutinized the code here so maybe it was a big improvement, and if so
> fine, but if not it's better to collaborate with the author than to
> take over.

While reviewing the code, that has finished by being a large rewrite,
and that was more understandable than a review looking at all the
small tweaks and things I have been through while reading it. I have
also experimented a couple of ideas with the patch that I added, so at
the end it proves to be a gain for everybody. I think that the last
patch is an improvement, if you want to make your own opinion on the
matter looking at the differences between both patches would be the
most direct way to go.

> In any case, yeah, I think you should put that back.

Here you go with this parameter back and the allocation of the masked
buffers done beforehand, close to the moment the XLogReader is
allocated actually. I have also removed wal_consistency from
PostgresNode.pm, small buildfarm machines would really suffer on it,
and hamster is very good to track race conditions when running TAP
tests. On top of that I have replaced a bunch of 0xF thingies by
their PG_UINT_MAX equivalents to keep things cleaner.

Now, I have put back the GUC-related code exactly to the same shape as
it was originally. Here are a couple of comments regarding it after
review:
- Let's drop 'none' as a magic keyword. Users are going to use an
empty string, and the default should be defined as such IMO.
- Using an allocated array of booleans to store the values of each
RMGRs could be replaced by an integer using bitwise shifts. Your
option looks better and makes the code cleaner.

A more nitpick remark: code comments don't refer much to RMIDs, but
they use the term "resource managers" more generally. I'd suggest to
do the same.
-- 
Michael


walconsistency_v10.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-01 Thread Peter Geoghegan
On Mon, Oct 31, 2016 at 5:31 AM, Robert Haas  wrote:
> On Fri, Oct 28, 2016 at 2:05 AM, Michael Paquier
>  wrote:
>> And here we go. Here is a review as well as a large brush-up for this
>> patch. A couple of things:
>> - wal_consistency is using a list of RMGRs, at the cost of being
>> PGC_POSTMASTER. I'd suggest making it PGC_SUSER, and use a boolean (I
>> have been thinking hard about that, and still I don't see the point).
>> It is rather easy to for example default it to false, and enable it to
>> true to check if a certain code path is correctly exercised or not for
>> WAL consistency. Note that this simplification reduces the patch size
>> by 100~150 lines. I know, I know, I'd expect some complains about
>> that
>
> I don't understand how you can fail to see the point of that.  As you
> yourself said, this facility generates a ton of WAL.  If you're
> focusing on one AM, why would you want to be forced to incur the
> overhead for every other AM?  A good deal has been written about this
> upthread already, and just saying "I don't see the point" seems to be
> ignoring the explanations already given.

+1. I strongly agree.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-01 Thread Robert Haas
On Mon, Oct 31, 2016 at 5:51 PM, Michael Paquier
 wrote:
> Hehe, I was expecting you to jump on those lines. While looking at the
> patch I have simplified it first to focus on the core engine of the
> thing. Adding back this code sounds fine to me as there is a wall of
> contestation. I offer to do it myself if the effort is the problem.

IMHO, your rewrite of this patch was a bit heavy-handed.  I haven't
scrutinized the code here so maybe it was a big improvement, and if so
fine, but if not it's better to collaborate with the author than to
take over.  In any case, yeah, I think you should put that back.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-10-31 Thread Michael Paquier
On Mon, Oct 31, 2016 at 9:31 PM, Robert Haas  wrote:
> On Fri, Oct 28, 2016 at 2:05 AM, Michael Paquier
>> - Instead of palloc'ing the old and new pages to compare, it would be
>> more performant to keep around two static buffers worth of BLCKSZ and
>> just use that. This way there is no need as well to perform any palloc
>> calls in the masking functions, limiting the risk of errors (those
>> code paths had better avoid errors IMO). It would be also less costly
>> to just pass to the masking function a pointer to a buffer of size
>> BLCKSZ and just do the masking on it.
>
> We always palloc buffers like this so that they will be aligned.  But
> we could arrange not to repeat the palloc every time (see, e.g.,
> BootstrapXLOG()).

Yeah, we could go with that and there is clearly no reason to not do so.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-10-31 Thread Michael Paquier
On Mon, Oct 31, 2016 at 9:31 PM, Robert Haas  wrote:
> On Fri, Oct 28, 2016 at 2:05 AM, Michael Paquier
>  wrote:
>> And here we go. Here is a review as well as a large brush-up for this
>> patch. A couple of things:
>> - wal_consistency is using a list of RMGRs, at the cost of being
>> PGC_POSTMASTER. I'd suggest making it PGC_SUSER, and use a boolean (I
>> have been thinking hard about that, and still I don't see the point).
>> It is rather easy to for example default it to false, and enable it to
>> true to check if a certain code path is correctly exercised or not for
>> WAL consistency. Note that this simplification reduces the patch size
>> by 100~150 lines. I know, I know, I'd expect some complains about
>> that
>
> I don't understand how you can fail to see the point of that.  As you
> yourself said, this facility generates a ton of WAL.  If you're
> focusing on one AM, why would you want to be forced to incur the
> overhead for every other AM?  A good deal has been written about this
> upthread already, and just saying "I don't see the point" seems to be
> ignoring the explanations already given.

Hehe, I was expecting you to jump on those lines. While looking at the
patch I have simplified it first to focus on the core engine of the
thing. Adding back this code sounds fine to me as there is a wall of
contestation. I offer to do it myself if the effort is the problem.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-10-31 Thread Kuntal Ghosh
On Fri, Oct 28, 2016 at 11:35 AM, Michael Paquier
 wrote:
> And here we go. Here is a review as well as a large brush-up for this
> patch. A couple of things:
Thanks for reviewing the patch in detail.

> - Speaking of which using BKPIMAGE_IS_REQUIRED_FOR_REDO stored in the
> block definition is sort of weird because we want to know if
> consistency should be checked at a higher level.
A full page image can be included in the WAL record because of the following
reasons:
1. It needs to be restored during replay.
2. WAL consistency should be checked for the record.
3. Both of above.
In your patch, you've included a full page image whenever wal_consistency
is true. So, XLogReadBufferForRedoExtended always restores the image
and returns BLK_RESTORED, which is unacceptable. We can't change
the default WAL replay behaviour. A full image should only be restored if it is
necessary to do so. Although, I agree that BKPIMAGE_IS_REQUIRED_FOR_REDO
doesn't look a clean way to implement this feature.

> - wal_consistency is using a list of RMGRs, at the cost of being
> PGC_POSTMASTER. I'd suggest making it PGC_SUSER, and use a boolean (I
> have been thinking hard about that, and still I don't see the point).
> It is rather easy to for example default it to false, and enable it to
> true to check if a certain code path is correctly exercised or not for
> WAL consistency. Note that this simplification reduces the patch size
> by 100~150 lines. I know, I know, I'd expect some complains about
> that
As Robert also told, if I'm focusing on a single AM, I really don't
want to store
full images and perform consistency check for other AMs.

> On top of that, I have done a fair amount of testing, creating
> manually some inconsistencies in the REDO routines to trigger failures
> on standbys. And that was sort of fun to break things intentionally.
I know the feeling. :)

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-10-31 Thread Robert Haas
On Fri, Oct 28, 2016 at 2:05 AM, Michael Paquier
 wrote:
> And here we go. Here is a review as well as a large brush-up for this
> patch. A couple of things:
> - wal_consistency is using a list of RMGRs, at the cost of being
> PGC_POSTMASTER. I'd suggest making it PGC_SUSER, and use a boolean (I
> have been thinking hard about that, and still I don't see the point).
> It is rather easy to for example default it to false, and enable it to
> true to check if a certain code path is correctly exercised or not for
> WAL consistency. Note that this simplification reduces the patch size
> by 100~150 lines. I know, I know, I'd expect some complains about
> that

I don't understand how you can fail to see the point of that.  As you
yourself said, this facility generates a ton of WAL.  If you're
focusing on one AM, why would you want to be forced to incur the
overhead for every other AM?  A good deal has been written about this
upthread already, and just saying "I don't see the point" seems to be
ignoring the explanations already given.

> - Looking for wal_consistency at replay has no real value. What if on
> a standby the parameter value is inconsistent than the one on the
> master? This logic adds a whole new level of complications and
> potential bugs. So instead my suggestion is to add a marker at WAL
> record level to check if this record should be checked for consistency
> at replay or not.

Agreed.

> This is also quite flexible if you think about it,
> the standby is made independent of the WAL generated on the master and
> just applies, or checks what it sees is fit checking for.

+1.

> The best
> match here is to add a flag for XLogRecord->xl_info and make use of
> one of the low 4 bits and only one is used now for
> XLR_SPECIAL_REL_UPDATE.

Seems reasonable.

> - in maskPage, the new rmgr routine, there is no need for the info and
> blkno arguments. info is not used at all to begin with. blkno is used
> for gin pages to detect meta pages but this can be guessed using the
> opaque pointer. For heap pages and speculative inserts, masking the
> blkno would be fine. That's not worth it.

Passing the blkno doesn't cost anything.  If it avoids guessing,
that's entirely worth it.

> - Instead of palloc'ing the old and new pages to compare, it would be
> more performant to keep around two static buffers worth of BLCKSZ and
> just use that. This way there is no need as well to perform any palloc
> calls in the masking functions, limiting the risk of errors (those
> code paths had better avoid errors IMO). It would be also less costly
> to just pass to the masking function a pointer to a buffer of size
> BLCKSZ and just do the masking on it.

We always palloc buffers like this so that they will be aligned.  But
we could arrange not to repeat the palloc every time (see, e.g.,
BootstrapXLOG()).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-10-27 Thread Michael Paquier
On Thu, Oct 27, 2016 at 5:08 PM, Michael Paquier
 wrote:
> On Thu, Sep 29, 2016 at 12:49 PM, Michael Paquier
>  wrote:
>> Seeing nothing happening, I have moved the patch to next CF as there
>> is a new version, but no reviews for it.
>
> Just a note for anybody potentially looking at this patch. I am
> currently looking at it in depth, and will post a new version of the
> patch in a couple of days with review comments. Thanks.

And here we go. Here is a review as well as a large brush-up for this
patch. A couple of things:
- wal_consistency is using a list of RMGRs, at the cost of being
PGC_POSTMASTER. I'd suggest making it PGC_SUSER, and use a boolean (I
have been thinking hard about that, and still I don't see the point).
It is rather easy to for example default it to false, and enable it to
true to check if a certain code path is correctly exercised or not for
WAL consistency. Note that this simplification reduces the patch size
by 100~150 lines. I know, I know, I'd expect some complains about
that
- Looking for wal_consistency at replay has no real value. What if on
a standby the parameter value is inconsistent than the one on the
master? This logic adds a whole new level of complications and
potential bugs. So instead my suggestion is to add a marker at WAL
record level to check if this record should be checked for consistency
at replay or not. This is also quite flexible if you think about it,
the standby is made independent of the WAL generated on the master and
just applies, or checks what it sees is fit checking for. The best
match here is to add a flag for XLogRecord->xl_info and make use of
one of the low 4 bits and only one is used now for
XLR_SPECIAL_REL_UPDATE. An interesting side effect of this approach is
that callers of XLogInsert can set XLR_CHECK_CONSISTENCY to enforce a
consistency check even if wal_consistency is off. It is true that we
could register such a data via XLogRegisterBuffer() instead, though
the 4 bits with the BKPBLOCK_* flags are already occupied so that
would induce a record penalty length and I have a hard time believing
that one would like to check the consistency of a record in
particular.
- Speaking of which using BKPIMAGE_IS_REQUIRED_FOR_REDO stored in the
block definition is sort of weird because we want to know if
consistency should be checked at a higher level.
- in maskPage, the new rmgr routine, there is no need for the info and
blkno arguments. info is not used at all to begin with. blkno is used
for gin pages to detect meta pages but this can be guessed using the
opaque pointer. For heap pages and speculative inserts, masking the
blkno would be fine. That's not worth it.
- Instead of palloc'ing the old and new pages to compare, it would be
more performant to keep around two static buffers worth of BLCKSZ and
just use that. This way there is no need as well to perform any palloc
calls in the masking functions, limiting the risk of errors (those
code paths had better avoid errors IMO). It would be also less costly
to just pass to the masking function a pointer to a buffer of size
BLCKSZ and just do the masking on it.
- The masking routine names can be more generic, like XXX_mask(char
*page). No need to say page, we already know they work on it via the
argument provided.
- mask_xlog_page and mask_generic_page are useless as the block
restored comes directly from a FPW, so you are comparing basically a
FPW with itself.
- In checkConsistency, there is no need to allocate the old page. As
RestorebackupImage stores the data in an already allocated buffer, you
can reuse the same location as the buffer masked afterwards.
- Removed comparePages(), using memcmp instead for simplicity(). This
does not show up the exact location of the inconsistency, still that
won't be a win as there could be more than one inconsistency across a
page. So this gives an invitation to user to look at the exact
context. memcmp can be used anyway to understand where is the
inconsistency if need be.
- I have noticed that mask_common_page is meaningfull just for the
sequence RMGR, and just that does not justify its existence so I
ripped it off.
- PostgresNode.pm enables wal_consistency. Seeing the high amount of
WAL this produces, I feel cold about doing that, the patch does
include it btw...
- Standbys now stop with FATAL when an inconsistency is found. This
makes error detection easier on buildfarm machines.
- A couple of masking functions still use 0xFF or similar marks.
Those should be replaced by MASK_MARKING. Not done that yet.
- Some of the masking routines should be refined, particularly the
heap and GIn functions. I did not spend time yet to do it.

On top of that, I have done a fair amount of testing, creating
manually some inconsistencies in the REDO routines to trigger failures
on standbys. And that was sort of fun to break things intentionally.

Another fun thing is the large amount of WAL that this generates (!),
so anyone willing to enable that in production would be cra

Re: [HACKERS] WAL consistency check facility

2016-10-27 Thread Michael Paquier
On Thu, Sep 29, 2016 at 12:49 PM, Michael Paquier
 wrote:
> Seeing nothing happening, I have moved the patch to next CF as there
> is a new version, but no reviews for it.

Just a note for anybody potentially looking at this patch. I am
currently looking at it in depth, and will post a new version of the
patch in a couple of days with review comments. Thanks.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-09-28 Thread Michael Paquier
On Fri, Sep 16, 2016 at 10:36 PM, Michael Paquier
 wrote:
> On Fri, Sep 16, 2016 at 10:30 PM, Robert Haas  wrote:
>> I don't think you have the right to tell Kuntal that he has to move
>> the patch to the next CommitFest because there are unspecified things
>> about the current version you don't like.  If you don't have time to
>> review further, that's your call, but he can leave the patch as Needs
>> Review and see if someone else has time.
>
> No complain from here if done this way. I don't mean any offense :)

Seeing nothing happening, I have moved the patch to next CF as there
is a new version, but no reviews for it.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-09-16 Thread Michael Paquier
On Fri, Sep 16, 2016 at 10:30 PM, Robert Haas  wrote:
> I don't think you have the right to tell Kuntal that he has to move
> the patch to the next CommitFest because there are unspecified things
> about the current version you don't like.  If you don't have time to
> review further, that's your call, but he can leave the patch as Needs
> Review and see if someone else has time.

No complain from here if done this way. I don't mean any offense :)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-09-16 Thread Robert Haas
On Thu, Sep 15, 2016 at 9:23 PM, Michael Paquier
 wrote:
> On Thu, Sep 15, 2016 at 7:30 PM, Kuntal Ghosh
>  wrote:
>> Thoughts?
>
> There are still a couple of things that this patch makes me unhappy,
> particularly the handling of the GUC with the xlogreader flags. I am
> not sure if I'll be able to look at that again within the next couple
> of weeks, but please be sure that this is registered in the next
> commit fest. You could for example do that by changing the patch from
> "Returned with Feedback" to "Moved to next CF" in the commit fest app.
> Be sure as well to spend a couple of cycles in reviewing patches.
> Usually for one patch sent, that's one patch of equal difficulty to
> review, and there are many patch still waiting for feedback.

I don't think you have the right to tell Kuntal that he has to move
the patch to the next CommitFest because there are unspecified things
about the current version you don't like.  If you don't have time to
review further, that's your call, but he can leave the patch as Needs
Review and see if someone else has time.

You are right that he should review some other people's patches, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-09-15 Thread Michael Paquier
On Thu, Sep 15, 2016 at 7:30 PM, Kuntal Ghosh
 wrote:
> Thoughts?

There are still a couple of things that this patch makes me unhappy,
particularly the handling of the GUC with the xlogreader flags. I am
not sure if I'll be able to look at that again within the next couple
of weeks, but please be sure that this is registered in the next
commit fest. You could for example do that by changing the patch from
"Returned with Feedback" to "Moved to next CF" in the commit fest app.
Be sure as well to spend a couple of cycles in reviewing patches.
Usually for one patch sent, that's one patch of equal difficulty to
review, and there are many patch still waiting for feedback.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-09-15 Thread Kuntal Ghosh
Hello,

I've added the updated the patch with the necessary documentation and comments.
I've referenced Robert's reply in this thread and Simon's reply in
Production block comparison facility thread to write the documentation.

This feature is used to check the consistency of WAL records, i.e,
whether the WAL records are inserted and applied correctly.
A guc parameter named wal_consistency is added to enable this feature.
When wal_consistency is enabled for a WAL record, it stores a full-page image
along with the record. When a full-page image arrives during redo, it compares
against the current page to check whether both are consistent.

The default value for this setting is none. To check all records written to the
write-ahead log, set this parameter to all. To check only some records, specify
a comma-separated list of resource managers. The resource managers which
are currently supported are xlog, heap2, heap, btree, hash, gin, gist, spgist,
sequence, brin and generic.

If any inconsistency is detected, it throws a WARNING. But, as per discussions
in the earlier threads, it can be changed to ERROR./FATAL(just a one
word change).
I've kept this as warning because of some inconsistency in BRIN VACUUM
during gmake check.

In recovery tests, I've enabled this feature in PostgresNode.pm.

Thanks to Amit, Dilip, Michael, Simon and Robert for their valuable feedbacks.

Thoughts?

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index cd66abc..8b251b7 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2470,6 +2470,35 @@ include_dir 'conf.d'
   
  
 
+ 
+  wal_consistency (string)
+  
+   wal_consistency configuration parameter
+  
+  
+  
+   
+This parameter is used to check the consistency of WAL records, i.e,
+whether the WAL records are inserted and applied correctly. When
+wal_consistency is enabled for a WAL record, it
+stores a full-page image along with the record. When a full-page image
+arrives during redo, it compares against the current page to check whether
+both are consistent.
+   
+
+   
+The default value for this setting is none. To check
+all records written to the write-ahead log, set this parameter to
+all. To check only some records, specify a
+comma-separated list of resource managers. The resource managers
+which are currently supported are xlog, heap2,
+heap, btree, hash, gin,
+gist, spgist, sequence, brin
+and generic.
+   
+  
+ 
+
  
   wal_buffers (integer)
   
diff --git a/src/backend/access/brin/brin_xlog.c b/src/backend/access/brin/brin_xlog.c
index 5a6b728..f4feb88 100644
--- a/src/backend/access/brin/brin_xlog.c
+++ b/src/backend/access/brin/brin_xlog.c
@@ -14,6 +14,7 @@
 #include "access/brin_pageops.h"
 #include "access/brin_xlog.h"
 #include "access/xlogutils.h"
+#include "storage/bufmask.h"
 
 
 /*
@@ -279,3 +280,45 @@ brin_redo(XLogReaderState *record)
 			elog(PANIC, "brin_redo: unknown op code %u", info);
 	}
 }
+
+/*
+ * Mask a BRIN page
+ */
+char *
+mask_brin_page(uint8 info, BlockNumber blkno, const char *page)
+{
+	Page	page_norm;
+	OffsetNumber offnum,
+maxoff;
+
+	page_norm = (Page) palloc(BLCKSZ);
+	memcpy(page_norm, page, BLCKSZ);
+
+	/*
+	 * Mask the Page LSN. Because, we store the page before updating the LSN.
+	 * Hence, LSNs of both pages will always be different.
+	 */
+	mask_page_lsn(page_norm);
+
+	mask_page_hint_bits(page_norm);
+
+	if (BRIN_IS_REGULAR_PAGE(page_norm))
+	{
+		mask_unused_space(page_norm);
+
+		maxoff = PageGetMaxOffsetNumber(page_norm);
+		for (offnum = FirstOffsetNumber;
+			 offnum <= maxoff;
+			 offnum = OffsetNumberNext(offnum))
+		{
+			ItemId		itemId = PageGetItemId(page_norm, offnum);
+
+			if (ItemIdIsUsed(itemId))
+itemId->lp_flags = LP_UNUSED;
+		}
+	}
+
+	/* We need to handle brin pages of type Meta and Revmap if needed */
+
+	return (char *)page_norm;
+}
diff --git a/src/backend/access/gin/ginxlog.c b/src/backend/access/gin/ginxlog.c
index a40f168..a247807 100644
--- a/src/backend/access/gin/ginxlog.c
+++ b/src/backend/access/gin/ginxlog.c
@@ -15,6 +15,7 @@
 
 #include "access/gin_private.h"
 #include "access/xlogutils.h"
+#include "storage/bufmask.h"
 #include "utils/memutils.h"
 
 static MemoryContext opCtx;		/* working memory for operations */
@@ -758,3 +759,40 @@ gin_xlog_cleanup(void)
 	MemoryContextDelete(opCtx);
 	opCtx = NULL;
 }
+
+/*
+ * Mask a Gin page
+ */
+char *
+mask_gin_page(uint8 info, BlockNumber blkno, const char *page)
+{
+	Page	page_norm;
+	GinPageOpaque opaque;
+
+	page_norm = (Page) palloc(BLCKSZ);
+	memcpy(page_norm, page, BLCKSZ);
+
+	/*
+	 * Mask the Page LSN. Because, we store the page before updating the LSN.
+	 * Hence, LSNs of both pages will always be different.
+	 */
+	mask_page_lsn(page_norm

Re: [HACKERS] WAL consistency check facility

2016-09-14 Thread Robert Haas
On Tue, Sep 13, 2016 at 9:04 PM, Michael Paquier
 wrote:
> It seems to me that you need to think about the way to document things
> properly first, with for example:
> - Have a first documentation patch that explains what is a resource
> manager for WAL, and what are the types available with a nice table.
> - Add in your patch documentation to explain what are the benefits of
> using this facility, the main purpose is testing, but there are also
> mention upthread about users that would like to get that into
> production, assuming that the overhead is minimal.

So, I don't think that this patch should be required to document all
of the currently-undocumented stuff that somebody might want to know
that it is related to this patch.  It should be enough to documented
the patch itself.   One paragraph in config.sgml in the usual format
should be fine.  Maybe two paragraphs.  We do need to list the
resource managers, but that can just be something like this:

The default value of for this setting is off.  To check
all records written to the write-ahead log, set this parameter to
all.  To check only same records, specify a
comma-separated list of resource managers.  The resource managers
which are currently supported are heap, btree,
hash, BLAH, and BLAH.

If somebody wants to write some user-facing documentation of the
write-ahead log format, great.  That could certainly be very helpful
for people who are running pg_xlogdump.  But I don't think that stuff
goes in this patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-09-14 Thread Kuntal Ghosh
On Thu, Sep 8, 2016 at 1:20 PM, Michael Paquier
 wrote:
>
>> 2. For BRIN/UPDATE+INIT, block numbers (in rm_tid[0]) are different in
>> REVMAP page. This happens only for two cases. I'm not sure what the
>> reason can be.
>
> Hm? This smells like a block reference bug. What are the cases you are
> referring to?
>
Following is the only case where the backup page stored in the wal
record and the current page after redo are not consistent.

test:BRIN using gmake-check

Master
-
STATEMENT:  VACUUM brintest;
LOG:  INSERT @ 0/59E1E0F8:  - BRIN/UPDATE+INIT: heapBlk 100
pagesPerRange 1 old offnum 11, new offnum 1

Standby
--
LOG:  REDO @ 0/59E1B500; LSN 0/59E1E0F8: prev 0/59E17578; xid 0; len
14; blkref #0: rel 1663/16384/30556, blk 12; blkref #1: rel
1663/16384/30556, blk 1; blkref #2: rel 1663/16384/30556, blk 2 -
BRIN/UPDATE+INIT: heapBlk 100 pagesPerRange 1 old offnum 11, new
offnum 1

WARNING:  Inconsistent page (at byte 26) found, rel 1663/16384/30556,
forknum 0, blkno 1
CONTEXT:  xlog redo at 0/59E1B500 for BRIN/UPDATE+INIT: heapBlk 100
pagesPerRange 1 old offnum 11, new offnum 1

thoughts?
-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-09-14 Thread Kuntal Ghosh
On Wed, Sep 14, 2016 at 11:31 AM, Michael Paquier
 wrote:
> On Wed, Sep 14, 2016 at 2:56 PM, Kuntal Ghosh
>  wrote:
>> Master
>> ---
>> - If wal_consistency check is enabled or needs_backup is set in
>> XLogRecordAssemble(), we do a fpw.
>> - If a fpw is to be done, then fork_flags is set with BKPBLOCK_HAS_IMAGE,
>> which in turns set has_image flag while decoding the record.
>> - If a fpw needs to be restored during redo, i.e., needs_backup is true,
>> then bimg_info is set with BKPIMAGE_IS_REQUIRED_FOR_REDO.
>
> Here that should be if wal_consistency is true, no?
Nope. I'll try to explain using some pseudo-code:
XLogRecordAssemble()
{

 include_image = needs_backup || wal_consistency[rmid];
 if (include_image)
 {
  
  set XLogRecordBlockHeader.fork_flags |= BKPBLOCK_HAS_IMAGE;
  if (needs_backup)
set XLogRecordBlockImageHeader.bimg_info
  |=BKPIMAGE_IS_REQUIRED_FOR_REDO;
  
 }
 .
}

XLogReadBufferForRedoExtended()
{
 ..
 if (XLogRecHasBlockImage() && XLogRecHasBlockImageForRedo())
 {
  RestoreBlockImage();
  
  return BLK_RESTORED;
 }
 ..
}

checkConsistency()
{
 
 if (wal_consistency[rmid] && !XLogRecHasBlockImage())
  throw error;
 .
}

*XLogRecHasBlockImageForRedo() checks whether bimg_info is set with
 BKPIMAGE_IS_REQUIRED_FOR_REDO.

For a backup image any of the followings is possible:
1. consistency should be checked.
2. page should restored.
3. both 1 and 2.

Consistency check can be controlled by a guc parameter. But, standby
should be conveyed whether an image should be restored. For that, we
have used the new flag.
Suggestions?

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-09-13 Thread Michael Paquier
On Wed, Sep 14, 2016 at 2:56 PM, Kuntal Ghosh
 wrote:
> Master
> ---
> - If wal_consistency check is enabled or needs_backup is set in
> XLogRecordAssemble(), we do a fpw.
> - If a fpw is to be done, then fork_flags is set with BKPBLOCK_HAS_IMAGE,
> which in turns set has_image flag while decoding the record.
> - If a fpw needs to be restored during redo, i.e., needs_backup is true,
> then bimg_info is set with BKPIMAGE_IS_REQUIRED_FOR_REDO.

Here that should be if wal_consistency is true, no?

> Standby
> ---
> - In XLogReadBufferForRedoExtended(), if both XLogRecHasBlockImage() and
> XLogRecHasBlockImageForRedo()(added by me*) return true, we restore the
> backup image.
> - In checkConsistency, we only check if XLogRecHasBlockImage() returns true
> when wal_consistency check is enabled for this rmid.

My guess would have been that you do not need to check anymore for
wal_consistency in checkConsistency, making the GUC value only used on
master node.

> *XLogRecHasBlockImageForRedo() checks whether bimg_info is set with
> BKPIMAGE_IS_REQUIRED_FOR_REDO.

Yes, that's more or less what you should have.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-09-13 Thread Kuntal Ghosh
On Wed, Sep 14, 2016 at 6:34 AM, Michael Paquier
 wrote:
>>>2) Regarding page comparison:
>>>We could just use memcpy() here. compareImages was useful to get a
>>>clear image of what the inconsistencies were, but you don't do that
>>>anymore.
>> memcmp(), right?
>
>Yep :)
If I use memcmp(), I won't get the byte location where the first mismatch
has occurred. It will be helpful to display the byte location which causes
an inconsistency.

>>>6)
>>>+   /*
>>>+* Remember that, if WAL consistency check is enabled for
>>>the current rmid,
>>>+* we always include backup image with the WAL record.
>>>But, during redo we
>>>+* restore the backup block only if needs_backup is set.
>>>+*/
>>>+   if (needs_backup)
>>>+   bimg.bimg_info |= BKPIMAGE_IS_REQUIRED_FOR_REDO;
>>>This should use wal_consistency[rmid]?
>>
>> needs_backup is set when XLogRecordAssemble decides that backup image
>> should be included in the record for redo purpose. This image will be
>> restored during
>> redo. BKPIMAGE_IS_REQUIRED_FOR_REDO indicates whether the included
>> image should be restored during redo(or has_image should be set or not).
>
>When decoding a record, I think that you had better not use has_image
>to assume that a FPW has to be double-checked. This has better be a
>different boolean flag, say check_page or similar. This way after
>decoding a record it is possible to know if there is a PFW, and if a
>check on it is needed or not.
I've done some modifications which discards the necessity of adding
anything in DecodeXLogRecord().

Master
---
- If wal_consistency check is enabled or needs_backup is set in
XLogRecordAssemble(), we do a fpw.
- If a fpw is to be done, then fork_flags is set with BKPBLOCK_HAS_IMAGE,
which in turns set has_image flag while decoding the record.
- If a fpw needs to be restored during redo, i.e., needs_backup is true,
then bimg_info is set with BKPIMAGE_IS_REQUIRED_FOR_REDO.

Standby
---
- In XLogReadBufferForRedoExtended(), if both XLogRecHasBlockImage() and
XLogRecHasBlockImageForRedo()(added by me*) return true, we restore the
backup image.
- In checkConsistency, we only check if XLogRecHasBlockImage() returns true
when wal_consistency check is enabled for this rmid.

*XLogRecHasBlockImageForRedo() checks whether bimg_info is set with
BKPIMAGE_IS_REQUIRED_FOR_REDO.

Thoughts?

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-09-13 Thread Michael Paquier
On Tue, Sep 13, 2016 at 6:07 PM, Kuntal Ghosh
 wrote:
>> For now, I've kept this as a WARNING message to detect all inconsistencies
>> at once. Once, the patch is finalized, I'll modify it as an ERROR message.
>
> Or say FATAL. This way the server is taken down.

What I'd really like to see here is a way to quickly identify in the
buildfarm the moment an inconsistent WAL has been introduced by a new
feature, some bug fix, or perhaps a deficiency in the masking
routines. We could definitely tune that later on, by controlling with
a GUC if this generates a WARNING instead of a FATAL, the former being
more useful for production environments, and the latter for tests. It
would be good to think as well about a set of tests, one rough thing
would be to modify an on-disk page for a table, and work on that to
force an inconsistency to be triggered..

>>A couple of extra thoughts:
>>1) The routines of each rmgr are located in a dedicated file, for
>>example GIN stuff is in ginxlog.c, etc. It seems to me that it would
>>be better to move each masking function where it should be instead
>>being centralized. A couple of routines need to be centralized, so I'd
>>suggest putting them in a new file, like xlogmask.c, xlog because now
>>this is part of WAL replay completely, including the lsn, the hint
>>bint and the other common routines.
> Sounds good. But, I think that the file name for common masking routines
> should be as bufmask.c since we are masking the buffers only.

That makes sense as well. No objections to that.

>>2) Regarding page comparison:
>>We could just use memcpy() here. compareImages was useful to get a
>>clear image of what the inconsistencies were, but you don't do that
>>anymore.
> memcmp(), right?

Yep :)

>>6)
>>+   /*
>>+* Remember that, if WAL consistency check is enabled for
>>the current rmid,
>>+* we always include backup image with the WAL record.
>>But, during redo we
>>+* restore the backup block only if needs_backup is set.
>>+*/
>>+   if (needs_backup)
>>+   bimg.bimg_info |= BKPIMAGE_IS_REQUIRED_FOR_REDO;
>>This should use wal_consistency[rmid]?
>
> needs_backup is set when XLogRecordAssemble decides that backup image
> should be included in the record for redo purpose. This image will be
> restored during
> redo. BKPIMAGE_IS_REQUIRED_FOR_REDO indicates whether the included
> image should be restored during redo(or has_image should be set or not).

When decoding a record, I think that you had better not use has_image
to assume that a FPW has to be double-checked. This has better be a
different boolean flag, say check_page or similar. This way after
decoding a record it is possible to know if there is a PFW, and if a
check on it is needed or not.

>>7) This patch has zero documentation. Please add some. Any human being
>>on this list other than those who worked on the first versions
>>(Heikki, Simon and I?) is going to have a hard time to review this
>>patch in details moving on if there is no reference to tell what this
>>feature does for the user...
>>
>>This patch is going to the good direction, but I don't think it's far
>>from being ready for commit yet. So I am going to mark it as returned
>>with feedback if there are no objections
>
> I think only major change that this patch needs a proper and detailed
> documentation. Other than that there are very minor changes which can
> be done quickly. Right?

It seems to me that you need to think about the way to document things
properly first, with for example:
- Have a first documentation patch that explains what is a resource
manager for WAL, and what are the types available with a nice table.
- Add in your patch documentation to explain what are the benefits of
using this facility, the main purpose is testing, but there are also
mention upthread about users that would like to get that into
production, assuming that the overhead is minimal.
- Add more comments in your code to finish. One example is
checkConsistency() that is here, but explains nothing.

Well, if you'd simply use an on/off switch to control the feature, the
documentation load for rmgrs would be zero, but as I am visibly
outnumbered in this fight... We could also have an off/on switch
implemented first, and extend that later on depending on the feedback
from other users. We discussed rmgr-level or relation-level tuning of
FPW compression at some point, but we've finished with the most simple
approach, and we still stick with it.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-09-13 Thread Kuntal Ghosh
 - If WAL consistency check is enabled for a rmgrID, we always include
 the backup image in the WAL record.
>>>
>>>What happens if wal_consistency has different settings on a standby
>>>and its master? If for example it is set to 'all' on the standby, and
>>>'none' on the master, or vice-versa, how do things react? An update of
>>>this parameter should be WAL-logged, no?
>>
>> If wal_consistency is enabled for a rmid, standby will always check whether
>> backup image exists or not i.e. BKPBLOCK_HAS_IMAGE is set or not.
>> (I guess Amit and Robert also suggested the same in the thread)
>> Basically, BKPBLOCK_HAS_IMAGE is set if a block contains image and
>> BKPIMAGE_IS_REQUIRED_FOR_REDO (I've added this one) is set if that backup
>> image is required during redo. When we decode a wal record, has_image
>> flag of DecodedBkpBlock is set to BKPIMAGE_IS_REQUIRED_FOR_REDO.
>
>Ah I see. But do we actually store the status in the record itself,
>then at replay we don't care of the value of wal_consistency at
>replay. That's the same concept used by wal_compression. So shouldn't
>you have more specific checks related to that in checkConsistency? You
>actually don't need to check for anything in xlogreader.c, just check
>for the consistency if there is a need to do so, or do nothing.
I'm sorry, but I don't quite follow you here. If a wal record contains
an image, has_image should be set since it helps decoding the
record. But, during redo if XLogRecHasBlockImage() returns true, i.e.,
has_image is set, then it always restore the block. But, in our case,
a record can have a backup image which should not be restored. So, we need
to decide two things:
1. Does a record contain backup image? (required for decoding the record)
2. If it has an image, should we restore it during redo?
I think we sould decide these in DecodeXLogRecord() only. BKPBLOCK_HAS_IMAGE
answers the first question whereas BKPIMAGE_IS_REQUIRED_FOR_REDO
answers the second one. In DecodeXLogRecord(), we check that
BKPBLOCK_HAS_IMAGE should be set if wal_consistency is enabled for
this record. The flag has_image is set to
BKPIMAGE_IS_REQUIRED_FOR_REDO which is later used to decide whether we
want to restore a block or not.

> For now, I've kept this as a WARNING message to detect all inconsistencies
> at once. Once, the patch is finalized, I'll modify it as an ERROR message.

Or say FATAL. This way the server is taken down.

> Thoughts?
+1. I'll do that.

>A couple of extra thoughts:
>1) The routines of each rmgr are located in a dedicated file, for
>example GIN stuff is in ginxlog.c, etc. It seems to me that it would
>be better to move each masking function where it should be instead
>being centralized. A couple of routines need to be centralized, so I'd
>suggest putting them in a new file, like xlogmask.c, xlog because now
>this is part of WAL replay completely, including the lsn, the hint
>bint and the other common routines.
Sounds good. But, I think that the file name for common masking routines
should be as bufmask.c since we are masking the buffers only.

>2) Regarding page comparison:
>+/*
>+ * Compare the contents of two pages.
>+ * If the two pages are exactly same, it returns BLCKSZ. Otherwise,
>+ * it returns the location where the first mismatch has occurred.
>+ */
>+int
>+comparePages(char *page1, char *page2)
>We could just use memcpy() here. compareImages was useful to get a
>clear image of what the inconsistencies were, but you don't do that
>anymore.
memcmp(), right?

>5)
>+   has_image = record->blocks[block_id].has_image;
>+   record->blocks[block_id].has_image = true;
>+   if (!RestoreBlockImage(record, block_id, old_page))
>+   elog(ERROR, "failed to restore block image");
>+   record->blocks[block_id].has_image = has_image;
>Er, what? And BKPIMAGE_IS_REQUIRED_FOR_REDO?
Sorry, I completely missed this.

>6)
>+   /*
>+* Remember that, if WAL consistency check is enabled for
>the current rmid,
>+* we always include backup image with the WAL record.
>But, during redo we
>+* restore the backup block only if needs_backup is set.
>+*/
>+   if (needs_backup)
>+   bimg.bimg_info |= BKPIMAGE_IS_REQUIRED_FOR_REDO;
>This should use wal_consistency[rmid]?
needs_backup is set when XLogRecordAssemble decides that backup image
should be included in the record for redo purpose. This image will be
restored during
redo. BKPIMAGE_IS_REQUIRED_FOR_REDO indicates whether the included
image should be restored during redo(or has_image should be set or not).

>7) This patch has zero documentation. Please add some. Any human being
>on this list other than those who worked on the first versions
>(Heikki, Simon and I?) is going to have a hard time to review this
>patch in details moving on if there is no reference to tell what this
>feature does for the user...
>
>This patch is going to the good direction, but I don't think it's far
>fro

Re: [HACKERS] WAL consistency check facility

2016-09-12 Thread Michael Paquier
On Sun, Sep 11, 2016 at 12:03 AM, Robert Haas  wrote:
> It seems entirely unnecessary for the master and the standby to agree
> here.  I think what we need is two GUCs.  One of them, which affects
> only the master, controls whether the validation information is
> including in the WAL, and the other, which affects only the standby,
> affects whether validation is performed when the necessary information
> is present.  Or maybe skip the second one and just decree that
> standbys will always validate if the necessary information is present.
> Using the same GUC on both the master and the standby but making it
> mean different things in each of those places (whether to log the
> validation info in one case, whether to perform validation in the
> other case) is another option that also avoids needing to enforce that
> the setting is the same in both places, but probably an inferior one.

Thinking more about that, there is no actual need to do anything
complicated here. We could just track at the record level if a
consistency check is needs to be done at replay and do it. If nothing
is set, just do nothing. That would allow us to promote this parameter
to SIGHUP. wal_compression does something similar.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-09-12 Thread Michael Paquier
On Mon, Sep 12, 2016 at 5:06 AM, Kuntal Ghosh
 wrote:
>>+   void(*rm_checkConsistency) (XLogReaderState *record);
>>All your _checkConsistency functions share the same pattern, in short
>>they all use a for loop for each block, call each time
>>XLogReadBufferExtended, etc. And this leads to a *lot* of duplication.
>>You would get a reduction by a couple of hundreds of lines by having a
>>smarter refactoring. And to be honest, if I look at your patch what I
>>think is the correct way of doing things is to add to the rmgr not
>>this check consistency function, but just a pointer to the masking
>>function.
>
> +1. In rmgrlist, I've added a pointer to the masking function for each rmid.
> A common function named checkConsistency calls these masking functions
> based on their rmid and does comparison for each block.

The patch size is down from 79kB to 38kB. That gets better :)

>>> - If WAL consistency check is enabled for a rmgrID, we always include
>>> the backup image in the WAL record.
>>
>>What happens if wal_consistency has different settings on a standby
>>and its master? If for example it is set to 'all' on the standby, and
>>'none' on the master, or vice-versa, how do things react? An update of
>>this parameter should be WAL-logged, no?
>
> If wal_consistency is enabled for a rmid, standby will always check whether
> backup image exists or not i.e. BKPBLOCK_HAS_IMAGE is set or not.
> (I guess Amit and Robert also suggested the same in the thread)
> Basically, BKPBLOCK_HAS_IMAGE is set if a block contains image and
> BKPIMAGE_IS_REQUIRED_FOR_REDO (I've added this one) is set if that backup
> image is required during redo. When we decode a wal record, has_image
> flag of DecodedBkpBlock is set to BKPIMAGE_IS_REQUIRED_FOR_REDO.

Ah I see. But do we actually store the status in the record itself,
then at replay we don't care of the value of wal_consistency at
replay. That's the same concept used by wal_compression. So shouldn't
you have more specific checks related to that in checkConsistency? You
actually don't need to check for anything in xlogreader.c, just check
for the consistency if there is a need to do so, or do nothing.

> For now, I've kept this as a WARNING message to detect all inconsistencies
> at once. Once, the patch is finalized, I'll modify it as an ERROR message.

Or say FATAL. This way the server is taken down.

> Thoughts?

A couple of extra thoughts:
1) The routines of each rmgr are located in a dedicated file, for
example GIN stuff is in ginxlog.c, etc. It seems to me that it would
be better to move each masking function where it should be instead
being centralized. A couple of routines need to be centralized, so I'd
suggest putting them in a new file, like xlogmask.c, xlog because now
this is part of WAL replay completely, including the lsn, the hint
bint and the other common routines.

2) Regarding page comparison:
+/*
+ * Compare the contents of two pages.
+ * If the two pages are exactly same, it returns BLCKSZ. Otherwise,
+ * it returns the location where the first mismatch has occurred.
+ */
+int
+comparePages(char *page1, char *page2)
We could just use memcpy() here. compareImages was useful to get a
clear image of what the inconsistencies were, but you don't do that
anymore.

3)
+static void checkConsistency(RmgrId rmid, XLogReaderState *record);
The RMGR if is part of the record decoded, so you could just remove
RmgrId from the list of arguments and simplify this interface.

4) If this patch still goes with the possibility to set up a list of
RMGRs, documentation is needed for that. I'd suggest writing first a
patch to explain what are RMGRs for WAL, then apply the WAL
consistency facility on top of it and link wal_consistency to it.

5)
+   has_image = record->blocks[block_id].has_image;
+   record->blocks[block_id].has_image = true;
+   if (!RestoreBlockImage(record, block_id, old_page))
+   elog(ERROR, "failed to restore block image");
+   record->blocks[block_id].has_image = has_image;
Er, what? And BKPIMAGE_IS_REQUIRED_FOR_REDO?

6)
+   /*
+* Remember that, if WAL consistency check is enabled for
the current rmid,
+* we always include backup image with the WAL record.
But, during redo we
+* restore the backup block only if needs_backup is set.
+*/
+   if (needs_backup)
+   bimg.bimg_info |= BKPIMAGE_IS_REQUIRED_FOR_REDO;
This should use wal_consistency[rmid]?

7) This patch has zero documentation. Please add some. Any human being
on this list other than those who worked on the first versions
(Heikki, Simon and I?) is going to have a hard time to review this
patch in details moving on if there is no reference to tell what this
feature does for the user...

This patch is going to the good direction, but I don't think it's far
from being ready for commit yet. So I am going to mark it as returned
with feedback if there are no objections.
-- 
Michae

Re: [HACKERS] WAL consistency check facility

2016-09-11 Thread Kuntal Ghosh
Hello,

Based on the previous discussions, I've modified the existing patch.

>+   void(*rm_checkConsistency) (XLogReaderState *record);
>All your _checkConsistency functions share the same pattern, in short
>they all use a for loop for each block, call each time
>XLogReadBufferExtended, etc. And this leads to a *lot* of duplication.
>You would get a reduction by a couple of hundreds of lines by having a
>smarter refactoring. And to be honest, if I look at your patch what I
>think is the correct way of doing things is to add to the rmgr not
>this check consistency function, but just a pointer to the masking
>function.
+1. In rmgrlist, I've added a pointer to the masking function for each rmid.
A common function named checkConsistency calls these masking functions
based on their rmid and does comparison for each block.

>> - If WAL consistency check is enabled for a rmgrID, we always include
>> the backup image in the WAL record.
>
>What happens if wal_consistency has different settings on a standby
>and its master? If for example it is set to 'all' on the standby, and
>'none' on the master, or vice-versa, how do things react? An update of
>this parameter should be WAL-logged, no?
If wal_consistency is enabled for a rmid, standby will always check whether
backup image exists or not i.e. BKPBLOCK_HAS_IMAGE is set or not.
(I guess Amit and Robert also suggested the same in the thread)
Basically, BKPBLOCK_HAS_IMAGE is set if a block contains image and
BKPIMAGE_IS_REQUIRED_FOR_REDO (I've added this one) is set if that backup
image is required during redo. When we decode a wal record, has_image
flag of DecodedBkpBlock is set to BKPIMAGE_IS_REQUIRED_FOR_REDO.

>+   if (pg_strcasecmp(tok, "Heap2") == 0)
>+   {
>+   newwalconsistency[RM_HEAP2_ID] = true;
>+   }
>Thinking more about it, I guess that we had better change the
>definition list of rmgrs in rmgr.h and get something closer to
>RmgrDescData that pg_xlogdump has to avoid all this stanza by
>completing it with the name of the rmgr. The only special cases that
>this code path would need to take care of would be then 'none' and
>'all'. You could do this refactoring on top of the main patch to
>simplify it as it is rather big (1.7k lines).
I've modified it exactly like pg_xlogdump does. Additionally, it checks
whether masking function is defined for the rmid or not. Hence, in future,
if we want to include any other rmid for wal consistency check, we just need
to define its masking function.

>> - In recovery tests (src/test/recovery/t), I've added wal_consistency
>> parameter in the existing scripts. This feature doesn't change the
>> expected output. If there is any inconsistency, it can be verified in
>> corresponding log file.
>
>I am afraid that just generating a WARNING message is going to be
>useless for the buildfarm. If we want to detect errors, we could for
>example have an additional GUC to trigger an ERROR or a FATAL, taking
>down the cluster, and allowing things to show in red on a platform.
For now, I've kept this as a WARNING message to detect all inconsistencies
at once. Once, the patch is finalized, I'll modify it as an ERROR message.

Thoughts?


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 9bb1362..3ca64d1 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -26,12 +26,13 @@
 #include "commands/tablespace.h"
 #include "replication/message.h"
 #include "replication/origin.h"
+#include "storage/bufmask.h"
 #include "storage/standby.h"
 #include "utils/relmapper.h"
 
 /* must be kept in sync with RmgrData definition in xlog_internal.h */
-#define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup) \
-	{ name, redo, desc, identify, startup, cleanup },
+#define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,maskPage) \
+	{ name, redo, desc, identify, startup, cleanup, maskPage },
 
 const RmgrData RmgrTable[RM_MAX_ID + 1] = {
 #include "access/rmgrlist.h"
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 2189c22..77e79f5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -25,6 +25,7 @@
 #include "access/commit_ts.h"
 #include "access/multixact.h"
 #include "access/rewriteheap.h"
+#include "access/rmgr.h"
 #include "access/subtrans.h"
 #include "access/timeline.h"
 #include "access/transam.h"
@@ -95,6 +96,8 @@ bool		EnableHotStandby = false;
 bool		fullPageWrites = true;
 bool		wal_log_hints = false;
 bool		wal_compression = false;
+char		*wal_consistency_string = NULL;
+bool		*wal_consistency = NULL;
 bool		log_checkpoints = false;
 int			sync_method = DEFAULT_SYNC_METHOD;
 int			wal_level = WAL_LEVEL_MINIMAL;
@@ -870,6 +873,7 @@ static void WALInsertLockAcquireExclusive(void);
 static void WALInsertLockRelease(void);
 static void WALInsertLockUpdateInsertingAt(XLogRecPtr ins

Re: [HACKERS] WAL consistency check facility

2016-09-10 Thread Amit Kapila
On Sat, Sep 10, 2016 at 8:33 PM, Robert Haas  wrote:
> On Sat, Sep 10, 2016 at 3:19 AM, Michael Paquier
>  wrote:
>> On Fri, Sep 9, 2016 at 4:01 PM, Kuntal Ghosh  
>> wrote:
> - If WAL consistency check is enabled for a rmgrID, we always include
> the backup image in the WAL record.

 What happens if wal_consistency has different settings on a standby
 and its master? If for example it is set to 'all' on the standby, and
 'none' on the master, or vice-versa, how do things react? An update of
 this parameter should be WAL-logged, no?

>>> It is possible to set wal_consistency to 'All' in master and any other
>>> values in standby. But, the scenario you mentioned will cause error in
>>> standby since it may not get the required backup image for wal
>>> consistency check. I think that user should be responsible to set
>>> this value correctly. We can improve the error message to make the
>>> user aware of the situation.
>>
>> Let's be careful here. You should as well consider things from the
>> angle that some parameter updates are WAL-logged as well, like
>> wal_level with the WAL record XLOG_PARAMETER_CHANGE.
>
> It seems entirely unnecessary for the master and the standby to agree
> here.  I think what we need is two GUCs.  One of them, which affects
> only the master, controls whether the validation information is
> including in the WAL, and the other, which affects only the standby,
> affects whether validation is performed when the necessary information
> is present.
>

I think from the clarity perspective, this option sounds good, but I
am slightly afraid that it might be inconvenient for users to set the
different values for these two parameters.

>  Or maybe skip the second one and just decree that
> standbys will always validate if the necessary information is present.

Sounds like a better alternative and probably easier to configure for users.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-09-10 Thread Amit Kapila
On Sat, Sep 10, 2016 at 12:49 PM, Michael Paquier
 wrote:
> On Fri, Sep 9, 2016 at 4:01 PM, Kuntal Ghosh  
> wrote:
>
 - In recovery tests (src/test/recovery/t), I've added wal_consistency
 parameter in the existing scripts. This feature doesn't change the
 expected output. If there is any inconsistency, it can be verified in
 corresponding log file.
>>>
>>> I am afraid that just generating a WARNING message is going to be
>>> useless for the buildfarm. If we want to detect errors, we could for
>>> example have an additional GUC to trigger an ERROR or a FATAL, taking
>>> down the cluster, and allowing things to show in red on a platform.
>>>
>> Yes, we can include an additional GUC to trigger an ERROR for any 
>> inconsistency.
>
> I'd like to hear extra opinions about that, but IMO just having an
> ERROR would be fine for the first implementation. Once you've bumped
> into an ERROR, you are likely going to fix it first.
>

+1 for just an ERROR to detect the inconsistency.  I think adding
additional GUC just to raise error level doesn't seem to be advisable.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-09-10 Thread Robert Haas
On Sat, Sep 10, 2016 at 3:19 AM, Michael Paquier
 wrote:
> On Fri, Sep 9, 2016 at 4:01 PM, Kuntal Ghosh  
> wrote:
 - If WAL consistency check is enabled for a rmgrID, we always include
 the backup image in the WAL record.
>>>
>>> What happens if wal_consistency has different settings on a standby
>>> and its master? If for example it is set to 'all' on the standby, and
>>> 'none' on the master, or vice-versa, how do things react? An update of
>>> this parameter should be WAL-logged, no?
>>>
>> It is possible to set wal_consistency to 'All' in master and any other
>> values in standby. But, the scenario you mentioned will cause error in
>> standby since it may not get the required backup image for wal
>> consistency check. I think that user should be responsible to set
>> this value correctly. We can improve the error message to make the
>> user aware of the situation.
>
> Let's be careful here. You should as well consider things from the
> angle that some parameter updates are WAL-logged as well, like
> wal_level with the WAL record XLOG_PARAMETER_CHANGE.

It seems entirely unnecessary for the master and the standby to agree
here.  I think what we need is two GUCs.  One of them, which affects
only the master, controls whether the validation information is
including in the WAL, and the other, which affects only the standby,
affects whether validation is performed when the necessary information
is present.  Or maybe skip the second one and just decree that
standbys will always validate if the necessary information is present.
Using the same GUC on both the master and the standby but making it
mean different things in each of those places (whether to log the
validation info in one case, whether to perform validation in the
other case) is another option that also avoids needing to enforce that
the setting is the same in both places, but probably an inferior one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-09-10 Thread Michael Paquier
On Fri, Sep 9, 2016 at 4:01 PM, Kuntal Ghosh  wrote:
>>> - If WAL consistency check is enabled for a rmgrID, we always include
>>> the backup image in the WAL record.
>>
>> What happens if wal_consistency has different settings on a standby
>> and its master? If for example it is set to 'all' on the standby, and
>> 'none' on the master, or vice-versa, how do things react? An update of
>> this parameter should be WAL-logged, no?
>>
> It is possible to set wal_consistency to 'All' in master and any other
> values in standby. But, the scenario you mentioned will cause error in
> standby since it may not get the required backup image for wal
> consistency check. I think that user should be responsible to set
> this value correctly. We can improve the error message to make the
> user aware of the situation.

Let's be careful here. You should as well consider things from the
angle that some parameter updates are WAL-logged as well, like
wal_level with the WAL record XLOG_PARAMETER_CHANGE.

>>> - In recovery tests (src/test/recovery/t), I've added wal_consistency
>>> parameter in the existing scripts. This feature doesn't change the
>>> expected output. If there is any inconsistency, it can be verified in
>>> corresponding log file.
>>
>> I am afraid that just generating a WARNING message is going to be
>> useless for the buildfarm. If we want to detect errors, we could for
>> example have an additional GUC to trigger an ERROR or a FATAL, taking
>> down the cluster, and allowing things to show in red on a platform.
>>
> Yes, we can include an additional GUC to trigger an ERROR for any 
> inconsistency.

I'd like to hear extra opinions about that, but IMO just having an
ERROR would be fine for the first implementation. Once you've bumped
into an ERROR, you are likely going to fix it first.

>> +   /*
>> +* Followings are the rmids which can have backup blocks.
>> +* We'll enable this feature only for these rmids.
>> +*/
>> +   newwalconsistency[RM_HEAP2_ID] = true;
>> +   newwalconsistency[RM_HEAP_ID] = true;
>> +   newwalconsistency[RM_BTREE_ID] = true;
>> +   newwalconsistency[RM_HASH_ID] = true;
>> +   newwalconsistency[RM_GIN_ID] = true;
>> +   newwalconsistency[RM_GIST_ID] = true;
>> +   newwalconsistency[RM_SEQ_ID] = true;
>> +   newwalconsistency[RM_SPGIST_ID] = true;
>> +   newwalconsistency[RM_BRIN_ID] = true;
>> +   newwalconsistency[RM_GENERIC_ID] = true;
>> +   newwalconsistency[RM_XLOG_ID] = true;
>> Here you can just use MemSet with RM_MAX_ID and simplify this code 
>> maintenance.
>>
> Not all rmids can have backup blocks. So, for wal_consistency = 'all',
> I've enabled only those rmids which can have backup blocks.

Even if some rmgrs do not support FPWs, I don't think that it is safe
to assume that the existing ones would never support it. Imagine for
example that feature X is implemented. Feature X adds rmgs Y, but rmgr
Y does not use FPWs. At a later point a new feature is added, which
makes rmgr Y using FPWs. We'd increase the number of places to update
with your patch, increasing the likelyness to introduce bugs. It would
be better to use a safe implementation from the maintenance point of
view to be honest (maintenance load of masking functions is somewhat
leveraged by the fact that on-disk format is kept compatible).

>>
>> +   if (pg_strcasecmp(tok, "Heap2") == 0)
>> +   {
>> +   newwalconsistency[RM_HEAP2_ID] = true;
>> +   }
>> Thinking more about it, I guess that we had better change the
>> definition list of rmgrs in rmgr.h and get something closer to
>> RmgrDescData that pg_xlogdump has to avoid all this stanza by
>> completing it with the name of the rmgr. The only special cases that
>> this code path would need to take care of would be then 'none' and
>> 'all'. You could do this refactoring on top of the main patch to
>> simplify it as it is rather big (1.7k lines).
>>
> I'm not sure about this point. wal_consistency doesn't support  all
> the rmids. We should have some way to check this.

I'd rather see this code done in such a way that all the rmgrs can be
handled, this approach being particularly attractive for the fact that
there is no need to change it if new rmgrs are added in the future.
(This was a reason as well why I still think that a simple on/off
switch would be plain enough, users have mostly control of the SQLs
triggering WAL. And if you run tests, you'll likely have the mind to
turn autovacuum to off to avoid it to generate FPWs and pollute the
logs at least at the second run of your tests).

And if you move forward with the approach of making this parameter a
list, I think that it would be better to add a section in the WAL
documentation about resource managers, like what they are, and list
them in this section of the docs. Then your parameter could link to
this documentation part and users would be able to see what k

Re: [HACKERS] WAL consistency check facility

2016-09-09 Thread Kuntal Ghosh
Hello Michael,

Thanks for your detailed review.

>> - If WAL consistency check is enabled for a rmgrID, we always include
>> the backup image in the WAL record.
>
> What happens if wal_consistency has different settings on a standby
> and its master? If for example it is set to 'all' on the standby, and
> 'none' on the master, or vice-versa, how do things react? An update of
> this parameter should be WAL-logged, no?
>
It is possible to set wal_consistency to 'All' in master and any other
values in standby. But, the scenario you mentioned will cause error in
standby since it may not get the required backup image for wal
consistency check. I think that user should be responsible to set
this value correctly. We can improve the error message to make the
user aware of the situation.

>> - I've extended the RmgrTable with a new function pointer
>> rm_checkConsistency, which is called after rm_redo. (only when WAL
>> consistency check is enabled for this rmgrID)
>> - In each rm_checkConsistency, both backup pages and buffer pages are
>> masked accordingly before any comparison.
>
> This leads to heavy code duplication...
>
> +   void(*rm_checkConsistency) (XLogReaderState *record);
> All your _checkConsistency functions share the same pattern, in short
> they all use a for loop for each block, call each time
> XLogReadBufferExtended, etc. And this leads to a *lot* of duplication.
> You would get a reduction by a couple of hundreds of lines by having a
> smarter refactoring. And to be honest, if I look at your patch what I
> think is the correct way of doing things is to add to the rmgr not
> this check consistency function, but just a pointer to the masking
> function.
Pointer to the masking function will certainly reduce a lot of redundant
code. I'll modify it accordingly.

>> - In recovery tests (src/test/recovery/t), I've added wal_consistency
>> parameter in the existing scripts. This feature doesn't change the
>> expected output. If there is any inconsistency, it can be verified in
>> corresponding log file.
>
> I am afraid that just generating a WARNING message is going to be
> useless for the buildfarm. If we want to detect errors, we could for
> example have an additional GUC to trigger an ERROR or a FATAL, taking
> down the cluster, and allowing things to show in red on a platform.
>
Yes, we can include an additional GUC to trigger an ERROR for any inconsistency.

>> Results
>> 
>>
>> I've tested with installcheck and installcheck-world in master-standby
>> set-up. Followings are the configuration parameters.
>
> So you tested as well the recovery tests, right?
>
Yes, I've done the recovery tests after enabling tap-test.

>
> +   /*
> +* Followings are the rmids which can have backup blocks.
> +* We'll enable this feature only for these rmids.
> +*/
> +   newwalconsistency[RM_HEAP2_ID] = true;
> +   newwalconsistency[RM_HEAP_ID] = true;
> +   newwalconsistency[RM_BTREE_ID] = true;
> +   newwalconsistency[RM_HASH_ID] = true;
> +   newwalconsistency[RM_GIN_ID] = true;
> +   newwalconsistency[RM_GIST_ID] = true;
> +   newwalconsistency[RM_SEQ_ID] = true;
> +   newwalconsistency[RM_SPGIST_ID] = true;
> +   newwalconsistency[RM_BRIN_ID] = true;
> +   newwalconsistency[RM_GENERIC_ID] = true;
> +   newwalconsistency[RM_XLOG_ID] = true;
> Here you can just use MemSet with RM_MAX_ID and simplify this code 
> maintenance.
>
Not all rmids can have backup blocks. So, for wal_consistency = 'all',
I've enabled only those rmids which can have backup blocks.

>
> +   if (pg_strcasecmp(tok, "Heap2") == 0)
> +   {
> +   newwalconsistency[RM_HEAP2_ID] = true;
> +   }
> Thinking more about it, I guess that we had better change the
> definition list of rmgrs in rmgr.h and get something closer to
> RmgrDescData that pg_xlogdump has to avoid all this stanza by
> completing it with the name of the rmgr. The only special cases that
> this code path would need to take care of would be then 'none' and
> 'all'. You could do this refactoring on top of the main patch to
> simplify it as it is rather big (1.7k lines).
>
I'm not sure about this point. wal_consistency doesn't support  all
the rmids. We should have some way to check this.

I'll update rest of the things as mentioned by you accordingly.

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-09-08 Thread Michael Paquier
On Wed, Sep 7, 2016 at 7:22 PM, Kuntal Ghosh  wrote:
> Hello,

Could you avoid top-posting please? More reference here:
http://www.idallen.com/topposting.html

> - If WAL consistency check is enabled for a rmgrID, we always include
> the backup image in the WAL record.

What happens if wal_consistency has different settings on a standby
and its master? If for example it is set to 'all' on the standby, and
'none' on the master, or vice-versa, how do things react? An update of
this parameter should be WAL-logged, no?

> - I've extended the RmgrTable with a new function pointer
> rm_checkConsistency, which is called after rm_redo. (only when WAL
> consistency check is enabled for this rmgrID)
> - In each rm_checkConsistency, both backup pages and buffer pages are
> masked accordingly before any comparison.

This leads to heavy code duplication...

> - In postgresql.conf, a new guc variable named 'wal_consistency' is
> added. Default value of this variable is 'None'. Valid values are
> combinations of Heap2, Heap, Btree, Hash, Gin, Gist, Sequence, SPGist,
> BRIN, Generic and XLOG. It can also be set to 'All' to enable all the
> values.

Lower-case is the usual policy for parameter values for GUC parameters.

> - In recovery tests (src/test/recovery/t), I've added wal_consistency
> parameter in the existing scripts. This feature doesn't change the
> expected output. If there is any inconsistency, it can be verified in
> corresponding log file.

I am afraid that just generating a WARNING message is going to be
useless for the buildfarm. If we want to detect errors, we could for
example have an additional GUC to trigger an ERROR or a FATAL, taking
down the cluster, and allowing things to show in red on a platform.

> Results
> 
>
> I've tested with installcheck and installcheck-world in master-standby
> set-up. Followings are the configuration parameters.

So you tested as well the recovery tests, right?

> I got two types of inconsistencies as following:
>
> 1. For Btree/UNLINK_PAGE_META, btpo_flags are different. In backup
> page, BTP_DELETED and BTP_LEAF both the flags are set, whereas after
> redo, only BTP_DELETED flag is set in buffer page. I assume that we
> should clear all btpo_flags before setting BTP_DELETED in
> _bt_unlink_halfdead_page().

The page is deleted, it does not matter, so you could just mask all
the flags for a deleted page...
[...]
+   /*
+* Mask everything on a DELETED page.
+*/
+   if (((BTPageOpaque) PageGetSpecialPointer(page_norm))->btpo_flags
& BTP_DELETED)
And that's what is happening.

> 2. For BRIN/UPDATE+INIT, block numbers (in rm_tid[0]) are different in
> REVMAP page. This happens only for two cases. I'm not sure what the
> reason can be.

Hm? This smells like a block reference bug. What are the cases you are
referring to?

> I haven't done sufficient tests yet to measure the overhead of this
> modification. I'll do that next.

I did a first pass on your patch, and I think that things could be
really reduced. There is much code duplication, but see below for the
details..

 #include "access/xlogutils.h"
-
+#include "storage/bufmask.h"
I know that I am a noisy one on the matter, but please double-check
for such useless noise in your patch. And there is not only one.

+   newwalconsistency = (bool *) guc_malloc(ERROR,(RM_MAX_ID + 1)*sizeof(bool));
This spacing is not project-style. You may want to go through that:
https://www.postgresql.org/docs/devel/static/source.html

+$node_master->append_conf(
+   'postgresql.conf', qq(
+wal_consistency = 'All'
+));
Instead of duplicating that 7 times, you could just do it once in the
init() method of PostgresNode.pm. This really has meaning if enabled
by default.

+   /*
+* Followings are the rmids which can have backup blocks.
+* We'll enable this feature only for these rmids.
+*/
+   newwalconsistency[RM_HEAP2_ID] = true;
+   newwalconsistency[RM_HEAP_ID] = true;
+   newwalconsistency[RM_BTREE_ID] = true;
+   newwalconsistency[RM_HASH_ID] = true;
+   newwalconsistency[RM_GIN_ID] = true;
+   newwalconsistency[RM_GIST_ID] = true;
+   newwalconsistency[RM_SEQ_ID] = true;
+   newwalconsistency[RM_SPGIST_ID] = true;
+   newwalconsistency[RM_BRIN_ID] = true;
+   newwalconsistency[RM_GENERIC_ID] = true;
+   newwalconsistency[RM_XLOG_ID] = true;
Here you can just use MemSet with RM_MAX_ID and simplify this code maintenance.

+   for(i = 0; i < RM_MAX_ID + 1 ; i++)
+   newwalconsistency[i] = false;
+   break;
Same here you can just use MemSet.

+   else if (pg_strcasecmp(tok, "NONE") == 0)
[...]
+   else if (pg_strcasecmp(tok, "ALL") == 0)
It seems to me that using NONE or ALL with any other keywords should
not be allowed.

+   if (pg_strcasecmp(tok, "Heap2") == 0)
+   {
+   newwalconsistency[RM_HEAP2_ID] = true;
+   }
Thinking mo

Re: [HACKERS] WAL consistency check facility

2016-09-07 Thread Amit Kapila
On Wed, Sep 7, 2016 at 3:52 PM, Kuntal Ghosh  wrote:
>
> I got two types of inconsistencies as following:
>
> 1. For Btree/UNLINK_PAGE_META, btpo_flags are different. In backup
> page, BTP_DELETED and BTP_LEAF both the flags are set, whereas after
> redo, only BTP_DELETED flag is set in buffer page.
>

I see that inconsistency in code as well.  I think this is harmless,
because after the page is marked as deleted, it is not used for any
purpose other than to recycle it for re-use.  After re-using it, the
caller always suppose to initialize the flags based on it's usage and
I see that is happening in the code unless I am missing something.

> I assume that we
> should clear all btpo_flags before setting BTP_DELETED in
> _bt_unlink_halfdead_page().
>

Yeah, we can do that for consistency.  If we see any problem in doing
so, then I think we can log the flags and set them during replay.


Note - Please post your replies inline rather than top posting them.
It breaks the discussion link, if you top post it.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-09-07 Thread Kuntal Ghosh
On Wed, Sep 7, 2016 at 3:52 PM, Kuntal Ghosh  wrote:
> Hello,
>
> As per the earlier discussions, I've attached the updated patch for
> WAL consistency check feature. This is how the patch works:
>

The earlier patch (wal_consistency_v6.patch) was based on the commit
id 67e1e2aaff.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-09-07 Thread Kuntal Ghosh
Hello,

As per the earlier discussions, I've attached the updated patch for
WAL consistency check feature. This is how the patch works:

- If WAL consistency check is enabled for a rmgrID, we always include
the backup image in the WAL record.
- I've extended the RmgrTable with a new function pointer
rm_checkConsistency, which is called after rm_redo. (only when WAL
consistency check is enabled for this rmgrID)
- In each rm_checkConsistency, both backup pages and buffer pages are
masked accordingly before any comparison.
- In postgresql.conf, a new guc variable named 'wal_consistency' is
added. Default value of this variable is 'None'. Valid values are
combinations of Heap2, Heap, Btree, Hash, Gin, Gist, Sequence, SPGist,
BRIN, Generic and XLOG. It can also be set to 'All' to enable all the
values.
- In recovery tests (src/test/recovery/t), I've added wal_consistency
parameter in the existing scripts. This feature doesn't change the
expected output. If there is any inconsistency, it can be verified in
corresponding log file.

Results


I've tested with installcheck and installcheck-world in master-standby
set-up. Followings are the configuration parameters.

Master:
wal_level = replica
max_wal_senders = 3
wal_keep_segments = 4000
hot_standby = on
wal_consistency = 'All'

Standby:
wal_consistency = 'All'

I got two types of inconsistencies as following:

1. For Btree/UNLINK_PAGE_META, btpo_flags are different. In backup
page, BTP_DELETED and BTP_LEAF both the flags are set, whereas after
redo, only BTP_DELETED flag is set in buffer page. I assume that we
should clear all btpo_flags before setting BTP_DELETED in
_bt_unlink_halfdead_page().

2. For BRIN/UPDATE+INIT, block numbers (in rm_tid[0]) are different in
REVMAP page. This happens only for two cases. I'm not sure what the
reason can be.

I haven't done sufficient tests yet to measure the overhead of this
modification. I'll do that next.


Thanks to Amit Kapila, Dilip Kumar and Robert Haas for their off-line
suggestions.

Thoughts?

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

On Thu, Sep 1, 2016 at 11:34 PM, Peter Geoghegan  wrote:
> On Thu, Sep 1, 2016 at 9:23 AM, Robert Haas  wrote:
>> Indeed, it had occurred to me that we might not even want to compile
>> this code into the server unless WAL_DEBUG is defined; after all, how
>> does it help a regular user to detect that the server has a bug?  Bug
>> or no bug, that's the code they've got.  But on further reflection, it
>> seems like it could be useful: if we suspect a bug in the redo code
>> but we can't reproduce it here, we could ask the customer to turn this
>> option on to see whether it produces logging indicating the nature of
>> the problem.  However, because of the likely expensive of enabling the
>> feature, it seems like it would be quite desirable to limit the
>> expense of generating many extra FPWs to the affected rmgr.  For
>> example, if a user has a table with a btree index and a gin index, and
>> we suspect a bug in GIN, it would be nice for the user to be able to
>> enable the feature *only for GIN* rather than paying the cost of
>> enabling it for btree and heap as well.[2]
>
> Yes, that would be rather a large advantage.
>
> I think that there really is no hard distinction between users and
> hackers. Some people will want to run this in production, and it would
> be a lot better if performance was at least not atrocious. If amcheck
> couldn't do the majority of its verification with only an
> AccessShareLock, then users probably just couldn't use it. Heroku
> wouldn't have been able to use it on all production databases. It
> wouldn't have mattered that the verification was no less effective,
> since the bugs it found would simply never have been observed in
> practice.
>
> --
> Peter Geoghegan



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com
diff --git a/src/backend/access/brin/brin_xlog.c b/src/backend/access/brin/brin_xlog.c
index 27ba0a9..4c63ded 100644
--- a/src/backend/access/brin/brin_xlog.c
+++ b/src/backend/access/brin/brin_xlog.c
@@ -14,7 +14,7 @@
 #include "access/brin_pageops.h"
 #include "access/brin_xlog.h"
 #include "access/xlogutils.h"
-
+#include "storage/bufmask.h"
 
 /*
  * xlog replay routines
@@ -286,3 +286,84 @@ brin_redo(XLogReaderState *record)
 			elog(PANIC, "brin_redo: unknown op code %u", info);
 	}
 }
+
+/*
+ * It checks whether the current buffer page and backup page stored in the
+ * WAL record are consistent or not. Before comparing the two pages, it applies
+ * appropiate masking to the pages to ignore certain areas like hint bits,
+ * unused space between pd_lower and pd_upper etc. For more information about
+ * masking, see the masking function.
+ * This function should be called once WAL replay has been completed.
+ */
+void
+brin_checkConsistency(XLogReaderState *record)
+{
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+	int block_id;
+	RelFileNode

Re: [HACKERS] WAL consistency check facility

2016-09-01 Thread Peter Geoghegan
On Thu, Sep 1, 2016 at 9:23 AM, Robert Haas  wrote:
> Indeed, it had occurred to me that we might not even want to compile
> this code into the server unless WAL_DEBUG is defined; after all, how
> does it help a regular user to detect that the server has a bug?  Bug
> or no bug, that's the code they've got.  But on further reflection, it
> seems like it could be useful: if we suspect a bug in the redo code
> but we can't reproduce it here, we could ask the customer to turn this
> option on to see whether it produces logging indicating the nature of
> the problem.  However, because of the likely expensive of enabling the
> feature, it seems like it would be quite desirable to limit the
> expense of generating many extra FPWs to the affected rmgr.  For
> example, if a user has a table with a btree index and a gin index, and
> we suspect a bug in GIN, it would be nice for the user to be able to
> enable the feature *only for GIN* rather than paying the cost of
> enabling it for btree and heap as well.[2]

Yes, that would be rather a large advantage.

I think that there really is no hard distinction between users and
hackers. Some people will want to run this in production, and it would
be a lot better if performance was at least not atrocious. If amcheck
couldn't do the majority of its verification with only an
AccessShareLock, then users probably just couldn't use it. Heroku
wouldn't have been able to use it on all production databases. It
wouldn't have mattered that the verification was no less effective,
since the bugs it found would simply never have been observed in
practice.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-09-01 Thread Simon Riggs
On 1 September 2016 at 17:23, Robert Haas  wrote:

> The primary audience of this feature is PostgreSQL developers

I have spoken to users who are waiting for this feature to run in
production, which is why I suggested it.

Some people care more about correctness than they do about loss of performance.

Obviously, this would be expensive and those with a super high
performance requirement may not be able to take advantage of this. I'm
sure many people will turn it off once if they hit a performance
issue, but running it in production for the first few months will give
people a very safe feeling.

I think the primary use for an rmgr filter might well be PostgreSQL developers.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-09-01 Thread Robert Haas
On Thu, Sep 1, 2016 at 4:12 PM, Simon Riggs  wrote:
> So in my current understanding, a hinted change has by definition no
> WAL record, so we just ship a FPW.

Hmm.  An FPW would have to be contained in a WAL record, so it can't
be right to say that we ship an FPW for lack of a WAL record.  I think
what we ship is nothing at all when wal_log_hints is disabled, and
when wal_log_hints is enabled we log an FDW once per checkpoint.

> A non-hint change has a WAL record
> and it is my (possibly naive) hope that all changes to a page are
> reflected in the WAL record/replay,

I hope for this, too.

> so we can just make a simple
> comparison without caring what is the rmgr of the WAL record.

Sure, that is 100% possible, and likely a good idea as far as the
behavior on the standby is concerned.  What's not so clear is whether
a simple on/off switch is a wise plan on the master.

The purpose of this code, as I understand it, is to check for
discrepancies between "do" and "redo"; that is, to verify that the
changes made to the buffer at the time the WAL record is generated
produce the same result as replay of that WAL record on the standby.
To accomplish this purpose, a post-image of the affected buffers is
included in each and every WAL record.  On replay, that post-image can
be compared with the result of replay.  If they differ, PostgreSQL has
a bug.  I would not expect many users to run this in production,
because it will presumably be wicked expensive.  If I recall
correctly, doing full page writes once per buffer per checkpoint, the
space taken up by FPWs is >75% of WAL volume.  Doing it for every
record will be exponentially more expensive.  The primary audience of
this feature is PostgreSQL developers, who might want to use it to try
to verify that, for example, Amit's patch to add write-ahead logging
for hash indexes does not have bugs.[1]

Indeed, it had occurred to me that we might not even want to compile
this code into the server unless WAL_DEBUG is defined; after all, how
does it help a regular user to detect that the server has a bug?  Bug
or no bug, that's the code they've got.  But on further reflection, it
seems like it could be useful: if we suspect a bug in the redo code
but we can't reproduce it here, we could ask the customer to turn this
option on to see whether it produces logging indicating the nature of
the problem.  However, because of the likely expensive of enabling the
feature, it seems like it would be quite desirable to limit the
expense of generating many extra FPWs to the affected rmgr.  For
example, if a user has a table with a btree index and a gin index, and
we suspect a bug in GIN, it would be nice for the user to be able to
enable the feature *only for GIN* rather than paying the cost of
enabling it for btree and heap as well.[2]

Similarly, when we imagine a developer using this feature to test for
bugs, it may at times be useful to enable it across-the-board to look
for bugs in any aspect of the write-ahead logging system. However, at
other times, when the goal is to find bugs in a particular AM, it
might be useful to enable it only for the corresponding rmgr.  It is
altogether likely that this feature will slow the system down quite a
lot.  If enabling this feature for hash indexes also means enabling it
for the heap, the incremental performance hit might be sufficient to
mask concurrency-related bugs in the hash index code that would
otherwise have been found.  So, I think having at least some basic
filtering is probably a pretty smart idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] It probably has bugs.
[2] One could of course add filtering considerably more complex than
per-rmgr - e.g. enabling it for only one particular relfilenode on a
busy production system might be rather desirable.  But I'm not sure we
really want to go there.  It adds a fair amount of complexity to a
feature that many people are obviously hoping will be quite simple to
use.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-09-01 Thread Simon Riggs
On 1 September 2016 at 11:16, Robert Haas  wrote:
> On Wed, Aug 31, 2016 at 7:02 PM, Simon Riggs  wrote:
>> I'd prefer a solution that was not dependent upon RmgrID at all.
>>
>> If there are various special cases that we need to cater for, ISTM
>> they would be flaws in the existing WAL implementation rather than
>> anything we would want to perpetuate. I hope we'll spend time fixing
>> them rather than add loads of weird code to work around the
>> imperfections.
>>
>> Underdocumented special case code is going to be unbelievably
>> difficult to get right in the long term.
>
> It seems to me that you may be conflating the issue of which changes
> should be masked out as hints (which is, indeed, special case code,
> whether underdocumented or not) with the issue of which rmgrs the user
> may want to verify (which is just a case of matching the rmgr ID in
> the WAL record against a list provided by the user, and is not special
> case code at all).

Yep, it seems entirely likely that I am misunderstanding what is
happening here. I'd like to see an analysis/discussion before we write
code. As you might expect, I'm excited by this feature and the
discoveries it appears likely to bring.

We've got wal_log_hints and that causes lots of extra traffic. I'm
happy with assuming that is switched on in this case also. (Perhaps we
might have a wal_log_level with various levels of logging.)

So in my current understanding, a hinted change has by definition no
WAL record, so we just ship a FPW. A non-hint change has a WAL record
and it is my (possibly naive) hope that all changes to a page are
reflected in the WAL record/replay, so we can just make a simple
comparison without caring what is the rmgr of the WAL record.

If we can start by discussing which special cases we know about that
require extra code, that will help. We can then decide whether to fix
the WAL record/replay or fix the comparison logic, possibly on a case
by case basis. My current preference is to generate lots of small
fixes to existing WAL code and then have a very, very simple patch for
this actual feature, but am willing to discuss alternate approaches.

IMV this would be a feature certain users would want turned on all the
time for everything. So I'm not bothered much about making this
feature settable by rmgr. I might question why this particular feature
would be settable by rmgr, when features like wal_log_hints and
wal_compression are not, but such discussion is a minor point in
comparison to discussing the main feature.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-09-01 Thread Robert Haas
On Wed, Aug 31, 2016 at 7:02 PM, Simon Riggs  wrote:
> I'd prefer a solution that was not dependent upon RmgrID at all.
>
> If there are various special cases that we need to cater for, ISTM
> they would be flaws in the existing WAL implementation rather than
> anything we would want to perpetuate. I hope we'll spend time fixing
> them rather than add loads of weird code to work around the
> imperfections.
>
> Underdocumented special case code is going to be unbelievably
> difficult to get right in the long term.

It seems to me that you may be conflating the issue of which changes
should be masked out as hints (which is, indeed, special case code,
whether underdocumented or not) with the issue of which rmgrs the user
may want to verify (which is just a case of matching the rmgr ID in
the WAL record against a list provided by the user, and is not special
case code at all).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-08-31 Thread Amit Kapila
On Thu, Sep 1, 2016 at 9:43 AM, Peter Geoghegan  wrote:
> On Wed, Aug 31, 2016 at 8:26 PM, Amit Kapila  wrote:
>>> As far as I am understanding things, we are aiming at something that
>>> could be used on production systems.
>>>
>>
>> I don't think you can enable it by default in production systems.
>> Enabling it will lead to significant performance drop as it writes the
>> whole page after each record for most type of RMGR ID's.
>>
>>> And, honestly, any people
>>> enabling it would just do it for all RMGRs because that's a
>>> no-brainer.
>>
>> Agreed, but remember enabling it for all is not free.
>
> I have sympathy for the idea that this should be as low overhead as
> possible, even if that means adding complexity to the interface --
> within reason. I would like to hear a practical example of where this
> RMGR id interface could be put to good use, when starting with little
> initial information about a problem.
>

One example that comes to mind is for the cases where the problem
reproduces only under high concurrency or some stress test. Now assume
the problem is with index, enabling it for all rmgr's could reduce the
probability of problem due to it's performance impact.  The second
advantage which I have already listed is it helps in future
development like the one I am doing now for hash indexes (making them
logged).

> And, ideally, we'd also have some
> indication of how big a difference that would make, it terms of
> measurable performance impact.
>

Yes, that's a valid point. I think we can do some tests to see the difference.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-08-31 Thread Peter Geoghegan
On Wed, Aug 31, 2016 at 8:26 PM, Amit Kapila  wrote:
>> As far as I am understanding things, we are aiming at something that
>> could be used on production systems.
>>
>
> I don't think you can enable it by default in production systems.
> Enabling it will lead to significant performance drop as it writes the
> whole page after each record for most type of RMGR ID's.
>
>> And, honestly, any people
>> enabling it would just do it for all RMGRs because that's a
>> no-brainer.
>
> Agreed, but remember enabling it for all is not free.

I have sympathy for the idea that this should be as low overhead as
possible, even if that means adding complexity to the interface --
within reason. I would like to hear a practical example of where this
RMGR id interface could be put to good use, when starting with little
initial information about a problem. And, ideally, we'd also have some
indication of how big a difference that would make, it terms of
measurable performance impact.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-08-31 Thread Amit Kapila
On Thu, Sep 1, 2016 at 8:02 AM, Amit Kapila  wrote:
> On Wed, Aug 31, 2016 at 7:02 PM, Simon Riggs  wrote:
>> On 27 August 2016 at 12:09, Kuntal Ghosh  wrote:
>>
> * wal_consistency_mask = 511  /* Enable consistency check mask bit*/

 What does this mean? (No docs)
>>>
>>> I was using this parameter as a masking integer to indicate the
>>> operations(rmgr list) for which we need this feature to be enabled.
>>> Since, this could be confusing, I've changed it accordingly so that it
>>> accepts a list of rmgrIDs. (suggested by Michael, Amit and Robert)
>>
>> Why would we want that?
>>
>
> It would be easier to test and develop the various modules separately.
> As an example, if we develop a new AM which needs WAL facility or
> adding WAL capability to an existing system (say Hash Index), we can
> just test that module, rather than whole system.  I think it can help
> us in narrowing down the problem, if we have facility to enable it at
> RMGR ID level.  Having said that, I think this must have the facility
> to enable it for all the RMGR ID's (say ALL) and probably that should
> be default.
>

oops, I think having an option of specifying 'ALL' is good, but that
shouldn't be default, because it could have serious performance
implications.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-08-31 Thread Amit Kapila
On Thu, Sep 1, 2016 at 8:30 AM, Michael Paquier
 wrote:
> On Thu, Sep 1, 2016 at 11:32 AM, Amit Kapila  wrote:
>> On Wed, Aug 31, 2016 at 7:02 PM, Simon Riggs  wrote:
>>> On 27 August 2016 at 12:09, Kuntal Ghosh  wrote:
>>>
>> * wal_consistency_mask = 511  /* Enable consistency check mask bit*/
>
> What does this mean? (No docs)

 I was using this parameter as a masking integer to indicate the
 operations(rmgr list) for which we need this feature to be enabled.
 Since, this could be confusing, I've changed it accordingly so that it
 accepts a list of rmgrIDs. (suggested by Michael, Amit and Robert)
>>>
>>> Why would we want that?
>>
>> It would be easier to test and develop the various modules separately.
>> As an example, if we develop a new AM which needs WAL facility or
>> adding WAL capability to an existing system (say Hash Index), we can
>> just test that module, rather than whole system.  I think it can help
>> us in narrowing down the problem, if we have facility to enable it at
>> RMGR ID level.  Having said that, I think this must have the facility
>> to enable it for all the RMGR ID's (say ALL) and probably that should
>> be default.
>
> As far as I am understanding things, we are aiming at something that
> could be used on production systems.
>

I don't think you can enable it by default in production systems.
Enabling it will lead to significant performance drop as it writes the
whole page after each record for most type of RMGR ID's.

> And, honestly, any people
> enabling it would just do it for all RMGRs because that's a
> no-brainer.

Agreed, but remember enabling it for all is not free.

> If we are designing something for testing purposes
> instead, something is wrong with this patch then.
>

What is wrong?

> Doing filtering at RMGR level for testing and development purposes
> will be done by somebody who has the skills to filter out which
> records he should look at.
>

Right, but in that way, if you see many of our guc parameters needs a
good level of understanding to set the correct  values for them.  For
example, do you think it is easy for user to set value for
"replacement_sort_tuples" without reading the description or
understanding the meaning of same.  This example might not be the best
example, but I think there are other parameters which do require some
deeper understanding of system.  The main thing is default values for
such parameters should be chosen carefully such that it represents
most common usage.

> Or he'll bump into an existing bump. So I'd
> rather keep this thing simple.
>

It seems to me that having an option of 'ALL' would make it easier for
users to set it.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-08-31 Thread Michael Paquier
On Thu, Sep 1, 2016 at 11:32 AM, Amit Kapila  wrote:
> On Wed, Aug 31, 2016 at 7:02 PM, Simon Riggs  wrote:
>> On 27 August 2016 at 12:09, Kuntal Ghosh  wrote:
>>
> * wal_consistency_mask = 511  /* Enable consistency check mask bit*/

 What does this mean? (No docs)
>>>
>>> I was using this parameter as a masking integer to indicate the
>>> operations(rmgr list) for which we need this feature to be enabled.
>>> Since, this could be confusing, I've changed it accordingly so that it
>>> accepts a list of rmgrIDs. (suggested by Michael, Amit and Robert)
>>
>> Why would we want that?
>
> It would be easier to test and develop the various modules separately.
> As an example, if we develop a new AM which needs WAL facility or
> adding WAL capability to an existing system (say Hash Index), we can
> just test that module, rather than whole system.  I think it can help
> us in narrowing down the problem, if we have facility to enable it at
> RMGR ID level.  Having said that, I think this must have the facility
> to enable it for all the RMGR ID's (say ALL) and probably that should
> be default.

As far as I am understanding things, we are aiming at something that
could be used on production systems. And, honestly, any people
enabling it would just do it for all RMGRs because that's a
no-brainer. If we are designing something for testing purposes
instead, something is wrong with this patch then.

Doing filtering at RMGR level for testing and development purposes
will be done by somebody who has the skills to filter out which
records he should look at. Or he'll bump into an existing bump. So I'd
rather keep this thing simple.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-08-31 Thread Amit Kapila
On Wed, Aug 31, 2016 at 7:02 PM, Simon Riggs  wrote:
> On 27 August 2016 at 12:09, Kuntal Ghosh  wrote:
>
 * wal_consistency_mask = 511  /* Enable consistency check mask bit*/
>>>
>>> What does this mean? (No docs)
>>
>> I was using this parameter as a masking integer to indicate the
>> operations(rmgr list) for which we need this feature to be enabled.
>> Since, this could be confusing, I've changed it accordingly so that it
>> accepts a list of rmgrIDs. (suggested by Michael, Amit and Robert)
>
> Why would we want that?
>

It would be easier to test and develop the various modules separately.
As an example, if we develop a new AM which needs WAL facility or
adding WAL capability to an existing system (say Hash Index), we can
just test that module, rather than whole system.  I think it can help
us in narrowing down the problem, if we have facility to enable it at
RMGR ID level.  Having said that, I think this must have the facility
to enable it for all the RMGR ID's (say ALL) and probably that should
be default.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-08-31 Thread Michael Paquier
On Wed, Aug 31, 2016 at 10:32 PM, Simon Riggs  wrote:
> On 27 August 2016 at 12:09, Kuntal Ghosh  wrote:
>
 * wal_consistency_mask = 511  /* Enable consistency check mask bit*/
>>>
>>> What does this mean? (No docs)
>>
>> I was using this parameter as a masking integer to indicate the
>> operations(rmgr list) for which we need this feature to be enabled.
>> Since, this could be confusing, I've changed it accordingly so that it
>> accepts a list of rmgrIDs. (suggested by Michael, Amit and Robert)
>
> Why would we want that?

I am still in for just an on/off switch instead of this complication.
An all-or-nothing feature is what we are looking at here. Still a list
is an improvement compared to a bitmap.

 1. Add support for other Resource Managers.
>>>
>>> We probably need to have a discussion as to why you think this should
>>> be Rmgr dependent?
>>> Code comments would help there.
>>>
>>> If it does, then you should probably do this by extending RmgrTable
>>> with an rm_check, so you can call it like this...
>>>
>>> RmgrTable[record->xl_rmid].rm_check
>>
>> +1.
>> I'm modifying it accordingly. I'm calling this function after
>> RmgrTable[record->xl_rmid].rm_redo.
>>
 5. Generalize the page type identification technique.
>>>
>>> Why not do this first?
>>>
>>
>> At present, I'm using special page size and page ID to identify page
>> type. But, I've noticed some cases where the entire page is
>> initialized to zero (Ex: hash_xlog_squeeze_page). RmgrID and info bit
>> can help us to identify those pages.
>
> I'd prefer a solution that was not dependent upon RmgrID at all.

So you'd rather identify the page types by looking at pd_special? That
seems worse to me but..
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >