Re: [HACKERS] WAL consistency check facility
On Wed, Feb 15, 2017 at 11:08 AM, Robert Haas wrote: > On Tue, Feb 14, 2017 at 7:12 PM, Tom Lane wrote: >> FWIW, my own habit when creating new PG files is generally to write >> >> * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group >> * Portions Copyright (c) 1994, Regents of the University of California >> >> even if it's "all new" code. The main reason being that it's hardly ever >> the case that you didn't copy-and-paste some amount of stuff out of a >> pre-existing file, and trying to sort out how much of what originated >> exactly when is an unrewarding exercise. Even if it is basically all >> new code, this feels like giving an appropriate amount of credit to >> Those Who Went Before Us. > > Right. I tend to do the same, and wonder if we shouldn't make that a > general practice. This looks sensible to me. No-brainer rules that make sense are less things to worry about. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Tue, Feb 14, 2017 at 7:12 PM, Tom Lane wrote: > Robert Haas writes: >> On Tue, Feb 14, 2017 at 5:16 PM, Michael Paquier >> wrote: >>> Just for curiosity: does the moment when the code has been written or >>> committed counts? It's no big deal seeing how liberal the Postgres >>> license is, but this makes me wonder... > >> IANAL, but I think if you ask one, he or she will tell you that what >> matters is the date the work was created. In the case of code, that >> means when the code was written. > > FWIW, my own habit when creating new PG files is generally to write > > * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group > * Portions Copyright (c) 1994, Regents of the University of California > > even if it's "all new" code. The main reason being that it's hardly ever > the case that you didn't copy-and-paste some amount of stuff out of a > pre-existing file, and trying to sort out how much of what originated > exactly when is an unrewarding exercise. Even if it is basically all > new code, this feels like giving an appropriate amount of credit to > Those Who Went Before Us. Right. I tend to do the same, and wonder if we shouldn't make that a general practice. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
Robert Haas writes: > On Tue, Feb 14, 2017 at 5:16 PM, Michael Paquier > wrote: >> Just for curiosity: does the moment when the code has been written or >> committed counts? It's no big deal seeing how liberal the Postgres >> license is, but this makes me wonder... > IANAL, but I think if you ask one, he or she will tell you that what > matters is the date the work was created. In the case of code, that > means when the code was written. FWIW, my own habit when creating new PG files is generally to write * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group * Portions Copyright (c) 1994, Regents of the University of California even if it's "all new" code. The main reason being that it's hardly ever the case that you didn't copy-and-paste some amount of stuff out of a pre-existing file, and trying to sort out how much of what originated exactly when is an unrewarding exercise. Even if it is basically all new code, this feels like giving an appropriate amount of credit to Those Who Went Before Us. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Tue, Feb 14, 2017 at 5:16 PM, Michael Paquier wrote: > On Wed, Feb 15, 2017 at 2:43 AM, Robert Haas wrote: >> On Thu, Feb 9, 2017 at 8:17 PM, Michael Paquier >> wrote: >>> Please find attached a patch with those fixes. >> >> Committed, but I changed the copyright dates to 2016-2017 rather than >> just 2017 since surely some of the code was originally written before >> 2017. Even that might not really be going back far enough, but it >> doesn't matter too much. > > Just for curiosity: does the moment when the code has been written or > committed counts? It's no big deal seeing how liberal the Postgres > license is, but this makes me wonder... IANAL, but I think if you ask one, he or she will tell you that what matters is the date the work was created. In the case of code, that means when the code was written. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Wed, Feb 15, 2017 at 2:43 AM, Robert Haas wrote: > On Thu, Feb 9, 2017 at 8:17 PM, Michael Paquier > wrote: >> Please find attached a patch with those fixes. > > Committed, but I changed the copyright dates to 2016-2017 rather than > just 2017 since surely some of the code was originally written before > 2017. Even that might not really be going back far enough, but it > doesn't matter too much. Just for curiosity: does the moment when the code has been written or committed counts? It's no big deal seeing how liberal the Postgres license is, but this makes me wonder... -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Thu, Feb 9, 2017 at 8:17 PM, Michael Paquier wrote: > Please find attached a patch with those fixes. Committed, but I changed the copyright dates to 2016-2017 rather than just 2017 since surely some of the code was originally written before 2017. Even that might not really be going back far enough, but it doesn't matter too much. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Thu, Feb 9, 2017 at 5:56 AM, Robert Haas wrote: > On Wed, Feb 8, 2017 at 1:25 AM, Kuntal Ghosh > wrote: >> Thank you Robert for the review. Please find the updated patch in the >> attachment. > > I have committed this patch after fairly extensive revisions: Cool. I had finally a look at what has been committed in a507b869. Running regression tests with all RMGRs enabled, a single installcheck generates 7GB of WAL. Woah. > * Rewrote the documentation to give some idea what the underlying > mechanism of operation of the feature is, so that users who choose to > enable this will hopefully have some understanding of what they've > turned on. Thanks, those look good to me. > * Renamed "char *page" arguments to "char *pagedata" and "Page page", > because tempPage doesn't seem to be to be any better a name than > page_norm. > * Removed assertion in checkXLogConsistency that consistency checking > is enabled for the RM; that's trivially false if > wal_consistency_checking is not the same on the master and the > standby. (Note that quite apart from the issue of whether this > function should exist at all, adding it to a header file after the > closing #endif guard is certainly not right.) I recall doing those two things the same way as in the commit. Not sure at which point they have been re-introduced. > * Changed checkXLogConsistency to skip pages whose LSN is newer than > that of the record. Without this, if you shut down recovery and > restart it, it complains of inconsistent pages and dies. (I'm not > sure this is the only scenario that needs to be covered; it would be > smart to do more testing of restarting the standby.) Good point. > -- a WAL consistency failure causes your standby to die a hard death. > (Maybe there should be a way to suppress consistency checking on the > standby -- but I think not just by requiring wal_consistency_checking > on both ends. Or maybe we should just downgrade the FATAL to WARNING; > blowing up the standby irrevocably seems like poor behavior.) Having a FATAL is useful for buildfarm members, that would show up in red. Having a switch to generate a warning would be useful for live deployments I agree. Now I think that we need as well two things: - A recovery test to run regression tests with a standby behind. - Extend the TAP tests so as it is possible to fill in postgresql.conf with custom variables. - have the buildfarm client run recovery tests! I am fine to write those patches. > I also bumped XLOG_PAGE_MAGIC (which is normally done by the > committer, not the patch author, so I wasn't expecting that to be in > the patch as submitted). Here are a couple of things I have noticed while looking at the code. + * Portions Copyright (c) 2016, PostgreSQL Global Development Group s/2016/2017/ in bufmask.c and bufmask.h. + if (ItemIdIsNormal(iid)) + { + + HeapTupleHeader page_htup = (HeapTupleHeader) page_item; Unnecessary newline here. +* Read the contents from the backup copy, stored in WAL record and +* store it in a temporary page. There is not need to allocate a new +* page here, a local buffer is fine to hold its contents and a mask +* can be directly applied on it. s/not need/no need/. In checkXLogConsistency(), FPWs that have the flag BKPIMAGE_APPLY set will still be checked, resulting in a FPW being compared to itself. I think that those had better be bypassed. Please find attached a patch with those fixes. -- Michael consistency-checks-fix.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Thu, Feb 9, 2017 at 2:26 AM, Robert Haas wrote: > On Wed, Feb 8, 2017 at 1:25 AM, Kuntal Ghosh > wrote: >> Thank you Robert for the review. Please find the updated patch in the >> attachment. > > I have committed this patch after fairly extensive revisions: > Thank you, Robert, for the above corrections and commit. Thanks to Michael Paquier, Peter Eisentraut, Amit Kapila, Álvaro Herrera, and Simon Riggs for taking their time to complete the patch. It was a great learning experience for me. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Wed, Feb 8, 2017 at 1:25 AM, Kuntal Ghosh wrote: > Thank you Robert for the review. Please find the updated patch in the > attachment. I have committed this patch after fairly extensive revisions: * Rewrote the documentation to give some idea what the underlying mechanism of operation of the feature is, so that users who choose to enable this will hopefully have some understanding of what they've turned on. * Renamed "char *page" arguments to "char *pagedata" and "Page page", because tempPage doesn't seem to be to be any better a name than page_norm. * Moved bufmask.c to src/backend/access/common, because there's no code in src/backend/storage/buffer that knows anything about the format of pages; that is the job of AMs, hence src/backend/access. * Improved some comments in bufmask.c * Removed consistencyCheck_is_enabled in favor of determining which RMs support masking by the presence of absence of an rm_mask function. * Removed assertion in checkXLogConsistency that consistency checking is enabled for the RM; that's trivially false if wal_consistency_checking is not the same on the master and the standby. (Note that quite apart from the issue of whether this function should exist at all, adding it to a header file after the closing #endif guard is certainly not right.) * Changed checkXLogConsistency to use RBM_NORMAL_NO_LOG instead of RBM_NORMAL. I'm not sure if there are any cases where this makes a difference, but it seems safer. * Changed checkXLogConsistency to skip pages whose LSN is newer than that of the record. Without this, if you shut down recovery and restart it, it complains of inconsistent pages and dies. (I'm not sure this is the only scenario that needs to be covered; it would be smart to do more testing of restarting the standby.) * Made wal_consistency_checking a developer option instead of a WAL option. Even though it CAN be used in production, we don't particularly want to encourage that; enabling WAL consistency checking has a big performance cost and makes your system more fragile not less -- a WAL consistency failure causes your standby to die a hard death. (Maybe there should be a way to suppress consistency checking on the standby -- but I think not just by requiring wal_consistency_checking on both ends. Or maybe we should just downgrade the FATAL to WARNING; blowing up the standby irrevocably seems like poor behavior.) * Coding style improvement in check_wal_consistency_checking. * Removed commas in messages added to pg_xlogdump; those didn't look good to me, on further review. * Comment improvements in xlog_internal.h and xlogreader.h I also bumped XLOG_PAGE_MAGIC (which is normally done by the committer, not the patch author, so I wasn't expecting that to be in the patch as submitted). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Tue, Jan 31, 2017 at 9:36 PM, Robert Haas wrote: > On Thu, Jan 5, 2017 at 12:54 AM, Kuntal Ghosh > wrote: >> I've attached the patch with the modified changes. PFA. > > Can this patch check contrib/bloom? > Added support for generic resource manager type. > +/* > + * Mask some line pointer bits, particularly those marked as > + * used on a master and unused on a standby. > + */ > > Comment doesn't explain why we need to do this. > Added comments. > +/* > + * For GIN_DELETED page, the page is initialized to empty. > + * Hence mask everything. > + */ > +if (opaque->flags & GIN_DELETED) > +memset(page_norm, MASK_MARKER, BLCKSZ); > +else > +mask_unused_space(page_norm); > > If the page is initialized to empty, why do we need to mask > anything/everything? Anyway, it doesn't seem right to mask the > GinPageOpaque structure itself. Or at least it doesn't seem right to > mask the flags word. > Modified accordingly. > > +if (!HeapTupleHeaderXminFrozen(page_htup)) > +page_htup->t_infomask |= HEAP_XACT_MASK; > +else > +page_htup->t_infomask |= HEAP_XMAX_COMMITTED | > HEAP_XMAX_INVALID; > > Comment doesn't address this logic. Also, the "else" case shouldn't > exist at all, I think. Added comments. I think that "Else" part is needed for xmax. > > +/* > + * For a speculative tuple, the content of t_ctid is conflicting > + * between the backup page and current page. Hence, we set it > + * to the current block number and current offset. > + */ > > Why does it differ? Is that a bug? > Added comments. > > +/* > + * During replay of a btree page split, we don't set the BTP_SPLIT_END > + * flag of the right sibling and initialize th cycle_id to 0 for the same > + * page. > + */ > > A reference to the name of the redo function might be helpful here > (and in some other places), unless it's just ${AMNAME}_redo. "th" is > a typo for "the". Corrected. > +appendStringInfoString(buf, " (full page > image, apply)"); > +else > +appendStringInfoString(buf, " (full page image)"); > > How about "(full page image)" and "(full page image, for WAL verification)"? > > Similarly in XLogDumpDisplayRecord, I think we should assume that > "FPW" means what it has always meant, and leave that output alone. > Instead, distinguish the WAL-consistency-checking case when it > happens, e.g. "(FPW for consistency check)". > Corrected. > +checkConsistency(XLogReaderState *record) > > How about CheckXLogConsistency()? > > + * If needs_backup is true or wal consistency check is enabled for > > ...or WAL checking is enabled... > > + * If WAL consistency is enabled for the resource manager of > > If WAL consistency checking is enabled... > > + * Note: when a backup block is available in XLOG with BKPIMAGE_APPLY flag > > with the BKPIMAGE_APPLY flag Modified accordingly. > > - * In RBM_ZERO_* modes, if the page doesn't exist, the relation is extended > - * with all-zeroes pages up to the referenced block number. In > - * RBM_ZERO_AND_LOCK and RBM_ZERO_AND_CLEANUP_LOCK modes, the return value > + * In RBM_ZERO_* modes, if the page doesn't exist or BKPIMAGE_APPLY flag > + * is not set for the backup block, the relation is extended with all-zeroes > + * pages up to the referenced block number. > > OK, I'm puzzled by this. Surely we don't want the WAL consistency > checking facility to cause the relation to be extended like this. And > I don't see why it would, because the WAL consistency checking happens > after the page changes have already been applied. I wonder if this > hunk is in error and should be dropped. > Dropped comments. > +PageXLogRecPtrSet(phdr->pd_lsn, PG_UINT64_MAX); > +phdr->pd_prune_xid = PG_UINT32_MAX; > +phdr->pd_flags |= PD_PAGE_FULL | PD_HAS_FREE_LINES; > +phdr->pd_flags |= PD_ALL_VISIBLE; > +#define MASK_MARKER0xFF > (and many others) > > Why do we mask by setting bits rather than clearing bits? My > intuition would have been to zero things we want to mask, rather than > setting them to one. > Modified accordingly so that we can zero things during masking instead of setting them to one. > +{ > +newwalconsistency[i] = true; > +} > > Superfluous braces. > Corrected. > +/* > + * Leave if no masking functions defined, this is possible in the case > + * resource managers generating just full page writes, comparing an > + * image to itself has no meaning in those cases. > + */ > +if (RmgrTable[rmid].rm_mask == NULL) > +return; > > ...and also... > > +/* > + * This feature is enabled only for the resource managers where > + * a masking function is defined.
Re: [HACKERS] WAL consistency check facility
On Tue, Feb 7, 2017 at 6:32 AM, Amit Kapila wrote: > On Tue, Jan 31, 2017 at 9:36 PM, Robert Haas wrote: >> >> +if (!HeapTupleHeaderXminFrozen(page_htup)) >> +page_htup->t_infomask |= HEAP_XACT_MASK; >> +else >> +page_htup->t_infomask |= HEAP_XMAX_COMMITTED | >> HEAP_XMAX_INVALID; >> >> Comment doesn't address this logic. Also, the "else" case shouldn't >> exist at all, I think. >> > > In the *if* check, it just checks frozen status of xmin, so I think > you need to mask xmax related bits in else check. Can you explain > what makes you think that the else case shouldn't exist? Oh, you're right. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Tue, Jan 31, 2017 at 9:36 PM, Robert Haas wrote: > > +if (!HeapTupleHeaderXminFrozen(page_htup)) > +page_htup->t_infomask |= HEAP_XACT_MASK; > +else > +page_htup->t_infomask |= HEAP_XMAX_COMMITTED | > HEAP_XMAX_INVALID; > > Comment doesn't address this logic. Also, the "else" case shouldn't > exist at all, I think. > In the *if* check, it just checks frozen status of xmin, so I think you need to mask xmax related bits in else check. Can you explain what makes you think that the else case shouldn't exist? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Tue, Jan 31, 2017 at 9:31 PM, Michael Paquier wrote: > On Wed, Feb 1, 2017 at 1:06 AM, Robert Haas wrote: >> On Thu, Jan 5, 2017 at 12:54 AM, Kuntal Ghosh >> wrote: >>> I've attached the patch with the modified changes. PFA. >> >> Can this patch check contrib/bloom? > > Only full pages are applied at redo by the generic WAL facility. So > you would finish by comparing a page with itself in generic_redo. generic_redo is more complicated than just restoring an FPI. I admit that generic_redo *probably* has no bugs, but I don't see why it would hurt to allow it to be checked. It's not IMPOSSIBLE that restoring the page delta could go wrong somehow. >> +/* >> + * For a speculative tuple, the content of t_ctid is conflicting >> + * between the backup page and current page. Hence, we set it >> + * to the current block number and current offset. >> + */ >> >> Why does it differ? Is that a bug? > > This has been discussed twice in this thread, once by me, once by Alvaro: > https://www.postgresql.org/message-id/CAM3SWZQC8nUgp8SjKDY3d74VLpdf9puHc7-n3zf4xcr_bghPzg%40mail.gmail.com > https://www.postgresql.org/message-id/CAB7nPqQTLGvn_XePjS07kZMMw46kS6S7LfsTocK%2BgLpTN0bcZw%40mail.gmail.com Sorry, I missed/forgot about that. I think this is evidence that the comment isn't really good enough. It says "hey, this might differ" ... with no real explanation of why it might differ or why that's OK. If there were an explanation there, I wouldn't have flagged it. >> +/* >> + * Leave if no masking functions defined, this is possible in the case >> + * resource managers generating just full page writes, comparing an >> + * image to itself has no meaning in those cases. >> + */ >> +if (RmgrTable[rmid].rm_mask == NULL) >> +return; >> >> ...and also... >> >> +/* >> + * This feature is enabled only for the resource managers where >> + * a masking function is defined. >> + */ >> +for (i = 0; i <= RM_MAX_ID; i++) >> +{ >> +if (RmgrTable[i].rm_mask != NULL) >> >> Why do we assume that the feature is only enabled for RMs that have a >> mask function? Why not instead assume that if there's no masking >> function, no masking is required? > > Not all RMs work on full pages. Tracking in smgr.h the list of RMs > that are no-ops when masking things is easier than having empty > routines declared all over the code base. Don't you think so> Sure, but I don't think that's what the code is doing. If an RM is a no-op when masking things, it must define an empty function. If it defines no function, checking is disabled. I think we want to be able to check any RM. No? >> +void(*rm_mask) (char *page, BlockNumber blkno); >> >> Could the page be passed as type "Page" rather than a "char *" to make >> things more convenient for the masking functions? If not, could those >> functions at least do something like "Page page = (Page) pagebytes;" >> rather than "Page page_norm = (Page) page;"? > > xlog_internal.h is used as well by frontends, this makes the header > dependencies cleaner. (I have looked at using Page when hacking this > stuff, but the header dependencies are not worth it, I don't recall > all the details though this was a couple of months back). OK. I still think page_norm is a lame variable name. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Wed, Feb 1, 2017 at 8:01 AM, Michael Paquier wrote: > On Wed, Feb 1, 2017 at 1:06 AM, Robert Haas wrote: >> +/* >> + * Leave if no masking functions defined, this is possible in the case >> + * resource managers generating just full page writes, comparing an >> + * image to itself has no meaning in those cases. >> + */ >> +if (RmgrTable[rmid].rm_mask == NULL) >> +return; >> >> ...and also... >> >> +/* >> + * This feature is enabled only for the resource managers where >> + * a masking function is defined. >> + */ >> +for (i = 0; i <= RM_MAX_ID; i++) >> +{ >> +if (RmgrTable[i].rm_mask != NULL) >> >> Why do we assume that the feature is only enabled for RMs that have a >> mask function? Why not instead assume that if there's no masking >> function, no masking is required? > > Not all RMs work on full pages. Tracking in smgr.h the list of RMs > that are no-ops when masking things is easier than having empty > routines declared all over the code base. Don't you think so> > Robert's suggestion surely makes the approach more general. But, the existing approach makes it easier to decide the RM's for which we support the consistency check facility. Surely, we can use a list to track the RMs which should (/not) be masked. But, I think we always have to mask the lsn of the pages at least. Isn't it? >> +void(*rm_mask) (char *page, BlockNumber blkno); >> >> Could the page be passed as type "Page" rather than a "char *" to make >> things more convenient for the masking functions? If not, could those >> functions at least do something like "Page page = (Page) pagebytes;" >> rather than "Page page_norm = (Page) page;"? > > xlog_internal.h is used as well by frontends, this makes the header > dependencies cleaner. (I have looked at using Page when hacking this > stuff, but the header dependencies are not worth it, I don't recall > all the details though this was a couple of months back). + 1 -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Tue, Jan 31, 2017 at 9:36 PM, Robert Haas wrote: > On Thu, Jan 5, 2017 at 12:54 AM, Kuntal Ghosh > wrote: >> I've attached the patch with the modified changes. PFA. Thanks Robert for taking your time for the review. I'll update the patch with the changes suggested by you. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Wed, Feb 1, 2017 at 1:06 AM, Robert Haas wrote: > On Thu, Jan 5, 2017 at 12:54 AM, Kuntal Ghosh > wrote: >> I've attached the patch with the modified changes. PFA. > > Can this patch check contrib/bloom? Only full pages are applied at redo by the generic WAL facility. So you would finish by comparing a page with itself in generic_redo. > +/* > + * For a speculative tuple, the content of t_ctid is conflicting > + * between the backup page and current page. Hence, we set it > + * to the current block number and current offset. > + */ > > Why does it differ? Is that a bug? This has been discussed twice in this thread, once by me, once by Alvaro: https://www.postgresql.org/message-id/CAM3SWZQC8nUgp8SjKDY3d74VLpdf9puHc7-n3zf4xcr_bghPzg%40mail.gmail.com https://www.postgresql.org/message-id/CAB7nPqQTLGvn_XePjS07kZMMw46kS6S7LfsTocK%2BgLpTN0bcZw%40mail.gmail.com > +/* > + * Leave if no masking functions defined, this is possible in the case > + * resource managers generating just full page writes, comparing an > + * image to itself has no meaning in those cases. > + */ > +if (RmgrTable[rmid].rm_mask == NULL) > +return; > > ...and also... > > +/* > + * This feature is enabled only for the resource managers where > + * a masking function is defined. > + */ > +for (i = 0; i <= RM_MAX_ID; i++) > +{ > +if (RmgrTable[i].rm_mask != NULL) > > Why do we assume that the feature is only enabled for RMs that have a > mask function? Why not instead assume that if there's no masking > function, no masking is required? Not all RMs work on full pages. Tracking in smgr.h the list of RMs that are no-ops when masking things is easier than having empty routines declared all over the code base. Don't you think so> > +void(*rm_mask) (char *page, BlockNumber blkno); > > Could the page be passed as type "Page" rather than a "char *" to make > things more convenient for the masking functions? If not, could those > functions at least do something like "Page page = (Page) pagebytes;" > rather than "Page page_norm = (Page) page;"? xlog_internal.h is used as well by frontends, this makes the header dependencies cleaner. (I have looked at using Page when hacking this stuff, but the header dependencies are not worth it, I don't recall all the details though this was a couple of months back). -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Thu, Jan 5, 2017 at 12:54 AM, Kuntal Ghosh wrote: > I've attached the patch with the modified changes. PFA. Can this patch check contrib/bloom? +/* + * Mask some line pointer bits, particularly those marked as + * used on a master and unused on a standby. + */ Comment doesn't explain why we need to do this. +/* + * For GIN_DELETED page, the page is initialized to empty. + * Hence mask everything. + */ +if (opaque->flags & GIN_DELETED) +memset(page_norm, MASK_MARKER, BLCKSZ); +else +mask_unused_space(page_norm); If the page is initialized to empty, why do we need to mask anything/everything? Anyway, it doesn't seem right to mask the GinPageOpaque structure itself. Or at least it doesn't seem right to mask the flags word. +/* + * For gist leaf pages, mask some line pointer bits, particularly + * those marked as used on a master and unused on a standby. + */ Comment doesn't explain why we need to do this. +if (!HeapTupleHeaderXminFrozen(page_htup)) +page_htup->t_infomask |= HEAP_XACT_MASK; +else +page_htup->t_infomask |= HEAP_XMAX_COMMITTED | HEAP_XMAX_INVALID; Comment doesn't address this logic. Also, the "else" case shouldn't exist at all, I think. +/* + * For a speculative tuple, the content of t_ctid is conflicting + * between the backup page and current page. Hence, we set it + * to the current block number and current offset. + */ Why does it differ? Is that a bug? +/* + * Mask everything on a DELETED page since it will be re-initialized + * during replay. + */ +if ((maskopaq->btpo_flags & BTP_DELETED) != 0) +{ +/* Mask Page Content */ +memset(page_norm + SizeOfPageHeaderData, MASK_MARKER, + BLCKSZ - SizeOfPageHeaderData); + +/* Mask pd_lower and pd_upper */ +memset(&((PageHeader) page_norm)->pd_lower, MASK_MARKER, + sizeof(uint16)); +memset(&((PageHeader) page_norm)->pd_upper, MASK_MARKER, + sizeof(uint16)); This isn't consistent with the GIN_DELETE case - it is more selective about what it masks. Probably that logic should be adapted to look more like this. +/* + * Mask some line pointer bits, particularly those marked as + * used on a master and unused on a standby. + */ Comment (still) doesn't explain why we need to do this. +/* + * During replay of a btree page split, we don't set the BTP_SPLIT_END + * flag of the right sibling and initialize th cycle_id to 0 for the same + * page. + */ A reference to the name of the redo function might be helpful here (and in some other places), unless it's just ${AMNAME}_redo. "th" is a typo for "the". +appendStringInfoString(buf, " (full page image, apply)"); +else +appendStringInfoString(buf, " (full page image)"); How about "(full page image)" and "(full page image, for WAL verification)"? Similarly in XLogDumpDisplayRecord, I think we should assume that "FPW" means what it has always meant, and leave that output alone. Instead, distinguish the WAL-consistency-checking case when it happens, e.g. "(FPW for consistency check)". +checkConsistency(XLogReaderState *record) How about CheckXLogConsistency()? + * If needs_backup is true or wal consistency check is enabled for ...or WAL checking is enabled... + * If WAL consistency is enabled for the resource manager of If WAL consistency checking is enabled... + * Note: when a backup block is available in XLOG with BKPIMAGE_APPLY flag with the BKPIMAGE_APPLY flag - * In RBM_ZERO_* modes, if the page doesn't exist, the relation is extended - * with all-zeroes pages up to the referenced block number. In - * RBM_ZERO_AND_LOCK and RBM_ZERO_AND_CLEANUP_LOCK modes, the return value + * In RBM_ZERO_* modes, if the page doesn't exist or BKPIMAGE_APPLY flag + * is not set for the backup block, the relation is extended with all-zeroes + * pages up to the referenced block number. OK, I'm puzzled by this. Surely we don't want the WAL consistency checking facility to cause the relation to be extended like this. And I don't see why it would, because the WAL consistency checking happens after the page changes have already been applied. I wonder if this hunk is in error and should be dropped. +PageXLogRecPtrSet(phdr->pd_lsn, PG_UINT64_MAX); +phdr->pd_prune_xid = PG_UINT32_MAX; +phdr->pd_flags |= PD_PAGE_FULL | PD_HAS_FREE_LINES; +phdr->pd_flags |= PD_ALL_VISIBLE; +#define MASK_MARKER0xFF (and many others) Why do we mask by setting bits rather than clearing bits? My intuition would have been to zero things we want to mask, rather than setting them to one. +{ +
Re: [HACKERS] WAL consistency check facility
On Thu, Jan 5, 2017 at 2:54 PM, Kuntal Ghosh wrote: > On Wed, Dec 21, 2016 at 10:53 PM, Robert Haas wrote: > >> On a first read-through of this patch -- I have not studied it in >> detail yet -- this looks pretty good to me. One concern is that this >> patch adds a bit of code to XLogInsert(), which is a very hot piece of >> code. Conceivably, that might produce a regression even when this is >> disabled; if so, we'd probably need to make it a build-time option. I >> hope that's not necessary, because I think it would be great to >> compile this into the server by default, but we better make sure it's >> not a problem. A bulk load into an existing table might be a good >> test case. >> > I've done some bulk load testing with 16,32,64 clients. I didn't > notice any regression > in the results. > >> Aside from that, I think the biggest issue here is that the masking >> functions are virtually free of comments, whereas I think they should >> have extensive and detailed comments. For example, in heap_mask, you >> have this: >> >> +page_htup->t_infomask = >> +HEAP_XMIN_COMMITTED | HEAP_XMIN_INVALID | >> +HEAP_XMAX_COMMITTED | HEAP_XMAX_INVALID; >> >> For something like this, you could write "We want to ignore >> differences in hint bits, since they can be set by SetHintBits without >> emitting WAL. Force them all to be set so that we don't notice >> discrepancies." Actually, though, I think that you could be a bit >> more nuanced here: HEAP_XMIN_COMMITTED + HEAP_XMIN_INVALID = >> HEAP_XMIN_FROZEN, so maybe what you should do is clear >> HEAP_XMAX_COMMITTED and HEAP_XMAX_INVALID but only clear the others if >> one is set but not both. >> > I've modified it as follows: > + > + if (!HeapTupleHeaderXminFrozen(page_htup)) > + page_htup->t_infomask |= HEAP_XACT_MASK; > + else > + page_htup->t_infomask |= > HEAP_XMAX_COMMITTED | HEAP_XMAX_INVALID; > >> Anyway, leaving that aside, I think every single change that gets >> masked in every single masking routine needs a similar comment, >> explaining why that change can happen on the master without also >> happening on the standby and hopefully referring to the code that >> makes that unlogged change. >> > I've added comments for all the masking routines. > >> I think wal_consistency_checking, as proposed by Peter, is better than >> wal_consistency_check, as implemented. >> > Modified to wal_consistency_checking. > >> Having StartupXLOG() call pfree() on the masking buffers is a waste of >> code. The process is going to exit anyway. >> >> + "Inconsistent page found, rel %u/%u/%u, forknum %u, blkno >> %u", >> > Done. > >> Primary error messages aren't capitalized. >> >> +if (!XLogRecGetBlockTag(record, block_id, &rnode, &forknum, &blkno)) >> +{ >> +/* Caller specified a bogus block_id. Do nothing. */ >> +continue; >> +} >> >> Why would the caller do something so dastardly? >> > Modified to following comment: > + /* > +* WAL record doesn't contain a block reference > +* with the given id. Do nothing. > +*/ > > I've attached the patch with the modified changes. PFA. Moved to CF 2017-03 with same status, "ready for committer". -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Wed, Dec 21, 2016 at 10:53 PM, Robert Haas wrote: > On a first read-through of this patch -- I have not studied it in > detail yet -- this looks pretty good to me. One concern is that this > patch adds a bit of code to XLogInsert(), which is a very hot piece of > code. Conceivably, that might produce a regression even when this is > disabled; if so, we'd probably need to make it a build-time option. I > hope that's not necessary, because I think it would be great to > compile this into the server by default, but we better make sure it's > not a problem. A bulk load into an existing table might be a good > test case. > I've done some bulk load testing with 16,32,64 clients. I didn't notice any regression in the results. > Aside from that, I think the biggest issue here is that the masking > functions are virtually free of comments, whereas I think they should > have extensive and detailed comments. For example, in heap_mask, you > have this: > > +page_htup->t_infomask = > +HEAP_XMIN_COMMITTED | HEAP_XMIN_INVALID | > +HEAP_XMAX_COMMITTED | HEAP_XMAX_INVALID; > > For something like this, you could write "We want to ignore > differences in hint bits, since they can be set by SetHintBits without > emitting WAL. Force them all to be set so that we don't notice > discrepancies." Actually, though, I think that you could be a bit > more nuanced here: HEAP_XMIN_COMMITTED + HEAP_XMIN_INVALID = > HEAP_XMIN_FROZEN, so maybe what you should do is clear > HEAP_XMAX_COMMITTED and HEAP_XMAX_INVALID but only clear the others if > one is set but not both. > I've modified it as follows: + + if (!HeapTupleHeaderXminFrozen(page_htup)) + page_htup->t_infomask |= HEAP_XACT_MASK; + else + page_htup->t_infomask |= HEAP_XMAX_COMMITTED | HEAP_XMAX_INVALID; > Anyway, leaving that aside, I think every single change that gets > masked in every single masking routine needs a similar comment, > explaining why that change can happen on the master without also > happening on the standby and hopefully referring to the code that > makes that unlogged change. > I've added comments for all the masking routines. > I think wal_consistency_checking, as proposed by Peter, is better than > wal_consistency_check, as implemented. > Modified to wal_consistency_checking. > Having StartupXLOG() call pfree() on the masking buffers is a waste of > code. The process is going to exit anyway. > > + "Inconsistent page found, rel %u/%u/%u, forknum %u, blkno > %u", > Done. > Primary error messages aren't capitalized. > > +if (!XLogRecGetBlockTag(record, block_id, &rnode, &forknum, &blkno)) > +{ > +/* Caller specified a bogus block_id. Do nothing. */ > +continue; > +} > > Why would the caller do something so dastardly? > Modified to following comment: + /* +* WAL record doesn't contain a block reference +* with the given id. Do nothing. +*/ I've attached the patch with the modified changes. PFA. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com walconsistency_v16.patch Description: application/download -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Wed, Dec 21, 2016 at 10:53 PM, Robert Haas wrote: > On Mon, Nov 28, 2016 at 11:31 PM, Michael Paquier > wrote: >> Moved to CF 2017-01, as no committers have showed up yet :( > > Seeing no other volunteers, here I am. > Thanks Robert for looking into the patch. > On a first read-through of this patch -- I have not studied it in > detail yet -- this looks pretty good to me. One concern is that this > patch adds a bit of code to XLogInsert(), which is a very hot piece of > code. Conceivably, that might produce a regression even when this is > disabled; if so, we'd probably need to make it a build-time option. I > hope that's not necessary, because I think it would be great to > compile this into the server by default, but we better make sure it's > not a problem. A bulk load into an existing table might be a good > test case. > I'll do this test and post the results. > +if (!XLogRecGetBlockTag(record, block_id, &rnode, &forknum, &blkno)) > +{ > +/* Caller specified a bogus block_id. Do nothing. */ > +continue; > +} > > Why would the caller do something so dastardly? > Sorry, it's my bad. I've copied the code from somewhere else, but forgot to modify the comment. It should be something like /* block_id is not used. */ I'll modify the above along with other suggested changes. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Mon, Nov 28, 2016 at 11:31 PM, Michael Paquier wrote: > Moved to CF 2017-01, as no committers have showed up yet :( Seeing no other volunteers, here I am. On a first read-through of this patch -- I have not studied it in detail yet -- this looks pretty good to me. One concern is that this patch adds a bit of code to XLogInsert(), which is a very hot piece of code. Conceivably, that might produce a regression even when this is disabled; if so, we'd probably need to make it a build-time option. I hope that's not necessary, because I think it would be great to compile this into the server by default, but we better make sure it's not a problem. A bulk load into an existing table might be a good test case. Aside from that, I think the biggest issue here is that the masking functions are virtually free of comments, whereas I think they should have extensive and detailed comments. For example, in heap_mask, you have this: +page_htup->t_infomask = +HEAP_XMIN_COMMITTED | HEAP_XMIN_INVALID | +HEAP_XMAX_COMMITTED | HEAP_XMAX_INVALID; For something like this, you could write "We want to ignore differences in hint bits, since they can be set by SetHintBits without emitting WAL. Force them all to be set so that we don't notice discrepancies." Actually, though, I think that you could be a bit more nuanced here: HEAP_XMIN_COMMITTED + HEAP_XMIN_INVALID = HEAP_XMIN_FROZEN, so maybe what you should do is clear HEAP_XMAX_COMMITTED and HEAP_XMAX_INVALID but only clear the others if one is set but not both. Anyway, leaving that aside, I think every single change that gets masked in every single masking routine needs a similar comment, explaining why that change can happen on the master without also happening on the standby and hopefully referring to the code that makes that unlogged change. I think wal_consistency_checking, as proposed by Peter, is better than wal_consistency_check, as implemented. Having StartupXLOG() call pfree() on the masking buffers is a waste of code. The process is going to exit anyway. + "Inconsistent page found, rel %u/%u/%u, forknum %u, blkno %u", Primary error messages aren't capitalized. +if (!XLogRecGetBlockTag(record, block_id, &rnode, &forknum, &blkno)) +{ +/* Caller specified a bogus block_id. Do nothing. */ +continue; +} Why would the caller do something so dastardly? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Wed, Nov 16, 2016 at 2:15 AM, Michael Paquier wrote: > On Tue, Nov 15, 2016 at 7:50 AM, Kuntal Ghosh > wrote: >> I've modified the guc parameter name as wal_consistency_check (little >> hesitant for a participle in suffix :) ). Also, updated the sgml and >> variable name accordingly. > > The changes look good to me. Moved to CF 2017-01, as no committers have showed up yet :( -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Tue, Nov 15, 2016 at 7:50 AM, Kuntal Ghosh wrote: > I've modified the guc parameter name as wal_consistency_check (little > hesitant for a participle in suffix :) ). Also, updated the sgml and > variable name accordingly. The changes look good to me. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Tue, Nov 15, 2016 at 7:23 PM, Robert Haas wrote: > On Sat, Nov 12, 2016 at 10:06 PM, Peter Eisentraut > wrote: >> On 11/9/16 11:55 PM, Michael Paquier wrote: >>> + >>> + wal_consistency (string) >>> + >>> + wal_consistency configuration >>> parameter >>> + >>> + >>> + >>> + >>> +This parameter is used to check the consistency of WAL records, >>> i.e, >>> +whether the WAL records are inserted and applied correctly. When >>> +wal_consistency is enabled for a WAL record, it >>> +stores a full-page image along with the record. When a full-page >>> image >>> +arrives during redo, it compares against the current page to check >>> whether >>> +both are consistent. >>> + >> >> Could we name this something like wal_consistency_checking? >> >> Otherwise it sounds like you use this to select the amount of >> consistency in the WAL (similar to, say, wal_level). > > +1. I like that name. I've modified the guc parameter name as wal_consistency_check (little hesitant for a participle in suffix :) ). Also, updated the sgml and variable name accordingly. FYI, regression test will fail because of an inconsistency in brin page. I've already submitted a patch for that. Following is the thread for the same: https://www.postgresql.org/message-id/flat/CAGz5QCJ%3D00UQjScSEFbV%3D0qO5ShTZB9WWz_Fm7%2BWd83zPs9Geg%40mail.gmail.com#CAGz5QCJ=00UQjScSEFbV=0qo5shtzb9wwz_fm7+wd83zps9...@mail.gmail.com -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com walconsistency_v15.patch Description: application/download -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Sat, Nov 12, 2016 at 10:06 PM, Peter Eisentraut wrote: > On 11/9/16 11:55 PM, Michael Paquier wrote: >> + >> + wal_consistency (string) >> + >> + wal_consistency configuration >> parameter >> + >> + >> + >> + >> +This parameter is used to check the consistency of WAL records, i.e, >> +whether the WAL records are inserted and applied correctly. When >> +wal_consistency is enabled for a WAL record, it >> +stores a full-page image along with the record. When a full-page >> image >> +arrives during redo, it compares against the current page to check >> whether >> +both are consistent. >> + > > Could we name this something like wal_consistency_checking? > > Otherwise it sounds like you use this to select the amount of > consistency in the WAL (similar to, say, wal_level). +1. I like that name. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Sun, Nov 13, 2016 at 12:06 PM, Peter Eisentraut wrote: > Could we name this something like wal_consistency_checking? > > Otherwise it sounds like you use this to select the amount of > consistency in the WAL (similar to, say, wal_level). Or wal_check? Or wal_consistency_check? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On 11/9/16 11:55 PM, Michael Paquier wrote: > + > + wal_consistency (string) > + > + wal_consistency configuration parameter > + > + > + > + > +This parameter is used to check the consistency of WAL records, i.e, > +whether the WAL records are inserted and applied correctly. When > +wal_consistency is enabled for a WAL record, it > +stores a full-page image along with the record. When a full-page > image > +arrives during redo, it compares against the current page to check > whether > +both are consistent. > + Could we name this something like wal_consistency_checking? Otherwise it sounds like you use this to select the amount of consistency in the WAL (similar to, say, wal_level). -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Fri, Nov 11, 2016 at 3:36 AM, Michael Paquier wrote: > On Fri, Nov 11, 2016 at 1:36 AM, Robert Haas wrote: >> So, who are all of the people involved in the effort to produce this >> patch, and what's the right way to attribute credit? > > The original idea was from Heikki as he has introduced the idea of > doing such checks if you look at the original thread. I added on top > of it a couple of things like the concept of page masking, and hacked > an early version of the versoin we have now > (https://www.postgresql.org/message-id/cab7npqr4vxdkijp+du82vocongmvutq-gfqiu2dsh4bsm77...@mail.gmail.com). > So it seems to me that marking Heikki as an author would be fair as > the original idea is from him. I don't mind being marked only as a > reviewer of the feature, or as someone from which has written an > earlier version of the patch, but I let that up to your judgement. > Kuntai is definitely an author. Although lot of changes have been done later, but I've started with the patch attached in the above thread. Hence, I feel the author of that patch should also get the credit. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Fri, Nov 11, 2016 at 1:36 AM, Robert Haas wrote: > So, who are all of the people involved in the effort to produce this > patch, and what's the right way to attribute credit? The original idea was from Heikki as he has introduced the idea of doing such checks if you look at the original thread. I added on top of it a couple of things like the concept of page masking, and hacked an early version of the versoin we have now (https://www.postgresql.org/message-id/cab7npqr4vxdkijp+du82vocongmvutq-gfqiu2dsh4bsm77...@mail.gmail.com). So it seems to me that marking Heikki as an author would be fair as the original idea is from him. I don't mind being marked only as a reviewer of the feature, or as someone from which has written an earlier version of the patch, but I let that up to your judgement. Kuntai is definitely an author. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Thu, Nov 10, 2016 at 10:02 AM, Kuntal Ghosh wrote: >> With the patch for BRIN applied, I am able to get installcheck-world >> working with wal_consistency = all and a standby doing the consistency >> checks behind. Adding wal_consistency = all in PostgresNode.pm, the >> recovery tests are passing. This patch is switched as "Ready for >> Committer". Thanks for completing this effort begun 3 years ago! > Thanks to you for reviewing all the patches in so much detail. Amit, Robert > and Dilip also helped me a lot in developing the feature. Thanks to them > as well. So, who should be credited as co-authors of this patch and in what order, if and when it gets committed? If X started this patch and then Kuntal did a little more work on it, I would credit it as: X and Kuntal Ghosh If Kuntal did major work on it, though, then I would think of something more like: Kuntal Ghosh, based on an earlier patch from X If he didn't use any of the old code but just the idea, then I would do something like this: Kuntal Ghosh, inspired by a previous patch from X So, who are all of the people involved in the effort to produce this patch, and what's the right way to attribute credit? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Thu, Nov 10, 2016 at 10:25 AM, Michael Paquier wrote: > On Wed, Nov 9, 2016 at 11:32 PM, Kuntal Ghosh >> Thanks a lot for reviewing the patch. Based on your review, I've attached the >> I've done a fair amount of testing which includes regression tests >> and manual creation of inconsistencies in the page. I've also found the >> reason behind inconsistency in brin revmap page. >> Brin revmap page doesn't have standard page layout. Besides, It doesn't >> update >> pd_upper and pd_lower in its operations as well. But, during WAL >> insertions, it uses >> REGBUF_STANDARD to register a reference in the WAL record. Hence, when we >> restore image before consistency check, RestoreBlockImage fills the space >> between pd_upper and pd_lower(page hole) with zero. I've posted this as a >> separate thread. >> https://www.postgresql.org/message-id/flat/CAGz5QCJ%3D00UQjScSEFbV%3D0qO5ShTZB9WWz_Fm7%2BWd83zPs9Geg%40mail.gmail.com#CAGz5QCJ=00UQjScSEFbV=0qo5shtzb9wwz_fm7+wd83zps9...@mail.gmail.com > > Nice to have spotted the inconsistency. This tool is going to be useful.. > > I have spent a couple of hours playing with the patch, and worked on > it a bit more with a couple of minor changes: > - In gindesc.c, the if blocks are more readable with brackets. > - Addition of a comment when info is set, to mention that this is done > at the beginning of the routine so as callers of XLogInsert() can pass > the flag for consistency checks. > - apply_image should be reset in ResetDecoder(). > - The BRIN problem is here: > 2016-11-10 12:24:10 JST [65776]: [23-1] db=,user=,app=,client= FATAL: > Inconsistent page found, rel 1663/16385/30625, forknum 0, blkno 1 > 2016-11-10 12:24:10 JST [65776]: [24-1] db=,user=,app=,client= > CONTEXT: xlog redo at 0/9BD31148 for BRIN/UPDATE+INIT: heapBlk 100 > pagesPerRange 1 old offnum 11, new offnum 1 > 2016-11-10 12:24:10 JST [65776]: [25-1] db=,user=,app=,client= > WARNING: buffer refcount leak: [4540] (rel=base/16385/30625, > blockNum=1, flags=0x9380, refcount=1 1) > TRAP: FailedAssertion("!(RefCountErrors == 0)", File: "bufmgr.c", Line: 2506) > Now the buffer refcount leak is not normal! The safest thing to do > here is to release the buffer once a copy of it has been taken, and > the leaks goes away when calling FATAL to report the inconsistency. > - When checking for XLR_CHECK_CONSISTENCY, better to add a != 0 to get > a boolean out of it. > - guc_malloc() should be done as late as possible, this simplifies the > code and prevents any memory leaks which is what your patch is doing > when there is an error. (I have finally put my finger on what was > itching me here). > - In btree_mask, the lookup of BTP_DELETED can be deadly simplified. > - I wondered also about putting assign_wal_consistency and > check_wal_consistency in a separate file, say xlogparams.c, concluding > that the current code does nothing bad either even if it adds rmgr.h > in the list of headers in guc.c. > - Some comment blocks are longer than 72~80 characters. > - Typos! All the changes make perfect sense to me. > With the patch for BRIN applied, I am able to get installcheck-world > working with wal_consistency = all and a standby doing the consistency > checks behind. Adding wal_consistency = all in PostgresNode.pm, the > recovery tests are passing. This patch is switched as "Ready for > Committer". Thanks for completing this effort begun 3 years ago! Thanks to you for reviewing all the patches in so much detail. Amit, Robert and Dilip also helped me a lot in developing the feature. Thanks to them as well. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Wed, Nov 9, 2016 at 11:32 PM, Kuntal Ghosh wrote: > On Fri, Nov 4, 2016 at 1:32 PM, Michael Paquier > wrote: >> Thank you for the new patch. This will be hopefully the last round of >> reviews, we are getting close to something that has an acceptable >> shape. > > Thanks a lot for reviewing the patch. Based on your review, I've attached the > updated patch along with few comments. Thanks for the new version. pg_xlogdump is really helpful now for debugging. >> I haven't performed any tests with the patch, and that's all I have >> regarding the code. With that done we should be in good shape >> code-speaking I think... > > I've done a fair amount of testing which includes regression tests > and manual creation of inconsistencies in the page. I've also found the > reason behind inconsistency in brin revmap page. > Brin revmap page doesn't have standard page layout. Besides, It doesn't update > pd_upper and pd_lower in its operations as well. But, during WAL > insertions, it uses > REGBUF_STANDARD to register a reference in the WAL record. Hence, when we > restore image before consistency check, RestoreBlockImage fills the space > between pd_upper and pd_lower(page hole) with zero. I've posted this as a > separate thread. > https://www.postgresql.org/message-id/flat/CAGz5QCJ%3D00UQjScSEFbV%3D0qO5ShTZB9WWz_Fm7%2BWd83zPs9Geg%40mail.gmail.com#CAGz5QCJ=00UQjScSEFbV=0qo5shtzb9wwz_fm7+wd83zps9...@mail.gmail.com Nice to have spotted the inconsistency. This tool is going to be useful.. I have spent a couple of hours playing with the patch, and worked on it a bit more with a couple of minor changes: - In gindesc.c, the if blocks are more readable with brackets. - Addition of a comment when info is set, to mention that this is done at the beginning of the routine so as callers of XLogInsert() can pass the flag for consistency checks. - apply_image should be reset in ResetDecoder(). - The BRIN problem is here: 2016-11-10 12:24:10 JST [65776]: [23-1] db=,user=,app=,client= FATAL: Inconsistent page found, rel 1663/16385/30625, forknum 0, blkno 1 2016-11-10 12:24:10 JST [65776]: [24-1] db=,user=,app=,client= CONTEXT: xlog redo at 0/9BD31148 for BRIN/UPDATE+INIT: heapBlk 100 pagesPerRange 1 old offnum 11, new offnum 1 2016-11-10 12:24:10 JST [65776]: [25-1] db=,user=,app=,client= WARNING: buffer refcount leak: [4540] (rel=base/16385/30625, blockNum=1, flags=0x9380, refcount=1 1) TRAP: FailedAssertion("!(RefCountErrors == 0)", File: "bufmgr.c", Line: 2506) Now the buffer refcount leak is not normal! The safest thing to do here is to release the buffer once a copy of it has been taken, and the leaks goes away when calling FATAL to report the inconsistency. - When checking for XLR_CHECK_CONSISTENCY, better to add a != 0 to get a boolean out of it. - guc_malloc() should be done as late as possible, this simplifies the code and prevents any memory leaks which is what your patch is doing when there is an error. (I have finally put my finger on what was itching me here). - In btree_mask, the lookup of BTP_DELETED can be deadly simplified. - I wondered also about putting assign_wal_consistency and check_wal_consistency in a separate file, say xlogparams.c, concluding that the current code does nothing bad either even if it adds rmgr.h in the list of headers in guc.c. - Some comment blocks are longer than 72~80 characters. - Typos! With the patch for BRIN applied, I am able to get installcheck-world working with wal_consistency = all and a standby doing the consistency checks behind. Adding wal_consistency = all in PostgresNode.pm, the recovery tests are passing. This patch is switched as "Ready for Committer". Thanks for completing this effort begun 3 years ago! -- Michael diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index adab2f8..57660d3 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -2476,6 +2476,35 @@ include_dir 'conf.d' + + wal_consistency (string) + + wal_consistency configuration parameter + + + + +This parameter is used to check the consistency of WAL records, i.e, +whether the WAL records are inserted and applied correctly. When +wal_consistency is enabled for a WAL record, it +stores a full-page image along with the record. When a full-page image +arrives during redo, it compares against the current page to check whether +both are consistent. + + + +By default, this setting does not contain any value. To check +all records written to the write-ahead log, set this parameter to +all. To check only some records, specify a +comma-separated list of resource managers. The resource managers +which are currently supported are heap2, heap, +btree, gin, gist, +spgist, sequence and brin. Only +superusers can change this setting. + + + + wal_buffers (int
Re: [HACKERS] WAL consistency check facility
On Fri, Nov 4, 2016 at 1:32 PM, Michael Paquier wrote: > Thank you for the new patch. This will be hopefully the last round of > reviews, we are getting close to something that has an acceptable > shape. Thanks a lot for reviewing the patch. Based on your review, I've attached the updated patch along with few comments. > In DecodeXLogRecord@xlogreader.c, please add a boolean flag "apply" > and then please could you do some error checks on it. Only one is > needed: if "apply" is true and has_image is false, xlogreader.c should > complain about an inconsistency! Added a flag named apply_image in DecodedBkpBlock and used it to check whether image apply is required or not. > I would still for the removal of blkno in the list of arguments of the > masking functions. This is used just for speculative inserts, where we > could just enforce the page number to 0 because this does not matter, > as Peter has mentioned upthread. It just doesn't feel right to me to enforce the number manually when I can use the blkno without any harm. > I haven't performed any tests with the patch, and that's all I have > regarding the code. With that done we should be in good shape > code-speaking I think... I've done a fair amount of testing which includes regression tests and manual creation of inconsistencies in the page. I've also found the reason behind inconsistency in brin revmap page. Brin revmap page doesn't have standard page layout. Besides, It doesn't update pd_upper and pd_lower in its operations as well. But, during WAL insertions, it uses REGBUF_STANDARD to register a reference in the WAL record. Hence, when we restore image before consistency check, RestoreBlockImage fills the space between pd_upper and pd_lower(page hole) with zero. I've posted this as a separate thread. https://www.postgresql.org/message-id/flat/CAGz5QCJ%3D00UQjScSEFbV%3D0qO5ShTZB9WWz_Fm7%2BWd83zPs9Geg%40mail.gmail.com#CAGz5QCJ=00UQjScSEFbV=0qo5shtzb9wwz_fm7+wd83zps9...@mail.gmail.com -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com walconsistency_v13.patch Description: application/download -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Fri, Nov 4, 2016 at 6:02 PM, Michael Paquier wrote: > On Fri, Nov 4, 2016 at 5:02 PM, Michael Paquier > wrote: >> On Thu, Nov 3, 2016 at 11:17 PM, Kuntal Ghosh >> wrote: >>> I've updated the patch for review. >> >> Thank you for the new patch. This will be hopefully the last round of >> reviews, we are getting close to something that has an acceptable >> shape. > > One last thing: in XLogRecordAssemble(), could you enforce the value > of info at the beginning of the routine when wal_consistency[rmid] is > true? And then use the value of info to decide if include_image is > true or not? The idea here is to allow callers of XLogInsert() to pass > by themselves XLR_CHECK_CONSISTENCY and still have consistency checks > enabled for a given record even if wal_consistency is false for the > rmgr of the record happening. This would be potentially useful for > extension and feature developers when debugging some stuff, for some > builds compiled with a DEBUG flag, or whatever. And you need to rebase the patch, d5f6f13 has fixed the handling of xl_info with XLR_INFO_MASK. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Fri, Nov 4, 2016 at 5:02 PM, Michael Paquier wrote: > On Thu, Nov 3, 2016 at 11:17 PM, Kuntal Ghosh > wrote: >> I've updated the patch for review. > > Thank you for the new patch. This will be hopefully the last round of > reviews, we are getting close to something that has an acceptable > shape. One last thing: in XLogRecordAssemble(), could you enforce the value of info at the beginning of the routine when wal_consistency[rmid] is true? And then use the value of info to decide if include_image is true or not? The idea here is to allow callers of XLogInsert() to pass by themselves XLR_CHECK_CONSISTENCY and still have consistency checks enabled for a given record even if wal_consistency is false for the rmgr of the record happening. This would be potentially useful for extension and feature developers when debugging some stuff, for some builds compiled with a DEBUG flag, or whatever. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Thu, Nov 3, 2016 at 11:17 PM, Kuntal Ghosh wrote: > I've updated the patch for review. Thank you for the new patch. This will be hopefully the last round of reviews, we are getting close to something that has an acceptable shape. + + + + + + Did you try to compile the docs? Because that will break. (Likely my fault). What needs to be done is removing one and one markup. +/*- + * + * bufmask.h + * Buffer masking definitions. + * + * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * src/include/storage/bufmask.h + */ We could likely survive here with just a copyright mention as 2016, PGDG... Same remark for bufmask.c. --- a/src/backend/access/hash/hash.c +++ b/src/backend/access/hash/hash.c @@ -25,6 +25,7 @@ #include "commands/vacuum.h" #include "miscadmin.h" #include "optimizer/plancat.h" +#include "storage/bufmask.h" #include "utils/index_selfuncs.h" #include "utils/rel.h" This header declaration is not necessary. + /* +* Mask the Page LSN. Because, we store the page before updating the LSN. +* Hence, LSNs of both pages will always be different. +*/ + mask_page_lsn(page_norm); I don't fully understand this comment if phrased this way. Well, I do understand it, but people who would read this code for the first time may have a hard time understanding it. So I would suggest removing it, but add a comment on top of mask_page_lsn() to mention that in consistency checks the LSN of the two pages compared will likely be different because of concurrent operations when the WAL is generated and the state of the page where WAL is applied. + maskopaq = (BTPageOpaque) + (((char *) page_norm) + ((PageHeader) page_norm)->pd_special); + /* +* Mask everything on a DELETED page. +*/ Let's make the code breath and add a space here. +/* Aligned Buffers dedicated to consistency checks of size BLCKSZ */ +static char *new_page_masked = NULL; +static char *old_page_masked = NULL; palloc'd buffers are aligned, so you could just remove the work "Aligned" in this comment? + /* If we've just restored the block from backup image, skip consistency check. */ + if (XLogRecHasBlockImage(record, block_id) && + XLogRecBlockImageApply(record, block_id)) + continue; Here you could just check for Apply() to decide if continue should be called or not, and Assert afterwards on HasBlockImage(). The assertion would help checking for inconsistency errors. @@ -7810,6 +7929,7 @@ ReadCheckpointRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, } record = ReadRecord(xlogreader, RecPtr, LOG, true); + info = record->xl_info & ~XLR_INFO_MASK; if (record == NULL) { @@ -7852,8 +7972,8 @@ ReadCheckpointRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, } return NULL; } - if (record->xl_info != XLOG_CHECKPOINT_SHUTDOWN && - record->xl_info != XLOG_CHECKPOINT_ONLINE) + if (info != XLOG_CHECKPOINT_SHUTDOWN && + info != XLOG_CHECKPOINT_ONLINE) Those changes are not directly related to this patch, but make sure that record checks are done correctly or this patch would just fail. It may be better to apply those changes independently first per the patch on this thread: https://www.postgresql.org/message-id/CAB7nPqSWVyaZJg7_amRKVqRpEP=_=54e+762+2pf9u3q3+z...@mail.gmail.com My recommendation is to do so. + /* +* Remember that, if WAL consistency check is enabled for the current rmid, +* we always include backup image with the WAL record. But, during redo we +* restore the backup block only if needs_backup is set. +*/ This could be rewritten a bit: "if WAL consistency is enabled for the resource manager of this WAL record, a full-page image is included in the record for the block modified. During redo, the full-page is replayed only of BKPIMAGE_APPLY is set." - * In RBM_ZERO_* modes, if the page doesn't exist, the relation is extended - * with all-zeroes pages up to the referenced block number. In - * RBM_ZERO_AND_LOCK and RBM_ZERO_AND_CLEANUP_LOCK modes, the return value + * In RBM_ZERO_* modes, if BKPIMAGE_APPLY flag is not set for the backup block, + * the relation is extended with all-zeroes pages up to the referenced block number. + * In RBM_ZERO_AND_LOCK and RBM_ZERO_AND_CLEANUP_LOCK modes, the return value * is always BLK_NEEDS_REDO You are forgetting to mention "if the page does not exist" in the new comment block. + /* If it has a full-page image and it should be restored, do it. */ + if (XLogRecHasBlockImage(record, block_id) && XLogRecBlockImageApply(record, block_id)) { Perhaps on two lines? The headers of the functions in bufmask.c could be more descriptive, there should be explanations regarding in which aspect they are useful t
Re: [HACKERS] WAL consistency check facility
On Fri, Nov 4, 2016 at 4:16 AM, Kuntal Ghosh wrote: > On Thu, Nov 3, 2016 at 8:24 PM, Robert Haas wrote: >> On Wed, Nov 2, 2016 at 10:30 AM, Kuntal Ghosh >> wrote: >>> - Another suggestion was to remove wal_consistency from PostgresNode.pm >>> because small buildfarm machines may suffer on it. Although I've no >>> experience in this matter, I would like to be certain that nothings breaks >>> in recovery tests after some modifications. >> >> I think running the whole test suite with this enabled is going to >> provoke complaints from buildfarm owners. That's too bad, because I >> agree with you that it would be nice to have the test coverage, but it >> seems that many of the buildfarm machines are VMs with very minimal >> resource allocations -- or very old physical machines -- or running >> with settings like CLOBBER_CACHE_ALWAYS that make runs very slow. If >> you blow on them too hard, they fall over. Count me in. My RPIs won't like that! Actually I have a couple of things internally mimicking the buildfarm client code on machines with far higher capacity. And FWIW I am definitely going to enable this option in the test suite, finishing with reports here. > Thanks Robert. I got your point. Then, as Michael has suggested, it is nice to > have some environment variable to pass optional conf parameters during > tap-tests. > Implementing this feature actually solves the problem. We just need make PostgresNode.pm aware of something like PGTAPOPTIONS to enforce a server started by TAP tests to append options to it. There is already PGCTLTIMEOUT that behaves similarly. Even if this brings extra load to buildfarm owners, that will limit complaints. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Thu, Nov 3, 2016 at 8:24 PM, Robert Haas wrote: > On Wed, Nov 2, 2016 at 10:30 AM, Kuntal Ghosh > wrote: >> - Another suggestion was to remove wal_consistency from PostgresNode.pm >> because small buildfarm machines may suffer on it. Although I've no >> experience in this matter, I would like to be certain that nothings breaks >> in recovery tests after some modifications. > > I think running the whole test suite with this enabled is going to > provoke complaints from buildfarm owners. That's too bad, because I > agree with you that it would be nice to have the test coverage, but it > seems that many of the buildfarm machines are VMs with very minimal > resource allocations -- or very old physical machines -- or running > with settings like CLOBBER_CACHE_ALWAYS that make runs very slow. If > you blow on them too hard, they fall over. > Thanks Robert. I got your point. Then, as Michael has suggested, it is nice to have some environment variable to pass optional conf parameters during tap-tests. Implementing this feature actually solves the problem. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Wed, Nov 2, 2016 at 10:30 AM, Kuntal Ghosh wrote: > - Another suggestion was to remove wal_consistency from PostgresNode.pm > because small buildfarm machines may suffer on it. Although I've no > experience in this matter, I would like to be certain that nothings breaks > in recovery tests after some modifications. I think running the whole test suite with this enabled is going to provoke complaints from buildfarm owners. That's too bad, because I agree with you that it would be nice to have the test coverage, but it seems that many of the buildfarm machines are VMs with very minimal resource allocations -- or very old physical machines -- or running with settings like CLOBBER_CACHE_ALWAYS that make runs very slow. If you blow on them too hard, they fall over. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Thu, Nov 3, 2016 at 7:47 PM, Kuntal Ghosh wrote: > I've updated the patch for review. > If an inconsistency is found, it'll just log it for now. Once, the patch is finalized, we can change it to FATAL as before. I was making sure that all regression tests should pass with the patch. It seems that there is some inconsistency in regression tests for BRIN index. LOG: Inconsistent page found, rel 1663/16384/30607, forknum 0, blkno 1 CONTEXT: xlog redo at 0/9BAE08C8 for BRIN/UPDATE+INIT: heapBlk 100 pagesPerRange 1 old offnum 11, new offnum 1 -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
I've updated the patch for review. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com walconsistency_v12.patch Description: application/download -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Thu, Nov 3, 2016 at 6:48 PM, Kuntal Ghosh wrote: > So, whenever we are required to use bimg_info flag, we should make > sure that has_image > is set. OK, we are taking past each other here. There are two possible patterns: - has_image is set, not apply, meaning that the image block is used for consistency checks. - has_image is set, as well as apply, meaning that the block needs to be applied at redo. So I mean exactly the same thing as you do. The point I am trying to raise is that it would be meaningful to put in some code paths checks of the type (apply && !has_image) and ERROR on them. Perhaps we could just do that in xlogreader.c though. If having those checks external to xlogreader.c makes sense, then using separate macros is more portable. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Thu, Nov 3, 2016 at 2:52 PM, Michael Paquier wrote: > On Thu, Nov 3, 2016 at 6:15 PM, Kuntal Ghosh > wrote: >> Actually, I just verified that bimg_info is not even valid if >> has_image is not set. >> In DecodeXLogRecord, we initialize bimg_info only when has_image flag >> is set. So, keeping them >> separate doesn't look a good approach to me. If we keep them separate, >> the output >> of the following assert is undefined: >> Assert(XLogRecHasBlockImage(record, block_id) || >> !XLogRecBlockImageApply(record, block_id)). >> >> Thoughts?? > > Yes, that's exactly the reason why we should keep both macros as > checking for separate things: apply implies that has_image is set and > that's normal, hence we could use sanity checks by just using those > macros and not propagating xlogreader.h. > No, apply doesn't mean has_image is set. If has_image is not set, apply/bimg_info is invalid(/undefined) and we should not use that. For example, in XLogDumpDisplayRecord we use bimg_info as following, if (XLogRecHasBlockImage(record, block_id)) { if (record->blocks[block_id].bimg_info & BKPIMAGE_IS_COMPRESSED) { } } So, whenever we are required to use bimg_info flag, we should make sure that has_image is set. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Thu, Nov 3, 2016 at 6:15 PM, Kuntal Ghosh wrote: > Actually, I just verified that bimg_info is not even valid if > has_image is not set. > In DecodeXLogRecord, we initialize bimg_info only when has_image flag > is set. So, keeping them > separate doesn't look a good approach to me. If we keep them separate, > the output > of the following assert is undefined: > Assert(XLogRecHasBlockImage(record, block_id) || > !XLogRecBlockImageApply(record, block_id)). > > Thoughts?? Yes, that's exactly the reason why we should keep both macros as checking for separate things: apply implies that has_image is set and that's normal, hence we could use sanity checks by just using those macros and not propagating xlogreader.h. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Thu, Nov 3, 2016 at 5:56 PM, Kuntal Ghosh wrote: > I'm not getting why we should introduce a new redo action and return > from the function beforehand. Per my last email, same conclusion from here :) Sorry if I am picky and noisy on many points, I am trying to think about the value of each change introduced in this patch, particularly if they are meaningful, can be improved in some way, or can be simplified and make the code more simple. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Thu, Nov 3, 2016 at 2:27 PM, Michael Paquier wrote: > On Thu, Nov 3, 2016 at 4:04 PM, Michael Paquier > wrote: >> Wouldn't the definition of a new redo action make sense then? Say >> SKIPPED. None of the existing actions match the non-apply case. > > I just took 5 minutes to look at the code and reason about it, and > something like what your patch is doing would be actually fine. Still > I don't think that checking for the apply flag in the macro routine > should look for has_image. Let's keep things separate. Actually, I just verified that bimg_info is not even valid if has_image is not set. In DecodeXLogRecord, we initialize bimg_info only when has_image flag is set. So, keeping them separate doesn't look a good approach to me. If we keep them separate, the output of the following assert is undefined: Assert(XLogRecHasBlockImage(record, block_id) || !XLogRecBlockImageApply(record, block_id)). Thoughts?? -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Thu, Nov 3, 2016 at 4:04 PM, Michael Paquier wrote: > Wouldn't the definition of a new redo action make sense then? Say > SKIPPED. None of the existing actions match the non-apply case. I just took 5 minutes to look at the code and reason about it, and something like what your patch is doing would be actually fine. Still I don't think that checking for the apply flag in the macro routine should look for has_image. Let's keep things separate. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Thu, Nov 3, 2016 at 12:34 PM, Michael Paquier wrote: > On Thu, Nov 3, 2016 at 3:24 PM, Kuntal Ghosh > wrote: >> On Thu, Nov 3, 2016 at 2:35 AM, Michael Paquier >>> -/* If it's a full-page image, restore it. */ >>> -if (XLogRecHasBlockImage(record, block_id)) >>> +/* If full-page image should be restored, do it. */ >>> +if (XLogRecBlockImageApply(record, block_id)) >>> Hm. It seems to me that this modification is incorrect. If the page >>> does not need to be applied, aka if it needs to be used for >>> consistency checks, what should be done is more something like the >>> following in XLogReadBufferForRedoExtended: >>> if (Apply(record, block_id)) >>> return; >>> if (HasImage) >>> { >>> //do what needs to be done with an image >>> } >>> else >>> { >>> //do default things >>> } >>> >> XLogReadBufferForRedoExtended should return a redo action >> (block restored, done, block needs redo or block not found). So, we >> can't just return >> from the function if BLKIMAGE_APPLY flag is not set. It still has to >> check whether a >> redo is required or not. > > Wouldn't the definition of a new redo action make sense then? Say > SKIPPED. None of the existing actions match the non-apply case. As per my understanding, XLogReadBufferForRedoExtended works as follows: 1. If wal record has backup block 2. { 3. restore the backup block; 4. return BLK_RESTORED; 5. } 6. else 7. { 8. If block found in buffer 10. If lsn of block is less than last replayed record 11. return BLK_DONE; 12. else 13. return BLK_NEEDS_REDO; 14. else 15. return BLK_NOT_FOUND; 16. } Now, we can just change step 1 as follows: 1. If wal record has backup block and it needs to be restored. I'm not getting why we should introduce a new redo action and return from the function beforehand. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Thu, Nov 3, 2016 at 3:24 PM, Kuntal Ghosh wrote: > On Thu, Nov 3, 2016 at 2:35 AM, Michael Paquier > wrote: >>> - Another suggestion was to remove wal_consistency from PostgresNode.pm >>> because small buildfarm machines may suffer on it. Although I've no >>> experience in this matter, I would like to be certain that nothings breaks >>> in recovery tests after some modifications. >> >> An extra idea that I have here would be to extend the TAP tests to >> accept an environment variable that would be used to specify extra >> options when starting Postgres instances. Buildfarm machines could use >> it. >> > It can be added as a separate feature. > >> >> -/* If it's a full-page image, restore it. */ >> -if (XLogRecHasBlockImage(record, block_id)) >> +/* If full-page image should be restored, do it. */ >> +if (XLogRecBlockImageApply(record, block_id)) >> Hm. It seems to me that this modification is incorrect. If the page >> does not need to be applied, aka if it needs to be used for >> consistency checks, what should be done is more something like the >> following in XLogReadBufferForRedoExtended: >> if (Apply(record, block_id)) >> return; >> if (HasImage) >> { >> //do what needs to be done with an image >> } >> else >> { >> //do default things >> } >> > XLogReadBufferForRedoExtended should return a redo action > (block restored, done, block needs redo or block not found). So, we > can't just return > from the function if BLKIMAGE_APPLY flag is not set. It still has to > check whether a > redo is required or not. Wouldn't the definition of a new redo action make sense then? Say SKIPPED. None of the existing actions match the non-apply case. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Thu, Nov 3, 2016 at 2:35 AM, Michael Paquier wrote: >> - Another suggestion was to remove wal_consistency from PostgresNode.pm >> because small buildfarm machines may suffer on it. Although I've no >> experience in this matter, I would like to be certain that nothings breaks >> in recovery tests after some modifications. > > An extra idea that I have here would be to extend the TAP tests to > accept an environment variable that would be used to specify extra > options when starting Postgres instances. Buildfarm machines could use > it. > It can be added as a separate feature. > > -/* If it's a full-page image, restore it. */ > -if (XLogRecHasBlockImage(record, block_id)) > +/* If full-page image should be restored, do it. */ > +if (XLogRecBlockImageApply(record, block_id)) > Hm. It seems to me that this modification is incorrect. If the page > does not need to be applied, aka if it needs to be used for > consistency checks, what should be done is more something like the > following in XLogReadBufferForRedoExtended: > if (Apply(record, block_id)) > return; > if (HasImage) > { > //do what needs to be done with an image > } > else > { > //do default things > } > XLogReadBufferForRedoExtended should return a redo action (block restored, done, block needs redo or block not found). So, we can't just return from the function if BLKIMAGE_APPLY flag is not set. It still has to check whether a redo is required or not. > XLogRecBlockImageApply() should only check for BKP_APPLY and not imply > HasImage(). This will be more flexible when for example using it for > assertions. seems reasonable. > It would be more consistent to have a boolean flag to treat > BKPIMAGE_APPLY in xlogreader.c. Have a look at has_image for example. For flags in bimg_info, we directly check if the mask bit is set in bimg_info (ex: hole, compressed). Besides, we use this flag only at XLogReadBufferForRedoExtended. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Wed, Nov 2, 2016 at 11:30 PM, Kuntal Ghosh wrote: > On Wed, Nov 2, 2016 at 1:11 PM, Kuntal Ghosh > wrote: >> Rest of the suggestions are well-taken. I'll update the patch accordingly. > I've updated the last submitted patch(v10) with the following changes: > - added a block level flag BKPIMAGE_APPLY to distinguish backup image > blocks which needs to be restored during replay. > - at present, hash index operations are not WAL-logged. Hence, I've removed > the consistency check option for hash indices. It can be added later. Both make sense. > - Another suggestion was to remove wal_consistency from PostgresNode.pm > because small buildfarm machines may suffer on it. Although I've no > experience in this matter, I would like to be certain that nothings breaks > in recovery tests after some modifications. An extra idea that I have here would be to extend the TAP tests to accept an environment variable that would be used to specify extra options when starting Postgres instances. Buildfarm machines could use it. +/* + * Remember that, if WAL consistency check is enabled for the current rmid, + * we always include backup image with the WAL record. But, during redo we + * restore the backup block only if needs_backup is set. + */ +if (needs_backup) +bimg.bimg_info |= BKPIMAGE_APPLY; + + You should be careful about extra newlines and noise in the code. -/* If it's a full-page image, restore it. */ -if (XLogRecHasBlockImage(record, block_id)) +/* If full-page image should be restored, do it. */ +if (XLogRecBlockImageApply(record, block_id)) Hm. It seems to me that this modification is incorrect. If the page does not need to be applied, aka if it needs to be used for consistency checks, what should be done is more something like the following in XLogReadBufferForRedoExtended: if (Apply(record, block_id)) return; if (HasImage) { //do what needs to be done with an image } else { //do default things } XLogRecBlockImageApply() should only check for BKP_APPLY and not imply HasImage(). This will be more flexible when for example using it for assertions. With this patch the comments on top of XLogReadBufferForRedo are wrong. A block is not unconditionally applied. +#define XLogRecBlockImageApply(decoder, block_id) \ +(XLogRecHasBlockImage(decoder, block_id) && \ +(((decoder)->blocks[block_id].bimg_info & BKPIMAGE_APPLY) > 0)) Maybe != 0? That's the most common practice in the code. It would be more consistent to have a boolean flag to treat BKPIMAGE_APPLY in xlogreader.c. Have a look at has_image for example. This will as well reduce dependencies on the header xlog_record.h +/* + * For a speculative tuple, the content of t_ctid is conflicting + * between the backup page and current page. Hence, we set it + * to the current block number and current offset. + */ +if (HeapTupleHeaderIsSpeculative(page_htup)) +ItemPointerSet(&page_htup->t_ctid, blkno, off); In the set of masking functions this is the only portion of the code depending on blkno. But isn't that actually a bug in speculative inserts? Andres (added now in CC), Peter, could you provide some input regarding that? The masking functions should not prevent the detection of future errors, and this code is making me uneasy. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Wed, Nov 2, 2016 at 1:11 PM, Kuntal Ghosh wrote: > Rest of the suggestions are well-taken. I'll update the patch accordingly. I've updated the last submitted patch(v10) with the following changes: - added a block level flag BKPIMAGE_APPLY to distinguish backup image blocks which needs to be restored during replay. - at present, hash index operations are not WAL-logged. Hence, I've removed the consistency check option for hash indices. It can be added later. Few comments: - Michael suggested to use an integer variable and bitwise-shift operation to store the RMGR values instead of using a boolean array. But, boolean array implementation looks cleaner to me. For example, +if (wal_consistency[rmid]) + rechdr->xl_info |= XLR_CHECK_CONSISTENCY; +include_image = needs_backup || wal_consistency[rmid]; - Another suggestion was to remove wal_consistency from PostgresNode.pm because small buildfarm machines may suffer on it. Although I've no experience in this matter, I would like to be certain that nothings breaks in recovery tests after some modifications. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com walconsistency_v11.patch Description: application/download -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Wed, Nov 2, 2016 at 1:34 PM, Michael Paquier wrote: > Hm... Right. That was broken. And actually, while the record-level > flag is useful so as you don't need to rely on checking > wal_consistency when doing WAL redo, the block-level flag is useful to > make a distinction between blocks that have to be replayed and the > ones that are used only for consistency, and both types could be mixed > in a record. Using it in bimg_info would be fine... Perhaps a better > name for the flag would be something like BKPIMAGE_APPLY, to mean that > the FPW needs to be applied at redo. Or BKPIMAGE_IGNORE, to bypass it > when replaying it. IS_REQUIRED_FOR_REDO is quite confusing. BKPIMAGE_APPLY seems reasonable. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Wed, Nov 2, 2016 at 4:41 PM, Kuntal Ghosh wrote: > On Wed, Nov 2, 2016 at 10:23 AM, Michael Paquier > wrote: >> On Tue, Nov 1, 2016 at 10:31 PM, Robert Haas wrote: >>> IMHO, your rewrite of this patch was a bit heavy-handed. >> >> OK... Sorry for that. >> >>> I haven't >>> scrutinized the code here so maybe it was a big improvement, and if so >>> fine, but if not it's better to collaborate with the author than to >>> take over. >> >> While reviewing the code, that has finished by being a large rewrite, >> and that was more understandable than a review looking at all the >> small tweaks and things I have been through while reading it. I have >> also experimented a couple of ideas with the patch that I added, so at >> the end it proves to be a gain for everybody. I think that the last >> patch is an improvement, if you want to make your own opinion on the >> matter looking at the differences between both patches would be the >> most direct way to go. >> > If my understanding is correct regarding this feature, last two patches > completely break the fundamental idea of wal consistency check feature. > I mentioned this in my last reply as well that we've to use some flag > to indicate > whether an image should be restored during replay or not. Otherwise, > XLogReadBufferForRedoExtended will always restore the image skipping the usual > redo operation. What's happening now is the following: > 1. If wal_consistency is on, include backup block image with the wal record. > 2. During replay, XLogReadBufferForRedoExtended always restores the backup > block > image in local buffer since XLogRecHasBlockImage is true for each block. > 3. In checkConsistency, you compare the local buffer with the backup block > image > from the wal record. It'll always be consistent. > This feature aims to validate whether wal replay operation is > happening correctly or not. > To achieve that aim, we should not alter the wal replay operation itself. Hm... Right. That was broken. And actually, while the record-level flag is useful so as you don't need to rely on checking wal_consistency when doing WAL redo, the block-level flag is useful to make a distinction between blocks that have to be replayed and the ones that are used only for consistency, and both types could be mixed in a record. Using it in bimg_info would be fine... Perhaps a better name for the flag would be something like BKPIMAGE_APPLY, to mean that the FPW needs to be applied at redo. Or BKPIMAGE_IGNORE, to bypass it when replaying it. IS_REQUIRED_FOR_REDO is quite confusing. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Wed, Nov 2, 2016 at 10:23 AM, Michael Paquier wrote: > On Tue, Nov 1, 2016 at 10:31 PM, Robert Haas wrote: >> IMHO, your rewrite of this patch was a bit heavy-handed. > > OK... Sorry for that. > >> I haven't >> scrutinized the code here so maybe it was a big improvement, and if so >> fine, but if not it's better to collaborate with the author than to >> take over. > > While reviewing the code, that has finished by being a large rewrite, > and that was more understandable than a review looking at all the > small tweaks and things I have been through while reading it. I have > also experimented a couple of ideas with the patch that I added, so at > the end it proves to be a gain for everybody. I think that the last > patch is an improvement, if you want to make your own opinion on the > matter looking at the differences between both patches would be the > most direct way to go. > If my understanding is correct regarding this feature, last two patches completely break the fundamental idea of wal consistency check feature. I mentioned this in my last reply as well that we've to use some flag to indicate whether an image should be restored during replay or not. Otherwise, XLogReadBufferForRedoExtended will always restore the image skipping the usual redo operation. What's happening now is the following: 1. If wal_consistency is on, include backup block image with the wal record. 2. During replay, XLogReadBufferForRedoExtended always restores the backup block image in local buffer since XLogRecHasBlockImage is true for each block. 3. In checkConsistency, you compare the local buffer with the backup block image from the wal record. It'll always be consistent. This feature aims to validate whether wal replay operation is happening correctly or not. To achieve that aim, we should not alter the wal replay operation itself. Rest of the suggestions are well-taken. I'll update the patch accordingly. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Tue, Nov 1, 2016 at 10:31 PM, Robert Haas wrote: > IMHO, your rewrite of this patch was a bit heavy-handed. OK... Sorry for that. > I haven't > scrutinized the code here so maybe it was a big improvement, and if so > fine, but if not it's better to collaborate with the author than to > take over. While reviewing the code, that has finished by being a large rewrite, and that was more understandable than a review looking at all the small tweaks and things I have been through while reading it. I have also experimented a couple of ideas with the patch that I added, so at the end it proves to be a gain for everybody. I think that the last patch is an improvement, if you want to make your own opinion on the matter looking at the differences between both patches would be the most direct way to go. > In any case, yeah, I think you should put that back. Here you go with this parameter back and the allocation of the masked buffers done beforehand, close to the moment the XLogReader is allocated actually. I have also removed wal_consistency from PostgresNode.pm, small buildfarm machines would really suffer on it, and hamster is very good to track race conditions when running TAP tests. On top of that I have replaced a bunch of 0xF thingies by their PG_UINT_MAX equivalents to keep things cleaner. Now, I have put back the GUC-related code exactly to the same shape as it was originally. Here are a couple of comments regarding it after review: - Let's drop 'none' as a magic keyword. Users are going to use an empty string, and the default should be defined as such IMO. - Using an allocated array of booleans to store the values of each RMGRs could be replaced by an integer using bitwise shifts. Your option looks better and makes the code cleaner. A more nitpick remark: code comments don't refer much to RMIDs, but they use the term "resource managers" more generally. I'd suggest to do the same. -- Michael walconsistency_v10.patch Description: application/download -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Mon, Oct 31, 2016 at 5:31 AM, Robert Haas wrote: > On Fri, Oct 28, 2016 at 2:05 AM, Michael Paquier > wrote: >> And here we go. Here is a review as well as a large brush-up for this >> patch. A couple of things: >> - wal_consistency is using a list of RMGRs, at the cost of being >> PGC_POSTMASTER. I'd suggest making it PGC_SUSER, and use a boolean (I >> have been thinking hard about that, and still I don't see the point). >> It is rather easy to for example default it to false, and enable it to >> true to check if a certain code path is correctly exercised or not for >> WAL consistency. Note that this simplification reduces the patch size >> by 100~150 lines. I know, I know, I'd expect some complains about >> that > > I don't understand how you can fail to see the point of that. As you > yourself said, this facility generates a ton of WAL. If you're > focusing on one AM, why would you want to be forced to incur the > overhead for every other AM? A good deal has been written about this > upthread already, and just saying "I don't see the point" seems to be > ignoring the explanations already given. +1. I strongly agree. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Mon, Oct 31, 2016 at 5:51 PM, Michael Paquier wrote: > Hehe, I was expecting you to jump on those lines. While looking at the > patch I have simplified it first to focus on the core engine of the > thing. Adding back this code sounds fine to me as there is a wall of > contestation. I offer to do it myself if the effort is the problem. IMHO, your rewrite of this patch was a bit heavy-handed. I haven't scrutinized the code here so maybe it was a big improvement, and if so fine, but if not it's better to collaborate with the author than to take over. In any case, yeah, I think you should put that back. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Mon, Oct 31, 2016 at 9:31 PM, Robert Haas wrote: > On Fri, Oct 28, 2016 at 2:05 AM, Michael Paquier >> - Instead of palloc'ing the old and new pages to compare, it would be >> more performant to keep around two static buffers worth of BLCKSZ and >> just use that. This way there is no need as well to perform any palloc >> calls in the masking functions, limiting the risk of errors (those >> code paths had better avoid errors IMO). It would be also less costly >> to just pass to the masking function a pointer to a buffer of size >> BLCKSZ and just do the masking on it. > > We always palloc buffers like this so that they will be aligned. But > we could arrange not to repeat the palloc every time (see, e.g., > BootstrapXLOG()). Yeah, we could go with that and there is clearly no reason to not do so. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Mon, Oct 31, 2016 at 9:31 PM, Robert Haas wrote: > On Fri, Oct 28, 2016 at 2:05 AM, Michael Paquier > wrote: >> And here we go. Here is a review as well as a large brush-up for this >> patch. A couple of things: >> - wal_consistency is using a list of RMGRs, at the cost of being >> PGC_POSTMASTER. I'd suggest making it PGC_SUSER, and use a boolean (I >> have been thinking hard about that, and still I don't see the point). >> It is rather easy to for example default it to false, and enable it to >> true to check if a certain code path is correctly exercised or not for >> WAL consistency. Note that this simplification reduces the patch size >> by 100~150 lines. I know, I know, I'd expect some complains about >> that > > I don't understand how you can fail to see the point of that. As you > yourself said, this facility generates a ton of WAL. If you're > focusing on one AM, why would you want to be forced to incur the > overhead for every other AM? A good deal has been written about this > upthread already, and just saying "I don't see the point" seems to be > ignoring the explanations already given. Hehe, I was expecting you to jump on those lines. While looking at the patch I have simplified it first to focus on the core engine of the thing. Adding back this code sounds fine to me as there is a wall of contestation. I offer to do it myself if the effort is the problem. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Fri, Oct 28, 2016 at 11:35 AM, Michael Paquier wrote: > And here we go. Here is a review as well as a large brush-up for this > patch. A couple of things: Thanks for reviewing the patch in detail. > - Speaking of which using BKPIMAGE_IS_REQUIRED_FOR_REDO stored in the > block definition is sort of weird because we want to know if > consistency should be checked at a higher level. A full page image can be included in the WAL record because of the following reasons: 1. It needs to be restored during replay. 2. WAL consistency should be checked for the record. 3. Both of above. In your patch, you've included a full page image whenever wal_consistency is true. So, XLogReadBufferForRedoExtended always restores the image and returns BLK_RESTORED, which is unacceptable. We can't change the default WAL replay behaviour. A full image should only be restored if it is necessary to do so. Although, I agree that BKPIMAGE_IS_REQUIRED_FOR_REDO doesn't look a clean way to implement this feature. > - wal_consistency is using a list of RMGRs, at the cost of being > PGC_POSTMASTER. I'd suggest making it PGC_SUSER, and use a boolean (I > have been thinking hard about that, and still I don't see the point). > It is rather easy to for example default it to false, and enable it to > true to check if a certain code path is correctly exercised or not for > WAL consistency. Note that this simplification reduces the patch size > by 100~150 lines. I know, I know, I'd expect some complains about > that As Robert also told, if I'm focusing on a single AM, I really don't want to store full images and perform consistency check for other AMs. > On top of that, I have done a fair amount of testing, creating > manually some inconsistencies in the REDO routines to trigger failures > on standbys. And that was sort of fun to break things intentionally. I know the feeling. :) -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Fri, Oct 28, 2016 at 2:05 AM, Michael Paquier wrote: > And here we go. Here is a review as well as a large brush-up for this > patch. A couple of things: > - wal_consistency is using a list of RMGRs, at the cost of being > PGC_POSTMASTER. I'd suggest making it PGC_SUSER, and use a boolean (I > have been thinking hard about that, and still I don't see the point). > It is rather easy to for example default it to false, and enable it to > true to check if a certain code path is correctly exercised or not for > WAL consistency. Note that this simplification reduces the patch size > by 100~150 lines. I know, I know, I'd expect some complains about > that I don't understand how you can fail to see the point of that. As you yourself said, this facility generates a ton of WAL. If you're focusing on one AM, why would you want to be forced to incur the overhead for every other AM? A good deal has been written about this upthread already, and just saying "I don't see the point" seems to be ignoring the explanations already given. > - Looking for wal_consistency at replay has no real value. What if on > a standby the parameter value is inconsistent than the one on the > master? This logic adds a whole new level of complications and > potential bugs. So instead my suggestion is to add a marker at WAL > record level to check if this record should be checked for consistency > at replay or not. Agreed. > This is also quite flexible if you think about it, > the standby is made independent of the WAL generated on the master and > just applies, or checks what it sees is fit checking for. +1. > The best > match here is to add a flag for XLogRecord->xl_info and make use of > one of the low 4 bits and only one is used now for > XLR_SPECIAL_REL_UPDATE. Seems reasonable. > - in maskPage, the new rmgr routine, there is no need for the info and > blkno arguments. info is not used at all to begin with. blkno is used > for gin pages to detect meta pages but this can be guessed using the > opaque pointer. For heap pages and speculative inserts, masking the > blkno would be fine. That's not worth it. Passing the blkno doesn't cost anything. If it avoids guessing, that's entirely worth it. > - Instead of palloc'ing the old and new pages to compare, it would be > more performant to keep around two static buffers worth of BLCKSZ and > just use that. This way there is no need as well to perform any palloc > calls in the masking functions, limiting the risk of errors (those > code paths had better avoid errors IMO). It would be also less costly > to just pass to the masking function a pointer to a buffer of size > BLCKSZ and just do the masking on it. We always palloc buffers like this so that they will be aligned. But we could arrange not to repeat the palloc every time (see, e.g., BootstrapXLOG()). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Thu, Oct 27, 2016 at 5:08 PM, Michael Paquier wrote: > On Thu, Sep 29, 2016 at 12:49 PM, Michael Paquier > wrote: >> Seeing nothing happening, I have moved the patch to next CF as there >> is a new version, but no reviews for it. > > Just a note for anybody potentially looking at this patch. I am > currently looking at it in depth, and will post a new version of the > patch in a couple of days with review comments. Thanks. And here we go. Here is a review as well as a large brush-up for this patch. A couple of things: - wal_consistency is using a list of RMGRs, at the cost of being PGC_POSTMASTER. I'd suggest making it PGC_SUSER, and use a boolean (I have been thinking hard about that, and still I don't see the point). It is rather easy to for example default it to false, and enable it to true to check if a certain code path is correctly exercised or not for WAL consistency. Note that this simplification reduces the patch size by 100~150 lines. I know, I know, I'd expect some complains about that - Looking for wal_consistency at replay has no real value. What if on a standby the parameter value is inconsistent than the one on the master? This logic adds a whole new level of complications and potential bugs. So instead my suggestion is to add a marker at WAL record level to check if this record should be checked for consistency at replay or not. This is also quite flexible if you think about it, the standby is made independent of the WAL generated on the master and just applies, or checks what it sees is fit checking for. The best match here is to add a flag for XLogRecord->xl_info and make use of one of the low 4 bits and only one is used now for XLR_SPECIAL_REL_UPDATE. An interesting side effect of this approach is that callers of XLogInsert can set XLR_CHECK_CONSISTENCY to enforce a consistency check even if wal_consistency is off. It is true that we could register such a data via XLogRegisterBuffer() instead, though the 4 bits with the BKPBLOCK_* flags are already occupied so that would induce a record penalty length and I have a hard time believing that one would like to check the consistency of a record in particular. - Speaking of which using BKPIMAGE_IS_REQUIRED_FOR_REDO stored in the block definition is sort of weird because we want to know if consistency should be checked at a higher level. - in maskPage, the new rmgr routine, there is no need for the info and blkno arguments. info is not used at all to begin with. blkno is used for gin pages to detect meta pages but this can be guessed using the opaque pointer. For heap pages and speculative inserts, masking the blkno would be fine. That's not worth it. - Instead of palloc'ing the old and new pages to compare, it would be more performant to keep around two static buffers worth of BLCKSZ and just use that. This way there is no need as well to perform any palloc calls in the masking functions, limiting the risk of errors (those code paths had better avoid errors IMO). It would be also less costly to just pass to the masking function a pointer to a buffer of size BLCKSZ and just do the masking on it. - The masking routine names can be more generic, like XXX_mask(char *page). No need to say page, we already know they work on it via the argument provided. - mask_xlog_page and mask_generic_page are useless as the block restored comes directly from a FPW, so you are comparing basically a FPW with itself. - In checkConsistency, there is no need to allocate the old page. As RestorebackupImage stores the data in an already allocated buffer, you can reuse the same location as the buffer masked afterwards. - Removed comparePages(), using memcmp instead for simplicity(). This does not show up the exact location of the inconsistency, still that won't be a win as there could be more than one inconsistency across a page. So this gives an invitation to user to look at the exact context. memcmp can be used anyway to understand where is the inconsistency if need be. - I have noticed that mask_common_page is meaningfull just for the sequence RMGR, and just that does not justify its existence so I ripped it off. - PostgresNode.pm enables wal_consistency. Seeing the high amount of WAL this produces, I feel cold about doing that, the patch does include it btw... - Standbys now stop with FATAL when an inconsistency is found. This makes error detection easier on buildfarm machines. - A couple of masking functions still use 0xFF or similar marks. Those should be replaced by MASK_MARKING. Not done that yet. - Some of the masking routines should be refined, particularly the heap and GIn functions. I did not spend time yet to do it. On top of that, I have done a fair amount of testing, creating manually some inconsistencies in the REDO routines to trigger failures on standbys. And that was sort of fun to break things intentionally. Another fun thing is the large amount of WAL that this generates (!), so anyone willing to enable that in production would be cra
Re: [HACKERS] WAL consistency check facility
On Thu, Sep 29, 2016 at 12:49 PM, Michael Paquier wrote: > Seeing nothing happening, I have moved the patch to next CF as there > is a new version, but no reviews for it. Just a note for anybody potentially looking at this patch. I am currently looking at it in depth, and will post a new version of the patch in a couple of days with review comments. Thanks. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Fri, Sep 16, 2016 at 10:36 PM, Michael Paquier wrote: > On Fri, Sep 16, 2016 at 10:30 PM, Robert Haas wrote: >> I don't think you have the right to tell Kuntal that he has to move >> the patch to the next CommitFest because there are unspecified things >> about the current version you don't like. If you don't have time to >> review further, that's your call, but he can leave the patch as Needs >> Review and see if someone else has time. > > No complain from here if done this way. I don't mean any offense :) Seeing nothing happening, I have moved the patch to next CF as there is a new version, but no reviews for it. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Fri, Sep 16, 2016 at 10:30 PM, Robert Haas wrote: > I don't think you have the right to tell Kuntal that he has to move > the patch to the next CommitFest because there are unspecified things > about the current version you don't like. If you don't have time to > review further, that's your call, but he can leave the patch as Needs > Review and see if someone else has time. No complain from here if done this way. I don't mean any offense :) -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Thu, Sep 15, 2016 at 9:23 PM, Michael Paquier wrote: > On Thu, Sep 15, 2016 at 7:30 PM, Kuntal Ghosh > wrote: >> Thoughts? > > There are still a couple of things that this patch makes me unhappy, > particularly the handling of the GUC with the xlogreader flags. I am > not sure if I'll be able to look at that again within the next couple > of weeks, but please be sure that this is registered in the next > commit fest. You could for example do that by changing the patch from > "Returned with Feedback" to "Moved to next CF" in the commit fest app. > Be sure as well to spend a couple of cycles in reviewing patches. > Usually for one patch sent, that's one patch of equal difficulty to > review, and there are many patch still waiting for feedback. I don't think you have the right to tell Kuntal that he has to move the patch to the next CommitFest because there are unspecified things about the current version you don't like. If you don't have time to review further, that's your call, but he can leave the patch as Needs Review and see if someone else has time. You are right that he should review some other people's patches, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Thu, Sep 15, 2016 at 7:30 PM, Kuntal Ghosh wrote: > Thoughts? There are still a couple of things that this patch makes me unhappy, particularly the handling of the GUC with the xlogreader flags. I am not sure if I'll be able to look at that again within the next couple of weeks, but please be sure that this is registered in the next commit fest. You could for example do that by changing the patch from "Returned with Feedback" to "Moved to next CF" in the commit fest app. Be sure as well to spend a couple of cycles in reviewing patches. Usually for one patch sent, that's one patch of equal difficulty to review, and there are many patch still waiting for feedback. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
Hello, I've added the updated the patch with the necessary documentation and comments. I've referenced Robert's reply in this thread and Simon's reply in Production block comparison facility thread to write the documentation. This feature is used to check the consistency of WAL records, i.e, whether the WAL records are inserted and applied correctly. A guc parameter named wal_consistency is added to enable this feature. When wal_consistency is enabled for a WAL record, it stores a full-page image along with the record. When a full-page image arrives during redo, it compares against the current page to check whether both are consistent. The default value for this setting is none. To check all records written to the write-ahead log, set this parameter to all. To check only some records, specify a comma-separated list of resource managers. The resource managers which are currently supported are xlog, heap2, heap, btree, hash, gin, gist, spgist, sequence, brin and generic. If any inconsistency is detected, it throws a WARNING. But, as per discussions in the earlier threads, it can be changed to ERROR./FATAL(just a one word change). I've kept this as warning because of some inconsistency in BRIN VACUUM during gmake check. In recovery tests, I've enabled this feature in PostgresNode.pm. Thanks to Amit, Dilip, Michael, Simon and Robert for their valuable feedbacks. Thoughts? -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index cd66abc..8b251b7 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -2470,6 +2470,35 @@ include_dir 'conf.d' + + wal_consistency (string) + + wal_consistency configuration parameter + + + + +This parameter is used to check the consistency of WAL records, i.e, +whether the WAL records are inserted and applied correctly. When +wal_consistency is enabled for a WAL record, it +stores a full-page image along with the record. When a full-page image +arrives during redo, it compares against the current page to check whether +both are consistent. + + + +The default value for this setting is none. To check +all records written to the write-ahead log, set this parameter to +all. To check only some records, specify a +comma-separated list of resource managers. The resource managers +which are currently supported are xlog, heap2, +heap, btree, hash, gin, +gist, spgist, sequence, brin +and generic. + + + + wal_buffers (integer) diff --git a/src/backend/access/brin/brin_xlog.c b/src/backend/access/brin/brin_xlog.c index 5a6b728..f4feb88 100644 --- a/src/backend/access/brin/brin_xlog.c +++ b/src/backend/access/brin/brin_xlog.c @@ -14,6 +14,7 @@ #include "access/brin_pageops.h" #include "access/brin_xlog.h" #include "access/xlogutils.h" +#include "storage/bufmask.h" /* @@ -279,3 +280,45 @@ brin_redo(XLogReaderState *record) elog(PANIC, "brin_redo: unknown op code %u", info); } } + +/* + * Mask a BRIN page + */ +char * +mask_brin_page(uint8 info, BlockNumber blkno, const char *page) +{ + Page page_norm; + OffsetNumber offnum, +maxoff; + + page_norm = (Page) palloc(BLCKSZ); + memcpy(page_norm, page, BLCKSZ); + + /* + * Mask the Page LSN. Because, we store the page before updating the LSN. + * Hence, LSNs of both pages will always be different. + */ + mask_page_lsn(page_norm); + + mask_page_hint_bits(page_norm); + + if (BRIN_IS_REGULAR_PAGE(page_norm)) + { + mask_unused_space(page_norm); + + maxoff = PageGetMaxOffsetNumber(page_norm); + for (offnum = FirstOffsetNumber; + offnum <= maxoff; + offnum = OffsetNumberNext(offnum)) + { + ItemId itemId = PageGetItemId(page_norm, offnum); + + if (ItemIdIsUsed(itemId)) +itemId->lp_flags = LP_UNUSED; + } + } + + /* We need to handle brin pages of type Meta and Revmap if needed */ + + return (char *)page_norm; +} diff --git a/src/backend/access/gin/ginxlog.c b/src/backend/access/gin/ginxlog.c index a40f168..a247807 100644 --- a/src/backend/access/gin/ginxlog.c +++ b/src/backend/access/gin/ginxlog.c @@ -15,6 +15,7 @@ #include "access/gin_private.h" #include "access/xlogutils.h" +#include "storage/bufmask.h" #include "utils/memutils.h" static MemoryContext opCtx; /* working memory for operations */ @@ -758,3 +759,40 @@ gin_xlog_cleanup(void) MemoryContextDelete(opCtx); opCtx = NULL; } + +/* + * Mask a Gin page + */ +char * +mask_gin_page(uint8 info, BlockNumber blkno, const char *page) +{ + Page page_norm; + GinPageOpaque opaque; + + page_norm = (Page) palloc(BLCKSZ); + memcpy(page_norm, page, BLCKSZ); + + /* + * Mask the Page LSN. Because, we store the page before updating the LSN. + * Hence, LSNs of both pages will always be different. + */ + mask_page_lsn(page_norm
Re: [HACKERS] WAL consistency check facility
On Tue, Sep 13, 2016 at 9:04 PM, Michael Paquier wrote: > It seems to me that you need to think about the way to document things > properly first, with for example: > - Have a first documentation patch that explains what is a resource > manager for WAL, and what are the types available with a nice table. > - Add in your patch documentation to explain what are the benefits of > using this facility, the main purpose is testing, but there are also > mention upthread about users that would like to get that into > production, assuming that the overhead is minimal. So, I don't think that this patch should be required to document all of the currently-undocumented stuff that somebody might want to know that it is related to this patch. It should be enough to documented the patch itself. One paragraph in config.sgml in the usual format should be fine. Maybe two paragraphs. We do need to list the resource managers, but that can just be something like this: The default value of for this setting is off. To check all records written to the write-ahead log, set this parameter to all. To check only same records, specify a comma-separated list of resource managers. The resource managers which are currently supported are heap, btree, hash, BLAH, and BLAH. If somebody wants to write some user-facing documentation of the write-ahead log format, great. That could certainly be very helpful for people who are running pg_xlogdump. But I don't think that stuff goes in this patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Thu, Sep 8, 2016 at 1:20 PM, Michael Paquier wrote: > >> 2. For BRIN/UPDATE+INIT, block numbers (in rm_tid[0]) are different in >> REVMAP page. This happens only for two cases. I'm not sure what the >> reason can be. > > Hm? This smells like a block reference bug. What are the cases you are > referring to? > Following is the only case where the backup page stored in the wal record and the current page after redo are not consistent. test:BRIN using gmake-check Master - STATEMENT: VACUUM brintest; LOG: INSERT @ 0/59E1E0F8: - BRIN/UPDATE+INIT: heapBlk 100 pagesPerRange 1 old offnum 11, new offnum 1 Standby -- LOG: REDO @ 0/59E1B500; LSN 0/59E1E0F8: prev 0/59E17578; xid 0; len 14; blkref #0: rel 1663/16384/30556, blk 12; blkref #1: rel 1663/16384/30556, blk 1; blkref #2: rel 1663/16384/30556, blk 2 - BRIN/UPDATE+INIT: heapBlk 100 pagesPerRange 1 old offnum 11, new offnum 1 WARNING: Inconsistent page (at byte 26) found, rel 1663/16384/30556, forknum 0, blkno 1 CONTEXT: xlog redo at 0/59E1B500 for BRIN/UPDATE+INIT: heapBlk 100 pagesPerRange 1 old offnum 11, new offnum 1 thoughts? -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Wed, Sep 14, 2016 at 11:31 AM, Michael Paquier wrote: > On Wed, Sep 14, 2016 at 2:56 PM, Kuntal Ghosh > wrote: >> Master >> --- >> - If wal_consistency check is enabled or needs_backup is set in >> XLogRecordAssemble(), we do a fpw. >> - If a fpw is to be done, then fork_flags is set with BKPBLOCK_HAS_IMAGE, >> which in turns set has_image flag while decoding the record. >> - If a fpw needs to be restored during redo, i.e., needs_backup is true, >> then bimg_info is set with BKPIMAGE_IS_REQUIRED_FOR_REDO. > > Here that should be if wal_consistency is true, no? Nope. I'll try to explain using some pseudo-code: XLogRecordAssemble() { include_image = needs_backup || wal_consistency[rmid]; if (include_image) { set XLogRecordBlockHeader.fork_flags |= BKPBLOCK_HAS_IMAGE; if (needs_backup) set XLogRecordBlockImageHeader.bimg_info |=BKPIMAGE_IS_REQUIRED_FOR_REDO; } . } XLogReadBufferForRedoExtended() { .. if (XLogRecHasBlockImage() && XLogRecHasBlockImageForRedo()) { RestoreBlockImage(); return BLK_RESTORED; } .. } checkConsistency() { if (wal_consistency[rmid] && !XLogRecHasBlockImage()) throw error; . } *XLogRecHasBlockImageForRedo() checks whether bimg_info is set with BKPIMAGE_IS_REQUIRED_FOR_REDO. For a backup image any of the followings is possible: 1. consistency should be checked. 2. page should restored. 3. both 1 and 2. Consistency check can be controlled by a guc parameter. But, standby should be conveyed whether an image should be restored. For that, we have used the new flag. Suggestions? -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Wed, Sep 14, 2016 at 2:56 PM, Kuntal Ghosh wrote: > Master > --- > - If wal_consistency check is enabled or needs_backup is set in > XLogRecordAssemble(), we do a fpw. > - If a fpw is to be done, then fork_flags is set with BKPBLOCK_HAS_IMAGE, > which in turns set has_image flag while decoding the record. > - If a fpw needs to be restored during redo, i.e., needs_backup is true, > then bimg_info is set with BKPIMAGE_IS_REQUIRED_FOR_REDO. Here that should be if wal_consistency is true, no? > Standby > --- > - In XLogReadBufferForRedoExtended(), if both XLogRecHasBlockImage() and > XLogRecHasBlockImageForRedo()(added by me*) return true, we restore the > backup image. > - In checkConsistency, we only check if XLogRecHasBlockImage() returns true > when wal_consistency check is enabled for this rmid. My guess would have been that you do not need to check anymore for wal_consistency in checkConsistency, making the GUC value only used on master node. > *XLogRecHasBlockImageForRedo() checks whether bimg_info is set with > BKPIMAGE_IS_REQUIRED_FOR_REDO. Yes, that's more or less what you should have. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Wed, Sep 14, 2016 at 6:34 AM, Michael Paquier wrote: >>>2) Regarding page comparison: >>>We could just use memcpy() here. compareImages was useful to get a >>>clear image of what the inconsistencies were, but you don't do that >>>anymore. >> memcmp(), right? > >Yep :) If I use memcmp(), I won't get the byte location where the first mismatch has occurred. It will be helpful to display the byte location which causes an inconsistency. >>>6) >>>+ /* >>>+* Remember that, if WAL consistency check is enabled for >>>the current rmid, >>>+* we always include backup image with the WAL record. >>>But, during redo we >>>+* restore the backup block only if needs_backup is set. >>>+*/ >>>+ if (needs_backup) >>>+ bimg.bimg_info |= BKPIMAGE_IS_REQUIRED_FOR_REDO; >>>This should use wal_consistency[rmid]? >> >> needs_backup is set when XLogRecordAssemble decides that backup image >> should be included in the record for redo purpose. This image will be >> restored during >> redo. BKPIMAGE_IS_REQUIRED_FOR_REDO indicates whether the included >> image should be restored during redo(or has_image should be set or not). > >When decoding a record, I think that you had better not use has_image >to assume that a FPW has to be double-checked. This has better be a >different boolean flag, say check_page or similar. This way after >decoding a record it is possible to know if there is a PFW, and if a >check on it is needed or not. I've done some modifications which discards the necessity of adding anything in DecodeXLogRecord(). Master --- - If wal_consistency check is enabled or needs_backup is set in XLogRecordAssemble(), we do a fpw. - If a fpw is to be done, then fork_flags is set with BKPBLOCK_HAS_IMAGE, which in turns set has_image flag while decoding the record. - If a fpw needs to be restored during redo, i.e., needs_backup is true, then bimg_info is set with BKPIMAGE_IS_REQUIRED_FOR_REDO. Standby --- - In XLogReadBufferForRedoExtended(), if both XLogRecHasBlockImage() and XLogRecHasBlockImageForRedo()(added by me*) return true, we restore the backup image. - In checkConsistency, we only check if XLogRecHasBlockImage() returns true when wal_consistency check is enabled for this rmid. *XLogRecHasBlockImageForRedo() checks whether bimg_info is set with BKPIMAGE_IS_REQUIRED_FOR_REDO. Thoughts? -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Tue, Sep 13, 2016 at 6:07 PM, Kuntal Ghosh wrote: >> For now, I've kept this as a WARNING message to detect all inconsistencies >> at once. Once, the patch is finalized, I'll modify it as an ERROR message. > > Or say FATAL. This way the server is taken down. What I'd really like to see here is a way to quickly identify in the buildfarm the moment an inconsistent WAL has been introduced by a new feature, some bug fix, or perhaps a deficiency in the masking routines. We could definitely tune that later on, by controlling with a GUC if this generates a WARNING instead of a FATAL, the former being more useful for production environments, and the latter for tests. It would be good to think as well about a set of tests, one rough thing would be to modify an on-disk page for a table, and work on that to force an inconsistency to be triggered.. >>A couple of extra thoughts: >>1) The routines of each rmgr are located in a dedicated file, for >>example GIN stuff is in ginxlog.c, etc. It seems to me that it would >>be better to move each masking function where it should be instead >>being centralized. A couple of routines need to be centralized, so I'd >>suggest putting them in a new file, like xlogmask.c, xlog because now >>this is part of WAL replay completely, including the lsn, the hint >>bint and the other common routines. > Sounds good. But, I think that the file name for common masking routines > should be as bufmask.c since we are masking the buffers only. That makes sense as well. No objections to that. >>2) Regarding page comparison: >>We could just use memcpy() here. compareImages was useful to get a >>clear image of what the inconsistencies were, but you don't do that >>anymore. > memcmp(), right? Yep :) >>6) >>+ /* >>+* Remember that, if WAL consistency check is enabled for >>the current rmid, >>+* we always include backup image with the WAL record. >>But, during redo we >>+* restore the backup block only if needs_backup is set. >>+*/ >>+ if (needs_backup) >>+ bimg.bimg_info |= BKPIMAGE_IS_REQUIRED_FOR_REDO; >>This should use wal_consistency[rmid]? > > needs_backup is set when XLogRecordAssemble decides that backup image > should be included in the record for redo purpose. This image will be > restored during > redo. BKPIMAGE_IS_REQUIRED_FOR_REDO indicates whether the included > image should be restored during redo(or has_image should be set or not). When decoding a record, I think that you had better not use has_image to assume that a FPW has to be double-checked. This has better be a different boolean flag, say check_page or similar. This way after decoding a record it is possible to know if there is a PFW, and if a check on it is needed or not. >>7) This patch has zero documentation. Please add some. Any human being >>on this list other than those who worked on the first versions >>(Heikki, Simon and I?) is going to have a hard time to review this >>patch in details moving on if there is no reference to tell what this >>feature does for the user... >> >>This patch is going to the good direction, but I don't think it's far >>from being ready for commit yet. So I am going to mark it as returned >>with feedback if there are no objections > > I think only major change that this patch needs a proper and detailed > documentation. Other than that there are very minor changes which can > be done quickly. Right? It seems to me that you need to think about the way to document things properly first, with for example: - Have a first documentation patch that explains what is a resource manager for WAL, and what are the types available with a nice table. - Add in your patch documentation to explain what are the benefits of using this facility, the main purpose is testing, but there are also mention upthread about users that would like to get that into production, assuming that the overhead is minimal. - Add more comments in your code to finish. One example is checkConsistency() that is here, but explains nothing. Well, if you'd simply use an on/off switch to control the feature, the documentation load for rmgrs would be zero, but as I am visibly outnumbered in this fight... We could also have an off/on switch implemented first, and extend that later on depending on the feedback from other users. We discussed rmgr-level or relation-level tuning of FPW compression at some point, but we've finished with the most simple approach, and we still stick with it. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
- If WAL consistency check is enabled for a rmgrID, we always include the backup image in the WAL record. >>> >>>What happens if wal_consistency has different settings on a standby >>>and its master? If for example it is set to 'all' on the standby, and >>>'none' on the master, or vice-versa, how do things react? An update of >>>this parameter should be WAL-logged, no? >> >> If wal_consistency is enabled for a rmid, standby will always check whether >> backup image exists or not i.e. BKPBLOCK_HAS_IMAGE is set or not. >> (I guess Amit and Robert also suggested the same in the thread) >> Basically, BKPBLOCK_HAS_IMAGE is set if a block contains image and >> BKPIMAGE_IS_REQUIRED_FOR_REDO (I've added this one) is set if that backup >> image is required during redo. When we decode a wal record, has_image >> flag of DecodedBkpBlock is set to BKPIMAGE_IS_REQUIRED_FOR_REDO. > >Ah I see. But do we actually store the status in the record itself, >then at replay we don't care of the value of wal_consistency at >replay. That's the same concept used by wal_compression. So shouldn't >you have more specific checks related to that in checkConsistency? You >actually don't need to check for anything in xlogreader.c, just check >for the consistency if there is a need to do so, or do nothing. I'm sorry, but I don't quite follow you here. If a wal record contains an image, has_image should be set since it helps decoding the record. But, during redo if XLogRecHasBlockImage() returns true, i.e., has_image is set, then it always restore the block. But, in our case, a record can have a backup image which should not be restored. So, we need to decide two things: 1. Does a record contain backup image? (required for decoding the record) 2. If it has an image, should we restore it during redo? I think we sould decide these in DecodeXLogRecord() only. BKPBLOCK_HAS_IMAGE answers the first question whereas BKPIMAGE_IS_REQUIRED_FOR_REDO answers the second one. In DecodeXLogRecord(), we check that BKPBLOCK_HAS_IMAGE should be set if wal_consistency is enabled for this record. The flag has_image is set to BKPIMAGE_IS_REQUIRED_FOR_REDO which is later used to decide whether we want to restore a block or not. > For now, I've kept this as a WARNING message to detect all inconsistencies > at once. Once, the patch is finalized, I'll modify it as an ERROR message. Or say FATAL. This way the server is taken down. > Thoughts? +1. I'll do that. >A couple of extra thoughts: >1) The routines of each rmgr are located in a dedicated file, for >example GIN stuff is in ginxlog.c, etc. It seems to me that it would >be better to move each masking function where it should be instead >being centralized. A couple of routines need to be centralized, so I'd >suggest putting them in a new file, like xlogmask.c, xlog because now >this is part of WAL replay completely, including the lsn, the hint >bint and the other common routines. Sounds good. But, I think that the file name for common masking routines should be as bufmask.c since we are masking the buffers only. >2) Regarding page comparison: >+/* >+ * Compare the contents of two pages. >+ * If the two pages are exactly same, it returns BLCKSZ. Otherwise, >+ * it returns the location where the first mismatch has occurred. >+ */ >+int >+comparePages(char *page1, char *page2) >We could just use memcpy() here. compareImages was useful to get a >clear image of what the inconsistencies were, but you don't do that >anymore. memcmp(), right? >5) >+ has_image = record->blocks[block_id].has_image; >+ record->blocks[block_id].has_image = true; >+ if (!RestoreBlockImage(record, block_id, old_page)) >+ elog(ERROR, "failed to restore block image"); >+ record->blocks[block_id].has_image = has_image; >Er, what? And BKPIMAGE_IS_REQUIRED_FOR_REDO? Sorry, I completely missed this. >6) >+ /* >+* Remember that, if WAL consistency check is enabled for >the current rmid, >+* we always include backup image with the WAL record. >But, during redo we >+* restore the backup block only if needs_backup is set. >+*/ >+ if (needs_backup) >+ bimg.bimg_info |= BKPIMAGE_IS_REQUIRED_FOR_REDO; >This should use wal_consistency[rmid]? needs_backup is set when XLogRecordAssemble decides that backup image should be included in the record for redo purpose. This image will be restored during redo. BKPIMAGE_IS_REQUIRED_FOR_REDO indicates whether the included image should be restored during redo(or has_image should be set or not). >7) This patch has zero documentation. Please add some. Any human being >on this list other than those who worked on the first versions >(Heikki, Simon and I?) is going to have a hard time to review this >patch in details moving on if there is no reference to tell what this >feature does for the user... > >This patch is going to the good direction, but I don't think it's far >fro
Re: [HACKERS] WAL consistency check facility
On Sun, Sep 11, 2016 at 12:03 AM, Robert Haas wrote: > It seems entirely unnecessary for the master and the standby to agree > here. I think what we need is two GUCs. One of them, which affects > only the master, controls whether the validation information is > including in the WAL, and the other, which affects only the standby, > affects whether validation is performed when the necessary information > is present. Or maybe skip the second one and just decree that > standbys will always validate if the necessary information is present. > Using the same GUC on both the master and the standby but making it > mean different things in each of those places (whether to log the > validation info in one case, whether to perform validation in the > other case) is another option that also avoids needing to enforce that > the setting is the same in both places, but probably an inferior one. Thinking more about that, there is no actual need to do anything complicated here. We could just track at the record level if a consistency check is needs to be done at replay and do it. If nothing is set, just do nothing. That would allow us to promote this parameter to SIGHUP. wal_compression does something similar. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Mon, Sep 12, 2016 at 5:06 AM, Kuntal Ghosh wrote: >>+ void(*rm_checkConsistency) (XLogReaderState *record); >>All your _checkConsistency functions share the same pattern, in short >>they all use a for loop for each block, call each time >>XLogReadBufferExtended, etc. And this leads to a *lot* of duplication. >>You would get a reduction by a couple of hundreds of lines by having a >>smarter refactoring. And to be honest, if I look at your patch what I >>think is the correct way of doing things is to add to the rmgr not >>this check consistency function, but just a pointer to the masking >>function. > > +1. In rmgrlist, I've added a pointer to the masking function for each rmid. > A common function named checkConsistency calls these masking functions > based on their rmid and does comparison for each block. The patch size is down from 79kB to 38kB. That gets better :) >>> - If WAL consistency check is enabled for a rmgrID, we always include >>> the backup image in the WAL record. >> >>What happens if wal_consistency has different settings on a standby >>and its master? If for example it is set to 'all' on the standby, and >>'none' on the master, or vice-versa, how do things react? An update of >>this parameter should be WAL-logged, no? > > If wal_consistency is enabled for a rmid, standby will always check whether > backup image exists or not i.e. BKPBLOCK_HAS_IMAGE is set or not. > (I guess Amit and Robert also suggested the same in the thread) > Basically, BKPBLOCK_HAS_IMAGE is set if a block contains image and > BKPIMAGE_IS_REQUIRED_FOR_REDO (I've added this one) is set if that backup > image is required during redo. When we decode a wal record, has_image > flag of DecodedBkpBlock is set to BKPIMAGE_IS_REQUIRED_FOR_REDO. Ah I see. But do we actually store the status in the record itself, then at replay we don't care of the value of wal_consistency at replay. That's the same concept used by wal_compression. So shouldn't you have more specific checks related to that in checkConsistency? You actually don't need to check for anything in xlogreader.c, just check for the consistency if there is a need to do so, or do nothing. > For now, I've kept this as a WARNING message to detect all inconsistencies > at once. Once, the patch is finalized, I'll modify it as an ERROR message. Or say FATAL. This way the server is taken down. > Thoughts? A couple of extra thoughts: 1) The routines of each rmgr are located in a dedicated file, for example GIN stuff is in ginxlog.c, etc. It seems to me that it would be better to move each masking function where it should be instead being centralized. A couple of routines need to be centralized, so I'd suggest putting them in a new file, like xlogmask.c, xlog because now this is part of WAL replay completely, including the lsn, the hint bint and the other common routines. 2) Regarding page comparison: +/* + * Compare the contents of two pages. + * If the two pages are exactly same, it returns BLCKSZ. Otherwise, + * it returns the location where the first mismatch has occurred. + */ +int +comparePages(char *page1, char *page2) We could just use memcpy() here. compareImages was useful to get a clear image of what the inconsistencies were, but you don't do that anymore. 3) +static void checkConsistency(RmgrId rmid, XLogReaderState *record); The RMGR if is part of the record decoded, so you could just remove RmgrId from the list of arguments and simplify this interface. 4) If this patch still goes with the possibility to set up a list of RMGRs, documentation is needed for that. I'd suggest writing first a patch to explain what are RMGRs for WAL, then apply the WAL consistency facility on top of it and link wal_consistency to it. 5) + has_image = record->blocks[block_id].has_image; + record->blocks[block_id].has_image = true; + if (!RestoreBlockImage(record, block_id, old_page)) + elog(ERROR, "failed to restore block image"); + record->blocks[block_id].has_image = has_image; Er, what? And BKPIMAGE_IS_REQUIRED_FOR_REDO? 6) + /* +* Remember that, if WAL consistency check is enabled for the current rmid, +* we always include backup image with the WAL record. But, during redo we +* restore the backup block only if needs_backup is set. +*/ + if (needs_backup) + bimg.bimg_info |= BKPIMAGE_IS_REQUIRED_FOR_REDO; This should use wal_consistency[rmid]? 7) This patch has zero documentation. Please add some. Any human being on this list other than those who worked on the first versions (Heikki, Simon and I?) is going to have a hard time to review this patch in details moving on if there is no reference to tell what this feature does for the user... This patch is going to the good direction, but I don't think it's far from being ready for commit yet. So I am going to mark it as returned with feedback if there are no objections. -- Michae
Re: [HACKERS] WAL consistency check facility
Hello, Based on the previous discussions, I've modified the existing patch. >+ void(*rm_checkConsistency) (XLogReaderState *record); >All your _checkConsistency functions share the same pattern, in short >they all use a for loop for each block, call each time >XLogReadBufferExtended, etc. And this leads to a *lot* of duplication. >You would get a reduction by a couple of hundreds of lines by having a >smarter refactoring. And to be honest, if I look at your patch what I >think is the correct way of doing things is to add to the rmgr not >this check consistency function, but just a pointer to the masking >function. +1. In rmgrlist, I've added a pointer to the masking function for each rmid. A common function named checkConsistency calls these masking functions based on their rmid and does comparison for each block. >> - If WAL consistency check is enabled for a rmgrID, we always include >> the backup image in the WAL record. > >What happens if wal_consistency has different settings on a standby >and its master? If for example it is set to 'all' on the standby, and >'none' on the master, or vice-versa, how do things react? An update of >this parameter should be WAL-logged, no? If wal_consistency is enabled for a rmid, standby will always check whether backup image exists or not i.e. BKPBLOCK_HAS_IMAGE is set or not. (I guess Amit and Robert also suggested the same in the thread) Basically, BKPBLOCK_HAS_IMAGE is set if a block contains image and BKPIMAGE_IS_REQUIRED_FOR_REDO (I've added this one) is set if that backup image is required during redo. When we decode a wal record, has_image flag of DecodedBkpBlock is set to BKPIMAGE_IS_REQUIRED_FOR_REDO. >+ if (pg_strcasecmp(tok, "Heap2") == 0) >+ { >+ newwalconsistency[RM_HEAP2_ID] = true; >+ } >Thinking more about it, I guess that we had better change the >definition list of rmgrs in rmgr.h and get something closer to >RmgrDescData that pg_xlogdump has to avoid all this stanza by >completing it with the name of the rmgr. The only special cases that >this code path would need to take care of would be then 'none' and >'all'. You could do this refactoring on top of the main patch to >simplify it as it is rather big (1.7k lines). I've modified it exactly like pg_xlogdump does. Additionally, it checks whether masking function is defined for the rmid or not. Hence, in future, if we want to include any other rmid for wal consistency check, we just need to define its masking function. >> - In recovery tests (src/test/recovery/t), I've added wal_consistency >> parameter in the existing scripts. This feature doesn't change the >> expected output. If there is any inconsistency, it can be verified in >> corresponding log file. > >I am afraid that just generating a WARNING message is going to be >useless for the buildfarm. If we want to detect errors, we could for >example have an additional GUC to trigger an ERROR or a FATAL, taking >down the cluster, and allowing things to show in red on a platform. For now, I've kept this as a WARNING message to detect all inconsistencies at once. Once, the patch is finalized, I'll modify it as an ERROR message. Thoughts? -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c index 9bb1362..3ca64d1 100644 --- a/src/backend/access/transam/rmgr.c +++ b/src/backend/access/transam/rmgr.c @@ -26,12 +26,13 @@ #include "commands/tablespace.h" #include "replication/message.h" #include "replication/origin.h" +#include "storage/bufmask.h" #include "storage/standby.h" #include "utils/relmapper.h" /* must be kept in sync with RmgrData definition in xlog_internal.h */ -#define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup) \ - { name, redo, desc, identify, startup, cleanup }, +#define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,maskPage) \ + { name, redo, desc, identify, startup, cleanup, maskPage }, const RmgrData RmgrTable[RM_MAX_ID + 1] = { #include "access/rmgrlist.h" diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 2189c22..77e79f5 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -25,6 +25,7 @@ #include "access/commit_ts.h" #include "access/multixact.h" #include "access/rewriteheap.h" +#include "access/rmgr.h" #include "access/subtrans.h" #include "access/timeline.h" #include "access/transam.h" @@ -95,6 +96,8 @@ bool EnableHotStandby = false; bool fullPageWrites = true; bool wal_log_hints = false; bool wal_compression = false; +char *wal_consistency_string = NULL; +bool *wal_consistency = NULL; bool log_checkpoints = false; int sync_method = DEFAULT_SYNC_METHOD; int wal_level = WAL_LEVEL_MINIMAL; @@ -870,6 +873,7 @@ static void WALInsertLockAcquireExclusive(void); static void WALInsertLockRelease(void); static void WALInsertLockUpdateInsertingAt(XLogRecPtr ins
Re: [HACKERS] WAL consistency check facility
On Sat, Sep 10, 2016 at 8:33 PM, Robert Haas wrote: > On Sat, Sep 10, 2016 at 3:19 AM, Michael Paquier > wrote: >> On Fri, Sep 9, 2016 at 4:01 PM, Kuntal Ghosh >> wrote: > - If WAL consistency check is enabled for a rmgrID, we always include > the backup image in the WAL record. What happens if wal_consistency has different settings on a standby and its master? If for example it is set to 'all' on the standby, and 'none' on the master, or vice-versa, how do things react? An update of this parameter should be WAL-logged, no? >>> It is possible to set wal_consistency to 'All' in master and any other >>> values in standby. But, the scenario you mentioned will cause error in >>> standby since it may not get the required backup image for wal >>> consistency check. I think that user should be responsible to set >>> this value correctly. We can improve the error message to make the >>> user aware of the situation. >> >> Let's be careful here. You should as well consider things from the >> angle that some parameter updates are WAL-logged as well, like >> wal_level with the WAL record XLOG_PARAMETER_CHANGE. > > It seems entirely unnecessary for the master and the standby to agree > here. I think what we need is two GUCs. One of them, which affects > only the master, controls whether the validation information is > including in the WAL, and the other, which affects only the standby, > affects whether validation is performed when the necessary information > is present. > I think from the clarity perspective, this option sounds good, but I am slightly afraid that it might be inconvenient for users to set the different values for these two parameters. > Or maybe skip the second one and just decree that > standbys will always validate if the necessary information is present. Sounds like a better alternative and probably easier to configure for users. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Sat, Sep 10, 2016 at 12:49 PM, Michael Paquier wrote: > On Fri, Sep 9, 2016 at 4:01 PM, Kuntal Ghosh > wrote: > - In recovery tests (src/test/recovery/t), I've added wal_consistency parameter in the existing scripts. This feature doesn't change the expected output. If there is any inconsistency, it can be verified in corresponding log file. >>> >>> I am afraid that just generating a WARNING message is going to be >>> useless for the buildfarm. If we want to detect errors, we could for >>> example have an additional GUC to trigger an ERROR or a FATAL, taking >>> down the cluster, and allowing things to show in red on a platform. >>> >> Yes, we can include an additional GUC to trigger an ERROR for any >> inconsistency. > > I'd like to hear extra opinions about that, but IMO just having an > ERROR would be fine for the first implementation. Once you've bumped > into an ERROR, you are likely going to fix it first. > +1 for just an ERROR to detect the inconsistency. I think adding additional GUC just to raise error level doesn't seem to be advisable. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Sat, Sep 10, 2016 at 3:19 AM, Michael Paquier wrote: > On Fri, Sep 9, 2016 at 4:01 PM, Kuntal Ghosh > wrote: - If WAL consistency check is enabled for a rmgrID, we always include the backup image in the WAL record. >>> >>> What happens if wal_consistency has different settings on a standby >>> and its master? If for example it is set to 'all' on the standby, and >>> 'none' on the master, or vice-versa, how do things react? An update of >>> this parameter should be WAL-logged, no? >>> >> It is possible to set wal_consistency to 'All' in master and any other >> values in standby. But, the scenario you mentioned will cause error in >> standby since it may not get the required backup image for wal >> consistency check. I think that user should be responsible to set >> this value correctly. We can improve the error message to make the >> user aware of the situation. > > Let's be careful here. You should as well consider things from the > angle that some parameter updates are WAL-logged as well, like > wal_level with the WAL record XLOG_PARAMETER_CHANGE. It seems entirely unnecessary for the master and the standby to agree here. I think what we need is two GUCs. One of them, which affects only the master, controls whether the validation information is including in the WAL, and the other, which affects only the standby, affects whether validation is performed when the necessary information is present. Or maybe skip the second one and just decree that standbys will always validate if the necessary information is present. Using the same GUC on both the master and the standby but making it mean different things in each of those places (whether to log the validation info in one case, whether to perform validation in the other case) is another option that also avoids needing to enforce that the setting is the same in both places, but probably an inferior one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Fri, Sep 9, 2016 at 4:01 PM, Kuntal Ghosh wrote: >>> - If WAL consistency check is enabled for a rmgrID, we always include >>> the backup image in the WAL record. >> >> What happens if wal_consistency has different settings on a standby >> and its master? If for example it is set to 'all' on the standby, and >> 'none' on the master, or vice-versa, how do things react? An update of >> this parameter should be WAL-logged, no? >> > It is possible to set wal_consistency to 'All' in master and any other > values in standby. But, the scenario you mentioned will cause error in > standby since it may not get the required backup image for wal > consistency check. I think that user should be responsible to set > this value correctly. We can improve the error message to make the > user aware of the situation. Let's be careful here. You should as well consider things from the angle that some parameter updates are WAL-logged as well, like wal_level with the WAL record XLOG_PARAMETER_CHANGE. >>> - In recovery tests (src/test/recovery/t), I've added wal_consistency >>> parameter in the existing scripts. This feature doesn't change the >>> expected output. If there is any inconsistency, it can be verified in >>> corresponding log file. >> >> I am afraid that just generating a WARNING message is going to be >> useless for the buildfarm. If we want to detect errors, we could for >> example have an additional GUC to trigger an ERROR or a FATAL, taking >> down the cluster, and allowing things to show in red on a platform. >> > Yes, we can include an additional GUC to trigger an ERROR for any > inconsistency. I'd like to hear extra opinions about that, but IMO just having an ERROR would be fine for the first implementation. Once you've bumped into an ERROR, you are likely going to fix it first. >> + /* >> +* Followings are the rmids which can have backup blocks. >> +* We'll enable this feature only for these rmids. >> +*/ >> + newwalconsistency[RM_HEAP2_ID] = true; >> + newwalconsistency[RM_HEAP_ID] = true; >> + newwalconsistency[RM_BTREE_ID] = true; >> + newwalconsistency[RM_HASH_ID] = true; >> + newwalconsistency[RM_GIN_ID] = true; >> + newwalconsistency[RM_GIST_ID] = true; >> + newwalconsistency[RM_SEQ_ID] = true; >> + newwalconsistency[RM_SPGIST_ID] = true; >> + newwalconsistency[RM_BRIN_ID] = true; >> + newwalconsistency[RM_GENERIC_ID] = true; >> + newwalconsistency[RM_XLOG_ID] = true; >> Here you can just use MemSet with RM_MAX_ID and simplify this code >> maintenance. >> > Not all rmids can have backup blocks. So, for wal_consistency = 'all', > I've enabled only those rmids which can have backup blocks. Even if some rmgrs do not support FPWs, I don't think that it is safe to assume that the existing ones would never support it. Imagine for example that feature X is implemented. Feature X adds rmgs Y, but rmgr Y does not use FPWs. At a later point a new feature is added, which makes rmgr Y using FPWs. We'd increase the number of places to update with your patch, increasing the likelyness to introduce bugs. It would be better to use a safe implementation from the maintenance point of view to be honest (maintenance load of masking functions is somewhat leveraged by the fact that on-disk format is kept compatible). >> >> + if (pg_strcasecmp(tok, "Heap2") == 0) >> + { >> + newwalconsistency[RM_HEAP2_ID] = true; >> + } >> Thinking more about it, I guess that we had better change the >> definition list of rmgrs in rmgr.h and get something closer to >> RmgrDescData that pg_xlogdump has to avoid all this stanza by >> completing it with the name of the rmgr. The only special cases that >> this code path would need to take care of would be then 'none' and >> 'all'. You could do this refactoring on top of the main patch to >> simplify it as it is rather big (1.7k lines). >> > I'm not sure about this point. wal_consistency doesn't support all > the rmids. We should have some way to check this. I'd rather see this code done in such a way that all the rmgrs can be handled, this approach being particularly attractive for the fact that there is no need to change it if new rmgrs are added in the future. (This was a reason as well why I still think that a simple on/off switch would be plain enough, users have mostly control of the SQLs triggering WAL. And if you run tests, you'll likely have the mind to turn autovacuum to off to avoid it to generate FPWs and pollute the logs at least at the second run of your tests). And if you move forward with the approach of making this parameter a list, I think that it would be better to add a section in the WAL documentation about resource managers, like what they are, and list them in this section of the docs. Then your parameter could link to this documentation part and users would be able to see what k
Re: [HACKERS] WAL consistency check facility
Hello Michael, Thanks for your detailed review. >> - If WAL consistency check is enabled for a rmgrID, we always include >> the backup image in the WAL record. > > What happens if wal_consistency has different settings on a standby > and its master? If for example it is set to 'all' on the standby, and > 'none' on the master, or vice-versa, how do things react? An update of > this parameter should be WAL-logged, no? > It is possible to set wal_consistency to 'All' in master and any other values in standby. But, the scenario you mentioned will cause error in standby since it may not get the required backup image for wal consistency check. I think that user should be responsible to set this value correctly. We can improve the error message to make the user aware of the situation. >> - I've extended the RmgrTable with a new function pointer >> rm_checkConsistency, which is called after rm_redo. (only when WAL >> consistency check is enabled for this rmgrID) >> - In each rm_checkConsistency, both backup pages and buffer pages are >> masked accordingly before any comparison. > > This leads to heavy code duplication... > > + void(*rm_checkConsistency) (XLogReaderState *record); > All your _checkConsistency functions share the same pattern, in short > they all use a for loop for each block, call each time > XLogReadBufferExtended, etc. And this leads to a *lot* of duplication. > You would get a reduction by a couple of hundreds of lines by having a > smarter refactoring. And to be honest, if I look at your patch what I > think is the correct way of doing things is to add to the rmgr not > this check consistency function, but just a pointer to the masking > function. Pointer to the masking function will certainly reduce a lot of redundant code. I'll modify it accordingly. >> - In recovery tests (src/test/recovery/t), I've added wal_consistency >> parameter in the existing scripts. This feature doesn't change the >> expected output. If there is any inconsistency, it can be verified in >> corresponding log file. > > I am afraid that just generating a WARNING message is going to be > useless for the buildfarm. If we want to detect errors, we could for > example have an additional GUC to trigger an ERROR or a FATAL, taking > down the cluster, and allowing things to show in red on a platform. > Yes, we can include an additional GUC to trigger an ERROR for any inconsistency. >> Results >> >> >> I've tested with installcheck and installcheck-world in master-standby >> set-up. Followings are the configuration parameters. > > So you tested as well the recovery tests, right? > Yes, I've done the recovery tests after enabling tap-test. > > + /* > +* Followings are the rmids which can have backup blocks. > +* We'll enable this feature only for these rmids. > +*/ > + newwalconsistency[RM_HEAP2_ID] = true; > + newwalconsistency[RM_HEAP_ID] = true; > + newwalconsistency[RM_BTREE_ID] = true; > + newwalconsistency[RM_HASH_ID] = true; > + newwalconsistency[RM_GIN_ID] = true; > + newwalconsistency[RM_GIST_ID] = true; > + newwalconsistency[RM_SEQ_ID] = true; > + newwalconsistency[RM_SPGIST_ID] = true; > + newwalconsistency[RM_BRIN_ID] = true; > + newwalconsistency[RM_GENERIC_ID] = true; > + newwalconsistency[RM_XLOG_ID] = true; > Here you can just use MemSet with RM_MAX_ID and simplify this code > maintenance. > Not all rmids can have backup blocks. So, for wal_consistency = 'all', I've enabled only those rmids which can have backup blocks. > > + if (pg_strcasecmp(tok, "Heap2") == 0) > + { > + newwalconsistency[RM_HEAP2_ID] = true; > + } > Thinking more about it, I guess that we had better change the > definition list of rmgrs in rmgr.h and get something closer to > RmgrDescData that pg_xlogdump has to avoid all this stanza by > completing it with the name of the rmgr. The only special cases that > this code path would need to take care of would be then 'none' and > 'all'. You could do this refactoring on top of the main patch to > simplify it as it is rather big (1.7k lines). > I'm not sure about this point. wal_consistency doesn't support all the rmids. We should have some way to check this. I'll update rest of the things as mentioned by you accordingly. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Wed, Sep 7, 2016 at 7:22 PM, Kuntal Ghosh wrote: > Hello, Could you avoid top-posting please? More reference here: http://www.idallen.com/topposting.html > - If WAL consistency check is enabled for a rmgrID, we always include > the backup image in the WAL record. What happens if wal_consistency has different settings on a standby and its master? If for example it is set to 'all' on the standby, and 'none' on the master, or vice-versa, how do things react? An update of this parameter should be WAL-logged, no? > - I've extended the RmgrTable with a new function pointer > rm_checkConsistency, which is called after rm_redo. (only when WAL > consistency check is enabled for this rmgrID) > - In each rm_checkConsistency, both backup pages and buffer pages are > masked accordingly before any comparison. This leads to heavy code duplication... > - In postgresql.conf, a new guc variable named 'wal_consistency' is > added. Default value of this variable is 'None'. Valid values are > combinations of Heap2, Heap, Btree, Hash, Gin, Gist, Sequence, SPGist, > BRIN, Generic and XLOG. It can also be set to 'All' to enable all the > values. Lower-case is the usual policy for parameter values for GUC parameters. > - In recovery tests (src/test/recovery/t), I've added wal_consistency > parameter in the existing scripts. This feature doesn't change the > expected output. If there is any inconsistency, it can be verified in > corresponding log file. I am afraid that just generating a WARNING message is going to be useless for the buildfarm. If we want to detect errors, we could for example have an additional GUC to trigger an ERROR or a FATAL, taking down the cluster, and allowing things to show in red on a platform. > Results > > > I've tested with installcheck and installcheck-world in master-standby > set-up. Followings are the configuration parameters. So you tested as well the recovery tests, right? > I got two types of inconsistencies as following: > > 1. For Btree/UNLINK_PAGE_META, btpo_flags are different. In backup > page, BTP_DELETED and BTP_LEAF both the flags are set, whereas after > redo, only BTP_DELETED flag is set in buffer page. I assume that we > should clear all btpo_flags before setting BTP_DELETED in > _bt_unlink_halfdead_page(). The page is deleted, it does not matter, so you could just mask all the flags for a deleted page... [...] + /* +* Mask everything on a DELETED page. +*/ + if (((BTPageOpaque) PageGetSpecialPointer(page_norm))->btpo_flags & BTP_DELETED) And that's what is happening. > 2. For BRIN/UPDATE+INIT, block numbers (in rm_tid[0]) are different in > REVMAP page. This happens only for two cases. I'm not sure what the > reason can be. Hm? This smells like a block reference bug. What are the cases you are referring to? > I haven't done sufficient tests yet to measure the overhead of this > modification. I'll do that next. I did a first pass on your patch, and I think that things could be really reduced. There is much code duplication, but see below for the details.. #include "access/xlogutils.h" - +#include "storage/bufmask.h" I know that I am a noisy one on the matter, but please double-check for such useless noise in your patch. And there is not only one. + newwalconsistency = (bool *) guc_malloc(ERROR,(RM_MAX_ID + 1)*sizeof(bool)); This spacing is not project-style. You may want to go through that: https://www.postgresql.org/docs/devel/static/source.html +$node_master->append_conf( + 'postgresql.conf', qq( +wal_consistency = 'All' +)); Instead of duplicating that 7 times, you could just do it once in the init() method of PostgresNode.pm. This really has meaning if enabled by default. + /* +* Followings are the rmids which can have backup blocks. +* We'll enable this feature only for these rmids. +*/ + newwalconsistency[RM_HEAP2_ID] = true; + newwalconsistency[RM_HEAP_ID] = true; + newwalconsistency[RM_BTREE_ID] = true; + newwalconsistency[RM_HASH_ID] = true; + newwalconsistency[RM_GIN_ID] = true; + newwalconsistency[RM_GIST_ID] = true; + newwalconsistency[RM_SEQ_ID] = true; + newwalconsistency[RM_SPGIST_ID] = true; + newwalconsistency[RM_BRIN_ID] = true; + newwalconsistency[RM_GENERIC_ID] = true; + newwalconsistency[RM_XLOG_ID] = true; Here you can just use MemSet with RM_MAX_ID and simplify this code maintenance. + for(i = 0; i < RM_MAX_ID + 1 ; i++) + newwalconsistency[i] = false; + break; Same here you can just use MemSet. + else if (pg_strcasecmp(tok, "NONE") == 0) [...] + else if (pg_strcasecmp(tok, "ALL") == 0) It seems to me that using NONE or ALL with any other keywords should not be allowed. + if (pg_strcasecmp(tok, "Heap2") == 0) + { + newwalconsistency[RM_HEAP2_ID] = true; + } Thinking mo
Re: [HACKERS] WAL consistency check facility
On Wed, Sep 7, 2016 at 3:52 PM, Kuntal Ghosh wrote: > > I got two types of inconsistencies as following: > > 1. For Btree/UNLINK_PAGE_META, btpo_flags are different. In backup > page, BTP_DELETED and BTP_LEAF both the flags are set, whereas after > redo, only BTP_DELETED flag is set in buffer page. > I see that inconsistency in code as well. I think this is harmless, because after the page is marked as deleted, it is not used for any purpose other than to recycle it for re-use. After re-using it, the caller always suppose to initialize the flags based on it's usage and I see that is happening in the code unless I am missing something. > I assume that we > should clear all btpo_flags before setting BTP_DELETED in > _bt_unlink_halfdead_page(). > Yeah, we can do that for consistency. If we see any problem in doing so, then I think we can log the flags and set them during replay. Note - Please post your replies inline rather than top posting them. It breaks the discussion link, if you top post it. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Wed, Sep 7, 2016 at 3:52 PM, Kuntal Ghosh wrote: > Hello, > > As per the earlier discussions, I've attached the updated patch for > WAL consistency check feature. This is how the patch works: > The earlier patch (wal_consistency_v6.patch) was based on the commit id 67e1e2aaff. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
Hello, As per the earlier discussions, I've attached the updated patch for WAL consistency check feature. This is how the patch works: - If WAL consistency check is enabled for a rmgrID, we always include the backup image in the WAL record. - I've extended the RmgrTable with a new function pointer rm_checkConsistency, which is called after rm_redo. (only when WAL consistency check is enabled for this rmgrID) - In each rm_checkConsistency, both backup pages and buffer pages are masked accordingly before any comparison. - In postgresql.conf, a new guc variable named 'wal_consistency' is added. Default value of this variable is 'None'. Valid values are combinations of Heap2, Heap, Btree, Hash, Gin, Gist, Sequence, SPGist, BRIN, Generic and XLOG. It can also be set to 'All' to enable all the values. - In recovery tests (src/test/recovery/t), I've added wal_consistency parameter in the existing scripts. This feature doesn't change the expected output. If there is any inconsistency, it can be verified in corresponding log file. Results I've tested with installcheck and installcheck-world in master-standby set-up. Followings are the configuration parameters. Master: wal_level = replica max_wal_senders = 3 wal_keep_segments = 4000 hot_standby = on wal_consistency = 'All' Standby: wal_consistency = 'All' I got two types of inconsistencies as following: 1. For Btree/UNLINK_PAGE_META, btpo_flags are different. In backup page, BTP_DELETED and BTP_LEAF both the flags are set, whereas after redo, only BTP_DELETED flag is set in buffer page. I assume that we should clear all btpo_flags before setting BTP_DELETED in _bt_unlink_halfdead_page(). 2. For BRIN/UPDATE+INIT, block numbers (in rm_tid[0]) are different in REVMAP page. This happens only for two cases. I'm not sure what the reason can be. I haven't done sufficient tests yet to measure the overhead of this modification. I'll do that next. Thanks to Amit Kapila, Dilip Kumar and Robert Haas for their off-line suggestions. Thoughts? -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com On Thu, Sep 1, 2016 at 11:34 PM, Peter Geoghegan wrote: > On Thu, Sep 1, 2016 at 9:23 AM, Robert Haas wrote: >> Indeed, it had occurred to me that we might not even want to compile >> this code into the server unless WAL_DEBUG is defined; after all, how >> does it help a regular user to detect that the server has a bug? Bug >> or no bug, that's the code they've got. But on further reflection, it >> seems like it could be useful: if we suspect a bug in the redo code >> but we can't reproduce it here, we could ask the customer to turn this >> option on to see whether it produces logging indicating the nature of >> the problem. However, because of the likely expensive of enabling the >> feature, it seems like it would be quite desirable to limit the >> expense of generating many extra FPWs to the affected rmgr. For >> example, if a user has a table with a btree index and a gin index, and >> we suspect a bug in GIN, it would be nice for the user to be able to >> enable the feature *only for GIN* rather than paying the cost of >> enabling it for btree and heap as well.[2] > > Yes, that would be rather a large advantage. > > I think that there really is no hard distinction between users and > hackers. Some people will want to run this in production, and it would > be a lot better if performance was at least not atrocious. If amcheck > couldn't do the majority of its verification with only an > AccessShareLock, then users probably just couldn't use it. Heroku > wouldn't have been able to use it on all production databases. It > wouldn't have mattered that the verification was no less effective, > since the bugs it found would simply never have been observed in > practice. > > -- > Peter Geoghegan -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com diff --git a/src/backend/access/brin/brin_xlog.c b/src/backend/access/brin/brin_xlog.c index 27ba0a9..4c63ded 100644 --- a/src/backend/access/brin/brin_xlog.c +++ b/src/backend/access/brin/brin_xlog.c @@ -14,7 +14,7 @@ #include "access/brin_pageops.h" #include "access/brin_xlog.h" #include "access/xlogutils.h" - +#include "storage/bufmask.h" /* * xlog replay routines @@ -286,3 +286,84 @@ brin_redo(XLogReaderState *record) elog(PANIC, "brin_redo: unknown op code %u", info); } } + +/* + * It checks whether the current buffer page and backup page stored in the + * WAL record are consistent or not. Before comparing the two pages, it applies + * appropiate masking to the pages to ignore certain areas like hint bits, + * unused space between pd_lower and pd_upper etc. For more information about + * masking, see the masking function. + * This function should be called once WAL replay has been completed. + */ +void +brin_checkConsistency(XLogReaderState *record) +{ + uint8 info = XLogRecGetInfo(record) & ~XLR_INFO_MASK; + int block_id; + RelFileNode
Re: [HACKERS] WAL consistency check facility
On Thu, Sep 1, 2016 at 9:23 AM, Robert Haas wrote: > Indeed, it had occurred to me that we might not even want to compile > this code into the server unless WAL_DEBUG is defined; after all, how > does it help a regular user to detect that the server has a bug? Bug > or no bug, that's the code they've got. But on further reflection, it > seems like it could be useful: if we suspect a bug in the redo code > but we can't reproduce it here, we could ask the customer to turn this > option on to see whether it produces logging indicating the nature of > the problem. However, because of the likely expensive of enabling the > feature, it seems like it would be quite desirable to limit the > expense of generating many extra FPWs to the affected rmgr. For > example, if a user has a table with a btree index and a gin index, and > we suspect a bug in GIN, it would be nice for the user to be able to > enable the feature *only for GIN* rather than paying the cost of > enabling it for btree and heap as well.[2] Yes, that would be rather a large advantage. I think that there really is no hard distinction between users and hackers. Some people will want to run this in production, and it would be a lot better if performance was at least not atrocious. If amcheck couldn't do the majority of its verification with only an AccessShareLock, then users probably just couldn't use it. Heroku wouldn't have been able to use it on all production databases. It wouldn't have mattered that the verification was no less effective, since the bugs it found would simply never have been observed in practice. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On 1 September 2016 at 17:23, Robert Haas wrote: > The primary audience of this feature is PostgreSQL developers I have spoken to users who are waiting for this feature to run in production, which is why I suggested it. Some people care more about correctness than they do about loss of performance. Obviously, this would be expensive and those with a super high performance requirement may not be able to take advantage of this. I'm sure many people will turn it off once if they hit a performance issue, but running it in production for the first few months will give people a very safe feeling. I think the primary use for an rmgr filter might well be PostgreSQL developers. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Thu, Sep 1, 2016 at 4:12 PM, Simon Riggs wrote: > So in my current understanding, a hinted change has by definition no > WAL record, so we just ship a FPW. Hmm. An FPW would have to be contained in a WAL record, so it can't be right to say that we ship an FPW for lack of a WAL record. I think what we ship is nothing at all when wal_log_hints is disabled, and when wal_log_hints is enabled we log an FDW once per checkpoint. > A non-hint change has a WAL record > and it is my (possibly naive) hope that all changes to a page are > reflected in the WAL record/replay, I hope for this, too. > so we can just make a simple > comparison without caring what is the rmgr of the WAL record. Sure, that is 100% possible, and likely a good idea as far as the behavior on the standby is concerned. What's not so clear is whether a simple on/off switch is a wise plan on the master. The purpose of this code, as I understand it, is to check for discrepancies between "do" and "redo"; that is, to verify that the changes made to the buffer at the time the WAL record is generated produce the same result as replay of that WAL record on the standby. To accomplish this purpose, a post-image of the affected buffers is included in each and every WAL record. On replay, that post-image can be compared with the result of replay. If they differ, PostgreSQL has a bug. I would not expect many users to run this in production, because it will presumably be wicked expensive. If I recall correctly, doing full page writes once per buffer per checkpoint, the space taken up by FPWs is >75% of WAL volume. Doing it for every record will be exponentially more expensive. The primary audience of this feature is PostgreSQL developers, who might want to use it to try to verify that, for example, Amit's patch to add write-ahead logging for hash indexes does not have bugs.[1] Indeed, it had occurred to me that we might not even want to compile this code into the server unless WAL_DEBUG is defined; after all, how does it help a regular user to detect that the server has a bug? Bug or no bug, that's the code they've got. But on further reflection, it seems like it could be useful: if we suspect a bug in the redo code but we can't reproduce it here, we could ask the customer to turn this option on to see whether it produces logging indicating the nature of the problem. However, because of the likely expensive of enabling the feature, it seems like it would be quite desirable to limit the expense of generating many extra FPWs to the affected rmgr. For example, if a user has a table with a btree index and a gin index, and we suspect a bug in GIN, it would be nice for the user to be able to enable the feature *only for GIN* rather than paying the cost of enabling it for btree and heap as well.[2] Similarly, when we imagine a developer using this feature to test for bugs, it may at times be useful to enable it across-the-board to look for bugs in any aspect of the write-ahead logging system. However, at other times, when the goal is to find bugs in a particular AM, it might be useful to enable it only for the corresponding rmgr. It is altogether likely that this feature will slow the system down quite a lot. If enabling this feature for hash indexes also means enabling it for the heap, the incremental performance hit might be sufficient to mask concurrency-related bugs in the hash index code that would otherwise have been found. So, I think having at least some basic filtering is probably a pretty smart idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company [1] It probably has bugs. [2] One could of course add filtering considerably more complex than per-rmgr - e.g. enabling it for only one particular relfilenode on a busy production system might be rather desirable. But I'm not sure we really want to go there. It adds a fair amount of complexity to a feature that many people are obviously hoping will be quite simple to use. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On 1 September 2016 at 11:16, Robert Haas wrote: > On Wed, Aug 31, 2016 at 7:02 PM, Simon Riggs wrote: >> I'd prefer a solution that was not dependent upon RmgrID at all. >> >> If there are various special cases that we need to cater for, ISTM >> they would be flaws in the existing WAL implementation rather than >> anything we would want to perpetuate. I hope we'll spend time fixing >> them rather than add loads of weird code to work around the >> imperfections. >> >> Underdocumented special case code is going to be unbelievably >> difficult to get right in the long term. > > It seems to me that you may be conflating the issue of which changes > should be masked out as hints (which is, indeed, special case code, > whether underdocumented or not) with the issue of which rmgrs the user > may want to verify (which is just a case of matching the rmgr ID in > the WAL record against a list provided by the user, and is not special > case code at all). Yep, it seems entirely likely that I am misunderstanding what is happening here. I'd like to see an analysis/discussion before we write code. As you might expect, I'm excited by this feature and the discoveries it appears likely to bring. We've got wal_log_hints and that causes lots of extra traffic. I'm happy with assuming that is switched on in this case also. (Perhaps we might have a wal_log_level with various levels of logging.) So in my current understanding, a hinted change has by definition no WAL record, so we just ship a FPW. A non-hint change has a WAL record and it is my (possibly naive) hope that all changes to a page are reflected in the WAL record/replay, so we can just make a simple comparison without caring what is the rmgr of the WAL record. If we can start by discussing which special cases we know about that require extra code, that will help. We can then decide whether to fix the WAL record/replay or fix the comparison logic, possibly on a case by case basis. My current preference is to generate lots of small fixes to existing WAL code and then have a very, very simple patch for this actual feature, but am willing to discuss alternate approaches. IMV this would be a feature certain users would want turned on all the time for everything. So I'm not bothered much about making this feature settable by rmgr. I might question why this particular feature would be settable by rmgr, when features like wal_log_hints and wal_compression are not, but such discussion is a minor point in comparison to discussing the main feature. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Wed, Aug 31, 2016 at 7:02 PM, Simon Riggs wrote: > I'd prefer a solution that was not dependent upon RmgrID at all. > > If there are various special cases that we need to cater for, ISTM > they would be flaws in the existing WAL implementation rather than > anything we would want to perpetuate. I hope we'll spend time fixing > them rather than add loads of weird code to work around the > imperfections. > > Underdocumented special case code is going to be unbelievably > difficult to get right in the long term. It seems to me that you may be conflating the issue of which changes should be masked out as hints (which is, indeed, special case code, whether underdocumented or not) with the issue of which rmgrs the user may want to verify (which is just a case of matching the rmgr ID in the WAL record against a list provided by the user, and is not special case code at all). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Thu, Sep 1, 2016 at 9:43 AM, Peter Geoghegan wrote: > On Wed, Aug 31, 2016 at 8:26 PM, Amit Kapila wrote: >>> As far as I am understanding things, we are aiming at something that >>> could be used on production systems. >>> >> >> I don't think you can enable it by default in production systems. >> Enabling it will lead to significant performance drop as it writes the >> whole page after each record for most type of RMGR ID's. >> >>> And, honestly, any people >>> enabling it would just do it for all RMGRs because that's a >>> no-brainer. >> >> Agreed, but remember enabling it for all is not free. > > I have sympathy for the idea that this should be as low overhead as > possible, even if that means adding complexity to the interface -- > within reason. I would like to hear a practical example of where this > RMGR id interface could be put to good use, when starting with little > initial information about a problem. > One example that comes to mind is for the cases where the problem reproduces only under high concurrency or some stress test. Now assume the problem is with index, enabling it for all rmgr's could reduce the probability of problem due to it's performance impact. The second advantage which I have already listed is it helps in future development like the one I am doing now for hash indexes (making them logged). > And, ideally, we'd also have some > indication of how big a difference that would make, it terms of > measurable performance impact. > Yes, that's a valid point. I think we can do some tests to see the difference. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Wed, Aug 31, 2016 at 8:26 PM, Amit Kapila wrote: >> As far as I am understanding things, we are aiming at something that >> could be used on production systems. >> > > I don't think you can enable it by default in production systems. > Enabling it will lead to significant performance drop as it writes the > whole page after each record for most type of RMGR ID's. > >> And, honestly, any people >> enabling it would just do it for all RMGRs because that's a >> no-brainer. > > Agreed, but remember enabling it for all is not free. I have sympathy for the idea that this should be as low overhead as possible, even if that means adding complexity to the interface -- within reason. I would like to hear a practical example of where this RMGR id interface could be put to good use, when starting with little initial information about a problem. And, ideally, we'd also have some indication of how big a difference that would make, it terms of measurable performance impact. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Thu, Sep 1, 2016 at 8:02 AM, Amit Kapila wrote: > On Wed, Aug 31, 2016 at 7:02 PM, Simon Riggs wrote: >> On 27 August 2016 at 12:09, Kuntal Ghosh wrote: >> > * wal_consistency_mask = 511 /* Enable consistency check mask bit*/ What does this mean? (No docs) >>> >>> I was using this parameter as a masking integer to indicate the >>> operations(rmgr list) for which we need this feature to be enabled. >>> Since, this could be confusing, I've changed it accordingly so that it >>> accepts a list of rmgrIDs. (suggested by Michael, Amit and Robert) >> >> Why would we want that? >> > > It would be easier to test and develop the various modules separately. > As an example, if we develop a new AM which needs WAL facility or > adding WAL capability to an existing system (say Hash Index), we can > just test that module, rather than whole system. I think it can help > us in narrowing down the problem, if we have facility to enable it at > RMGR ID level. Having said that, I think this must have the facility > to enable it for all the RMGR ID's (say ALL) and probably that should > be default. > oops, I think having an option of specifying 'ALL' is good, but that shouldn't be default, because it could have serious performance implications. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Thu, Sep 1, 2016 at 8:30 AM, Michael Paquier wrote: > On Thu, Sep 1, 2016 at 11:32 AM, Amit Kapila wrote: >> On Wed, Aug 31, 2016 at 7:02 PM, Simon Riggs wrote: >>> On 27 August 2016 at 12:09, Kuntal Ghosh wrote: >>> >> * wal_consistency_mask = 511 /* Enable consistency check mask bit*/ > > What does this mean? (No docs) I was using this parameter as a masking integer to indicate the operations(rmgr list) for which we need this feature to be enabled. Since, this could be confusing, I've changed it accordingly so that it accepts a list of rmgrIDs. (suggested by Michael, Amit and Robert) >>> >>> Why would we want that? >> >> It would be easier to test and develop the various modules separately. >> As an example, if we develop a new AM which needs WAL facility or >> adding WAL capability to an existing system (say Hash Index), we can >> just test that module, rather than whole system. I think it can help >> us in narrowing down the problem, if we have facility to enable it at >> RMGR ID level. Having said that, I think this must have the facility >> to enable it for all the RMGR ID's (say ALL) and probably that should >> be default. > > As far as I am understanding things, we are aiming at something that > could be used on production systems. > I don't think you can enable it by default in production systems. Enabling it will lead to significant performance drop as it writes the whole page after each record for most type of RMGR ID's. > And, honestly, any people > enabling it would just do it for all RMGRs because that's a > no-brainer. Agreed, but remember enabling it for all is not free. > If we are designing something for testing purposes > instead, something is wrong with this patch then. > What is wrong? > Doing filtering at RMGR level for testing and development purposes > will be done by somebody who has the skills to filter out which > records he should look at. > Right, but in that way, if you see many of our guc parameters needs a good level of understanding to set the correct values for them. For example, do you think it is easy for user to set value for "replacement_sort_tuples" without reading the description or understanding the meaning of same. This example might not be the best example, but I think there are other parameters which do require some deeper understanding of system. The main thing is default values for such parameters should be chosen carefully such that it represents most common usage. > Or he'll bump into an existing bump. So I'd > rather keep this thing simple. > It seems to me that having an option of 'ALL' would make it easier for users to set it. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Thu, Sep 1, 2016 at 11:32 AM, Amit Kapila wrote: > On Wed, Aug 31, 2016 at 7:02 PM, Simon Riggs wrote: >> On 27 August 2016 at 12:09, Kuntal Ghosh wrote: >> > * wal_consistency_mask = 511 /* Enable consistency check mask bit*/ What does this mean? (No docs) >>> >>> I was using this parameter as a masking integer to indicate the >>> operations(rmgr list) for which we need this feature to be enabled. >>> Since, this could be confusing, I've changed it accordingly so that it >>> accepts a list of rmgrIDs. (suggested by Michael, Amit and Robert) >> >> Why would we want that? > > It would be easier to test and develop the various modules separately. > As an example, if we develop a new AM which needs WAL facility or > adding WAL capability to an existing system (say Hash Index), we can > just test that module, rather than whole system. I think it can help > us in narrowing down the problem, if we have facility to enable it at > RMGR ID level. Having said that, I think this must have the facility > to enable it for all the RMGR ID's (say ALL) and probably that should > be default. As far as I am understanding things, we are aiming at something that could be used on production systems. And, honestly, any people enabling it would just do it for all RMGRs because that's a no-brainer. If we are designing something for testing purposes instead, something is wrong with this patch then. Doing filtering at RMGR level for testing and development purposes will be done by somebody who has the skills to filter out which records he should look at. Or he'll bump into an existing bump. So I'd rather keep this thing simple. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Wed, Aug 31, 2016 at 7:02 PM, Simon Riggs wrote: > On 27 August 2016 at 12:09, Kuntal Ghosh wrote: > * wal_consistency_mask = 511 /* Enable consistency check mask bit*/ >>> >>> What does this mean? (No docs) >> >> I was using this parameter as a masking integer to indicate the >> operations(rmgr list) for which we need this feature to be enabled. >> Since, this could be confusing, I've changed it accordingly so that it >> accepts a list of rmgrIDs. (suggested by Michael, Amit and Robert) > > Why would we want that? > It would be easier to test and develop the various modules separately. As an example, if we develop a new AM which needs WAL facility or adding WAL capability to an existing system (say Hash Index), we can just test that module, rather than whole system. I think it can help us in narrowing down the problem, if we have facility to enable it at RMGR ID level. Having said that, I think this must have the facility to enable it for all the RMGR ID's (say ALL) and probably that should be default. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Wed, Aug 31, 2016 at 10:32 PM, Simon Riggs wrote: > On 27 August 2016 at 12:09, Kuntal Ghosh wrote: > * wal_consistency_mask = 511 /* Enable consistency check mask bit*/ >>> >>> What does this mean? (No docs) >> >> I was using this parameter as a masking integer to indicate the >> operations(rmgr list) for which we need this feature to be enabled. >> Since, this could be confusing, I've changed it accordingly so that it >> accepts a list of rmgrIDs. (suggested by Michael, Amit and Robert) > > Why would we want that? I am still in for just an on/off switch instead of this complication. An all-or-nothing feature is what we are looking at here. Still a list is an improvement compared to a bitmap. 1. Add support for other Resource Managers. >>> >>> We probably need to have a discussion as to why you think this should >>> be Rmgr dependent? >>> Code comments would help there. >>> >>> If it does, then you should probably do this by extending RmgrTable >>> with an rm_check, so you can call it like this... >>> >>> RmgrTable[record->xl_rmid].rm_check >> >> +1. >> I'm modifying it accordingly. I'm calling this function after >> RmgrTable[record->xl_rmid].rm_redo. >> 5. Generalize the page type identification technique. >>> >>> Why not do this first? >>> >> >> At present, I'm using special page size and page ID to identify page >> type. But, I've noticed some cases where the entire page is >> initialized to zero (Ex: hash_xlog_squeeze_page). RmgrID and info bit >> can help us to identify those pages. > > I'd prefer a solution that was not dependent upon RmgrID at all. So you'd rather identify the page types by looking at pd_special? That seems worse to me but.. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers