Re: pg_waldump
Ok thanks for all these precisions Regards Fabrice On Tue, Dec 19, 2023 at 2:00 PM Matthias van de Meent < boekewurm+postg...@gmail.com> wrote: > On Tue, 19 Dec 2023, 12:27 Fabrice Chapuis, > wrote: > > > > Hi, > > Is it possible to visualize the DDL with the pg_waldump tool. I created > a postgres user but I cannot find the creation command in the wals > > Not really, no. PostgreSQL does not log DDL or DML as such in WAL. > Essentially all catalog updates are logged only as changes on a > certain page in some file: a new user getting inserted would be > approximately "Insert tuple [user's pg_role row data] on page X in > file [the file corresponding to the pg_role table]". > > You could likely derive most DDL commands from Heap/Insert, > Heap/Delete, and Heap/Update records (after cross-referencing the > database's relfilemap), as most DDL is "just" a lot of in-memory > operations plus some record insertions/updates/deletes in catalog > tables. You'd also need to keep track of any relfilemap changes while > processing the WAL, as VACUUM FULL on the catalog tables would change > the file numbering of catalog tables... > > Kind regards, > > Matthias van de Meent > Neon (https://neon.tech) >
Re: pg_waldump
On Tue, 19 Dec 2023, 12:27 Fabrice Chapuis, wrote: > > Hi, > Is it possible to visualize the DDL with the pg_waldump tool. I created a > postgres user but I cannot find the creation command in the wals Not really, no. PostgreSQL does not log DDL or DML as such in WAL. Essentially all catalog updates are logged only as changes on a certain page in some file: a new user getting inserted would be approximately "Insert tuple [user's pg_role row data] on page X in file [the file corresponding to the pg_role table]". You could likely derive most DDL commands from Heap/Insert, Heap/Delete, and Heap/Update records (after cross-referencing the database's relfilemap), as most DDL is "just" a lot of in-memory operations plus some record insertions/updates/deletes in catalog tables. You'd also need to keep track of any relfilemap changes while processing the WAL, as VACUUM FULL on the catalog tables would change the file numbering of catalog tables... Kind regards, Matthias van de Meent Neon (https://neon.tech)
pg_waldump
Hi, Is it possible to visualize the DDL with the pg_waldump tool. I created a postgres user but I cannot find the creation command in the wals Thanks for help Fabrice
Re: pg_waldump vs. all-zeros WAL files; server creation of such files
On Sat, Aug 12, 2023 at 08:15:31PM -0700, Noah Misch wrote: > The attached 010_zero.pl, when run as part of the pg_waldump test suite, fails > at today's master (c36b636) and v15 (1bc19df). It passes at v14 (5a32af3). > Command "pg_waldump --start 0/0100 --end 0/01000100" fails as follows: > > pg_waldump: error: WAL segment size must be a power of two between > 1 MB and 1 GB, but the WAL file "00010002" header > specifies 0 bytes So this depends on the ordering of the entries retrieved by readdir() as much as the segments produced by the backend. > Where it fails, the server has created an all-zeros WAL file under that name. > Where it succeeds, that file doesn't exist at all. Two decisions to make: > > - Should a clean server shutdown ever leave an all-zeros WAL file? I think > yes, it's okay to let that happen. It doesn't hurt to leave that around. On the contrary, it makes any follow-up startup cheaper the bigger the segment size. > - Should "pg_waldump --start $X --end $Y" open files not needed for the > requested range? I think no. So this is a case where identify_target_directory() is called with a fname of NULL. Agreed that search_directory could be smarter with the files it should scan, especially if we have start and/or end LSNs at hand to filter out what we'd like to be in the data folder. -- Michael signature.asc Description: PGP signature
pg_waldump vs. all-zeros WAL files; server creation of such files
The attached 010_zero.pl, when run as part of the pg_waldump test suite, fails at today's master (c36b636) and v15 (1bc19df). It passes at v14 (5a32af3). Command "pg_waldump --start 0/0100 --end 0/01000100" fails as follows: pg_waldump: error: WAL segment size must be a power of two between 1 MB and 1 GB, but the WAL file "00010002" header specifies 0 bytes Where it fails, the server has created an all-zeros WAL file under that name. Where it succeeds, that file doesn't exist at all. Two decisions to make: - Should a clean server shutdown ever leave an all-zeros WAL file? I think yes, it's okay to let that happen. - Should "pg_waldump --start $X --end $Y" open files not needed for the requested range? I think no. Bisect of master got: 30a53b7 Wed Mar 8 16:56:37 2023 +0100 Allow tailoring of ICU locales with custom rules Doesn't fail at $(git merge-base REL_15_STABLE master). Bisect of v15 got: 811203d Sat Aug 6 11:50:23 2022 -0400 Fix data-corruption hazard in WAL-logged CREATE DATABASE. I suspect those are innocent. They changed the exact WAL content, which I expect somehow caused creation of segment 2. Oddly, I find only one other report of this: https://www.postgresql.org/message-id/CAJ6DU3HiJ5FHbqPua19jAD%3DwLgiXBTjuHfbmv1jCOaNOpB3cCQ%40mail.gmail.com Thanks, nm 010_zero.pl Description: Perl program
Re: pg_waldump: add test for coverage
On 29.06.23 21:16, Tristen Raab wrote: I've reviewed your latest v3 patches on Ubuntu 23.04. Both patches apply correctly and all the tests run and pass as they should. Execution time was normal for me, I didn't notice any significant latency when compared to other tests. The only other feedback I can provide would be to add test coverage to some of the other options that aren't currently covered (ie. --bkp-details, --end, --follow, --path, etc.) for completeness. Other than that, this looks like a great patch. Committed. I added a test for the --quiet option. --end and --path are covered. The only options not covered now are -b, --bkp-details output detailed information about backup blocks -f, --follow keep retrying after reaching end of WAL -t, --timeline=TLI timeline from which to read WAL records -x, --xid=XID only show records with transaction ID XID --follow is a bit tricky to test because you need to leave pg_waldump running in the background for a while, or something like that. --timeline and --xid can be tested but would need some work on the underlying test data (such as creating more than one timeline). I don't know much about --bkp-details, so I don't have a good idea how to test it. So I'll leave those as projects for the future.
Re: pg_waldump: add test for coverage
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation:not tested Hello, I've reviewed your latest v3 patches on Ubuntu 23.04. Both patches apply correctly and all the tests run and pass as they should. Execution time was normal for me, I didn't notice any significant latency when compared to other tests. The only other feedback I can provide would be to add test coverage to some of the other options that aren't currently covered (ie. --bkp-details, --end, --follow, --path, etc.) for completeness. Other than that, this looks like a great patch. Kind regards, Tristen
Re: Add TLI number to name of files generated by pg_waldump --save-fullpage
On Wed, Jun 28, 2023 at 09:20:27AM +0900, Kyotaro Horiguchi wrote: > At Tue, 27 Jun 2023 18:58:39 -0500, David Christensen > wrote in >>> Adjusted as per the v2 attached. >> >> +1 > > +1 Okay, cool. Both of you seem happy with it, so I have applied it. Thanks for the quick checks. -- Michael signature.asc Description: PGP signature
Re: pg_waldump: add test for coverage
On 14.06.23 09:16, Peter Eisentraut wrote: On 06.09.22 07:57, Peter Eisentraut wrote: I wrote a test for coverage. Unfortunately, it seems to take quite a while to run the test. I want to improve these execution times, but I don't know exactly what to do. Therefore, I want to hear feedback from many people. I think having some more test coverage for pg_waldump would be good, so I encourage you to continue working on this. I made an updated patch that incorporates many of your ideas and code, just made it a bit more compact, and added more tests for various command-line options. This moves the test coverage of pg_waldump from "bloodbath" to "mixed fruit salad", which I think is pretty good progress. And now there is room for additional patches if someone wants to figure out, e.g., how to get more complete coverage in gindesc.c or whatever. Here is an updated patch set. I added a test case for the "first record is after" message. Also, I think this message should really go to stderr, since it's more of a notice or warning, so I changed it to use pg_log_info. From e5d13b7891a9c0fcc3d28f9bc2756c5372b2fa40 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Wed, 14 Jun 2023 08:46:47 +0200 Subject: [PATCH v3 1/2] Add more pg_waldump tests This adds tests for most command-line options and tests for most rmgrdesc routines. Discussion: https://www.postgresql.org/message-id/flat/CAAcByaKM7zyJSDkPWv6_%2BapupY4fXXM3q3SRXas9MMNVPUGcsQ%40mail.gmail.com --- src/bin/pg_waldump/t/001_basic.pl | 191 ++ 1 file changed, 191 insertions(+) diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl index 492a8cadd8..d4056e1b95 100644 --- a/src/bin/pg_waldump/t/001_basic.pl +++ b/src/bin/pg_waldump/t/001_basic.pl @@ -3,6 +3,7 @@ use strict; use warnings; +use PostgreSQL::Test::Cluster; use PostgreSQL::Test::Utils; use Test::More; @@ -10,4 +11,194 @@ program_version_ok('pg_waldump'); program_options_handling_ok('pg_waldump'); +# wrong number of arguments +command_fails_like([ 'pg_waldump', ], qr/error: no arguments/, 'no arguments'); +command_fails_like([ 'pg_waldump', 'foo', 'bar', 'baz' ], qr/error: too many command-line arguments/, 'too many arguments'); + +# invalid option arguments +command_fails_like([ 'pg_waldump', '--block', 'bad' ], qr/error: invalid block number/, 'invalid block number'); +command_fails_like([ 'pg_waldump', '--fork', 'bad' ], qr/error: invalid fork name/, 'invalid fork name'); +command_fails_like([ 'pg_waldump', '--limit', 'bad' ], qr/error: invalid value/, 'invalid limit'); +command_fails_like([ 'pg_waldump', '--relation', 'bad' ], qr/error: invalid relation/, 'invalid relation specification'); +command_fails_like([ 'pg_waldump', '--rmgr', 'bad' ], qr/error: resource manager .* does not exist/, 'invalid rmgr name'); +command_fails_like([ 'pg_waldump', '--start', 'bad' ], qr/error: invalid WAL location/, 'invalid start LSN'); +command_fails_like([ 'pg_waldump', '--end', 'bad' ], qr/error: invalid WAL location/, 'invalid end LSN'); + +# rmgr list: If you add one to the list, consider also adding a test +# case exercising the new rmgr below. +command_like([ 'pg_waldump', '--rmgr=list'], qr/^XLOG +Transaction +Storage +CLOG +Database +Tablespace +MultiXact +RelMap +Standby +Heap2 +Heap +Btree +Hash +Gin +Gist +Sequence +SPGist +BRIN +CommitTs +ReplicationOrigin +Generic +LogicalMessage$/, + 'rmgr list'); + + +my $node = PostgreSQL::Test::Cluster->new('main'); +$node->init; +$node->append_conf('postgresql.conf', q{ +autovacuum = off +checkpoint_timeout = 1h + +# for standbydesc +archive_mode=on +archive_command='' + +# for XLOG_HEAP_TRUNCATE +wal_level=logical +}); +$node->start; + +my ($start_lsn, $start_walfile) = split /\|/, $node->safe_psql('postgres', q{SELECT pg_current_wal_insert_lsn(), pg_walfile_name(pg_current_wal_insert_lsn())}); + +$node->safe_psql('postgres', q{ +-- heap, btree, hash, sequence +CREATE TABLE t1 (a int GENERATED ALWAYS AS IDENTITY, b text); +CREATE INDEX i1a ON t1 USING btree (a); +CREATE INDEX i1b ON t1 USING hash (b); +INSERT INTO t1 VALUES (default, 'one'), (default, 'two'); +DELETE FROM t1 WHERE b = 'one'; +TRUNCATE t1; + +-- abort +START TRANSACTION; +INSERT INTO t1 VALUES (default, 'three'); +ROLLBACK; + +-- unlogged/init fork +CREATE UNLOGGED TABLE t2 (x int); +CREATE INDEX i2 ON t2 USING btree (x); +INSERT INTO t2 SELEC
Re: Add TLI number to name of files generated by pg_waldump --save-fullpage
At Tue, 27 Jun 2023 18:58:39 -0500, David Christensen wrote in > > Adjusted as per the v2 attached. > > +1 +1 regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Add TLI number to name of files generated by pg_waldump --save-fullpage
> Adjusted as per the v2 attached. +1
Re: Add TLI number to name of files generated by pg_waldump --save-fullpage
On Tue, Jun 27, 2023 at 04:39:52PM +0900, Kyotaro Horiguchi wrote: > I meant that the name is structured as > TLIh-TLIl._, which > appears to be inconsistent with the comment. (And I'm not sure what > "TLOID" is..) Well, to be clear, it should not be TLIh-TLIl but LSNh-LSNl :) I'm OK with these terms for the comments. This is very internal anyway so anybody using this feature should know what that means. And yes, the order of the items is wrong, and I agree that TLOID is a bit confusing once the TLI is added in the set. I have just used TBLSPCOID as term in the comment, and adjusted the XXX to be about the LSN numbers. Adjusted as per the v2 attached. -- Michael From f0936a95651f767e0f56ff507db5d5a02c7629b2 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Wed, 28 Jun 2023 08:46:26 +0900 Subject: [PATCH v2] Add timeline to file names generated with pg_waldump --save-fullpage Reported-by: Fujii Masao --- src/bin/pg_waldump/pg_waldump.c | 3 ++- src/bin/pg_waldump/t/002_save_fullpage.pl | 9 + doc/src/sgml/ref/pg_waldump.sgml | 9 - 3 files changed, 15 insertions(+), 6 deletions(-) diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c index c6d3ae6344..96845e1a1a 100644 --- a/src/bin/pg_waldump/pg_waldump.c +++ b/src/bin/pg_waldump/pg_waldump.c @@ -518,7 +518,8 @@ XLogRecordSaveFPWs(XLogReaderState *record, const char *savepath) else pg_fatal("invalid fork number: %u", fork); - snprintf(filename, MAXPGPATH, "%s/%08X-%08X.%u.%u.%u.%u%s", savepath, + snprintf(filename, MAXPGPATH, "%s/%08X-%08X-%08X.%u.%u.%u.%u%s", savepath, + record->seg.ws_tli, LSN_FORMAT_ARGS(record->ReadRecPtr), rnode.spcOid, rnode.dbOid, rnode.relNumber, blk, forkname); diff --git a/src/bin/pg_waldump/t/002_save_fullpage.pl b/src/bin/pg_waldump/t/002_save_fullpage.pl index 831ffdefef..f0725805f2 100644 --- a/src/bin/pg_waldump/t/002_save_fullpage.pl +++ b/src/bin/pg_waldump/t/002_save_fullpage.pl @@ -79,15 +79,16 @@ $node->command_ok( 'pg_waldump with --save-fullpage runs'); # This regexp will match filenames formatted as: -# -.DBOID.TLOID.NODEOID.dd_fork with the components being: -# - WAL LSN in hex format, -# - Tablespace OID (0 for global) +# TLI-LSNh-LSNl.TBLSPCOID.DBOID.NODEOID.dd_fork with the components being: +# - Timeline ID in hex format. +# - WAL LSN in hex format, as two 8-character numbers. +# - Tablespace OID (0 for global). # - Database OID. # - Relfilenode. # - Block number. # - Fork this block came from (vm, init, fsm, or main). my $file_re = - qr/^([0-9A-F]{8})-([0-9A-F]{8})[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+(?:_vm|_init|_fsm|_main)?$/; + qr/^[0-9A-F]{8}-([0-9A-F]{8})-([0-9A-F]{8})[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+(?:_vm|_init|_fsm|_main)?$/; my $file_count = 0; diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml index e5f9637847..4592d6016a 100644 --- a/doc/src/sgml/ref/pg_waldump.sgml +++ b/doc/src/sgml/ref/pg_waldump.sgml @@ -283,7 +283,7 @@ PostgreSQL documentation The full page images are saved with the following file name format: -LSN.RELTABLESPACE.DATOID.RELNODE.BLKNOFORK +TIMELINE-LSN.RELTABLESPACE.DATOID.RELNODE.BLKNOFORK The file names are composed of the following parts: @@ -296,6 +296,13 @@ PostgreSQL documentation + +TIMELINE +The timeline of the WAL segment file where the record + is located formatted as one 8-character hexadecimal number + %08X + + LSN The LSN of the record with this image, -- 2.40.1 signature.asc Description: PGP signature
Re: Add TLI number to name of files generated by pg_waldump --save-fullpage
On Tue, Jun 27, 2023 at 11:53:10AM -0500, David Christensen wrote: > Patch looks good, but agreed that that comment should also be fixed. Okay, thanks for checking! -- Michael signature.asc Description: PGP signature
Re: Add TLI number to name of files generated by pg_waldump --save-fullpage
On Tue, Jun 27, 2023 at 1:12 AM Michael Paquier wrote: > > Hi all, > (Fujii-san and David in CC.) > > Fujii-san has reported on Twitter that we had better add the TLI > number to what pg_waldump --save-fullpage generates for the file names > of the blocks, as it could be possible that we overwrite some blocks. > This information can be added thanks to ws_tli, that tracks the TLI of > the opened segment. > > Attached is a patch to fix this issue, adding an open item assigned to > me. The file format is documented in the TAP test and the docs, the > two only places that would need a refresh. > > Thoughts or comments? Patch looks good, but agreed that that comment should also be fixed. Thanks! David
Re: Add TLI number to name of files generated by pg_waldump --save-fullpage
Of course, it's wrong. At Tue, 27 Jun 2023 16:39:52 +0900 (JST), Kyotaro Horiguchi wrote in > I meant that the name is structured as - TLIh-TLIl._, which + LSNh-LSNl._, which > appears to be inconsistent with the comment. (And I'm not sure what > "TLOID" is..) > > regards. > > -- > Kyotaro Horiguchi > NTT Open Source Software Center
Re: Add TLI number to name of files generated by pg_waldump --save-fullpage
At Tue, 27 Jun 2023 15:58:38 +0900, Michael Paquier wrote in > On Tue, Jun 27, 2023 at 03:44:04PM +0900, Kyotaro Horiguchi wrote: > > The file name was "LSNh-LSNl.spcOid.dbOid.relNumber.blk_forkname", but > > the comment in the TAP script read as: > > > > -# -.DBOID.TLOID.NODEOID.dd_fork with the components being: > > > > which looks wrong. I'm not sure it is a business of this patch, though.. > > This part is getting changed here anyway, so improving it is fine by > me with the terms you are suggesting for these two 4-byte values in > this comment. I meant that the name is structured as TLIh-TLIl._, which appears to be inconsistent with the comment. (And I'm not sure what "TLOID" is..) regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Add TLI number to name of files generated by pg_waldump --save-fullpage
On Tue, Jun 27, 2023 at 03:44:04PM +0900, Kyotaro Horiguchi wrote: > +# - Timeline number in hex format. > > Arn't we reffering to it as "Timeline ID"? (I remember there was a > discussion about redefining the "timeline ID" to use non-orderable > IDs. That is, making it non-numbers.) Using ID is fine by me. > The file name was "LSNh-LSNl.spcOid.dbOid.relNumber.blk_forkname", but > the comment in the TAP script read as: > > -# -.DBOID.TLOID.NODEOID.dd_fork with the components being: > > which looks wrong. I'm not sure it is a business of this patch, though.. This part is getting changed here anyway, so improving it is fine by me with the terms you are suggesting for these two 4-byte values in this comment. -- Michael signature.asc Description: PGP signature
Re: Add TLI number to name of files generated by pg_waldump --save-fullpage
At Tue, 27 Jun 2023 15:12:43 +0900, Michael Paquier wrote in > Hi all, > (Fujii-san and David in CC.) > > Fujii-san has reported on Twitter that we had better add the TLI > number to what pg_waldump --save-fullpage generates for the file names > of the blocks, as it could be possible that we overwrite some blocks. > This information can be added thanks to ws_tli, that tracks the TLI of > the opened segment. > > Attached is a patch to fix this issue, adding an open item assigned to > me. The file format is documented in the TAP test and the docs, the > two only places that would need a refresh. > > Thoughts or comments? It's sensible to add TLI to the file name. So +1 from me. +# - Timeline number in hex format. Arn't we reffering to it as "Timeline ID"? (I remember there was a discussion about redefining the "timeline ID" to use non-orderable IDs. That is, making it non-numbers.) Otherwise it looks fine to me. By the way, somewhat irrelevant to this patch, regading the the file name for the output. The file name was "LSNh-LSNl.spcOid.dbOid.relNumber.blk_forkname", but the comment in the TAP script read as: -# -.DBOID.TLOID.NODEOID.dd_fork with the components being: which looks wrong. I'm not sure it is a business of this patch, though.. # Documentation looks coorect. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Add TLI number to name of files generated by pg_waldump --save-fullpage
Hi all, (Fujii-san and David in CC.) Fujii-san has reported on Twitter that we had better add the TLI number to what pg_waldump --save-fullpage generates for the file names of the blocks, as it could be possible that we overwrite some blocks. This information can be added thanks to ws_tli, that tracks the TLI of the opened segment. Attached is a patch to fix this issue, adding an open item assigned to me. The file format is documented in the TAP test and the docs, the two only places that would need a refresh. Thoughts or comments? -- Michael From 35dc549115e8bf054a5f756c652b07f3f49d0bce Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Tue, 27 Jun 2023 15:04:49 +0900 Subject: [PATCH v1] Add timeline to file names generated with pg_waldump --save-fullpage Reported-by: Fujii Masao --- src/bin/pg_waldump/pg_waldump.c | 3 ++- src/bin/pg_waldump/t/002_save_fullpage.pl | 9 + doc/src/sgml/ref/pg_waldump.sgml | 9 - 3 files changed, 15 insertions(+), 6 deletions(-) diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c index c6d3ae6344..96845e1a1a 100644 --- a/src/bin/pg_waldump/pg_waldump.c +++ b/src/bin/pg_waldump/pg_waldump.c @@ -518,7 +518,8 @@ XLogRecordSaveFPWs(XLogReaderState *record, const char *savepath) else pg_fatal("invalid fork number: %u", fork); - snprintf(filename, MAXPGPATH, "%s/%08X-%08X.%u.%u.%u.%u%s", savepath, + snprintf(filename, MAXPGPATH, "%s/%08X-%08X-%08X.%u.%u.%u.%u%s", savepath, + record->seg.ws_tli, LSN_FORMAT_ARGS(record->ReadRecPtr), rnode.spcOid, rnode.dbOid, rnode.relNumber, blk, forkname); diff --git a/src/bin/pg_waldump/t/002_save_fullpage.pl b/src/bin/pg_waldump/t/002_save_fullpage.pl index 831ffdefef..dc53a6f5b4 100644 --- a/src/bin/pg_waldump/t/002_save_fullpage.pl +++ b/src/bin/pg_waldump/t/002_save_fullpage.pl @@ -79,15 +79,16 @@ $node->command_ok( 'pg_waldump with --save-fullpage runs'); # This regexp will match filenames formatted as: -# -.DBOID.TLOID.NODEOID.dd_fork with the components being: -# - WAL LSN in hex format, -# - Tablespace OID (0 for global) +# TLI--.DBOID.TLOID.NODEOID.dd_fork with the components being: +# - Timeline number in hex format. +# - WAL LSN in hex format. +# - Tablespace OID (0 for global). # - Database OID. # - Relfilenode. # - Block number. # - Fork this block came from (vm, init, fsm, or main). my $file_re = - qr/^([0-9A-F]{8})-([0-9A-F]{8})[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+(?:_vm|_init|_fsm|_main)?$/; + qr/^[0-9A-F]{8}-([0-9A-F]{8})-([0-9A-F]{8})[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+(?:_vm|_init|_fsm|_main)?$/; my $file_count = 0; diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml index e5f9637847..4592d6016a 100644 --- a/doc/src/sgml/ref/pg_waldump.sgml +++ b/doc/src/sgml/ref/pg_waldump.sgml @@ -283,7 +283,7 @@ PostgreSQL documentation The full page images are saved with the following file name format: -LSN.RELTABLESPACE.DATOID.RELNODE.BLKNOFORK +TIMELINE-LSN.RELTABLESPACE.DATOID.RELNODE.BLKNOFORK The file names are composed of the following parts: @@ -296,6 +296,13 @@ PostgreSQL documentation + +TIMELINE +The timeline of the WAL segment file where the record + is located formatted as one 8-character hexadecimal number + %08X + + LSN The LSN of the record with this image, -- 2.40.1 signature.asc Description: PGP signature
Re: pg_waldump: add test for coverage
On 06.09.22 07:57, Peter Eisentraut wrote: I wrote a test for coverage. Unfortunately, it seems to take quite a while to run the test. I want to improve these execution times, but I don't know exactly what to do. Therefore, I want to hear feedback from many people. I think having some more test coverage for pg_waldump would be good, so I encourage you to continue working on this. I made an updated patch that incorporates many of your ideas and code, just made it a bit more compact, and added more tests for various command-line options. This moves the test coverage of pg_waldump from "bloodbath" to "mixed fruit salad", which I think is pretty good progress. And now there is room for additional patches if someone wants to figure out, e.g., how to get more complete coverage in gindesc.c or whatever. From 67d63a87cf9ea8de69801127607101b3a515fb42 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Wed, 14 Jun 2023 08:46:47 +0200 Subject: [PATCH v2] Add more pg_waldump tests This adds tests for most command-line options and tests for most rmgrdesc routines. Discussion: https://www.postgresql.org/message-id/flat/CAAcByaKM7zyJSDkPWv6_%2BapupY4fXXM3q3SRXas9MMNVPUGcsQ%40mail.gmail.com --- src/bin/pg_waldump/t/001_basic.pl | 191 ++ 1 file changed, 191 insertions(+) diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl index 492a8cadd8..d4056e1b95 100644 --- a/src/bin/pg_waldump/t/001_basic.pl +++ b/src/bin/pg_waldump/t/001_basic.pl @@ -3,6 +3,7 @@ use strict; use warnings; +use PostgreSQL::Test::Cluster; use PostgreSQL::Test::Utils; use Test::More; @@ -10,4 +11,194 @@ program_version_ok('pg_waldump'); program_options_handling_ok('pg_waldump'); +# wrong number of arguments +command_fails_like([ 'pg_waldump', ], qr/error: no arguments/, 'no arguments'); +command_fails_like([ 'pg_waldump', 'foo', 'bar', 'baz' ], qr/error: too many command-line arguments/, 'too many arguments'); + +# invalid option arguments +command_fails_like([ 'pg_waldump', '--block', 'bad' ], qr/error: invalid block number/, 'invalid block number'); +command_fails_like([ 'pg_waldump', '--fork', 'bad' ], qr/error: invalid fork name/, 'invalid fork name'); +command_fails_like([ 'pg_waldump', '--limit', 'bad' ], qr/error: invalid value/, 'invalid limit'); +command_fails_like([ 'pg_waldump', '--relation', 'bad' ], qr/error: invalid relation/, 'invalid relation specification'); +command_fails_like([ 'pg_waldump', '--rmgr', 'bad' ], qr/error: resource manager .* does not exist/, 'invalid rmgr name'); +command_fails_like([ 'pg_waldump', '--start', 'bad' ], qr/error: invalid WAL location/, 'invalid start LSN'); +command_fails_like([ 'pg_waldump', '--end', 'bad' ], qr/error: invalid WAL location/, 'invalid end LSN'); + +# rmgr list: If you add one to the list, consider also adding a test +# case exercising the new rmgr below. +command_like([ 'pg_waldump', '--rmgr=list'], qr/^XLOG +Transaction +Storage +CLOG +Database +Tablespace +MultiXact +RelMap +Standby +Heap2 +Heap +Btree +Hash +Gin +Gist +Sequence +SPGist +BRIN +CommitTs +ReplicationOrigin +Generic +LogicalMessage$/, + 'rmgr list'); + + +my $node = PostgreSQL::Test::Cluster->new('main'); +$node->init; +$node->append_conf('postgresql.conf', q{ +autovacuum = off +checkpoint_timeout = 1h + +# for standbydesc +archive_mode=on +archive_command='' + +# for XLOG_HEAP_TRUNCATE +wal_level=logical +}); +$node->start; + +my ($start_lsn, $start_walfile) = split /\|/, $node->safe_psql('postgres', q{SELECT pg_current_wal_insert_lsn(), pg_walfile_name(pg_current_wal_insert_lsn())}); + +$node->safe_psql('postgres', q{ +-- heap, btree, hash, sequence +CREATE TABLE t1 (a int GENERATED ALWAYS AS IDENTITY, b text); +CREATE INDEX i1a ON t1 USING btree (a); +CREATE INDEX i1b ON t1 USING hash (b); +INSERT INTO t1 VALUES (default, 'one'), (default, 'two'); +DELETE FROM t1 WHERE b = 'one'; +TRUNCATE t1; + +-- abort +START TRANSACTION; +INSERT INTO t1 VALUES (default, 'three'); +ROLLBACK; + +-- unlogged/init fork +CREATE UNLOGGED TABLE t2 (x int); +CREATE INDEX i2 ON t2 USING btree (x); +INSERT INTO t2 SELECT generate_series(1, 10); + +-- gin +CREATE TABLE gin_idx_tbl (id bigserial PRIMARY KEY, data jsonb); +CREATE INDEX gin_idx ON gin_idx_tbl USING gin (data); +INSERT INTO gin_idx_tbl +WITH random_json AS ( +SELECT json_object_agg(key, trunc(random() * 10)) as json_data +
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Mon, Dec 26, 2022 at 4:18 PM Bharath Rupireddy wrote: > > On Mon, Dec 26, 2022 at 12:59 PM Michael Paquier wrote: > > +1. I think this feature will also be useful in pg_walinspect. Just for the record - here's the pg_walinspect function to extract FPIs from WAL records - https://www.postgresql.org/message-id/CALj2ACVCcvzd7WiWvD%3D6_7NBvVB_r6G0EGSxL4F8vosAi6Se4g%40mail.gmail.com. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Mon, Dec 26, 2022 at 04:18:18PM +0530, Bharath Rupireddy wrote: > +1. I think this feature will also be useful in pg_walinspect. > However, I'm a bit concerned that it can flood the running database > disk - if someone generates a lot of FPI files. pg_read_file() and pg_waldump can be fed absolute paths outside the data folder. So, well, just don't do that then :) -- Michael signature.asc Description: PGP signature
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Mon, Dec 26, 2022 at 02:39:03PM -0600, Justin Pryzby wrote: > fclose() should be tested, too: Sure. Done that too, and applied the change after a last lookup. -- Michael signature.asc Description: PGP signature
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Mon, Dec 26, 2022 at 04:28:52PM +0900, Michael Paquier wrote: > Comments? > + file = fopen(filename, PG_BINARY_W); > + if (!file) > + pg_fatal("could not open file \"%s\": %m", filename); > + > + if (fwrite(page, BLCKSZ, 1, file) != 1) > + pg_fatal("could not write file \"%s\": %m", filename); > + > + fclose(file); fclose() should be tested, too: > + if (fwrite(page, BLCKSZ, 1, file) != 1 || fclose(file) != 0) > + pg_fatal("could not write file \"%s\": %m", filename); -- Justin
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
> On Dec 26, 2022, at 1:29 AM, Michael Paquier wrote: > > On Sat, Dec 24, 2022 at 06:23:29PM +0530, Bharath Rupireddy wrote: >> Thanks for the patch. I've made the above change as well as renamed >> the test file name to be save_fpi.pl, everything else remains the same >> as v11. Here's the v12 patch which LGTM. I'll mark it as RfC - >> https://commitfest.postgresql.org/41/3628/. > > I have done a review of that, and here are my notes: > - The variable names were a bit inconsistent, so I have switched most > of the new code to use "fullpage". > - The code was not able to handle the case of a target directory > existing but empty, so I have added a wrapper on pg_check_dir(). > - XLogRecordHasFPW() could be checked directly in the function saving > the blocks. Still, there is no need for it as we apply the same > checks again in the inner loop of the routine. > - The new test has been renamed. > - RestoreBlockImage() would report a failure and the code would just > skip it and continue its work. This could point out to a compression > failure for example, so like any code paths calling this routine I > think that we'd better do a pg_fatal() and fail hard. > - I did not understand why there is a reason to make this option > conditional on the record prints or even the stats, so I have moved > the FPW save routine into a separate code path. The other two could > be silenced (or not) using --quiet for example, for the same result as > v12 without impacting the usability of this feature. > - Few tweaks to the docs, the --help output, the comments and the > tests. > - Indentation applied. > > Being able to filter the blocks saved using start/end LSNs or just > --relation is really cool, especially as the file names use the same > order as what's needed for this option. Sounds good, definitely along the ideas of what I’d originally envisioned. Thanks, David
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Mon, Dec 26, 2022 at 12:59 PM Michael Paquier wrote: > > I have done a review of that, and here are my notes: > - The variable names were a bit inconsistent, so I have switched most > of the new code to use "fullpage". > > - The new test has been renamed. > > - RestoreBlockImage() would report a failure and the code would just > skip it and continue its work. This could point out to a compression > failure for example, so like any code paths calling this routine I > think that we'd better do a pg_fatal() and fail hard. > > - XLogRecordHasFPW() could be checked directly in the function saving > the blocks. Still, there is no need for it as we apply the same > checks again in the inner loop of the routine. > > - Few tweaks to the docs, the --help output, the comments and the > tests. > - Indentation applied. > > - I did not understand why there is a reason to make this option > conditional on the record prints or even the stats, so I have moved > the FPW save routine into a separate code path. The other two could > be silenced (or not) using --quiet for example, for the same result as > v12 without impacting the usability of this feature. Looks good. > - The code was not able to handle the case of a target directory > existing but empty, so I have added a wrapper on pg_check_dir(). Looks okay and with the following, we impose the user-provided target directory must be empty. +case 4: +/* Exists and not empty */ +pg_fatal("directory \"%s\" exists but is not empty", path); > Being able to filter the blocks saved using start/end LSNs or just > --relation is really cool, especially as the file names use the same > order as what's needed for this option. > > Comments? +1. I think this feature will also be useful in pg_walinspect. However, I'm a bit concerned that it can flood the running database disk - if someone generates a lot of FPI files. Overall, the v13 patch LGTM. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Sat, Dec 24, 2022 at 06:23:29PM +0530, Bharath Rupireddy wrote: > Thanks for the patch. I've made the above change as well as renamed > the test file name to be save_fpi.pl, everything else remains the same > as v11. Here's the v12 patch which LGTM. I'll mark it as RfC - > https://commitfest.postgresql.org/41/3628/. I have done a review of that, and here are my notes: - The variable names were a bit inconsistent, so I have switched most of the new code to use "fullpage". - The code was not able to handle the case of a target directory existing but empty, so I have added a wrapper on pg_check_dir(). - XLogRecordHasFPW() could be checked directly in the function saving the blocks. Still, there is no need for it as we apply the same checks again in the inner loop of the routine. - The new test has been renamed. - RestoreBlockImage() would report a failure and the code would just skip it and continue its work. This could point out to a compression failure for example, so like any code paths calling this routine I think that we'd better do a pg_fatal() and fail hard. - I did not understand why there is a reason to make this option conditional on the record prints or even the stats, so I have moved the FPW save routine into a separate code path. The other two could be silenced (or not) using --quiet for example, for the same result as v12 without impacting the usability of this feature. - Few tweaks to the docs, the --help output, the comments and the tests. - Indentation applied. Being able to filter the blocks saved using start/end LSNs or just --relation is really cool, especially as the file names use the same order as what's needed for this option. Comments? -- Michael From 66eedbb96858a4f6804509900b116d8a63b39925 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Mon, 26 Dec 2022 16:28:00 +0900 Subject: [PATCH v13] Teach pg_waldump to extract FPIs from the WAL stream Extracts full-page images from the WAL stream into a given target directory. These images are subject to the same filtering rules as normal display in pg_waldump, which means that you can isolate the full page writes to a target relation, among other things. Files are saved with the filename: _ with formatting to make things somewhat sortable; for instance: -01C0.1663.1.6117.0_main -01000150.1664.0.6115.0_main -010001E0.1664.0.6114.0_main -01000270.1663.1.6116.0_main -01000300.1663.1.6113.0_main -01000390.1663.1.6112.0_main -01000420.1663.1.8903.0_main -010004B0.1663.1.8902.0_main -01000540.1663.1.6111.0_main -010005D0.1663.1.6110.0_main It's noteworthy that the raw block images do not have the current LSN stored with them in the WAL stream (as would be true for on-heap versions of the blocks), nor would the checksum be updated in them (though WAL itself has checksums, so there is some protection there). These images could be loaded/inspected via `pg_read_binary_file()` and used in the `pageinspect` suite of tools to perform detailed analysis on the pages in question, based on historical information, and may come in handy for forensics work. --- src/bin/pg_waldump/meson.build| 1 + src/bin/pg_waldump/pg_waldump.c | 107 + src/bin/pg_waldump/t/002_save_fullpage.pl | 111 ++ doc/src/sgml/ref/pg_waldump.sgml | 66 + 4 files changed, 285 insertions(+) create mode 100644 src/bin/pg_waldump/t/002_save_fullpage.pl diff --git a/src/bin/pg_waldump/meson.build b/src/bin/pg_waldump/meson.build index 3fa1b53e71..0428998350 100644 --- a/src/bin/pg_waldump/meson.build +++ b/src/bin/pg_waldump/meson.build @@ -31,6 +31,7 @@ tests += { 'tap': { 'tests': [ 't/001_basic.pl', + 't/002_save_fullpage.pl', ], }, } diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c index 9993378ca5..d90a142c68 100644 --- a/src/bin/pg_waldump/pg_waldump.c +++ b/src/bin/pg_waldump/pg_waldump.c @@ -23,9 +23,13 @@ #include "access/xlogrecord.h" #include "access/xlogstats.h" #include "common/fe_memutils.h" +#include "common/file_perm.h" +#include "common/file_utils.h" #include "common/logging.h" +#include "common/relpath.h" #include "getopt_long.h" #include "rmgrdesc.h" +#include "storage/bufpage.h" /* * NOTE: For any code change or issue fix here, it is highly recommended to @@ -70,6 +74,9 @@ typedef struct XLogDumpConfig bool filter_by_relation_block_enabled; ForkNumber filter_by_relation_forknum; bool filter_by_fpw; + + /* save options */ + char *save_fullpage_path; } XLogDumpConfig; @@ -112,6 +119,37 @@ verify_directory(const char *directory) return true; } +/* + * Create if necessary the directory storing the full-
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Sat, Dec 24, 2022 at 12:58 AM David Christensen wrote: > > On Fri, Dec 23, 2022 at 12:57 PM David Christensen > wrote: > > > > On Wed, Dec 21, 2022 at 5:47 AM Bharath Rupireddy > > wrote: > > > > 2. +$node->init(extra => ['-k'], allows_streaming => 1); > > > When enabled with allows_streaming, there are a bunch of things that > > > happen to the node while initializing, I don't think we need all of > > > them for this. > > > > I think the "allows_streaming" was required to ensure the WAL files > > were preserved properly, and was the approach we ended up taking > > rather than trying to fail the archive_command or other approaches I'd > > taken earlier. I'd rather keep this if we can, unless you can propose > > a different approach that would continue to work in the same way. > > Confirmed that we needed this in order to create the replication slot, > so this /is/ required for the test to work. The added test needs wal_level to be replica, but the TAP tests set it to minimal if allows_streaming isn't passed. However, if passed allows_streaming, it sets a bunch of other parameters which are not required for this test (see note->init function in cluster.pm), hence we could just set the required parameters wal_level = replica and max_wal_senders for the replication slot to be created. > Enclosing v11 with yours and Michael's latest feedback. Thanks for the patch. I've made the above change as well as renamed the test file name to be save_fpi.pl, everything else remains the same as v11. Here's the v12 patch which LGTM. I'll mark it as RfC - https://commitfest.postgresql.org/41/3628/. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com v12-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-st.patch Description: Binary data
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Fri, Dec 23, 2022 at 12:57 PM David Christensen wrote: > > On Wed, Dec 21, 2022 at 5:47 AM Bharath Rupireddy > wrote: [snip] > > 2. +$node->init(extra => ['-k'], allows_streaming => 1); > > When enabled with allows_streaming, there are a bunch of things that > > happen to the node while initializing, I don't think we need all of > > them for this. > > I think the "allows_streaming" was required to ensure the WAL files > were preserved properly, and was the approach we ended up taking > rather than trying to fail the archive_command or other approaches I'd > taken earlier. I'd rather keep this if we can, unless you can propose > a different approach that would continue to work in the same way. Confirmed that we needed this in order to create the replication slot, so this /is/ required for the test to work. Enclosing v11 with yours and Michael's latest feedback. Best, David v11-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-st.patch Description: Binary data
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Mon, Dec 19, 2022 at 12:23 AM Michael Paquier wrote: > > On Thu, Dec 15, 2022 at 05:17:46PM -0600, David Christensen wrote: > > On Thu, Dec 15, 2022 at 12:36 AM Michael Paquier > > wrote: > > This v10 should incorporate your feedback as well as Bharath's. > > Thanks for the new version. I have minor comments. > > >> It seems to me that you could allow things to work even if the > >> directory exists and is empty. See for example > >> verify_dir_is_empty_or_create() in pg_basebackup.c. > > > > The `pg_mkdir_p()` supports an existing directory (and I don't think > > we want to require it to be empty first), so this only errors when it > > can't create a directory for some reason. > > Sure, but things can also be made so as we don't fail if the directory > exists and is empty? This would be more consistent with the base > directories created by pg_basebackup and initdb. I guess I'm feeling a little dense here; how is this failing if there is an existing empty directory? > >> +$node->safe_psql('postgres', < >> +SELECT 'init' FROM > >> pg_create_physical_replication_slot('regress_pg_waldump_slot', true, > >> false); > >> +CREATE TABLE test_table AS SELECT generate_series(1,100) a; > >> +CHECKPOINT; -- required to force FPI for next writes > >> +UPDATE test_table SET a = a + 1; > >> Using an EOF to execute a multi-line query would be a first. Couldn't > >> you use the same thing as anywhere else? 009_twophase.pl just to > >> mention one. (Mentioned by Bharath upthread, where he asked for an > >> extra opinion so here it is.) > > > > Fair enough, while idiomatic perl to me, not a hill to die on; > > converted to a standard multiline string. > > By the way, knowing that we have an option called --fullpage, could be > be better to use --save-fullpage for the option name? Works for me. I think I'd just wanted to avoid reformatting the entire usage message which is why I'd gone with the shorter version. > + OPF = fopen(filename, PG_BINARY_W); > + if (!OPF) > + pg_fatal("couldn't open file for output: %s", filename); > [..] > + if (fwrite(page, BLCKSZ, 1, OPF) != 1) > + pg_fatal("couldn't write out complete full page image to file: > %s", filename); > These should more more generic, as of "could not open file \"%s\"" and > "could not write file \"%s\"" as the file name provides all the > information about what this writes. Sure, will update. Best, David
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Wed, Dec 21, 2022 at 5:47 AM Bharath Rupireddy wrote: > > On Fri, Dec 16, 2022 at 4:47 AM David Christensen > wrote: > > > > On Thu, Dec 15, 2022 at 12:36 AM Michael Paquier > > wrote: > > > > > > On Wed, Dec 14, 2022 at 04:44:34PM -0600, David Christensen wrote: > > > > I can get one sent in tomorrow. > > > > This v10 should incorporate your feedback as well as Bharath's. > > Thanks for the patch. Here're some minor comments: > > 1. +my $node = PostgreSQL::Test::Cluster->new('primary'); > Can the name be other than 'primary' because we don't create a standby > for this test? Something like - 'node_a' or 'node_extract_fpi' or some > other. Sure, no issues. > 2. +$node->init(extra => ['-k'], allows_streaming => 1); > When enabled with allows_streaming, there are a bunch of things that > happen to the node while initializing, I don't think we need all of > them for this. I think the "allows_streaming" was required to ensure the WAL files were preserved properly, and was the approach we ended up taking rather than trying to fail the archive_command or other approaches I'd taken earlier. I'd rather keep this if we can, unless you can propose a different approach that would continue to work in the same way. > 3. +$node->init(extra => ['-k'], allows_streaming => 1); > Can we use --data-checksums instead of -k for more readability? > Perhaps, a comment on why we need that option helps greatly. Yeah, can spell out; don't recall exactly why we needed it offhand, but will confirm or remove if insignificant. > 4. > +page = (Page) buf.data; > + > +if (!XLogRecHasBlockRef(record, block_id)) > +continue; > + > +if (!XLogRecHasBlockImage(record, block_id)) > +continue; > + > +if (!RestoreBlockImage(record, block_id, page)) > +continue; > Can you shift page = (Page) buf.data; just before the last if > condition RestoreBlockImage() so that it doesn't get executed for the > other two continue statements? Sure; since it was just setting a pointer value I didn't consider it to be a hotspot for optimization. Best, David
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Fri, Dec 16, 2022 at 4:47 AM David Christensen wrote: > > On Thu, Dec 15, 2022 at 12:36 AM Michael Paquier wrote: > > > > On Wed, Dec 14, 2022 at 04:44:34PM -0600, David Christensen wrote: > > > I can get one sent in tomorrow. > > This v10 should incorporate your feedback as well as Bharath's. Thanks for the patch. Here're some minor comments: 1. +my $node = PostgreSQL::Test::Cluster->new('primary'); Can the name be other than 'primary' because we don't create a standby for this test? Something like - 'node_a' or 'node_extract_fpi' or some other. 2. +$node->init(extra => ['-k'], allows_streaming => 1); When enabled with allows_streaming, there are a bunch of things that happen to the node while initializing, I don't think we need all of them for this. 3. +$node->init(extra => ['-k'], allows_streaming => 1); Can we use --data-checksums instead of -k for more readability? Perhaps, a comment on why we need that option helps greatly. 4. +page = (Page) buf.data; + +if (!XLogRecHasBlockRef(record, block_id)) +continue; + +if (!XLogRecHasBlockImage(record, block_id)) +continue; + +if (!RestoreBlockImage(record, block_id, page)) +continue; Can you shift page = (Page) buf.data; just before the last if condition RestoreBlockImage() so that it doesn't get executed for the other two continue statements? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Thu, Dec 15, 2022 at 05:17:46PM -0600, David Christensen wrote: > On Thu, Dec 15, 2022 at 12:36 AM Michael Paquier wrote: > This v10 should incorporate your feedback as well as Bharath's. Thanks for the new version. I have minor comments. >> It seems to me that you could allow things to work even if the >> directory exists and is empty. See for example >> verify_dir_is_empty_or_create() in pg_basebackup.c. > > The `pg_mkdir_p()` supports an existing directory (and I don't think > we want to require it to be empty first), so this only errors when it > can't create a directory for some reason. Sure, but things can also be made so as we don't fail if the directory exists and is empty? This would be more consistent with the base directories created by pg_basebackup and initdb. >> +$node->safe_psql('postgres', <> +SELECT 'init' FROM >> pg_create_physical_replication_slot('regress_pg_waldump_slot', true, false); >> +CREATE TABLE test_table AS SELECT generate_series(1,100) a; >> +CHECKPOINT; -- required to force FPI for next writes >> +UPDATE test_table SET a = a + 1; >> Using an EOF to execute a multi-line query would be a first. Couldn't >> you use the same thing as anywhere else? 009_twophase.pl just to >> mention one. (Mentioned by Bharath upthread, where he asked for an >> extra opinion so here it is.) > > Fair enough, while idiomatic perl to me, not a hill to die on; > converted to a standard multiline string. By the way, knowing that we have an option called --fullpage, could be be better to use --save-fullpage for the option name? + OPF = fopen(filename, PG_BINARY_W); + if (!OPF) + pg_fatal("couldn't open file for output: %s", filename); [..] + if (fwrite(page, BLCKSZ, 1, OPF) != 1) + pg_fatal("couldn't write out complete full page image to file: %s", filename); These should more more generic, as of "could not open file \"%s\"" and "could not write file \"%s\"" as the file name provides all the information about what this writes. -- Michael signature.asc Description: PGP signature
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Thu, Dec 15, 2022 at 12:36 AM Michael Paquier wrote: > > On Wed, Dec 14, 2022 at 04:44:34PM -0600, David Christensen wrote: > > I can get one sent in tomorrow. This v10 should incorporate your feedback as well as Bharath's. > -XLogRecordHasFPW(XLogReaderState *record) > +XLogRecordHasFPI(XLogReaderState *record) > This still refers to a FPW, so let's leave that out as well as any > renamings of this kind.. Reverted those changes. > + if (config.save_fpi_path != NULL) > + { > + /* Create the dir if it doesn't exist */ > + if (pg_mkdir_p(config.save_fpi_path, pg_dir_create_mode) < 0) > + { > + pg_log_error("could not create output directory \"%s\": %m", > +config.save_fpi_path); > + goto bad_argument; > + } > + } > It seems to me that you could allow things to work even if the > directory exists and is empty. See for example > verify_dir_is_empty_or_create() in pg_basebackup.c. The `pg_mkdir_p()` supports an existing directory (and I don't think we want to require it to be empty first), so this only errors when it can't create a directory for some reason. > +my $file_re = > + > qr/^([0-9A-F]{8})-([0-9A-F]{8})[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+(?:_vm|_init|_fsm|_main)?$/; > This is artistic to parse for people not used to regexps (I do, a > little). Perhaps this could use a small comment with an example of > name or a reference describing this format? Added a description of what this is looking for. > +# Set umask so test directories and files are created with default > permissions > +umask(0077); > Incorrect copy-paste coming from elsewhere like the TAP tests of > pg_basebackup with group permissions? Doesn't > PostgreSQL::Test::Utils::tempdir give already enough protection in > terms of umask() and permissions? I'd expect that's where that came from. Removed. > + if (config.save_fpi_path != NULL) > + { > + /* We fsync our output directory only; since these files are not part > +* of the production database we do not require the performance hit > +* that fsyncing every FPI would entail, so are doing this as a > +* compromise. */ > + > + fsync_fname(config.save_fpi_path, true); > + } > Why is it necessary to flush anything at all, then? I personally don't think it is, but added it per Bharath's request. Removed in this revision. > +my $relation = $node->safe_psql('postgres', > + q{SELECT format('%s/%s/%s', CASE WHEN reltablespace = 0 THEN > dattablespace ELSE reltablespace END, pg_database.oid, > pg_relation_filenode(pg_class.oid)) FROM pg_class, pg_database WHERE > relname = 'test_table' AND datname = current_database()} > Could you rewrite that to be on multiple lines? Sure, reformated. > +diag "using walfile: $walfile"; > You should avoid the use of "diag", as this would cause extra output > noise with a simple make check. Had been using it for debugging and didn't realize it'd cause issues. Removed both instances. > +$node->safe_psql('postgres', "SELECT > pg_drop_replication_slot('regress_pg_waldump_slot')") > That's not really necessary, the nodes are wiped out at the end of the > test. Removed. > +$node->safe_psql('postgres', < +SELECT 'init' FROM > pg_create_physical_replication_slot('regress_pg_waldump_slot', true, false); > +CREATE TABLE test_table AS SELECT generate_series(1,100) a; > +CHECKPOINT; -- required to force FPI for next writes > +UPDATE test_table SET a = a + 1; > Using an EOF to execute a multi-line query would be a first. Couldn't > you use the same thing as anywhere else? 009_twophase.pl just to > mention one. (Mentioned by Bharath upthread, where he asked for an > extra opinion so here it is.) Fair enough, while idiomatic perl to me, not a hill to die on; converted to a standard multiline string. Best, David v10-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-st.patch Description: Binary data
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Wed, Dec 14, 2022 at 04:44:34PM -0600, David Christensen wrote: > I can get one sent in tomorrow. -XLogRecordHasFPW(XLogReaderState *record) +XLogRecordHasFPI(XLogReaderState *record) This still refers to a FPW, so let's leave that out as well as any renamings of this kind.. + if (config.save_fpi_path != NULL) + { + /* Create the dir if it doesn't exist */ + if (pg_mkdir_p(config.save_fpi_path, pg_dir_create_mode) < 0) + { + pg_log_error("could not create output directory \"%s\": %m", +config.save_fpi_path); + goto bad_argument; + } + } It seems to me that you could allow things to work even if the directory exists and is empty. See for example verify_dir_is_empty_or_create() in pg_basebackup.c. +my $file_re = + qr/^([0-9A-F]{8})-([0-9A-F]{8})[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+(?:_vm|_init|_fsm|_main)?$/; This is artistic to parse for people not used to regexps (I do, a little). Perhaps this could use a small comment with an example of name or a reference describing this format? +# Set umask so test directories and files are created with default permissions +umask(0077); Incorrect copy-paste coming from elsewhere like the TAP tests of pg_basebackup with group permissions? Doesn't PostgreSQL::Test::Utils::tempdir give already enough protection in terms of umask() and permissions? + if (config.save_fpi_path != NULL) + { + /* We fsync our output directory only; since these files are not part +* of the production database we do not require the performance hit +* that fsyncing every FPI would entail, so are doing this as a +* compromise. */ + + fsync_fname(config.save_fpi_path, true); + } Why is it necessary to flush anything at all, then? +my $relation = $node->safe_psql('postgres', + q{SELECT format('%s/%s/%s', CASE WHEN reltablespace = 0 THEN dattablespace ELSE reltablespace END, pg_database.oid, pg_relation_filenode(pg_class.oid)) FROM pg_class, pg_database WHERE relname = 'test_table' AND datname = current_database()} Could you rewrite that to be on multiple lines? +diag "using walfile: $walfile"; You should avoid the use of "diag", as this would cause extra output noise with a simple make check. +$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regress_pg_waldump_slot')") That's not really necessary, the nodes are wiped out at the end of the test. +$node->safe_psql('postgres', < signature.asc Description: PGP signature
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
Hi Bharath, I can get one sent in tomorrow. Thanks, David
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
lication_slot('regress_pg_waldump_slot', true, > > false); > > +CREATE TABLE test_table AS SELECT generate_series(1,100) a; > > +CHECKPOINT; -- required to force FPI for next writes > > +UPDATE test_table SET a = a + 1; > > +EOF > > The EOF with append_conf() is being used in 4 files and elsewhere in > > the TAP test files (more than 100?) qq[] or quotes is being used. I > > have no strong opinion here, I'll leave it to the other reviewers or > > committer. > > I'm inclined to leave it just for (personal) readability, but can > change if there's a strong consensus against. > > > > > 11. > > > > +# verify filename formats matches w/--save-fpi > > > > +for my $fullpath (glob "$tmp_folder/raw/*") > > > > Do we need to look for the exact match of the file that gets created > > > > in the save-fpi path? While checking for this is great, it makes the > > > > test code non-portable (may not work on Windows or other platforms, > > > > no?) and complex? This way, you can get rid of get_block_info() as > > > > well? And +for my $fullpath (glob "$tmp_folder/raw/*") > > > > will also get simplified. > > > > > > > > I think you can further simplify the tests by: > > > > create the node > > > > generate an FPI > > > > call pg_waldump with save-fpi option > > > > check the target directory for a file that contains the relid, > > > > something like '%relid%'. > > > > > > > > The above would still serve the purpose, tests the code without much > > > > complexity. > > > > > > I disagree; I think there is utility in keeping the validation of the > > > expected output. Since we have the code that works for it (and does > > > work on Windows, per passing the CI tests) I'm not seeing why we > > > wouldn't want to continue to validate as much as possible. > > > > My intention is to simplify the tests further and I still stick to it. > > It looks like the majority of test code is to form the file name in > > the format that pg_waldump outputs and match with the file name in the > > target directory - for instance, in get_block_info(), and in the loop > > for my $fullpath (glob "$tmp_folder/raw/*"). I don't think the tests > > need to aim for file format checks, it's enough to look for the > > written file with '%relid%' by pg_waldump, if needed, the contents of > > the files written/FPI can also be verified with, say, pg_checksum > > tool. Others may have different opinions though. > > I would like to get broader feedback before changing this. David, is there a plan to provide an updated patch in this commitfest? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Standardizing how pg_waldump presents recovery conflict XID cutoffs
On Thu, Nov 17, 2022 at 9:02 AM Peter Geoghegan wrote: > Plan is to commit this later on today, barring objections. Pushed, thanks. -- Peter Geoghegan
Re: Standardizing how pg_waldump presents recovery conflict XID cutoffs
On Wed, Nov 16, 2022 at 4:34 PM Peter Geoghegan wrote: > WFM. Attached is v3. Plan is to commit this later on today, barring objections. Thanks -- Peter Geoghegan v3-0001-Standardize-rmgrdesc-recovery-conflict-XID-output.patch Description: Binary data
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
ity, but can change if there's a strong consensus against. > > > 11. > > > +# verify filename formats matches w/--save-fpi > > > +for my $fullpath (glob "$tmp_folder/raw/*") > > > Do we need to look for the exact match of the file that gets created > > > in the save-fpi path? While checking for this is great, it makes the > > > test code non-portable (may not work on Windows or other platforms, > > > no?) and complex? This way, you can get rid of get_block_info() as > > > well? And +for my $fullpath (glob "$tmp_folder/raw/*") > > > will also get simplified. > > > > > > I think you can further simplify the tests by: > > > create the node > > > generate an FPI > > > call pg_waldump with save-fpi option > > > check the target directory for a file that contains the relid, > > > something like '%relid%'. > > > > > > The above would still serve the purpose, tests the code without much > > > complexity. > > > > I disagree; I think there is utility in keeping the validation of the > > expected output. Since we have the code that works for it (and does > > work on Windows, per passing the CI tests) I'm not seeing why we > > wouldn't want to continue to validate as much as possible. > > My intention is to simplify the tests further and I still stick to it. > It looks like the majority of test code is to form the file name in > the format that pg_waldump outputs and match with the file name in the > target directory - for instance, in get_block_info(), and in the loop > for my $fullpath (glob "$tmp_folder/raw/*"). I don't think the tests > need to aim for file format checks, it's enough to look for the > written file with '%relid%' by pg_waldump, if needed, the contents of > the files written/FPI can also be verified with, say, pg_checksum > tool. Others may have different opinions though. I would like to get broader feedback before changing this. David
Re: Standardizing how pg_waldump presents recovery conflict XID cutoffs
On Wed, Nov 16, 2022 at 4:25 PM Andres Freund wrote: > > Anyway, worth calling this out directly in these comments IMV. We're > > addressing two closely related things that assign opposite meanings to > > InvalidTransactionId, which is rather confusing. > > It makes sense to call this out, but I'd > s/snapshotConflictHorizon format XIDs/cutoff with snapshotConflictHorizon > semantics/ > > or such? WFM. -- Peter Geoghegan
Re: Standardizing how pg_waldump presents recovery conflict XID cutoffs
Hi, On 2022-11-16 15:37:40 -0800, Peter Geoghegan wrote: > On Wed, Nov 16, 2022 at 3:27 PM Andres Freund wrote: > > What are "snapshotConflictHorizon format XIDs"? I guess you mean format in > > the > > sense of having the semantics of snapshotConflictHorizon? > > Yes. That is the only possible way that any recovery conflict ever > works on the REDO side, with the exception of a few > not-very-interesting cases such as DROP TABLESPACE. > > GetConflictingVirtualXIDs() assigns a special meaning to > InvalidTransactionId which is the *opposite* of the special meaning > that snapshotConflictHorizon-based values assign to > InvalidTransactionId. At one point they actually did the same > definition for InvalidTransactionId, but that was changed soon after > hot standby first went in (when we taught btree delete records to not > use ludicrously conservative cutoffs that caused needless conflicts). > > Anyway, worth calling this out directly in these comments IMV. We're > addressing two closely related things that assign opposite meanings to > InvalidTransactionId, which is rather confusing. It makes sense to call this out, but I'd s/snapshotConflictHorizon format XIDs/cutoff with snapshotConflictHorizon semantics/ or such? Greetings, Andres Freund
Re: Standardizing how pg_waldump presents recovery conflict XID cutoffs
On Wed, Nov 16, 2022 at 3:27 PM Andres Freund wrote: > The "(also...) formulation seems a bit odd. How about "an obsolescent heap > tuple that the caller is physically removing, e.g. via HOT pruning or index > deletion." or such? Okay, WFM. > > + * snapshotConflictHorizon format values are how all hot Standby conflicts > > are > > + * generated by REDO routines (at least wherever a granular cutoff is > > used). > > Not quite parsing for me. I meant something like: this is a cutoff that works in the same way as any other cutoff involved with recovery conflicts, in general, with the exception of those cases that have very coarse grained conflicts, such as DROP TABLESPACE. I suppose it would be better to just say the first part. Will fix. > > + /* > > + * It's quite possible that final snapshotConflictHorizon value will > > be > > + * invalid in final WAL record, indicating that we definitely don't > > need to > > + * generate a conflict > > + */ > > *the final > > Isn't this already described in the header? Sort of, but arguably it makes sense to call it out specifically. Though on second thought, yeah, lets just get rid of it. > What are "snapshotConflictHorizon format XIDs"? I guess you mean format in the > sense of having the semantics of snapshotConflictHorizon? Yes. That is the only possible way that any recovery conflict ever works on the REDO side, with the exception of a few not-very-interesting cases such as DROP TABLESPACE. GetConflictingVirtualXIDs() assigns a special meaning to InvalidTransactionId which is the *opposite* of the special meaning that snapshotConflictHorizon-based values assign to InvalidTransactionId. At one point they actually did the same definition for InvalidTransactionId, but that was changed soon after hot standby first went in (when we taught btree delete records to not use ludicrously conservative cutoffs that caused needless conflicts). Anyway, worth calling this out directly in these comments IMV. We're addressing two closely related things that assign opposite meanings to InvalidTransactionId, which is rather confusing. -- Peter Geoghegan
Re: Standardizing how pg_waldump presents recovery conflict XID cutoffs
Hi, On 2022-11-16 14:14:30 -0800, Peter Geoghegan wrote: > /* > - * If 'tuple' contains any visible XID greater than latestRemovedXid, > - * ratchet forwards latestRemovedXid to the greatest one found. > - * This is used as the basis for generating Hot Standby conflicts, so > - * if a tuple was never visible then removing it should not conflict > - * with queries. > + * Maintain snapshotConflictHorizon for caller by ratcheting forward its > value > + * using any committed XIDs contained in 'tuple', an obsolescent heap tuple > + * that caller is in the process of physically removing via pruning. > + * (Also supports generating index deletion snapshotConflictHorizon values.) The "(also...) formulation seems a bit odd. How about "an obsolescent heap tuple that the caller is physically removing, e.g. via HOT pruning or index deletion." or such? > + * snapshotConflictHorizon format values are how all hot Standby conflicts > are > + * generated by REDO routines (at least wherever a granular cutoff is used). Not quite parsing for me. > + * Caller must initialize its value to InvalidTransactionId, which is > generally > + * interpreted as "definitely no need for a recovery conflict". > + * > + * Final value must reflect all heap tuples that caller will physically > remove > + * via the ongoing pruning operation. ResolveRecoveryConflictWithSnapshot() > is > + * passed the final value (taken from caller's WAL record) by a REDO routine. > + /* > + * It's quite possible that final snapshotConflictHorizon value will be > + * invalid in final WAL record, indicating that we definitely don't > need to > + * generate a conflict > + */ *the final Isn't this already described in the header? > @@ -3337,12 +3337,17 @@ GetCurrentVirtualXIDs(TransactionId limitXmin, bool > excludeXmin0, > * GetConflictingVirtualXIDs -- returns an array of currently active VXIDs. > * > * Usage is limited to conflict resolution during recovery on standby > servers. > - * limitXmin is supplied as either latestRemovedXid, or InvalidTransactionId > - * in cases where we cannot accurately determine a value for > latestRemovedXid. > + * limitXmin is supplied as either a snapshotConflictHorizon format XID, or > as > + * InvalidTransactionId in cases where caller cannot accurately determine a > + * safe snapshotConflictHorizon value. > * > * If limitXmin is InvalidTransactionId then we want to kill everybody, > * so we're not worried if they have a snapshot or not, nor does it really > - * matter what type of lock we hold. > + * matter what type of lock we hold. Caller must avoid calling here with > + * snapshotConflictHorizon format XIDs that were set to InvalidTransactionId What are "snapshotConflictHorizon format XIDs"? I guess you mean format in the sense of having the semantics of snapshotConflictHorizon? Greetings, Andres Freund
Re: Standardizing how pg_waldump presents recovery conflict XID cutoffs
On Tue, Nov 15, 2022 at 8:48 PM Peter Geoghegan wrote: > Okay, let's go with snapshotConflictHorizon. I'll use that name in the > next revision, which I should be able to post tomorrow. Attached is a somewhat cleaned up version that uses that symbol name for everything. -- Peter Geoghegan v2-0001-Standardize-rmgrdesc-recovery-conflict-XID-output.patch Description: Binary data
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Wed, Nov 16, 2022 at 1:20 AM David Christensen wrote: > > Enclosed is v9. > > - code style consistency (FPI instead of FPW) internally. > - cleanup of no-longer needed checksum-related pieces from code and tests. > - test cleanup/simplification. > - other comment cleanup. > > Passes all CI checks. Thanks for the updated patch. 1. -if (config.filter_by_fpw && !XLogRecordHasFPW(xlogreader_state)) +if (config.filter_by_fpw && !XLogRecordHasFPI(xlogreader_state)) These changes are not related to this feature, hence renaming those variables/function names must be dealt with separately. If required, proposing another patch can be submitted to change filter_by_fpw to filter_by_fpi and XLogRecordHasFPW() to XLogRecordHasFPI(). 2. +/* We fsync our output directory only; since these files are not part + * of the production database we do not require the performance hit + * that fsyncing every FPI would entail, so are doing this as a + * compromise. */ The commenting style doesn't match the standard that we follow elsewhere in postgres, please refer to other multi-line comments. 3. +fsync_fname(config.save_fpi_path, true); +} It looks like fsync_fname()/fsync() in general isn't recursive, in the sense that it doesn't fsync the files under the directory, but the directory only. So, the idea of directory fsync doesn't seem worth it. We either 1) get rid of fsync entirely or 2) fsync all the files after they are created and the directory at the end or 3) do option (2) with --no-sync option similar to its friends. Since option (2) is a no go, we can either choose option (1) or option (2). My vote at this point is for option (1). 4. +($walfile_name, $blocksize) = split '\|' => $node->safe_psql('postgres',"SELECT pg_walfile_name(pg_switch_wal()), current_setting('block_size')"); +my $walfile = $node->basedir . '/pgdata/pg_wal/' . $walfile_name; I think there's something wrong with this, no? pg_switch_wal() can, at times, return end+1 of the prior segment (see below snippet) and I'm not sure if such a case can happen here. * The return value is either the end+1 address of the switch record, * or the end+1 address of the prior segment if we did not need to * write a switch record because we are already at segment start. */ XLogRecPtr RequestXLogSwitch(bool mark_unimportant) 5. +my $walfile = $node->basedir . '/pgdata/pg_wal/' . $walfile_name; +ok(-f $walfile, "Got a WAL file"); Is this checking if the WAL file is present or not in PGDATA/pg_wal? If yes, I think this isn't required as pg_switch_wal() ensures that the WAL is written and flushed to disk. 6. +my $walfile = $node->basedir . '/pgdata/pg_wal/' . $walfile_name; Isn't "pgdata" hardcoded here? I think you might need to do the following: $node->data_dir . '/pg_wal/' . $walfile_name;; 7. +# save filename for later verification +$files{$file}++; +# validate that we ended up with some FPIs saved +ok(keys %files > 0, 'verify we processed some files'); Why do we need to store filenames in an array when we later just check the size of the array? Can't we use a boolean (file_found) or an int variable (file_count) to verify that we found the file. 8. +$node->safe_psql('postgres', < > 11. > > +# verify filename formats matches w/--save-fpi > > +for my $fullpath (glob "$tmp_folder/raw/*") > > Do we need to look for the exact match of the file that gets created > > in the save-fpi path? While checking for this is great, it makes the > > test code non-portable (may not work on Windows or other platforms, > > no?) and complex? This way, you can get rid of get_block_info() as > > well? And +for my $fullpath (glob "$tmp_folder/raw/*") > > will also get simplified. > > > > I think you can further simplify the tests by: > > create the node > > generate an FPI > > call pg_waldump with save-fpi option > > check the target directory for a file that contains the relid, > > something like '%relid%'. > > > > The above would still serve the purpose, tests the code without much > > complexity. > > I disagree; I think there is utility in keeping the validation of the > expected output. Since we have the code that works for it (and does > work on Windows, per passing the CI tests) I'm not seeing why we > wouldn't want to continue to validate as much as possible. My intention is to simplify the tests further and I still stick to it. It looks like the majority of test code is to form the file name in the format that pg_waldump outputs and match with the file name in the target directory - for instance, in get_b
Re: Standardizing how pg_waldump presents recovery conflict XID cutoffs
On 2022-11-15 20:48:56 -0800, Peter Geoghegan wrote: > On Tue, Nov 15, 2022 at 5:29 PM Andres Freund wrote: > > If we want to focus on the mvcc affects we could just go for something like > > snapshotConflictHorizon or such. > > Okay, let's go with snapshotConflictHorizon. I'll use that name in the > next revision, which I should be able to post tomorrow. Cool!
Re: Standardizing how pg_waldump presents recovery conflict XID cutoffs
On Tue, Nov 15, 2022 at 5:29 PM Andres Freund wrote: > If we want to focus on the mvcc affects we could just go for something like > snapshotConflictHorizon or such. Okay, let's go with snapshotConflictHorizon. I'll use that name in the next revision, which I should be able to post tomorrow. -- Peter Geoghegan
Re: Standardizing how pg_waldump presents recovery conflict XID cutoffs
Hi, On 2022-11-15 13:54:24 -0800, Peter Geoghegan wrote: > On Tue, Nov 15, 2022 at 12:29 PM Andres Freund wrote: > > ... I strongly dislike latestCommittedXid. That seems at least as misleading > > as latestRemovedXid and has the danger of confusion with latestCompletedXid > > as you mention. > > > How about latestAffectedXid? > > I get why you don't care for latestCommittedXid, of course, but the > name does have some advantages. Namely: > > 1. Most conflicts come from PRUNE records (less often index deletion > records) where the XID is some heap tuple's xmax, a > committed-to-everybody XID on the primary (at the point of the > original execution of the prune). It makes sense to emphasize the idea > that snapshots running on a replica need to agree that this XID is > definitely committed -- we need to kill any snapshots that don't > definitely agree that this one particular XID is committed by now. I don't agree that it makes sense there - to me it sounds like the record is just carrying the globally latest committed xid rather than something just describing the record. I also just don't think "agreeing that a particular XID is committed" is a good description of latestRemovedXID, there's just too many ways that "agreeing xid has committed" can be understood. To me it's not obvious that it's about mvcc snapshots. If we want to focus on the mvcc affects we could just go for something like snapshotConflictHorizon or such. > Perhaps something like "mustBeCommittedCutoff" would work better? What > do you think of that? The emphasis on how things need to work on the > REDO side seems useful. I don't think "committed" should be part of the name. Greetings, Andres Freund
Re: Standardizing how pg_waldump presents recovery conflict XID cutoffs
On Tue, Nov 15, 2022 at 12:29 PM Andres Freund wrote: > ... I strongly dislike latestCommittedXid. That seems at least as misleading > as latestRemovedXid and has the danger of confusion with latestCompletedXid > as you mention. > How about latestAffectedXid? I get why you don't care for latestCommittedXid, of course, but the name does have some advantages. Namely: 1. Most conflicts come from PRUNE records (less often index deletion records) where the XID is some heap tuple's xmax, a committed-to-everybody XID on the primary (at the point of the original execution of the prune). It makes sense to emphasize the idea that snapshots running on a replica need to agree that this XID is definitely committed -- we need to kill any snapshots that don't definitely agree that this one particular XID is committed by now. 2. It hints at the idea that we don't need to set any XID to do cleanup for aborted transactions, per the optimization in HeapTupleHeaderAdvanceLatestRemovedXid(). Perhaps something like "mustBeCommittedCutoff" would work better? What do you think of that? The emphasis on how things need to work on the REDO side seems useful. -- Peter Geoghegan
Re: Standardizing how pg_waldump presents recovery conflict XID cutoffs
Hi, I like the idea of this, but: On 2022-11-15 10:24:05 -0800, Peter Geoghegan wrote: > I'm not necessarily that attached to the name latestCommittedXid. It > is more accurate, but it's also a little bit too similar to another > common XID symbol name, latestCompletedXid. Can anyone suggest an > alternative? ... I strongly dislike latestCommittedXid. That seems at least as misleading as latestRemovedXid and has the danger of confusion with latestCompletedXid as you mention. How about latestAffectedXid? Based on a quick scroll through the changed structures it seems like it'd be reasonably discriptive for most? Greetings, Andres Freund
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
Enclosed is v9. - code style consistency (FPI instead of FPW) internally. - cleanup of no-longer needed checksum-related pieces from code and tests. - test cleanup/simplification. - other comment cleanup. Passes all CI checks. Best, David v9-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-str.patch Description: Binary data
Standardizing how pg_waldump presents recovery conflict XID cutoffs
Most recovery conflicts are generated in REDO routines using a standard approach these days: they all call ResolveRecoveryConflictWithSnapshot() with a latestRemovedXid argument taken directly from the WAL record. Right now we don't quite present this information in a uniform way, even though REDO routines apply the cutoffs in a uniform way. ISTM that there is value consistently using the same symbol names for these cutoffs in every WAL record that has such a cutoff. The REDO routine doesn't care about how the cutoff was generated during original execution anyway -- it is always the responsibility of code that runs during original execution (details of which will vary by record type). Consistency makes all of this fairly explicit, and makes it easier to use tools like pg_waldump to debug recovery conflicts -- the user can grep for the same generic symbol name and see everything. Attached WIP patch brings heapam's VISIBLE record type and SP-GiST's VACUUM_REDIRECT record type in line with this convention. It also changes the symbol name from latestRemovedXid to something more general: latestCommittedXid (since many of these WAL records don't actually remove anything). The patch also documents how these cutoffs are supposed to work at a high level. We got the details slightly wrong (resulting in false conflicts) for several years with FREEZE_PAGE records (see bugfix commit 66fbcb0d2e for details), which seems like a relatively easy mistake to make -- so we should try to avoid similar mistakes in the future. I'm not necessarily that attached to the name latestCommittedXid. It is more accurate, but it's also a little bit too similar to another common XID symbol name, latestCompletedXid. Can anyone suggest an alternative? -- Peter Geoghegan v1-0001-Standardize-rmgrdesc-recovery-conflict-XID-output.patch Description: Binary data
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Tue, Nov 15, 2022 at 4:41 AM Bharath Rupireddy wrote: > > On Tue, Nov 15, 2022 at 1:29 AM David Christensen > wrote: > > > > Enclosed is v8, which uses the replication slot method to retain WAL > > as well as fsync'ing the output directory when everything is done. > > Thanks. It mostly is in good shape. However, few more comments: > > 1. > +if it does not exist. The images saved will be subject to the same > +filtering and limiting criteria as display records, but in this > +mode pg_waldump will not output any other > +information. > May I know what's the intention of the statement 'The images saved > '? If it's not necessary and convey anything useful to the user, > can we remove it? Basically I mean if you're limiting to a specific relation or rmgr type, etc, it only saves those FPIs. (So filtering is applied first before considering whether to save the FPI or not.) > 2. > +#include "storage/checksum.h" > +#include "storage/checksum_impl.h" > I think we don't need the above includes as we got rid of verifying > page checksums. The patch compiles without them for me. Good catch. > 3. > +char *save_fpw_path; > Can we rename the above variable to save_fpi_path, just to be in sync > with what we expose to the user, the option name 'save-fpi'? Sure. > 4. > +if (config.save_fpw_path != NULL) > +{ > +/* Fsync our output directory */ > +fsync_fname(config.save_fpw_path, true); > +} > I guess adding a comment there as to why we aren't fsyncing for every > file that gets created, but once per the directory at the end. That'd > help clarify doubts that other members might get while looking at the > code. Can do. > 5. > +if (config.save_fpw_path != NULL) > +{ > +/* Fsync our output directory */ > +fsync_fname(config.save_fpw_path, true); > +} > So, are we sure that we don't want to fsync for time_to_stop exit(0) > cases, say when CTRL+C'ed. Looks like we handle time_to_stop safely > meaning exiting with return code 0, shouldn't we fsync the directory? We can. Like I've said before, since these aren't production parts of the cluster I don't personally have much of an opinion if fsync() is appropriate at all, so don't have strong feelings here. > 6. > +else if (config.save_fpw_path) > Let's use the same convention to check non-NULLness, > config.save_fpw_path != NULL. Good catch. > 7. > +CHECKPOINT; > +SELECT pg_switch_wal(); > +UPDATE test_table SET a = a + 1; > +SELECT pg_switch_wal(); > I don't think switching WAL after checkpoint is necessary here, > because the checkpoint ensures all the WAL gets flushed to disk. > Please remove it. The point is to ensure we have a clean WAL segment that we know will contain the relation we are filtering by. Will test if this still holds without the extra pg_switch_wal(), but that's the rationale. > PS: I've seen the following code: > +my $walfile = [sort { $a <=> $b } glob("$waldir/00*")]->[1]; # we > want the second WAL file, which will be a complete WAL file with > full-page writes for our specific relation. I don't understand the question. > 8. > +$node->safe_psql('postgres', < +EOF > Why EOF is used here? Can't we do something like below to execute > multiple statements? > $node->safe_psql( > 'postgres', qq[ > SELECT data FROM pg_logical_slot_get_changes('regression_slot1', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); > SELECT data FROM pg_logical_slot_get_changes('regression_slot2', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); > SELECT data FROM pg_logical_slot_get_changes('regression_slot3', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); > SELECT data FROM pg_logical_slot_get_changes('regression_slot4', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); > ]); > > Same here: > +$node->safe_psql('postgres', < +SELECT pg_drop_replication_slot('regress_pg_waldump_slot'); > +EOQ As a long-time perl programmer, heredocs seem more natural and easier to read rather than a string that accomplishes the same function. If there is an established project style I'll stick with it, but it just rolled out that way. :-) > 9. > +my $walfile = [sort { $a <=> $b } glob("
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Tue, Nov 15, 2022 at 1:29 AM David Christensen wrote: > > Enclosed is v8, which uses the replication slot method to retain WAL > as well as fsync'ing the output directory when everything is done. Thanks. It mostly is in good shape. However, few more comments: 1. +if it does not exist. The images saved will be subject to the same +filtering and limiting criteria as display records, but in this +mode pg_waldump will not output any other +information. May I know what's the intention of the statement 'The images saved '? If it's not necessary and convey anything useful to the user, can we remove it? 2. +#include "storage/checksum.h" +#include "storage/checksum_impl.h" I think we don't need the above includes as we got rid of verifying page checksums. The patch compiles without them for me. 3. +char *save_fpw_path; Can we rename the above variable to save_fpi_path, just to be in sync with what we expose to the user, the option name 'save-fpi'? 4. +if (config.save_fpw_path != NULL) +{ +/* Fsync our output directory */ +fsync_fname(config.save_fpw_path, true); +} I guess adding a comment there as to why we aren't fsyncing for every file that gets created, but once per the directory at the end. That'd help clarify doubts that other members might get while looking at the code. 5. +if (config.save_fpw_path != NULL) +{ +/* Fsync our output directory */ +fsync_fname(config.save_fpw_path, true); +} So, are we sure that we don't want to fsync for time_to_stop exit(0) cases, say when CTRL+C'ed. Looks like we handle time_to_stop safely meaning exiting with return code 0, shouldn't we fsync the directory? 6. +else if (config.save_fpw_path) Let's use the same convention to check non-NULLness, config.save_fpw_path != NULL. 7. +CHECKPOINT; +SELECT pg_switch_wal(); +UPDATE test_table SET a = a + 1; +SELECT pg_switch_wal(); I don't think switching WAL after checkpoint is necessary here, because the checkpoint ensures all the WAL gets flushed to disk. Please remove it. PS: I've seen the following code: +my $walfile = [sort { $a <=> $b } glob("$waldir/00*")]->[1]; # we want the second WAL file, which will be a complete WAL file with full-page writes for our specific relation. 8. +$node->safe_psql('postgres', <safe_psql( 'postgres', qq[ SELECT data FROM pg_logical_slot_get_changes('regression_slot1', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); SELECT data FROM pg_logical_slot_get_changes('regression_slot2', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); SELECT data FROM pg_logical_slot_get_changes('regression_slot3', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); SELECT data FROM pg_logical_slot_get_changes('regression_slot4', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); ]); Same here: +$node->safe_psql('postgres', < $b } glob("$waldir/00*")]->[1]; # we want the second WAL file, which will be a complete WAL file with full-page writes for our specific relation. Is it guaranteed that just looking at the second WAL file in the pg_wal directory assures WAL file with FPIs? I think we have to save the WAL file that contains FPIs, that is the file after, CHECKPOINT, UPDATE and pg_switch_wal. I think you can store output LSN of pg_switch_wal 10. +$node->safe_psql('postgres', <https://aws.amazon.com
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
Enclosed is v8, which uses the replication slot method to retain WAL as well as fsync'ing the output directory when everything is done. v8-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-str.patch Description: Binary data
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Fri, Nov 11, 2022 at 4:57 AM Bharath Rupireddy wrote: > > Well if it's not the same output then I guess you're right and there's > > not a use for the `--fixup` mode. By the same token, I'd say > > calculating/setting the checksum also wouldn't need to be done, we > > should just include the page as included in the WAL stream. > > Let's hear from others, we may be missing something here. I recommend > keeping the --fixup patch as 0002, in case if we decide to discard > it's easier, however I'll leave that to you. I've whacked in `git` for now; I can resurrect if people consider it useful. > > > > > 5. > > > > > +if (!RestoreBlockImage(record, block_id, page)) > > > > > +continue; > > > > > + > > > > > +/* we have our extracted FPI, let's save it now */ > > > > > After extracting the page from the WAL record, do we need to perform a > > > > > checksum on it? > > > > > > I think you just need to do the following, this will ensure the > > > authenticity of the page that pg_waldump returns. > > > if ((PageHeader) page)->pd_checksum != pg_checksum_page((char *) page, > > > blk)) > > > { > > > pg_fatal("page checksum failed"); > > > } > > > > The WAL already has a checksum, so not certain this makes sense on its > > own. Also I'm inclined to make it a warning if it doesn't match rather > > than a fatal. (I'd also have to verify that the checksum is properly > > set on the page prior to copying the FPI into WAL, which I'm pretty > > sure it is but not certain.) > > How about having it as an Assert()? Based on empirical testing, the checksums don't match, so asserting/alerting on each block extracted seems next to useless, so going to just remove that. > > > 5. > > > +fsync(fileno(OPF)); > > > +fclose(OPF); > > > I think just the fsync() isn't enough, you still need fsync_fname() > > > and/or fsync_parent_path(), perhaps after for (block_id = 0; block_id > > > <= XLogRecMaxBlockId(record); block_id++) loop. > > > > I'm not sure I get the value of the fsyncs here; if you are using this > > tool at this capacity you're by definition doing some sort of > > transient investigative steps. Since the WAL was fsync'd, you could > > always rerun/recreate as needed in the unlikely event of an OS crash > > in the middle of this investigation. Since this is outside the > > purview of the database operations proper (unlike, say, initdb) seems > > like it's unnecessary (or definitely shouldn't need to be selectable). > > My thoughts are that if we're going to fsync, just do the fsyncs > > unconditionally rather than complicate the interface further. > > -1 for fysnc() per file created as it can create a lot of sync load on > production servers impacting performance. How about just syncing the > directory at the end assuming it doesn't cost as much as fsync() per > FPI file created would? I can fsync the dir if that's a useful compromise. > > > 4. > > > +$primary->append_conf('postgresql.conf', "wal_level='replica'"); > > > +$primary->append_conf('postgresql.conf', 'archive_mode=on'); > > > +$primary->append_conf('postgresql.conf', "archive_command='/bin/false'"); > > > Why do you need to set wal_level to replica, out of the box your > > > cluster comes with replica only no? > > > And why do you need archive_mode on and set the command to do nothing? > > > Why archiving is needed for testing your feature firstly? > > > > I think it had shown "minimal" in my testing; I was purposefully > > failing archives so the WAL would stick around. Maybe a custom > > archive command that just copied a single WAL file into a known > > location so I could use that instead of the current approach would > > work, though not sure how Windows support would work with that. Open > > to other ideas to more cleanly get a single WAL file that isn't the > > last one. (Earlier versions of this test were using /all/ of the > > generated WAL files rather than a single one, so maybe I am > > overcomplicating things for a single WAL file case.) > > Typically we create a physical replication slot at the beginning so > that the server keeps the WAL required for you in pg_wal itself, for > instance, please see pg_walinspect: > > -- Make sure checkpoints don't interfere with the test. > SELECT 'init' FROM > pg_create_physical_replication_slot('regress_pg_walinspect_slot', > true, false); Will see if I can get something like this to work; I'm currently stopping the server before running the file-based tests, but I suppose there's no reason to do so, so a temporary slot that holds it around until the test is complete is probably fine. David
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Fri, Nov 11, 2022 at 8:15 AM Justin Pryzby wrote: > > On Wed, Nov 09, 2022 at 02:37:29PM -0600, David Christensen wrote: > > On Wed, Nov 9, 2022 at 2:08 PM David Christensen > > wrote: > > > Justin sez: > > > > I was wondering if there's any reason to do "CREATE DATABASE". The vast > > > > majority of TAP tests don't. > > > > > > > > $ git grep -ho 'safe_psql[^ ]*' '*pl' |sort |uniq -c |sort -nr |head > > > >1435 safe_psql('postgres', > > > > 335 safe_psql( > > > > 23 safe_psql($connect_db, > > > > > > If there was a reason, I don't recall offhand; I will test removing it > > > and if things still work will consider it good enough. > > > > Things blew up when I did that; rather than hunt it down, I just left it > > in. :-) > > > +$primary->safe_psql('db1', < > It worked for me when I removed the 3 references to db1. > That's good for efficiency of the test. I did figure that out later; fixed in git. > > +my $blocksize = 8192; > > I think this should be just "my $blocksize;" rather than setting a value > which is later overwriten. Yep. Fixed in git. David
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Wed, Nov 09, 2022 at 02:37:29PM -0600, David Christensen wrote: > On Wed, Nov 9, 2022 at 2:08 PM David Christensen > wrote: > > Justin sez: > > > I was wondering if there's any reason to do "CREATE DATABASE". The vast > > > majority of TAP tests don't. > > > > > > $ git grep -ho 'safe_psql[^ ]*' '*pl' |sort |uniq -c |sort -nr |head > > >1435 safe_psql('postgres', > > > 335 safe_psql( > > > 23 safe_psql($connect_db, > > > > If there was a reason, I don't recall offhand; I will test removing it > > and if things still work will consider it good enough. > > Things blew up when I did that; rather than hunt it down, I just left it in. > :-) > +$primary->safe_psql('db1', < +my $blocksize = 8192; I think this should be just "my $blocksize;" rather than setting a value which is later overwriten. -- Justin
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Thu, Nov 10, 2022 at 9:52 PM David Christensen wrote: > > > > > 2. I'm unable to understand the use-case for --fixup-fpi option. > > > > pg_waldump is supposed to be just WAL reader, and must not return any > > > > modified information, with --fixup-fpi option, the patch violates this > > > > principle i.e. it sets page LSN and returns. Without actually > > > > replaying the WAL record on the page, how is it correct to just set > > > > the LSN? How will it be useful? ISTM, we must ignore this option > > > > unless there's a strong use-case. > > > > > > How I was envisioning this was for cases like extreme surgery for > > > corrupted pages, where you extract the page from WAL but it has lsn > > > and checksum set so you could do something like `dd if=fixup-block > > > of=relation ...`, so it *simulates* the replay of said fullpage blocks > > > in cases where for some reason you can't play the intermediate > > > records; since this is always a fullpage block, it's capturing what > > > would be the snapshot so you could manually insert somewhere as needed > > > without needing to replay (say if dealing with an incomplete or > > > corrupted WAL stream). > > > > Recovery sets the page LSN after it replayed the WAL record on the > > page right? Recovery does this - base_page/FPI + > > apply_WAL_record_and_then_set_applied_WAL_record's_LSN = > > new_version_of_page. Essentially, in your patch, you are just setting > > the WAL record LSN with the page contents being the base page's. I'm > > still not sure what's the real use-case here. We don't have an > > independent function in postgres, given a base page and a WAL record > > that just replays the WAL record and output's the new version of the > > page, so I think what you do in the patch with fixup option seems > > wrong to me. > > Well if it's not the same output then I guess you're right and there's > not a use for the `--fixup` mode. By the same token, I'd say > calculating/setting the checksum also wouldn't need to be done, we > should just include the page as included in the WAL stream. Let's hear from others, we may be missing something here. I recommend keeping the --fixup patch as 0002, in case if we decide to discard it's easier, however I'll leave that to you. > > > > 5. > > > > + if (!RestoreBlockImage(record, block_id, page)) > > > > +continue; > > > > + > > > > +/* we have our extracted FPI, let's save it now */ > > > > After extracting the page from the WAL record, do we need to perform a > > > > checksum on it? > > > > I think you just need to do the following, this will ensure the > > authenticity of the page that pg_waldump returns. > > if ((PageHeader) page)->pd_checksum != pg_checksum_page((char *) page, blk)) > > { > > pg_fatal("page checksum failed"); > > } > > The WAL already has a checksum, so not certain this makes sense on its > own. Also I'm inclined to make it a warning if it doesn't match rather > than a fatal. (I'd also have to verify that the checksum is properly > set on the page prior to copying the FPI into WAL, which I'm pretty > sure it is but not certain.) How about having it as an Assert()? > > 5. > > +fsync(fileno(OPF)); > > +fclose(OPF); > > I think just the fsync() isn't enough, you still need fsync_fname() > > and/or fsync_parent_path(), perhaps after for (block_id = 0; block_id > > <= XLogRecMaxBlockId(record); block_id++) loop. > > I'm not sure I get the value of the fsyncs here; if you are using this > tool at this capacity you're by definition doing some sort of > transient investigative steps. Since the WAL was fsync'd, you could > always rerun/recreate as needed in the unlikely event of an OS crash > in the middle of this investigation. Since this is outside the > purview of the database operations proper (unlike, say, initdb) seems > like it's unnecessary (or definitely shouldn't need to be selectable). > My thoughts are that if we're going to fsync, just do the fsyncs > unconditionally rather than complicate the interface further. -1 for fysnc() per file created as it can create a lot of sync load on production servers impacting performance. How about just syncing the directory at the end assuming it doesn't cost as much as fsync() per FPI file created would? > > 4. > > +$primary->append_conf('postgresql.co
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Thu, Nov 10, 2022 at 2:33 AM Bharath Rupireddy wrote: > > On Thu, Nov 10, 2022 at 1:31 AM David Christensen > wrote: > > > > Thanks for providing the v7 patch, please see my comments and responses below. Hi Bharath, Thanks for the feedback. > > > 2. I'm unable to understand the use-case for --fixup-fpi option. > > > pg_waldump is supposed to be just WAL reader, and must not return any > > > modified information, with --fixup-fpi option, the patch violates this > > > principle i.e. it sets page LSN and returns. Without actually > > > replaying the WAL record on the page, how is it correct to just set > > > the LSN? How will it be useful? ISTM, we must ignore this option > > > unless there's a strong use-case. > > > > How I was envisioning this was for cases like extreme surgery for > > corrupted pages, where you extract the page from WAL but it has lsn > > and checksum set so you could do something like `dd if=fixup-block > > of=relation ...`, so it *simulates* the replay of said fullpage blocks > > in cases where for some reason you can't play the intermediate > > records; since this is always a fullpage block, it's capturing what > > would be the snapshot so you could manually insert somewhere as needed > > without needing to replay (say if dealing with an incomplete or > > corrupted WAL stream). > > Recovery sets the page LSN after it replayed the WAL record on the > page right? Recovery does this - base_page/FPI + > apply_WAL_record_and_then_set_applied_WAL_record's_LSN = > new_version_of_page. Essentially, in your patch, you are just setting > the WAL record LSN with the page contents being the base page's. I'm > still not sure what's the real use-case here. We don't have an > independent function in postgres, given a base page and a WAL record > that just replays the WAL record and output's the new version of the > page, so I think what you do in the patch with fixup option seems > wrong to me. Well if it's not the same output then I guess you're right and there's not a use for the `--fixup` mode. By the same token, I'd say calculating/setting the checksum also wouldn't need to be done, we should just include the page as included in the WAL stream. > > > 5. > > > +if (!RestoreBlockImage(record, block_id, page)) > > > +continue; > > > + > > > +/* we have our extracted FPI, let's save it now */ > > > After extracting the page from the WAL record, do we need to perform a > > > checksum on it? > > I think you just need to do the following, this will ensure the > authenticity of the page that pg_waldump returns. > if ((PageHeader) page)->pd_checksum != pg_checksum_page((char *) page, blk)) > { > pg_fatal("page checksum failed"); > } The WAL already has a checksum, so not certain this makes sense on its own. Also I'm inclined to make it a warning if it doesn't match rather than a fatal. (I'd also have to verify that the checksum is properly set on the page prior to copying the FPI into WAL, which I'm pretty sure it is but not certain.) > > > case 'W': > > > config.save_fpw_path = pg_strdup(optarg); > > > case 'X': > > >config.fixup_fpw = true; > > >config.save_fpw_path = pg_strdup(optarg); > > > > Like separate opt processing with their own `break` statement? > > Probably a bit more readable/conventional. > > Yes. Moot with the removal of the --fixup mode. > Some more comments: > > 1. > +PGAlignedBlockzerobuf; > Essentially, it's not a zero buffer, please rename the variable to > something like 'buf' or 'page_buf' or someother? Sure. > 2. > +if (pg_pwrite(fileno(OPF), page, BLCKSZ, 0) != BLCKSZ) > Replace pg_pwrite with fwrite() and avoid fileno() system calls that > should suffice here, AFICS, we don't need pg_pwrite. Sure. > 3. > +if (config.save_fpw_path != NULL) > +{ > +/* Create the dir if it doesn't exist */ > +if (pg_mkdir_p(config.save_fpw_path, pg_dir_create_mode) < 0) > I think you still need pg_check_dir() here, how about something like below? I was assuming pg_mkdir_p() acted just like mkdir -p, where it's just an idempotent action, so an existing dir is just treated the same. What's the benefit here? Would assume if a non-dir file existed at that path or other permissions issues arose we'd just get an error from pg_mkdir_p(). (Will review the code there and confirm.) > 4. > +/* Create the dir if it doesn
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Thu, Nov 10, 2022 at 1:31 AM David Christensen wrote: > Thanks for providing the v7 patch, please see my comments and responses below. > > 2. I'm unable to understand the use-case for --fixup-fpi option. > > pg_waldump is supposed to be just WAL reader, and must not return any > > modified information, with --fixup-fpi option, the patch violates this > > principle i.e. it sets page LSN and returns. Without actually > > replaying the WAL record on the page, how is it correct to just set > > the LSN? How will it be useful? ISTM, we must ignore this option > > unless there's a strong use-case. > > How I was envisioning this was for cases like extreme surgery for > corrupted pages, where you extract the page from WAL but it has lsn > and checksum set so you could do something like `dd if=fixup-block > of=relation ...`, so it *simulates* the replay of said fullpage blocks > in cases where for some reason you can't play the intermediate > records; since this is always a fullpage block, it's capturing what > would be the snapshot so you could manually insert somewhere as needed > without needing to replay (say if dealing with an incomplete or > corrupted WAL stream). Recovery sets the page LSN after it replayed the WAL record on the page right? Recovery does this - base_page/FPI + apply_WAL_record_and_then_set_applied_WAL_record's_LSN = new_version_of_page. Essentially, in your patch, you are just setting the WAL record LSN with the page contents being the base page's. I'm still not sure what's the real use-case here. We don't have an independent function in postgres, given a base page and a WAL record that just replays the WAL record and output's the new version of the page, so I think what you do in the patch with fixup option seems wrong to me. > > 5. > > +if (!RestoreBlockImage(record, block_id, page)) > > +continue; > > + > > +/* we have our extracted FPI, let's save it now */ > > After extracting the page from the WAL record, do we need to perform a > > checksum on it? I think you just need to do the following, this will ensure the authenticity of the page that pg_waldump returns. if ((PageHeader) page)->pd_checksum != pg_checksum_page((char *) page, blk)) { pg_fatal("page checksum failed"); } > > case 'W': > > config.save_fpw_path = pg_strdup(optarg); > > case 'X': > >config.fixup_fpw = true; > >config.save_fpw_path = pg_strdup(optarg); > > Like separate opt processing with their own `break` statement? > Probably a bit more readable/conventional. Yes. Some more comments: 1. +PGAlignedBlockzerobuf; Essentially, it's not a zero buffer, please rename the variable to something like 'buf' or 'page_buf' or someother? 2. +if (pg_pwrite(fileno(OPF), page, BLCKSZ, 0) != BLCKSZ) Replace pg_pwrite with fwrite() and avoid fileno() system calls that should suffice here, AFICS, we don't need pg_pwrite. 3. +if (config.save_fpw_path != NULL) +{ +/* Create the dir if it doesn't exist */ +if (pg_mkdir_p(config.save_fpw_path, pg_dir_create_mode) < 0) I think you still need pg_check_dir() here, how about something like below? if (pg_check_dir(config.save_fpw_path) == 0) { if (pg_mkdir_p(config.save_fpw_path, pg_dir_create_mode) < 0) { /* error */ } } 4. +/* Create the dir if it doesn't exist */ +if (pg_mkdir_p(config.save_fpw_path, pg_dir_create_mode) < 0) +{ +pg_log_error("could not create output directory \"%s\": %m", + config.save_fpw_path); +goto bad_argument; Why is the directory creation error a bad_argument? I think you need just pg_fatal() here. 5. +fsync(fileno(OPF)); +fclose(OPF); I think just the fsync() isn't enough, you still need fsync_fname() and/or fsync_parent_path(), perhaps after for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++) loop. 6. Speaking of which, do we need to do fsync()'s optionally? If we were to write many such FPI files, aren't there going to be more fsync() calls and imagine this feature being used on a production server and a lot of fsync() will definitely make running server fsync() ops slower. I think we need a new option whether pg_waldump ever do fsync() or not, something similar to --no-sync of pg_receivewal/pg_upgrade/pg_dump/pg_initdb/pg_checksums etc. I would like it if the pg_waldump's --no-sync is coded as 0001 and 0002 can make use of it. 7. +pg_fatal("couldn't write out complete fullpage image to file: %s", filename); We typically use "full page image" in the output strings, pl
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Wed, Nov 9, 2022 at 2:08 PM David Christensen wrote: > Justin sez: > > I was wondering if there's any reason to do "CREATE DATABASE". The vast > > majority of TAP tests don't. > > > > $ git grep -ho 'safe_psql[^ ]*' '*pl' |sort |uniq -c |sort -nr |head > >1435 safe_psql('postgres', > > 335 safe_psql( > > 23 safe_psql($connect_db, > > If there was a reason, I don't recall offhand; I will test removing it > and if things still work will consider it good enough. Things blew up when I did that; rather than hunt it down, I just left it in. :-) Enclosed is v7, with changes thus suggested thus far. v7-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-str.patch Description: Binary data
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
> > 6. > > +if (dir_status == 0 && mkdir(config.save_fpw_path, 0700) < 0) > > Use pg_dir_create_mode instead of hard-coded 0007? > > I think I thought of that when I first looked at the patch ... but, I'm > not sure, since it says: > > src/include/common/file_perm.h-/* Modes for creating directories and files IN > THE DATA DIRECTORY */ > src/include/common/file_perm.h:extern PGDLLIMPORT int pg_dir_create_mode; Looks like it's pretty evenly split in src/bin: $ git grep -o -E -w '(pg_mkdir_p|mkdir)' '**.c' | sort | uniq -c 3 initdb/initdb.c:mkdir 3 initdb/initdb.c:pg_mkdir_p 1 pg_basebackup/bbstreamer_file.c:mkdir 2 pg_basebackup/pg_basebackup.c:pg_mkdir_p 1 pg_dump/pg_backup_directory.c:mkdir 1 pg_rewind/file_ops.c:mkdir 4 pg_upgrade/pg_upgrade.c:mkdir So if that is the preferred approach I'll go ahead and use it. > I was wondering if there's any reason to do "CREATE DATABASE". The vast > majority of TAP tests don't. > > $ git grep -ho 'safe_psql[^ ]*' '*pl' |sort |uniq -c |sort -nr |head >1435 safe_psql('postgres', > 335 safe_psql( > 23 safe_psql($connect_db, If there was a reason, I don't recall offhand; I will test removing it and if things still work will consider it good enough. David
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Wed, Nov 9, 2022 at 6:30 AM Bharath Rupireddy wrote: > > On Wed, Nov 9, 2022 at 5:08 AM David Christensen > wrote: > > > > Enclosed is v6, which squashes your refactor and adds the additional > > recent suggestions. > > Thanks for working on this feature. Here are some comments for now. I > haven't looked at the tests, I will take another look at the code and > tests after these and all other comments are addressed. > > 1. For ease of review, please split the test patch to 0002. Per later discussion seems like new feature tests are fine in the same patch, yes? > 2. I'm unable to understand the use-case for --fixup-fpi option. > pg_waldump is supposed to be just WAL reader, and must not return any > modified information, with --fixup-fpi option, the patch violates this > principle i.e. it sets page LSN and returns. Without actually > replaying the WAL record on the page, how is it correct to just set > the LSN? How will it be useful? ISTM, we must ignore this option > unless there's a strong use-case. How I was envisioning this was for cases like extreme surgery for corrupted pages, where you extract the page from WAL but it has lsn and checksum set so you could do something like `dd if=fixup-block of=relation ...`, so it *simulates* the replay of said fullpage blocks in cases where for some reason you can't play the intermediate records; since this is always a fullpage block, it's capturing what would be the snapshot so you could manually insert somewhere as needed without needing to replay (say if dealing with an incomplete or corrupted WAL stream). > 3. > +if (fork >= 0 && fork <= MAX_FORKNUM) > +{ > +if (fork) > +sprintf(forkname, "_%s", forkNames[fork]); > +else > +forkname[0] = 0; > +} > +else > +pg_fatal("found invalid fork number: %u", fork); > > Isn't the above complex? What's the problem with something like below? > Why do we need if (fork) - else block? > > if (fork >= 0 && fork <= MAX_FORKNUM) > sprintf(forkname, "_%s", forkNames[fork]); > else > pg_fatal("found invalid fork number: %u", fork); This was to suppress any suffix for main forks, but yes, could simplify and include the `_` in the suffix name. Will include such a change. > 3. > +charpage[BLCKSZ] = {0}; > I think when writing to a file, we need PGAlignedBlock rather than a > simple char array of bytes, see the description around PGAlignedBlock > for why it is so. Easy enough change, and makes sense. > 4. > +if (pg_pwrite(fileno(OPF), page, BLCKSZ, 0) != BLCKSZ) > Why pg_pwrite(), why not just fwrite()? If fwrite() is used, you can > avoid fileno() system calls, no? Do you need the file position to > remain the same after writing, hence pg_pwrite()? I don't recall the original motivation, TBH. > 5. > +if (!RestoreBlockImage(record, block_id, page)) > +continue; > + > +/* we have our extracted FPI, let's save it now */ > After extracting the page from the WAL record, do we need to perform a > checksum on it? That is there in fixup mode (or should be). Are you thinking this should also be set if not in fixup mode? That defeats the purpose of the raw page extract, which is to see *exactly* what the WAL stream has. > 6. > +if (dir_status == 0 && mkdir(config.save_fpw_path, 0700) < 0) > Use pg_dir_create_mode instead of hard-coded 0007? Sure. > 7. > +if (dir_status == 0 && mkdir(config.save_fpw_path, 0700) < 0) > +fsync(fileno(OPF)); > +fclose(OPF); > Since you're creating the directory in case it's not available, you > need to fsync the directory too? Sure. > 8. > +case 'W': > +case 'X': > +config.fixup_fpw = (option == 'X'); > +config.save_fpw_path = pg_strdup(optarg); > +break; > Just set config.fixup_fpw = false before the switch block starts, > like the other variables, and then perhaps doing like below is more > readable: > case 'W': > config.save_fpw_path = pg_strdup(optarg); > case 'X': >config.fixup_fpw = true; >config.save_fpw_path = pg_strdup(optarg); Like separate opt processing with their own `break` statement? Probably a bit more readable/conventional. > 9. > +if (dir_status == 0 && mkdir(config.save_fpw_path, 0700) < 0) > Should we use pg_mkdir_p() instead of mkdir()? Sure. Thanks, David
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On 2022-Nov-09, Justin Pryzby wrote: > On Wed, Nov 09, 2022 at 06:00:40PM +0530, Bharath Rupireddy wrote: > > 1. For ease of review, please split the test patch to 0002. > > This is just my opinion, but .. why ? Since it's easy to > filter/skip/display a file, I don't think it's usually useful to have > separate patches for tests or docs. I concur with Justin. When a patch is bugfix and a test is added that verifies it, I like to keep the test in a separate commit (for submit purposes and in my personal repo -- not for the official push!) so that I can git-checkout to just the test and make sure it fails ahead of pushing the fix commit. But for a new feature, there's no reason to do so. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Wed, Nov 09, 2022 at 06:00:40PM +0530, Bharath Rupireddy wrote: > On Wed, Nov 9, 2022 at 5:08 AM David Christensen > wrote: > > > > Enclosed is v6, which squashes your refactor and adds the additional > > recent suggestions. > > Thanks for working on this feature. Here are some comments for now. I > haven't looked at the tests, I will take another look at the code and > tests after these and all other comments are addressed. > > 1. For ease of review, please split the test patch to 0002. This is just my opinion, but .. why ? Since it's easy to filter/skip/display a file, I don't think it's usually useful to have separate patches for tests or docs. > 6. > +if (dir_status == 0 && mkdir(config.save_fpw_path, 0700) < 0) > Use pg_dir_create_mode instead of hard-coded 0007? I think I thought of that when I first looked at the patch ... but, I'm not sure, since it says: src/include/common/file_perm.h-/* Modes for creating directories and files IN THE DATA DIRECTORY */ src/include/common/file_perm.h:extern PGDLLIMPORT int pg_dir_create_mode; I was wondering if there's any reason to do "CREATE DATABASE". The vast majority of TAP tests don't. $ git grep -ho 'safe_psql[^ ]*' '*pl' |sort |uniq -c |sort -nr |head 1435 safe_psql('postgres', 335 safe_psql( 23 safe_psql($connect_db, -- Justin
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Wed, Nov 9, 2022 at 5:08 AM David Christensen wrote: > > Enclosed is v6, which squashes your refactor and adds the additional > recent suggestions. Thanks for working on this feature. Here are some comments for now. I haven't looked at the tests, I will take another look at the code and tests after these and all other comments are addressed. 1. For ease of review, please split the test patch to 0002. 2. I'm unable to understand the use-case for --fixup-fpi option. pg_waldump is supposed to be just WAL reader, and must not return any modified information, with --fixup-fpi option, the patch violates this principle i.e. it sets page LSN and returns. Without actually replaying the WAL record on the page, how is it correct to just set the LSN? How will it be useful? ISTM, we must ignore this option unless there's a strong use-case. 3. +if (fork >= 0 && fork <= MAX_FORKNUM) +{ +if (fork) +sprintf(forkname, "_%s", forkNames[fork]); +else +forkname[0] = 0; +} +else +pg_fatal("found invalid fork number: %u", fork); Isn't the above complex? What's the problem with something like below? Why do we need if (fork) - else block? if (fork >= 0 && fork <= MAX_FORKNUM) sprintf(forkname, "_%s", forkNames[fork]); else pg_fatal("found invalid fork number: %u", fork); 3. +charpage[BLCKSZ] = {0}; I think when writing to a file, we need PGAlignedBlock rather than a simple char array of bytes, see the description around PGAlignedBlock for why it is so. 4. +if (pg_pwrite(fileno(OPF), page, BLCKSZ, 0) != BLCKSZ) Why pg_pwrite(), why not just fwrite()? If fwrite() is used, you can avoid fileno() system calls, no? Do you need the file position to remain the same after writing, hence pg_pwrite()? 5. +if (!RestoreBlockImage(record, block_id, page)) +continue; + +/* we have our extracted FPI, let's save it now */ After extracting the page from the WAL record, do we need to perform a checksum on it? 6. +if (dir_status == 0 && mkdir(config.save_fpw_path, 0700) < 0) Use pg_dir_create_mode instead of hard-coded 0007? 7. +if (dir_status == 0 && mkdir(config.save_fpw_path, 0700) < 0) +fsync(fileno(OPF)); +fclose(OPF); Since you're creating the directory in case it's not available, you need to fsync the directory too? 8. +case 'W': +case 'X': +config.fixup_fpw = (option == 'X'); +config.save_fpw_path = pg_strdup(optarg); +break; Just set config.fixup_fpw = false before the switch block starts, like the other variables, and then perhaps doing like below is more readable: case 'W': config.save_fpw_path = pg_strdup(optarg); case 'X': config.fixup_fpw = true; config.save_fpw_path = pg_strdup(optarg); 9. +if (dir_status == 0 && mkdir(config.save_fpw_path, 0700) < 0) Should we use pg_mkdir_p() instead of mkdir()? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: tested, failed Spec compliant: tested, failed Documentation:tested, failed Hello, I tested this patch on Linux and there is no problem. Also, I reviewed this patch and commented below. @@ -439,6 +447,107 @@ XLogRecordHasFPW(XLogReaderState *record) + if (fork >= 0 && fork <= MAX_FORKNUM) + { + if (fork) + sprintf(forkname, "_%s", forkNames[fork]); + else + forkname[0] = 0; + } + else + pg_fatal("found invalid fork number: %u", fork); Should we add the comment if the main fork is saved without "_main" suffix for code readability? @@ -679,6 +788,9 @@ usage(void) " (default: 1 or the value used in STARTSEG)\n")); printf(_(" -V, --version output version information, then exit\n")); printf(_(" -w, --fullpage only show records with a full page write\n")); + printf(_(" -W, --save-fpi=pathsave full page images to given path as\n" +" LSN.T.D.R.B_F\n")); + printf(_(" -X, --fixup-fpi=path like --save-fpi but apply LSN fixups to saved page\n")); printf(_(" -x, --xid=XID only show records with transaction ID XID\n")); printf(_(" -z, --stats[=record] show statistics instead of records\n" " (optionally, show per-record statistics)\n")); Since lower-case options are displayed at the top, should we switch the order of -x and -X? @@ -972,6 +1093,25 @@ main(int argc, char **argv) } } + int dir_status = pg_check_dir(config.save_fpw_path); + + if (dir_status < 0) + { + pg_log_error("could not access output directory: %s", config.save_fpw_path); + goto bad_argument; + } Should we output %s enclosed with \"? Regards, Sho Kato
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
Enclosed is v6, which squashes your refactor and adds the additional recent suggestions. Thanks! v6-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-str.patch Description: Binary data
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Tue, Nov 8, 2022 at 4:45 PM Justin Pryzby wrote: > > On Mon, Nov 07, 2022 at 05:01:01PM -0600, David Christensen wrote: > > Hi Justin et al, > > > > Enclosed is v5 of this patch which now passes the CirrusCI checks for > > all supported OSes. I went ahead and reworked the test a bit so it's a > > little more amenable to the OS-agnostic approach for testing. > > Great, thanks. > > This includes the changes that I'd started a few months ago. > Plus adding the test which was missing for meson. Cool, will review, thanks. > +format: > LSN.TSOID.DBOID.RELNODE.BLKNO > > I'd prefer if the abbreviations were "reltablespace" and "datoid" Sure, no issues there. > Also, should the test case call pg_relation_filenode() rather than using > relfilenode directly ? Is it a problem that the test code assumes > pagesize=8192 ? Both good points. Is pagesize just exposed via `current_setting('block_size')` or is there a different approach? David
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Mon, Nov 07, 2022 at 05:01:01PM -0600, David Christensen wrote: > Hi Justin et al, > > Enclosed is v5 of this patch which now passes the CirrusCI checks for > all supported OSes. I went ahead and reworked the test a bit so it's a > little more amenable to the OS-agnostic approach for testing. Great, thanks. This includes the changes that I'd started a few months ago. Plus adding the test which was missing for meson. +format: LSN.TSOID.DBOID.RELNODE.BLKNO I'd prefer if the abbreviations were "reltablespace" and "datoid" Also, should the test case call pg_relation_filenode() rather than using relfilenode directly ? Is it a problem that the test code assumes pagesize=8192 ? -- Justin >From d4cb23bc2f0edcc65b0bef75fd2e8d6e5aeda21f Mon Sep 17 00:00:00 2001 From: David Christensen Date: Wed, 20 Apr 2022 19:59:35 -0500 Subject: [PATCH 1/2] Teach pg_waldump to extract FPIs from the WAL stream Extracts full-page images from the WAL stream into a target directory, which must be empty or not exist. These images are subject to the same filtering rules as normal display in pg_waldump, which means that you can isolate the full page writes to a target relation, among other things. Files are saved with the filename: with formatting to make things somewhat sortable; for instance: -01C0.1663.1.6117.0 -01000150.1664.0.6115.0 -010001E0.1664.0.6114.0 -01000270.1663.1.6116.0 -01000300.1663.1.6113.0 -01000390.1663.1.6112.0 -01000420.1663.1.8903.0 -010004B0.1663.1.8902.0 -01000540.1663.1.6111.0 -010005D0.1663.1.6110.0 If the FPI comes from a fork other than the main fork, the fork name will be appended on the output file name; e.g.: -014A4758.1663.1.12864.0_vm It's noteworthy that the raw block images do not have the current LSN stored with them in the WAL stream (as would be true for on-heap versions of the blocks), nor would the checksum be updated in them (though WAL itself has checksums, so there is some protection there). As such there are two versions of this functionality, one which returns the raw page as it appears in the WAL (--save-fpi) and one which applies the updated pd_lsn and pd_checksum (--fixup-fpi). These images could be loaded/inspected via `pg_read_binary_file()` and used in the `pageinspect` suite of tools to perform detailed analysis on the pages in question, based on historical information, and may come in handy for forensics work. --- doc/src/sgml/ref/pg_waldump.sgml | 79 +++ src/bin/pg_waldump/pg_waldump.c | 151 +- src/bin/pg_waldump/t/002_save_fullpage.pl | 137 3 files changed, 366 insertions(+), 1 deletion(-) create mode 100644 src/bin/pg_waldump/t/002_save_fullpage.pl diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml index d559f091e5f..cc3ce918905 100644 --- a/doc/src/sgml/ref/pg_waldump.sgml +++ b/doc/src/sgml/ref/pg_waldump.sgml @@ -240,6 +240,85 @@ PostgreSQL documentation + + -W save_path + --save-fpi=save_path + -X save_path + --fixup-fpi=save_path + + +Save full page images seen in the WAL stream to the +save_path directory, which will be created +if it does not exist. The images saved will be subject to the same +filtering and limiting criteria as display records, but in this +mode pg_waldump will not output any other +information. + + +If invoked using +the -X/--fixup-fpi +option, this page image will include the pd_lsn of +the replayed record rather than the raw page image; as well, +the pd_checksum field will be updated if it had +previously existed. + + +The page images will be saved with the file +format: LSN.TSOID.DBOID.RELNODE.BLKNOFORK + +The dot-separated components are (in order): + + + + + +Component +Description + + + + + +LSN +The LSN of the record with this block, formatted + as two 8-character hexadecimal numbers %08X-%08X + + + +TSOID +tablespace OID for the block + + + +DBOID +database OID for the block + + + +RELNODE +relnode id for the block + + + +BLKNO +the block number of this block + + + +FORK + + if coming from the main fork, will be empty, otherwise will be + one of _fsm, _vm, + or _init. + + + + + + + + +
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
Hi Justin et al, Enclosed is v5 of this patch which now passes the CirrusCI checks for all supported OSes. I went ahead and reworked the test a bit so it's a little more amenable to the OS-agnostic approach for testing. Best, David v5-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-str.patch Description: Binary data
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Fri, Nov 4, 2022 at 1:38 PM Justin Pryzby wrote: > > > As I recall, that's due to relying on "cp". And "rsync", which > > > shouldn't be assumed to exist by regression tests). Will poke around other TAP tests to see if there's a more consistent interface, what perl version we can assume and available modules, etc. If there's not some trivial wrapper at this point so all TAP tests could use it regardless of OS, it would definitely be good to introduce such a method. > > I will work on supporting the windows compatibility here. Is there some > > list of guidelines for what you can and can’t use? I don’t have a windows > > machine available to develop on. > > I think a lot (most?) developers here don't have a windows environment > available, so now have been using cirrusci's tests to verify. If you > haven't used cirrusci directly (not via cfbot) before, start at: > src/tools/ci/README Thanks, good starting point. > > Was it failing on windows? I was attempting to skip it as I recall. > > I don't see anything about skipping, and cirrus's logs from 2 > commitfests ago were pruned. I looked at this patch earlier this year, > but never got around to replacing the calls to rsync and cp. Ah, it's skipped (not fixed) in my git repo, but never got around to submitting that version through email. That explains it. David
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Fri, Nov 04, 2022 at 09:16:29AM -0500, David Christensen wrote: > On Nov 4, 2022, at 9:02 AM, Justin Pryzby wrote: > > On Fri, Nov 04, 2022 at 11:52:59AM +0900, Ian Lawrence Barwick wrote: > >> 2022年5月3日(火) 8:45 David Christensen : > >>> > >>> ...and pushing a couple fixups pointed out by cfbot, so here's v4. > >> > >> cfbot reports the patch no longer applies [1]. As CommitFest 2022-11 is > >> currently underway, this would be an excellent time to update the patch. > > > > More important than needing to be rebased, the patch has never passed > > its current tests on windows. > > > > As I recall, that's due to relying on "cp". And "rsync", which > > shouldn't be assumed to exist by regression tests). > > I will work on supporting the windows compatibility here. Is there some list > of guidelines for what you can and can’t use? I don’t have a windows machine > available to develop on. I think a lot (most?) developers here don't have a windows environment available, so now have been using cirrusci's tests to verify. If you haven't used cirrusci directly (not via cfbot) before, start at: src/tools/ci/README There's not much assumed about the build environment, and not much more assumed about the test environment. Most of the portability is handled by using C and perl. I think there's even no assumption that "tar" is available (except maybe for building releases). This patch should avoid relying on tools that aren't already required. As a practical matter, cfbot needs to pass, not only to demonstrate that the patch consistently passes tests, but also because if the patch were merged while it failed tests in cfbot, it would cause every other patch to start to fail, too. > Was it failing on windows? I was attempting to skip it as I recall. I don't see anything about skipping, and cirrus's logs from 2 commitfests ago were pruned. I looked at this patch earlier this year, but never got around to replacing the calls to rsync and cp. -- Justin
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Nov 4, 2022, at 9:02 AM, Justin Pryzby wrote: > > On Fri, Nov 04, 2022 at 11:52:59AM +0900, Ian Lawrence Barwick wrote: >> 2022年5月3日(火) 8:45 David Christensen : >>> >>> ...and pushing a couple fixups pointed out by cfbot, so here's v4. >> >> cfbot reports the patch no longer applies [1]. As CommitFest 2022-11 is >> currently underway, this would be an excellent time to update the patch. > > More important than needing to be rebased, the patch has never passed > its current tests on windows. > > As I recall, that's due to relying on "cp". And "rsync", which > shouldn't be assumed to exist by regression tests). I will work on supporting the windows compatibility here. Is there some list of guidelines for what you can and can’t use? I don’t have a windows machine available to develop on. Was it failing on windows? I was attempting to skip it as I recall. Best, David
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Fri, Nov 04, 2022 at 11:52:59AM +0900, Ian Lawrence Barwick wrote: > 2022年5月3日(火) 8:45 David Christensen : > > > > ...and pushing a couple fixups pointed out by cfbot, so here's v4. > > cfbot reports the patch no longer applies [1]. As CommitFest 2022-11 is > currently underway, this would be an excellent time to update the patch. More important than needing to be rebased, the patch has never passed its current tests on windows. As I recall, that's due to relying on "cp". And "rsync", which shouldn't be assumed to exist by regression tests). https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/38/3628
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
2022年5月3日(火) 8:45 David Christensen : > > ...and pushing a couple fixups pointed out by cfbot, so here's v4. Hi cfbot reports the patch no longer applies [1]. As CommitFest 2022-11 is currently underway, this would be an excellent time to update the patch. [1] http://cfbot.cputube.org/patch_40_3628.log Thanks Ian Barwick
Re: pg_waldump: add test for coverage
Hi, On 2022-08-23 10:50:08 +0900, Dong Wook Lee wrote: > I wrote a test for coverage. Unfortunately the test doesn't seem to pass on windows, and hasn't ever done so: https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/39/3834 Due to the merge of the meson patchset, you should also add 001_basic.pl to the list of tests in meson.build Greetings, Andres Freund
Re: pg_waldump: add test for coverage
On 23.08.22 03:50, Dong Wook Lee wrote: Hi Hackers, I wrote a test for coverage. Unfortunately, it seems to take quite a while to run the test. I want to improve these execution times, but I don't know exactly what to do. Therefore, I want to hear feedback from many people. I don't find these tests to be particularly slow. How long do they take for you to run? A couple of tips: - You should give each test a name. That's why each test function has a (usually) last argument that takes a string. - You could use command_like() to run a command and check that it exits successfully and check its standard out. For example, instead of # test pg_waldump with -F (main) IPC::Run::run [ 'pg_waldump', "$wal_dump_path", '-F', 'main' ], '>', \$stdout, '2>', \$stderr; isnt($stdout, '', ""); it is better to write command_like([ 'pg_waldump', "$wal_dump_path", '-F', 'main' ], qr/TODO/, 'test -F (main)'); - It would be useful to test the actual output (that is, fill in the TODO above). I don't know what the best way to do that is -- that is part of designing these tests. Also, - Your patch introduces a spurious blank line at the end of the test file. - For portability, options must be before non-option arguments. So instead of [ 'pg_waldump', "$wal_dump_path", '-F', 'main' ] it should be [ 'pg_waldump', '-F', 'main', "$wal_dump_path" ] I think having some more test coverage for pg_waldump would be good, so I encourage you to continue working on this.
pg_waldump: add test for coverage
Hi Hackers, I wrote a test for coverage. Unfortunately, it seems to take quite a while to run the test. I want to improve these execution times, but I don't know exactly what to do. Therefore, I want to hear feedback from many people. --- Regards, Dong Wook Lee v1_add_test_pg_waldump.patch Description: Binary data
Re: pg_waldump got an error with waldump file generated by initdb
On Sun, Jul 10, 2022 at 02:39:00PM -0700, Andres Freund wrote: > On 2022-07-10 21:51:04 +0900, Dong Wook Lee wrote: > > I don't know if this is an error. > > when I do ./initdb -D ../data and execute pg_waldump like this, In the last > > part I got an error. > > > > ``` > > ./pg_waldump ../data/pg_wal/00010001 > > ``` > > > > pg_waldump: error: error in WAL record at 0/140: invalid record length > > at 0/1499A08: wanted 24, got 0 > > > > my environment is `16devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu > > 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit` > > Is this normal? > > Yes, that's likely normal - i.e. pg_waldump has reached the point at which the > WAL ends. It's the issue that's fixed by this patch. 38/2490 Make message at end-of-recovery less scary Kyotaro Horiguchi https://commitfest.postgresql.org/38/2490/
Re: pg_waldump got an error with waldump file generated by initdb
Hi, On 2022-07-10 21:51:04 +0900, Dong Wook Lee wrote: > I don't know if this is an error. > when I do ./initdb -D ../data and execute pg_waldump like this, In the last > part I got an error. > > ``` > ./pg_waldump ../data/pg_wal/00010001 > ``` > > pg_waldump: error: error in WAL record at 0/140: invalid record length at > 0/1499A08: wanted 24, got 0 > > my environment is `16devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu > 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit` > Is this normal? Yes, that's likely normal - i.e. pg_waldump has reached the point at which the WAL ends. Greetings, Andres Freund
pg_waldump got an error with waldump file generated by initdb
Hi, hackers I don't know if this is an error. when I do ./initdb -D ../data and execute pg_waldump like this, In the last part I got an error. ``` ./pg_waldump ../data/pg_wal/00010001 ``` pg_waldump: error: error in WAL record at 0/140: invalid record length at 0/1499A08: wanted 24, got 0 my environment is `16devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit` Is this normal?
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Tue, May 03, 2022 at 10:34:41AM +0200, Alvaro Herrera wrote: > I remember Greg Mullane posted a tool that attempts to correct page CRC > mismatches[1]. This new tool might be useful to feed healing attempts, > too. (It's of course not in any way a *solution*, because the page > might have been modified by other WAL records since the last FPI, but it > could be a start towards building a solution that scoops page contents > from WAL.) > > [1] https://github.com/turnstep/pg_healer Fun. Thanks for mentioning that. -- Michael signature.asc Description: PGP signature
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On 2022-Apr-27, Michael Paquier wrote: > On Tue, Apr 26, 2022 at 01:15:05PM -0500, David Christensen wrote: > > True. :-) This does seem like a tool geared towards "expert mode", so > > maybe we just assume if you need it you know what you're doing? > > This is definitely an expert mode toy. I remember Greg Mullane posted a tool that attempts to correct page CRC mismatches[1]. This new tool might be useful to feed healing attempts, too. (It's of course not in any way a *solution*, because the page might have been modified by other WAL records since the last FPI, but it could be a start towards building a solution that scoops page contents from WAL.) [1] https://github.com/turnstep/pg_healer -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ Subversion to GIT: the shortest path to happiness I've ever heard of (Alexey Klyukin)
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
...and pushing a couple fixups pointed out by cfbot, so here's v4. On Mon, May 2, 2022 at 8:42 AM David Christensen wrote: > > Enclosed is v3 of this patch; this adds two modes for this feature, > one with the raw page `--save-fullpage/-W` and one with the > LSN+checksum fixups `--save-fullpage-fixup/-X`. > > I've added at least some basic sanity-checking of the underlying > feature, as well as run the test file and the changes to pg_waldump.c > through pgindent/perltidy to make them adhere to project standards. > Threw in a rebase as well. > > Would appreciate any additional feedback here. > > Best, > > David v4-pg_waldump-save-fpi.patch Description: Binary data
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
Enclosed is v3 of this patch; this adds two modes for this feature, one with the raw page `--save-fullpage/-W` and one with the LSN+checksum fixups `--save-fullpage-fixup/-X`. I've added at least some basic sanity-checking of the underlying feature, as well as run the test file and the changes to pg_waldump.c through pgindent/perltidy to make them adhere to project standards. Threw in a rebase as well. Would appreciate any additional feedback here. Best, David v3-pg_waldump-save-fpi.patch Description: Binary data
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Tue, Apr 26, 2022 at 01:15:05PM -0500, David Christensen wrote: > True. :-) This does seem like a tool geared towards "expert mode", so > maybe we just assume if you need it you know what you're doing? This is definitely an expert mode toy. -- Michael signature.asc Description: PGP signature
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Mon, Apr 25, 2022 at 9:42 PM Michael Paquier wrote: > On Mon, Apr 25, 2022 at 10:24:52AM -0500, David Christensen wrote: > > On Mon, Apr 25, 2022 at 6:03 AM Bharath Rupireddy > > wrote: > >> Thanks for working on this. I'm just thinking if we can use these FPIs > >> to repair the corrupted pages? I would like to understand more > >> detailed usages of the FPIs other than inspecting with pageinspect. > > > > My main use case was for being able to look at potential corruption, > > either in the WAL stream, on heap, or in tools associated with the WAL > > stream. I suppose you could use the page images to replace corrupted > > on-disk pages (and in fact I think I've heard of a tool or two that > > try to do that), though don't know that I consider this the primary > > purpose (and having toast tables and the list, as well as clog would > > make it potentially hard to just drop-in a page version without > > issues). Might help in extreme situations though. > > You could do a bunch of things with those images, even make things > worse if you are not careful enough. True. :-) This does seem like a tool geared towards "expert mode", so maybe we just assume if you need it you know what you're doing? > >> 10) Along with pg_pwrite(), can we also fsync the files (of course > >> users can choose it optionally) so that the writes will be durable for > >> the OS crashes? > > > > Can add; you thinking a separate flag to disable this with default true? > > We expect data generated by tools like pg_dump, pg_receivewal > (depending on the use --synchronous) or pg_basebackup to be consistent > when we exit from the call. FWIW, flushing this data does not seem > like a strong requirement for something aimed at being used page-level > chirurgy or lookups, because the WAL segments should still be around > even if the host holding the archives is unplugged. I have added the fsync to the latest path (forthcoming), but I have no strong preferences here as to the correct/expected behavior. Best, David
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Mon, Apr 25, 2022 at 9:54 PM Michael Paquier wrote: > On Mon, Apr 25, 2022 at 10:11:10AM -0500, David Christensen wrote: > > On Mon, Apr 25, 2022 at 1:11 AM Michael Paquier wrote: > >> I don't think that there is any need to rely on a new logic if there > >> is already some code in place able to do the same work. See > >> verify_dir_is_empty_or_create() in pg_basebackup.c, as one example, > >> that relies on pg_check_dir(). I think that you'd better rely at > >> least on what pgcheckdir.c offers. > > > > Yeah, though I am tending towards what another user had suggested and > > just accepting an existing directory rather than requiring it to be > > empty, so thinking I might just skip this one. (Will review the file > > you've pointed out to see if there is a relevant function though.) > > OK. FWIW, pg_check_dir() is used in initdb and pg_basebackup because > these care about the behavior to use when specifying a target path. > You could reuse it, but use a different policy depending on its > returned result for the needs of what you see fit in this case. I have a new version of the patch (pending tests) that uses pg_check_dir's return value to handle things appropriately, so at least some code reuse now. It did end up simplifying a lot. > >> + PageSetLSN(page, record->ReadRecPtr); > >> + /* if checksum field is non-zero then we have > >> checksums enabled, > >> +* so recalculate the checksum with new LSN (yes, this > >> is a hack) > >> +*/ > >> Yeah, that looks like a hack, but putting in place a page on a cluster > >> that has checksums enabled would be more annoying with > >> zero_damaged_pages enabled if we don't do that, so that's fine by me > >> as-is. Perhaps you should mention that FPWs don't have their > >> pd_checksum updated when written. > > > > Can make a mention; you thinking just a comment in the code is > > sufficient, or want there to be user-visible docs as well? > > I was thinking about a comment, at least. New patch version has significantly more comments. > > This was to make the page as extracted equivalent to the effect of > > applying the WAL record block on replay (the LSN and checksum both); > > since recovery calls this function to mark the LSN where the page came > > from this is why I did this as well. (I do see perhaps a case for > > --raw output that doesn't munge the page whatsoever, just made > > comparisons easier.) > > Hm. Okay. The argument goes both ways, I guess, depending on what we > want to do with those raw pages. Still you should not need pd_lsn if > the point is to be able to stick the pages back in place to attempt to > get back as much data as possible when loading it back to the shared > buffers? Yeah, I can see that too; I think there's at least enough of an argument for a flag to apply the fixups or just extract only the raw page pre-modification. Not sure which should be the "default" behavior; either `--raw` or `--fixup-metadata` or something could work. (Naming suggestions welcomed.) David
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Mon, Apr 25, 2022 at 10:11:10AM -0500, David Christensen wrote: > On Mon, Apr 25, 2022 at 1:11 AM Michael Paquier wrote: >> I don't think that there is any need to rely on a new logic if there >> is already some code in place able to do the same work. See >> verify_dir_is_empty_or_create() in pg_basebackup.c, as one example, >> that relies on pg_check_dir(). I think that you'd better rely at >> least on what pgcheckdir.c offers. > > Yeah, though I am tending towards what another user had suggested and > just accepting an existing directory rather than requiring it to be > empty, so thinking I might just skip this one. (Will review the file > you've pointed out to see if there is a relevant function though.) OK. FWIW, pg_check_dir() is used in initdb and pg_basebackup because these care about the behavior to use when specifying a target path. You could reuse it, but use a different policy depending on its returned result for the needs of what you see fit in this case. >> + PageSetLSN(page, record->ReadRecPtr); >> + /* if checksum field is non-zero then we have checksums >> enabled, >> +* so recalculate the checksum with new LSN (yes, this >> is a hack) >> +*/ >> Yeah, that looks like a hack, but putting in place a page on a cluster >> that has checksums enabled would be more annoying with >> zero_damaged_pages enabled if we don't do that, so that's fine by me >> as-is. Perhaps you should mention that FPWs don't have their >> pd_checksum updated when written. > > Can make a mention; you thinking just a comment in the code is > sufficient, or want there to be user-visible docs as well? I was thinking about a comment, at least. > This was to make the page as extracted equivalent to the effect of > applying the WAL record block on replay (the LSN and checksum both); > since recovery calls this function to mark the LSN where the page came > from this is why I did this as well. (I do see perhaps a case for > --raw output that doesn't munge the page whatsoever, just made > comparisons easier.) Hm. Okay. The argument goes both ways, I guess, depending on what we want to do with those raw pages. Still you should not need pd_lsn if the point is to be able to stick the pages back in place to attempt to get back as much data as possible when loading it back to the shared buffers? -- Michael signature.asc Description: PGP signature
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Mon, Apr 25, 2022 at 10:24:52AM -0500, David Christensen wrote: > On Mon, Apr 25, 2022 at 6:03 AM Bharath Rupireddy > wrote: >> Thanks for working on this. I'm just thinking if we can use these FPIs >> to repair the corrupted pages? I would like to understand more >> detailed usages of the FPIs other than inspecting with pageinspect. > > My main use case was for being able to look at potential corruption, > either in the WAL stream, on heap, or in tools associated with the WAL > stream. I suppose you could use the page images to replace corrupted > on-disk pages (and in fact I think I've heard of a tool or two that > try to do that), though don't know that I consider this the primary > purpose (and having toast tables and the list, as well as clog would > make it potentially hard to just drop-in a page version without > issues). Might help in extreme situations though. You could do a bunch of things with those images, even make things worse if you are not careful enough. >> 10) Along with pg_pwrite(), can we also fsync the files (of course >> users can choose it optionally) so that the writes will be durable for >> the OS crashes? > > Can add; you thinking a separate flag to disable this with default true? We expect data generated by tools like pg_dump, pg_receivewal (depending on the use --synchronous) or pg_basebackup to be consistent when we exit from the call. FWIW, flushing this data does not seem like a strong requirement for something aimed at being used page-level chirurgy or lookups, because the WAL segments should still be around even if the host holding the archives is unplugged. -- Michael signature.asc Description: PGP signature
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Mon, Apr 25, 2022 at 6:03 AM Bharath Rupireddy wrote: > Thanks for working on this. I'm just thinking if we can use these FPIs > to repair the corrupted pages? I would like to understand more > detailed usages of the FPIs other than inspecting with pageinspect. My main use case was for being able to look at potential corruption, either in the WAL stream, on heap, or in tools associated with the WAL stream. I suppose you could use the page images to replace corrupted on-disk pages (and in fact I think I've heard of a tool or two that try to do that), though don't know that I consider this the primary purpose (and having toast tables and the list, as well as clog would make it potentially hard to just drop-in a page version without issues). Might help in extreme situations though. > Given that others have realistic use-cases (of course I would like to > know more about those), +1 for the idea. However, I would suggest > adding a function to extract raw FPI data to the pg_walinspect > extension that got recently committed in PG 15, the out of which can > directly be fed to pageinspect functions or Yeah, makes sense to have some overlap here; will review what is there and see if there is some shared code base we can utilize. (ISTR some work towards getting these two tools using more of the same code, and this seems like another such instance.) > Few comments: > 1) I think it's good to mention the stored file name format. > + printf(_(" -W, --raw-fpi=path save found full page images to > given path\n")); +1, though I've also thought there could be uses to have multiple possible output formats here (most immediately, there may be cases where we want *each* FPI for a block vs the *latest*, so files name with/without the LSN component seem the easiest way forward here). That would introduce some additional complexity though, so might need to see if others think that makes any sense. > 2) > + for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++) > + { > + /* we will now extract the fullpage image from the XLogRecord and save > + * it to a calculated filename */ > + > + if (XLogRecHasBlockImage(record, block_id)) > > I think we need XLogRecHasBlockRef to be true to check > XLogRecHasBlockImage otherwise, we will see some build farms failing, > recently I've seen this failure for pg_walinspect.. > > for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++) > { > if (!XLogRecHasBlockRef(record, block_id)) > continue; > > if (XLogRecHasBlockImage(record, block_id)) > *fpi_len += XLogRecGetBlock(record, block_id)->bimg_len; > } Good point; my previous patch that got committed here (127aea2a65) probably also needed this treatment. > 3) Please correct the commenting format: > + /* we will now extract the fullpage image from the XLogRecord and save > + * it to a calculated filename */ Ack. > 4) Usually we start errors with lower case letters "could not ." > + pg_fatal("Couldn't open file for output: %s", filename); > + pg_fatal("Couldn't write out complete FPI to file: %s", filename); > And the variable name too: > + FILE *OPF; Ack. > 5) Not sure how the FPIs of TOASTed tables get stored, but it would be > good to check. What would be different here? Are there issues you can think of, or just more from the pageinspect side of things? > 6) Good to specify the known usages of FPIs in the documentation. Ack. Prob good to get additional info/use cases from others, as mine is fairly short. :-) > 7) Isn't it good to emit an error if RestoreBlockImage returns false? > + if (RestoreBlockImage(record, block_id, page)) > + { Ack. > 8) I think I don't mind if a non-empty directory is specified - IMO > better usability is this - if the directory is non-empty, just go add > the FPI files if FPI file exists just replace it, if the directory > isn't existing, create and write the FPI files. > + /* we accept an empty existing directory */ > + if (stat(config.save_fpw_path, &st) == 0 && S_ISDIR(st.st_mode)) > + { Agreed; was mainly trying to prevent accidental expansion inside `pg_wal` when an earlier version of the patch implied `.` as the current dir with an optional path, but I've since made the path non-optional and agree that this is unnecessarily restrictive. > 9) Instead of following: > + if (XLogRecordHasFPW(xlogreader_state)) > + XLogRecordSaveFPWs(xlogreader_state, config.save_fpw_path); > I will just do this in XLogRecordSaveFPWs: > for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++) > { > if (!XLogRecHasBlockRef(record, block_id)) > continue; > > if (XLogRecHasBlockImage(record, block_id)) > { > > } > } Yeah, a little redundant. > 10) Along with pg_pwrite(), can we also fsync the files (of course > users can choose it optionally) so that the writes will be durable for > the OS crashes? Can add; you thinking a separate flag to disable thi
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Mon, Apr 25, 2022 at 2:00 AM Drouvot, Bertrand wrote: > > Hi, > > On 4/25/22 8:11 AM, Michael Paquier wrote: > > On Sat, Apr 23, 2022 at 01:43:36PM -0500, David Christensen wrote: > >> Hi Matthias, great point. Enclosed is a revised version of the patch > >> that adds the fork identifier to the end if it's a non-main fork. > > Like Alvaro, I have seen cases where this would have been really > > handy. So +1 from me, as well, to have more tooling like what you are > > proposing. > > +1 on the idea. > > FWIW, there is an extension doing this [1] but having the feature > included in pg_waldump would be great. Cool, glad to see there is some interest; definitely some overlap in forensics inside and outside the database both, as there are different use cases for both. David
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Mon, Apr 25, 2022 at 1:11 AM Michael Paquier wrote: > > On Sat, Apr 23, 2022 at 01:43:36PM -0500, David Christensen wrote: > > Hi Matthias, great point. Enclosed is a revised version of the patch > > that adds the fork identifier to the end if it's a non-main fork. > > Like Alvaro, I have seen cases where this would have been really > handy. So +1 from me, as well, to have more tooling like what you are > proposing. Fine for me to use one file for each block with a name > like what you are suggesting for each one of them. > > + /* we accept an empty existing directory */ > + if (stat(config.save_fpw_path, &st) == 0 && S_ISDIR(st.st_mode)) > + { > I don't think that there is any need to rely on a new logic if there > is already some code in place able to do the same work. See > verify_dir_is_empty_or_create() in pg_basebackup.c, as one example, > that relies on pg_check_dir(). I think that you'd better rely at > least on what pgcheckdir.c offers. Yeah, though I am tending towards what another user had suggested and just accepting an existing directory rather than requiring it to be empty, so thinking I might just skip this one. (Will review the file you've pointed out to see if there is a relevant function though.) > + {"raw-fpi", required_argument, NULL, 'W'}, > I think that we'd better rename this option. "fpi", that is not used > much in the user-facing docs, is additionally not adapted when we have > an other option called -w/--fullpage. I can think of > --save-fullpage. Makes sense. > + PageSetLSN(page, record->ReadRecPtr); > + /* if checksum field is non-zero then we have checksums > enabled, > +* so recalculate the checksum with new LSN (yes, this is > a hack) > +*/ > Yeah, that looks like a hack, but putting in place a page on a cluster > that has checksums enabled would be more annoying with > zero_damaged_pages enabled if we don't do that, so that's fine by me > as-is. Perhaps you should mention that FPWs don't have their > pd_checksum updated when written. Can make a mention; you thinking just a comment in the code is sufficient, or want there to be user-visible docs as well? > + /* we will now extract the fullpage image from the XLogRecord and save > +* it to a calculated filename */ > The format of this comment is incorrect. > > +The LSN of the record with this block, formatted > +as %08x-%08X instead of the > +conventional %X/%X due to filesystem naming > +limits > The last part of the sentence about %X/%X could just be removed. That > could be confusing, at worse. Makes sense. > + PageSetLSN(page, record->ReadRecPtr); > Why is pd_lsn set? This was to make the page as extracted equivalent to the effect of applying the WAL record block on replay (the LSN and checksum both); since recovery calls this function to mark the LSN where the page came from this is why I did this as well. (I do see perhaps a case for --raw output that doesn't munge the page whatsoever, just made comparisons easier.) > git diff --check complains a bit. Can look into this. > This stuff should include some tests. With --end, the tests can > be cheap. Yeah, makes sense, will include some in the next version. David
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Sat, Apr 23, 2022 at 4:21 AM David Christensen wrote: > > Hi -hackers, > > Enclosed is a patch to allow extraction/saving of FPI from the WAL > stream via pg_waldump. > > Description from the commit: > > Extracts full-page images from the WAL stream into a target directory, > which must be empty or not > exist. These images are subject to the same filtering rules as normal > display in pg_waldump, which > means that you can isolate the full page writes to a target relation, > among other things. > > Files are saved with the filename: with > formatting to make things > somewhat sortable; for instance: > > -01C0.1663.1.6117.0 > -01000150.1664.0.6115.0 > -010001E0.1664.0.6114.0 > -01000270.1663.1.6116.0 > -01000300.1663.1.6113.0 > -01000390.1663.1.6112.0 > -01000420.1663.1.8903.0 > -010004B0.1663.1.8902.0 > -01000540.1663.1.6111.0 > -010005D0.1663.1.6110.0 > > It's noteworthy that the raw images do not have the current LSN stored > with them in the WAL > stream (as would be true for on-heap versions of the blocks), nor > would the checksum be valid in > them (though WAL itself has checksums, so there is some protection > there). This patch chooses to > place the LSN and calculate the proper checksum (if non-zero in the > source image) in the outputted > block. (This could perhaps be a targetted flag if we decide we don't > always want this.) > > These images could be loaded/inspected via `pg_read_binary_file()` and > used in the `pageinspect` > suite of tools to perform detailed analysis on the pages in question, > based on historical > information, and may come in handy for forensics work. Thanks for working on this. I'm just thinking if we can use these FPIs to repair the corrupted pages? I would like to understand more detailed usages of the FPIs other than inspecting with pageinspect. Given that others have realistic use-cases (of course I would like to know more about those), +1 for the idea. However, I would suggest adding a function to extract raw FPI data to the pg_walinspect extension that got recently committed in PG 15, the out of which can directly be fed to pageinspect functions or Few comments: 1) I think it's good to mention the stored file name format. + printf(_(" -W, --raw-fpi=path save found full page images to given path\n")); 2) + for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++) + { + /* we will now extract the fullpage image from the XLogRecord and save + * it to a calculated filename */ + + if (XLogRecHasBlockImage(record, block_id)) I think we need XLogRecHasBlockRef to be true to check XLogRecHasBlockImage otherwise, we will see some build farms failing, recently I've seen this failure for pg_walinspect.. for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++) { if (!XLogRecHasBlockRef(record, block_id)) continue; if (XLogRecHasBlockImage(record, block_id)) *fpi_len += XLogRecGetBlock(record, block_id)->bimg_len; } 3) Please correct the commenting format: + /* we will now extract the fullpage image from the XLogRecord and save + * it to a calculated filename */ 4) Usually we start errors with lower case letters "could not ." + pg_fatal("Couldn't open file for output: %s", filename); + pg_fatal("Couldn't write out complete FPI to file: %s", filename); And the variable name too: + FILE *OPF; 5) Not sure how the FPIs of TOASTed tables get stored, but it would be good to check. 6) Good to specify the known usages of FPIs in the documentation. 7) Isn't it good to emit an error if RestoreBlockImage returns false? + if (RestoreBlockImage(record, block_id, page)) + { 8) I think I don't mind if a non-empty directory is specified - IMO better usability is this - if the directory is non-empty, just go add the FPI files if FPI file exists just replace it, if the directory isn't existing, create and write the FPI files. + /* we accept an empty existing directory */ + if (stat(config.save_fpw_path, &st) == 0 && S_ISDIR(st.st_mode)) + { 9) Instead of following: + if (XLogRecordHasFPW(xlogreader_state)) + XLogRecordSaveFPWs(xlogreader_state, config.save_fpw_path); I will just do this in XLogRecordSaveFPWs: for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++) { if (!XLogRecHasBlockRef(record, block_id)) continue; if (XLogRecHasBlockImage(record, block_id)) { } } 10) Along with pg_pwrite(), can we also fsync the files (of course users can choose it optionally) so that the writes will be durable for the OS crashes? Regards, Bharath Rupireddy.
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Sat, Apr 23, 2022 at 01:43:36PM -0500, David Christensen wrote: > Hi Matthias, great point. Enclosed is a revised version of the patch > that adds the fork identifier to the end if it's a non-main fork. Like Alvaro, I have seen cases where this would have been really handy. So +1 from me, as well, to have more tooling like what you are proposing. Fine for me to use one file for each block with a name like what you are suggesting for each one of them. + /* we accept an empty existing directory */ + if (stat(config.save_fpw_path, &st) == 0 && S_ISDIR(st.st_mode)) + { I don't think that there is any need to rely on a new logic if there is already some code in place able to do the same work. See verify_dir_is_empty_or_create() in pg_basebackup.c, as one example, that relies on pg_check_dir(). I think that you'd better rely at least on what pgcheckdir.c offers. + {"raw-fpi", required_argument, NULL, 'W'}, I think that we'd better rename this option. "fpi", that is not used much in the user-facing docs, is additionally not adapted when we have an other option called -w/--fullpage. I can think of --save-fullpage. + PageSetLSN(page, record->ReadRecPtr); + /* if checksum field is non-zero then we have checksums enabled, +* so recalculate the checksum with new LSN (yes, this is a hack) +*/ Yeah, that looks like a hack, but putting in place a page on a cluster that has checksums enabled would be more annoying with zero_damaged_pages enabled if we don't do that, so that's fine by me as-is. Perhaps you should mention that FPWs don't have their pd_checksum updated when written. + /* we will now extract the fullpage image from the XLogRecord and save +* it to a calculated filename */ The format of this comment is incorrect. +The LSN of the record with this block, formatted +as %08x-%08X instead of the +conventional %X/%X due to filesystem naming +limits The last part of the sentence about %X/%X could just be removed. That could be confusing, at worse. + PageSetLSN(page, record->ReadRecPtr); Why is pd_lsn set? git diff --check complains a bit. This stuff should include some tests. With --end, the tests can be cheap. -- Michael signature.asc Description: PGP signature
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Sat, Apr 23, 2022 at 9:49 AM Matthias van de Meent wrote: > Regardless of my (lack of) opinion on the inclusion of this patch in > PG (I did not significantly review this patch); I noticed that you do > not yet identify the 'fork' of the FPI in the file name. > > A lack of fork identifier in the exported file names would make > debugging much more difficult due to the relatively difficult to > identify data contained in !main forks, so I think this oversight > should be fixed, be it through `_forkname` postfix like normal fork > segments, or be it through `.` numerical in- or postfix in > the filename. > > -Matthias Hi Matthias, great point. Enclosed is a revised version of the patch that adds the fork identifier to the end if it's a non-main fork. Best, David v2-pg_waldump-save-fpi.patch Description: Binary data
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Sat, 23 Apr 2022 at 00:51, David Christensen wrote: > > Hi -hackers, > > Enclosed is a patch to allow extraction/saving of FPI from the WAL > stream via pg_waldump. > > Description from the commit: > > Extracts full-page images from the WAL stream into a target directory, > which must be empty or not > exist. These images are subject to the same filtering rules as normal > display in pg_waldump, which > means that you can isolate the full page writes to a target relation, > among other things. > > Files are saved with the filename: with > formatting to make things Regardless of my (lack of) opinion on the inclusion of this patch in PG (I did not significantly review this patch); I noticed that you do not yet identify the 'fork' of the FPI in the file name. A lack of fork identifier in the exported file names would make debugging much more difficult due to the relatively difficult to identify data contained in !main forks, so I think this oversight should be fixed, be it through `_forkname` postfix like normal fork segments, or be it through `.` numerical in- or postfix in the filename. -Matthias