subject:"pg_waldump"

Re: pg_waldump

2023-12-19 Thread Fabrice Chapuis

Ok thanks for all these precisions
Regards
Fabrice

On Tue, Dec 19, 2023 at 2:00 PM Matthias van de Meent <
boekewurm+postg...@gmail.com> wrote:

> On Tue, 19 Dec 2023, 12:27 Fabrice Chapuis, 
> wrote:
> >
> > Hi,
> > Is it possible to visualize the DDL with the pg_waldump tool. I created
> a postgres user but I cannot find the creation command in the wals
>
> Not really, no. PostgreSQL does not log DDL or DML as such in WAL.
> Essentially all catalog updates are logged only as changes on a
> certain page in some file: a new user getting inserted would be
> approximately "Insert tuple [user's pg_role row data] on page X in
> file [the file corresponding to the pg_role table]".
>
> You could likely derive most DDL commands from Heap/Insert,
> Heap/Delete, and Heap/Update records (after cross-referencing the
> database's relfilemap), as most DDL is "just" a lot of in-memory
> operations plus some record insertions/updates/deletes in catalog
> tables. You'd also need to keep track of any relfilemap changes while
> processing the WAL, as VACUUM FULL on the catalog tables would change
> the file numbering of catalog tables...
>
> Kind regards,
>
> Matthias van de Meent
> Neon (https://neon.tech)
>

Re: pg_waldump

2023-12-19 Thread Matthias van de Meent

On Tue, 19 Dec 2023, 12:27 Fabrice Chapuis,  wrote:
>
> Hi,
> Is it possible to visualize the DDL with the pg_waldump tool. I created a 
> postgres user but I cannot find the creation command in the wals

Not really, no. PostgreSQL does not log DDL or DML as such in WAL.
Essentially all catalog updates are logged only as changes on a
certain page in some file: a new user getting inserted would be
approximately "Insert tuple [user's pg_role row data] on page X in
file [the file corresponding to the pg_role table]".

You could likely derive most DDL commands from Heap/Insert,
Heap/Delete, and Heap/Update records (after cross-referencing the
database's relfilemap), as most DDL is "just" a lot of in-memory
operations plus some record insertions/updates/deletes in catalog
tables. You'd also need to keep track of any relfilemap changes while
processing the WAL, as VACUUM FULL on the catalog tables would change
the file numbering of catalog tables...

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

pg_waldump

2023-12-19 Thread Fabrice Chapuis

Hi,
Is it possible to visualize the DDL with the pg_waldump tool. I created a
postgres user but I cannot find the creation command in the wals

Thanks for help

Fabrice

Re: pg_waldump vs. all-zeros WAL files; server creation of such files

2023-08-13 Thread Michael Paquier

On Sat, Aug 12, 2023 at 08:15:31PM -0700, Noah Misch wrote:
> The attached 010_zero.pl, when run as part of the pg_waldump test suite, fails
> at today's master (c36b636) and v15 (1bc19df).  It passes at v14 (5a32af3).
> Command "pg_waldump --start 0/0100 --end 0/01000100" fails as follows:
> 
>   pg_waldump: error: WAL segment size must be a power of two between
>   1 MB and 1 GB, but the WAL file "00010002" header
>   specifies 0 bytes

So this depends on the ordering of the entries retrieved by readdir()
as much as the segments produced by the backend.

> Where it fails, the server has created an all-zeros WAL file under that name.
> Where it succeeds, that file doesn't exist at all.  Two decisions to make:
> 
> - Should a clean server shutdown ever leave an all-zeros WAL file?  I think
>   yes, it's okay to let that happen.

It doesn't hurt to leave that around.  On the contrary, it makes any
follow-up startup cheaper the bigger the segment size.

> - Should "pg_waldump --start $X --end $Y" open files not needed for the
>   requested range?  I think no.

So this is a case where identify_target_directory() is called with a
fname of NULL.  Agreed that search_directory could be smarter with the
files it should scan, especially if we have start and/or end LSNs at
hand to filter out what we'd like to be in the data folder.
--
Michael


signature.asc
Description: PGP signature

pg_waldump vs. all-zeros WAL files; server creation of such files

2023-08-12 Thread Noah Misch

The attached 010_zero.pl, when run as part of the pg_waldump test suite, fails
at today's master (c36b636) and v15 (1bc19df).  It passes at v14 (5a32af3).
Command "pg_waldump --start 0/0100 --end 0/01000100" fails as follows:

  pg_waldump: error: WAL segment size must be a power of two between 1 MB and 1 
GB, but the WAL file "00010002" header specifies 0 bytes

Where it fails, the server has created an all-zeros WAL file under that name.
Where it succeeds, that file doesn't exist at all.  Two decisions to make:

- Should a clean server shutdown ever leave an all-zeros WAL file?  I think
  yes, it's okay to let that happen.
- Should "pg_waldump --start $X --end $Y" open files not needed for the
  requested range?  I think no.

Bisect of master got:
30a53b7 Wed Mar 8 16:56:37 2023 +0100 Allow tailoring of ICU locales with 
custom rules
Doesn't fail at $(git merge-base REL_15_STABLE master).  Bisect of v15 got:
811203d Sat Aug 6 11:50:23 2022 -0400 Fix data-corruption hazard in WAL-logged 
CREATE DATABASE.

I suspect those are innocent.  They changed the exact WAL content, which I
expect somehow caused creation of segment 2.

Oddly, I find only one other report of this:
https://www.postgresql.org/message-id/CAJ6DU3HiJ5FHbqPua19jAD%3DwLgiXBTjuHfbmv1jCOaNOpB3cCQ%40mail.gmail.com

Thanks,
nm


010_zero.pl
Description: Perl program

Re: pg_waldump: add test for coverage

2023-07-05 Thread Peter Eisentraut


On 29.06.23 21:16, Tristen Raab wrote:

I've reviewed your latest v3 patches on Ubuntu 23.04. Both patches apply 
correctly and all the tests run and pass as they should. Execution time was 
normal for me, I didn't notice any significant latency when compared to other 
tests. The only other feedback I can provide would be to add test coverage to 
some of the other options that aren't currently covered (ie. --bkp-details, 
--end, --follow, --path, etc.) for completeness. Other than that, this looks 
like a great patch.


Committed.

I added a test for the --quiet option.  --end and --path are covered.

The only options not covered now are

  -b, --bkp-details  output detailed information about backup blocks
  -f, --follow   keep retrying after reaching end of WAL
  -t, --timeline=TLI timeline from which to read WAL records
  -x, --xid=XID  only show records with transaction ID XID

--follow is a bit tricky to test because you need to leave pg_waldump 
running in the background for a while, or something like that. 
--timeline and --xid can be tested but would need some work on the 
underlying test data (such as creating more than one timeline).  I don't 
know much about --bkp-details, so I don't have a good idea how to test 
it.  So I'll leave those as projects for the future.

Re: pg_waldump: add test for coverage

2023-06-29 Thread Tristen Raab

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

Hello,

I've reviewed your latest v3 patches on Ubuntu 23.04. Both patches apply 
correctly and all the tests run and pass as they should. Execution time was 
normal for me, I didn't notice any significant latency when compared to other 
tests. The only other feedback I can provide would be to add test coverage to 
some of the other options that aren't currently covered (ie. --bkp-details, 
--end, --follow, --path, etc.) for completeness. Other than that, this looks 
like a great patch.

Kind regards,

Tristen

Re: Add TLI number to name of files generated by pg_waldump --save-fullpage

2023-06-28 Thread Michael Paquier

On Wed, Jun 28, 2023 at 09:20:27AM +0900, Kyotaro Horiguchi wrote:
> At Tue, 27 Jun 2023 18:58:39 -0500, David Christensen 
>  wrote in 
>>> Adjusted as per the v2 attached.
>> 
>> +1
> 
> +1

Okay, cool.  Both of you seem happy with it, so I have applied it.
Thanks for the quick checks.
--
Michael


signature.asc
Description: PGP signature

Re: pg_waldump: add test for coverage

2023-06-27 Thread Peter Eisentraut


On 14.06.23 09:16, Peter Eisentraut wrote:

On 06.09.22 07:57, Peter Eisentraut wrote:

I wrote a test for coverage.
Unfortunately, it seems to take quite a while to run the test.
I want to improve these execution times, but I don't know exactly 
what to do.

Therefore, I want to hear feedback from many people.


I think having some more test coverage for pg_waldump would be good, 
so I encourage you to continue working on this.


I made an updated patch that incorporates many of your ideas and code, 
just made it a bit more compact, and added more tests for various 
command-line options.  This moves the test coverage of pg_waldump from 
"bloodbath" to "mixed fruit salad", which I think is pretty good 
progress.  And now there is room for additional patches if someone wants 
to figure out, e.g., how to get more complete coverage in gindesc.c or 
whatever.


Here is an updated patch set.  I added a test case for the "first record 
is after" message.  Also, I think this message should really go to 
stderr, since it's more of a notice or warning, so I changed it to use 
pg_log_info.
From e5d13b7891a9c0fcc3d28f9bc2756c5372b2fa40 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 14 Jun 2023 08:46:47 +0200
Subject: [PATCH v3 1/2] Add more pg_waldump tests

This adds tests for most command-line options and tests for most
rmgrdesc routines.

Discussion: 
https://www.postgresql.org/message-id/flat/CAAcByaKM7zyJSDkPWv6_%2BapupY4fXXM3q3SRXas9MMNVPUGcsQ%40mail.gmail.com
---
 src/bin/pg_waldump/t/001_basic.pl | 191 ++
 1 file changed, 191 insertions(+)

diff --git a/src/bin/pg_waldump/t/001_basic.pl 
b/src/bin/pg_waldump/t/001_basic.pl
index 492a8cadd8..d4056e1b95 100644
--- a/src/bin/pg_waldump/t/001_basic.pl
+++ b/src/bin/pg_waldump/t/001_basic.pl
@@ -3,6 +3,7 @@
 
 use strict;
 use warnings;
+use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use Test::More;
 
@@ -10,4 +11,194 @@
 program_version_ok('pg_waldump');
 program_options_handling_ok('pg_waldump');
 
+# wrong number of arguments
+command_fails_like([ 'pg_waldump', ], qr/error: no arguments/, 'no arguments');
+command_fails_like([ 'pg_waldump', 'foo', 'bar', 'baz' ], qr/error: too many 
command-line arguments/, 'too many arguments');
+
+# invalid option arguments
+command_fails_like([ 'pg_waldump', '--block', 'bad' ], qr/error: invalid block 
number/, 'invalid block number');
+command_fails_like([ 'pg_waldump', '--fork', 'bad' ], qr/error: invalid fork 
name/, 'invalid fork name');
+command_fails_like([ 'pg_waldump', '--limit', 'bad' ], qr/error: invalid 
value/, 'invalid limit');
+command_fails_like([ 'pg_waldump', '--relation', 'bad' ], qr/error: invalid 
relation/, 'invalid relation specification');
+command_fails_like([ 'pg_waldump', '--rmgr', 'bad' ], qr/error: resource 
manager .* does not exist/, 'invalid rmgr name');
+command_fails_like([ 'pg_waldump', '--start', 'bad' ], qr/error: invalid WAL 
location/, 'invalid start LSN');
+command_fails_like([ 'pg_waldump', '--end', 'bad' ], qr/error: invalid WAL 
location/, 'invalid end LSN');
+
+# rmgr list: If you add one to the list, consider also adding a test
+# case exercising the new rmgr below.
+command_like([ 'pg_waldump', '--rmgr=list'], qr/^XLOG
+Transaction
+Storage
+CLOG
+Database
+Tablespace
+MultiXact
+RelMap
+Standby
+Heap2
+Heap
+Btree
+Hash
+Gin
+Gist
+Sequence
+SPGist
+BRIN
+CommitTs
+ReplicationOrigin
+Generic
+LogicalMessage$/,
+   'rmgr list');
+
+
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init;
+$node->append_conf('postgresql.conf', q{
+autovacuum = off
+checkpoint_timeout = 1h
+
+# for standbydesc
+archive_mode=on
+archive_command=''
+
+# for XLOG_HEAP_TRUNCATE
+wal_level=logical
+});
+$node->start;
+
+my ($start_lsn, $start_walfile) = split /\|/, $node->safe_psql('postgres', 
q{SELECT pg_current_wal_insert_lsn(), 
pg_walfile_name(pg_current_wal_insert_lsn())});
+
+$node->safe_psql('postgres', q{
+-- heap, btree, hash, sequence
+CREATE TABLE t1 (a int GENERATED ALWAYS AS IDENTITY, b text);
+CREATE INDEX i1a ON t1 USING btree (a);
+CREATE INDEX i1b ON t1 USING hash (b);
+INSERT INTO t1 VALUES (default, 'one'), (default, 'two');
+DELETE FROM t1 WHERE b = 'one';
+TRUNCATE t1;
+
+-- abort
+START TRANSACTION;
+INSERT INTO t1 VALUES (default, 'three');
+ROLLBACK;
+
+-- unlogged/init fork
+CREATE UNLOGGED TABLE t2 (x int);
+CREATE INDEX i2 ON t2 USING btree (x);
+INSERT INTO t2 SELEC

Re: Add TLI number to name of files generated by pg_waldump --save-fullpage

2023-06-27 Thread Kyotaro Horiguchi

At Tue, 27 Jun 2023 18:58:39 -0500, David Christensen 
 wrote in 
> > Adjusted as per the v2 attached.
> 
> +1

+1

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Add TLI number to name of files generated by pg_waldump --save-fullpage

2023-06-27 Thread David Christensen

> Adjusted as per the v2 attached.

+1

Re: Add TLI number to name of files generated by pg_waldump --save-fullpage

2023-06-27 Thread Michael Paquier

On Tue, Jun 27, 2023 at 04:39:52PM +0900, Kyotaro Horiguchi wrote:
> I meant that the name is structured as
> TLIh-TLIl._, which
> appears to be inconsistent with the comment. (And I'm not sure what
> "TLOID" is..)

Well, to be clear, it should not be TLIh-TLIl but LSNh-LSNl :)

I'm OK with these terms for the comments.  This is very internal
anyway so anybody using this feature should know what that means.

And yes, the order of the items is wrong, and I agree that TLOID is a
bit confusing once the TLI is added in the set.  I have just used
TBLSPCOID as term in the comment, and adjusted the XXX to be about the
LSN numbers.

Adjusted as per the v2 attached.
--
Michael
From f0936a95651f767e0f56ff507db5d5a02c7629b2 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 28 Jun 2023 08:46:26 +0900
Subject: [PATCH v2] Add timeline to file names generated with pg_waldump
 --save-fullpage

Reported-by: Fujii Masao
---
 src/bin/pg_waldump/pg_waldump.c   | 3 ++-
 src/bin/pg_waldump/t/002_save_fullpage.pl | 9 +
 doc/src/sgml/ref/pg_waldump.sgml  | 9 -
 3 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index c6d3ae6344..96845e1a1a 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -518,7 +518,8 @@ XLogRecordSaveFPWs(XLogReaderState *record, const char *savepath)
 		else
 			pg_fatal("invalid fork number: %u", fork);
 
-		snprintf(filename, MAXPGPATH, "%s/%08X-%08X.%u.%u.%u.%u%s", savepath,
+		snprintf(filename, MAXPGPATH, "%s/%08X-%08X-%08X.%u.%u.%u.%u%s", savepath,
+ record->seg.ws_tli,
  LSN_FORMAT_ARGS(record->ReadRecPtr),
  rnode.spcOid, rnode.dbOid, rnode.relNumber, blk, forkname);
 
diff --git a/src/bin/pg_waldump/t/002_save_fullpage.pl b/src/bin/pg_waldump/t/002_save_fullpage.pl
index 831ffdefef..f0725805f2 100644
--- a/src/bin/pg_waldump/t/002_save_fullpage.pl
+++ b/src/bin/pg_waldump/t/002_save_fullpage.pl
@@ -79,15 +79,16 @@ $node->command_ok(
 	'pg_waldump with --save-fullpage runs');
 
 # This regexp will match filenames formatted as:
-# -.DBOID.TLOID.NODEOID.dd_fork with the components being:
-# - WAL LSN in hex format,
-# - Tablespace OID (0 for global)
+# TLI-LSNh-LSNl.TBLSPCOID.DBOID.NODEOID.dd_fork with the components being:
+# - Timeline ID in hex format.
+# - WAL LSN in hex format, as two 8-character numbers.
+# - Tablespace OID (0 for global).
 # - Database OID.
 # - Relfilenode.
 # - Block number.
 # - Fork this block came from (vm, init, fsm, or main).
 my $file_re =
-  qr/^([0-9A-F]{8})-([0-9A-F]{8})[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+(?:_vm|_init|_fsm|_main)?$/;
+  qr/^[0-9A-F]{8}-([0-9A-F]{8})-([0-9A-F]{8})[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+(?:_vm|_init|_fsm|_main)?$/;
 
 my $file_count = 0;
 
diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index e5f9637847..4592d6016a 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -283,7 +283,7 @@ PostgreSQL documentation


 The full page images are saved with the following file name format:
-LSN.RELTABLESPACE.DATOID.RELNODE.BLKNOFORK
+TIMELINE-LSN.RELTABLESPACE.DATOID.RELNODE.BLKNOFORK
 
 The file names are composed of the following parts:
 
@@ -296,6 +296,13 @@ PostgreSQL documentation
   
 
   
+   
+TIMELINE
+The timeline of the WAL segment file where the record
+ is located formatted as one 8-character hexadecimal number
+ %08X
+   
+

 LSN
 The LSN of the record with this image,
-- 
2.40.1



signature.asc
Description: PGP signature

Re: Add TLI number to name of files generated by pg_waldump --save-fullpage

2023-06-27 Thread Michael Paquier

On Tue, Jun 27, 2023 at 11:53:10AM -0500, David Christensen wrote:
> Patch looks good, but agreed that that comment should also be fixed.

Okay, thanks for checking!
--
Michael


signature.asc
Description: PGP signature

Re: Add TLI number to name of files generated by pg_waldump --save-fullpage

2023-06-27 Thread David Christensen

On Tue, Jun 27, 2023 at 1:12 AM Michael Paquier  wrote:
>
> Hi all,
> (Fujii-san and David in CC.)
>
> Fujii-san has reported on Twitter that we had better add the TLI
> number to what pg_waldump --save-fullpage generates for the file names
> of the blocks, as it could be possible that we overwrite some blocks.
> This information can be added thanks to ws_tli, that tracks the TLI of
> the opened segment.
>
> Attached is a patch to fix this issue, adding an open item assigned to
> me.  The file format is documented in the TAP test and the docs, the
> two only places that would need a refresh.
>
> Thoughts or comments?

Patch looks good, but agreed that that comment should also be fixed.

Thanks!

David

Re: Add TLI number to name of files generated by pg_waldump --save-fullpage

2023-06-27 Thread Kyotaro Horiguchi

Of course, it's wrong.

At Tue, 27 Jun 2023 16:39:52 +0900 (JST), Kyotaro Horiguchi  wrote in 
> I meant that the name is structured as
- TLIh-TLIl._, which
+ LSNh-LSNl._, which

> appears to be inconsistent with the comment. (And I'm not sure what
> "TLOID" is..)
> 
> regards.
> 
> -- 
> Kyotaro Horiguchi
> NTT Open Source Software Center

Re: Add TLI number to name of files generated by pg_waldump --save-fullpage

2023-06-27 Thread Kyotaro Horiguchi

At Tue, 27 Jun 2023 15:58:38 +0900, Michael Paquier  wrote 
in 
> On Tue, Jun 27, 2023 at 03:44:04PM +0900, Kyotaro Horiguchi wrote:
> > The file name was "LSNh-LSNl.spcOid.dbOid.relNumber.blk_forkname", but
> > the comment in the TAP script read as:
> > 
> > -# -.DBOID.TLOID.NODEOID.dd_fork with the components being:
> > 
> > which looks wrong. I'm not sure it is a business of this patch, though..
> 
> This part is getting changed here anyway, so improving it is fine by
> me with the terms you are suggesting for these two 4-byte values in
> this comment.

I meant that the name is structured as
TLIh-TLIl._, which
appears to be inconsistent with the comment. (And I'm not sure what
"TLOID" is..)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Add TLI number to name of files generated by pg_waldump --save-fullpage

2023-06-26 Thread Michael Paquier

On Tue, Jun 27, 2023 at 03:44:04PM +0900, Kyotaro Horiguchi wrote:
> +# - Timeline number in hex format.
> 
> Arn't we reffering to it as "Timeline ID"? (I remember there was a
> discussion about redefining the "timeline ID" to use non-orderable
> IDs. That is, making it non-numbers.)

Using ID is fine by me.

> The file name was "LSNh-LSNl.spcOid.dbOid.relNumber.blk_forkname", but
> the comment in the TAP script read as:
> 
> -# -.DBOID.TLOID.NODEOID.dd_fork with the components being:
> 
> which looks wrong. I'm not sure it is a business of this patch, though..

This part is getting changed here anyway, so improving it is fine by
me with the terms you are suggesting for these two 4-byte values in
this comment.
--
Michael


signature.asc
Description: PGP signature

Re: Add TLI number to name of files generated by pg_waldump --save-fullpage

2023-06-26 Thread Kyotaro Horiguchi

At Tue, 27 Jun 2023 15:12:43 +0900, Michael Paquier  wrote 
in 
> Hi all,
> (Fujii-san and David in CC.)
> 
> Fujii-san has reported on Twitter that we had better add the TLI
> number to what pg_waldump --save-fullpage generates for the file names
> of the blocks, as it could be possible that we overwrite some blocks.
> This information can be added thanks to ws_tli, that tracks the TLI of
> the opened segment.
> 
> Attached is a patch to fix this issue, adding an open item assigned to
> me.  The file format is documented in the TAP test and the docs, the
> two only places that would need a refresh.
> 
> Thoughts or comments?

It's sensible to add TLI to the file name. So +1 from me.

+# - Timeline number in hex format.

Arn't we reffering to it as "Timeline ID"? (I remember there was a
discussion about redefining the "timeline ID" to use non-orderable
IDs. That is, making it non-numbers.)

Otherwise it looks fine to me.


By the way, somewhat irrelevant to this patch, regading the the file
name for the output.


The file name was "LSNh-LSNl.spcOid.dbOid.relNumber.blk_forkname", but
the comment in the TAP script read as:

-# -.DBOID.TLOID.NODEOID.dd_fork with the components being:

which looks wrong. I'm not sure it is a business of this patch, though..

# Documentation looks coorect.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Add TLI number to name of files generated by pg_waldump --save-fullpage

2023-06-26 Thread Michael Paquier

Hi all,
(Fujii-san and David in CC.)

Fujii-san has reported on Twitter that we had better add the TLI
number to what pg_waldump --save-fullpage generates for the file names
of the blocks, as it could be possible that we overwrite some blocks.
This information can be added thanks to ws_tli, that tracks the TLI of
the opened segment.

Attached is a patch to fix this issue, adding an open item assigned to
me.  The file format is documented in the TAP test and the docs, the
two only places that would need a refresh.

Thoughts or comments?
--
Michael
From 35dc549115e8bf054a5f756c652b07f3f49d0bce Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 27 Jun 2023 15:04:49 +0900
Subject: [PATCH v1] Add timeline to file names generated with pg_waldump
 --save-fullpage

Reported-by: Fujii Masao
---
 src/bin/pg_waldump/pg_waldump.c   | 3 ++-
 src/bin/pg_waldump/t/002_save_fullpage.pl | 9 +
 doc/src/sgml/ref/pg_waldump.sgml  | 9 -
 3 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index c6d3ae6344..96845e1a1a 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -518,7 +518,8 @@ XLogRecordSaveFPWs(XLogReaderState *record, const char *savepath)
 		else
 			pg_fatal("invalid fork number: %u", fork);
 
-		snprintf(filename, MAXPGPATH, "%s/%08X-%08X.%u.%u.%u.%u%s", savepath,
+		snprintf(filename, MAXPGPATH, "%s/%08X-%08X-%08X.%u.%u.%u.%u%s", savepath,
+ record->seg.ws_tli,
  LSN_FORMAT_ARGS(record->ReadRecPtr),
  rnode.spcOid, rnode.dbOid, rnode.relNumber, blk, forkname);
 
diff --git a/src/bin/pg_waldump/t/002_save_fullpage.pl b/src/bin/pg_waldump/t/002_save_fullpage.pl
index 831ffdefef..dc53a6f5b4 100644
--- a/src/bin/pg_waldump/t/002_save_fullpage.pl
+++ b/src/bin/pg_waldump/t/002_save_fullpage.pl
@@ -79,15 +79,16 @@ $node->command_ok(
 	'pg_waldump with --save-fullpage runs');
 
 # This regexp will match filenames formatted as:
-# -.DBOID.TLOID.NODEOID.dd_fork with the components being:
-# - WAL LSN in hex format,
-# - Tablespace OID (0 for global)
+# TLI--.DBOID.TLOID.NODEOID.dd_fork with the components being:
+# - Timeline number in hex format.
+# - WAL LSN in hex format.
+# - Tablespace OID (0 for global).
 # - Database OID.
 # - Relfilenode.
 # - Block number.
 # - Fork this block came from (vm, init, fsm, or main).
 my $file_re =
-  qr/^([0-9A-F]{8})-([0-9A-F]{8})[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+(?:_vm|_init|_fsm|_main)?$/;
+  qr/^[0-9A-F]{8}-([0-9A-F]{8})-([0-9A-F]{8})[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+(?:_vm|_init|_fsm|_main)?$/;
 
 my $file_count = 0;
 
diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index e5f9637847..4592d6016a 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -283,7 +283,7 @@ PostgreSQL documentation


 The full page images are saved with the following file name format:
-LSN.RELTABLESPACE.DATOID.RELNODE.BLKNOFORK
+TIMELINE-LSN.RELTABLESPACE.DATOID.RELNODE.BLKNOFORK
 
 The file names are composed of the following parts:
 
@@ -296,6 +296,13 @@ PostgreSQL documentation
   
 
   
+   
+TIMELINE
+The timeline of the WAL segment file where the record
+ is located formatted as one 8-character hexadecimal number
+ %08X
+   
+

 LSN
 The LSN of the record with this image,
-- 
2.40.1



signature.asc
Description: PGP signature

Re: pg_waldump: add test for coverage

2023-06-14 Thread Peter Eisentraut


On 06.09.22 07:57, Peter Eisentraut wrote:

I wrote a test for coverage.
Unfortunately, it seems to take quite a while to run the test.
I want to improve these execution times, but I don't know exactly what 
to do.

Therefore, I want to hear feedback from many people.


I think having some more test coverage for pg_waldump would be good, so 
I encourage you to continue working on this.


I made an updated patch that incorporates many of your ideas and code, 
just made it a bit more compact, and added more tests for various 
command-line options.  This moves the test coverage of pg_waldump from 
"bloodbath" to "mixed fruit salad", which I think is pretty good 
progress.  And now there is room for additional patches if someone wants 
to figure out, e.g., how to get more complete coverage in gindesc.c or 
whatever.
From 67d63a87cf9ea8de69801127607101b3a515fb42 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 14 Jun 2023 08:46:47 +0200
Subject: [PATCH v2] Add more pg_waldump tests

This adds tests for most command-line options and tests for most
rmgrdesc routines.

Discussion: 
https://www.postgresql.org/message-id/flat/CAAcByaKM7zyJSDkPWv6_%2BapupY4fXXM3q3SRXas9MMNVPUGcsQ%40mail.gmail.com
---
 src/bin/pg_waldump/t/001_basic.pl | 191 ++
 1 file changed, 191 insertions(+)

diff --git a/src/bin/pg_waldump/t/001_basic.pl 
b/src/bin/pg_waldump/t/001_basic.pl
index 492a8cadd8..d4056e1b95 100644
--- a/src/bin/pg_waldump/t/001_basic.pl
+++ b/src/bin/pg_waldump/t/001_basic.pl
@@ -3,6 +3,7 @@
 
 use strict;
 use warnings;
+use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use Test::More;
 
@@ -10,4 +11,194 @@
 program_version_ok('pg_waldump');
 program_options_handling_ok('pg_waldump');
 
+# wrong number of arguments
+command_fails_like([ 'pg_waldump', ], qr/error: no arguments/, 'no arguments');
+command_fails_like([ 'pg_waldump', 'foo', 'bar', 'baz' ], qr/error: too many 
command-line arguments/, 'too many arguments');
+
+# invalid option arguments
+command_fails_like([ 'pg_waldump', '--block', 'bad' ], qr/error: invalid block 
number/, 'invalid block number');
+command_fails_like([ 'pg_waldump', '--fork', 'bad' ], qr/error: invalid fork 
name/, 'invalid fork name');
+command_fails_like([ 'pg_waldump', '--limit', 'bad' ], qr/error: invalid 
value/, 'invalid limit');
+command_fails_like([ 'pg_waldump', '--relation', 'bad' ], qr/error: invalid 
relation/, 'invalid relation specification');
+command_fails_like([ 'pg_waldump', '--rmgr', 'bad' ], qr/error: resource 
manager .* does not exist/, 'invalid rmgr name');
+command_fails_like([ 'pg_waldump', '--start', 'bad' ], qr/error: invalid WAL 
location/, 'invalid start LSN');
+command_fails_like([ 'pg_waldump', '--end', 'bad' ], qr/error: invalid WAL 
location/, 'invalid end LSN');
+
+# rmgr list: If you add one to the list, consider also adding a test
+# case exercising the new rmgr below.
+command_like([ 'pg_waldump', '--rmgr=list'], qr/^XLOG
+Transaction
+Storage
+CLOG
+Database
+Tablespace
+MultiXact
+RelMap
+Standby
+Heap2
+Heap
+Btree
+Hash
+Gin
+Gist
+Sequence
+SPGist
+BRIN
+CommitTs
+ReplicationOrigin
+Generic
+LogicalMessage$/,
+   'rmgr list');
+
+
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init;
+$node->append_conf('postgresql.conf', q{
+autovacuum = off
+checkpoint_timeout = 1h
+
+# for standbydesc
+archive_mode=on
+archive_command=''
+
+# for XLOG_HEAP_TRUNCATE
+wal_level=logical
+});
+$node->start;
+
+my ($start_lsn, $start_walfile) = split /\|/, $node->safe_psql('postgres', 
q{SELECT pg_current_wal_insert_lsn(), 
pg_walfile_name(pg_current_wal_insert_lsn())});
+
+$node->safe_psql('postgres', q{
+-- heap, btree, hash, sequence
+CREATE TABLE t1 (a int GENERATED ALWAYS AS IDENTITY, b text);
+CREATE INDEX i1a ON t1 USING btree (a);
+CREATE INDEX i1b ON t1 USING hash (b);
+INSERT INTO t1 VALUES (default, 'one'), (default, 'two');
+DELETE FROM t1 WHERE b = 'one';
+TRUNCATE t1;
+
+-- abort
+START TRANSACTION;
+INSERT INTO t1 VALUES (default, 'three');
+ROLLBACK;
+
+-- unlogged/init fork
+CREATE UNLOGGED TABLE t2 (x int);
+CREATE INDEX i2 ON t2 USING btree (x);
+INSERT INTO t2 SELECT generate_series(1, 10);
+
+-- gin
+CREATE TABLE gin_idx_tbl (id bigserial PRIMARY KEY, data jsonb);
+CREATE INDEX gin_idx ON gin_idx_tbl USING gin (data);
+INSERT INTO gin_idx_tbl
+WITH random_json AS (
+SELECT json_object_agg(key, trunc(random() * 10)) as json_data
+

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-27 Thread Bharath Rupireddy

On Mon, Dec 26, 2022 at 4:18 PM Bharath Rupireddy
 wrote:
>
> On Mon, Dec 26, 2022 at 12:59 PM Michael Paquier  wrote:
>
> +1. I think this feature will also be useful in pg_walinspect.

Just for the record - here's the pg_walinspect function to extract
FPIs from WAL records -
https://www.postgresql.org/message-id/CALj2ACVCcvzd7WiWvD%3D6_7NBvVB_r6G0EGSxL4F8vosAi6Se4g%40mail.gmail.com.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-26 Thread Michael Paquier

On Mon, Dec 26, 2022 at 04:18:18PM +0530, Bharath Rupireddy wrote:
> +1. I think this feature will also be useful in pg_walinspect.
> However, I'm a bit concerned that it can flood the running database
> disk - if someone generates a lot of FPI files.

pg_read_file() and pg_waldump can be fed absolute paths outside the
data folder.  So, well, just don't do that then :)
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-26 Thread Michael Paquier

On Mon, Dec 26, 2022 at 02:39:03PM -0600, Justin Pryzby wrote:
> fclose() should be tested, too:

Sure.  Done that too, and applied the change after a last lookup.
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-26 Thread Justin Pryzby

On Mon, Dec 26, 2022 at 04:28:52PM +0900, Michael Paquier wrote:
> Comments?

> + file = fopen(filename, PG_BINARY_W);
> + if (!file)
> + pg_fatal("could not open file \"%s\": %m", filename);
> +
> + if (fwrite(page, BLCKSZ, 1, file) != 1)
> + pg_fatal("could not write file \"%s\": %m", filename);
> +
> + fclose(file);

fclose() should be tested, too:

> + if (fwrite(page, BLCKSZ, 1, file) != 1 || fclose(file) != 0)
> + pg_fatal("could not write file \"%s\": %m", filename);

-- 
Justin

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-26 Thread David Christensen

> On Dec 26, 2022, at 1:29 AM, Michael Paquier  wrote:
> 
> On Sat, Dec 24, 2022 at 06:23:29PM +0530, Bharath Rupireddy wrote:
>> Thanks for the patch. I've made the above change as well as renamed
>> the test file name to be save_fpi.pl, everything else remains the same
>> as v11. Here's the v12 patch which LGTM. I'll mark it as RfC -
>> https://commitfest.postgresql.org/41/3628/.
> 
> I have done a review of that, and here are my notes:
> - The variable names were a bit inconsistent, so I have switched most
> of the new code to use "fullpage".
> - The code was not able to handle the case of a target directory
> existing but empty, so I have added a wrapper on pg_check_dir().
> - XLogRecordHasFPW() could be checked directly in the function saving
> the blocks.  Still, there is no need for it as we apply the same
> checks again in the inner loop of the routine.
> - The new test has been renamed.
> - RestoreBlockImage() would report a failure and the code would just
> skip it and continue its work.  This could point out to a compression
> failure for example, so like any code paths calling this routine I
> think that we'd better do a pg_fatal() and fail hard.
> - I did not understand why there is a reason to make this option
> conditional on the record prints or even the stats, so I have moved
> the FPW save routine into a separate code path.  The other two could
> be silenced (or not) using --quiet for example, for the same result as
> v12 without impacting the usability of this feature.
> - Few tweaks to the docs, the --help output, the comments and the
> tests.
> - Indentation applied.
> 
> Being able to filter the blocks saved using start/end LSNs or just
> --relation is really cool, especially as the file names use the same
> order as what's needed for this option.

Sounds good, definitely along the ideas of what I’d originally envisioned.

Thanks,

David

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-26 Thread Bharath Rupireddy

On Mon, Dec 26, 2022 at 12:59 PM Michael Paquier  wrote:
>
> I have done a review of that, and here are my notes:
> - The variable names were a bit inconsistent, so I have switched most
> of the new code to use "fullpage".
>
> - The new test has been renamed.
>
> - RestoreBlockImage() would report a failure and the code would just
> skip it and continue its work.  This could point out to a compression
> failure for example, so like any code paths calling this routine I
> think that we'd better do a pg_fatal() and fail hard.
>
> - XLogRecordHasFPW() could be checked directly in the function saving
> the blocks.  Still, there is no need for it as we apply the same
> checks again in the inner loop of the routine.
>
> - Few tweaks to the docs, the --help output, the comments and the
> tests.
> - Indentation applied.
>
> - I did not understand why there is a reason to make this option
> conditional on the record prints or even the stats, so I have moved
> the FPW save routine into a separate code path.  The other two could
> be silenced (or not) using --quiet for example, for the same result as
> v12 without impacting the usability of this feature.

Looks good.

> - The code was not able to handle the case of a target directory
> existing but empty, so I have added a wrapper on pg_check_dir().

Looks okay and with the following, we impose the user-provided target
directory must be empty.

+case 4:
+/* Exists and not empty */
+pg_fatal("directory \"%s\" exists but is not empty", path);

> Being able to filter the blocks saved using start/end LSNs or just
> --relation is really cool, especially as the file names use the same
> order as what's needed for this option.
>
> Comments?

+1. I think this feature will also be useful in pg_walinspect.
However, I'm a bit concerned that it can flood the running database
disk - if someone generates a lot of FPI files.

Overall, the v13 patch LGTM.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-25 Thread Michael Paquier

On Sat, Dec 24, 2022 at 06:23:29PM +0530, Bharath Rupireddy wrote:
> Thanks for the patch. I've made the above change as well as renamed
> the test file name to be save_fpi.pl, everything else remains the same
> as v11. Here's the v12 patch which LGTM. I'll mark it as RfC -
> https://commitfest.postgresql.org/41/3628/.

I have done a review of that, and here are my notes:
- The variable names were a bit inconsistent, so I have switched most
of the new code to use "fullpage".
- The code was not able to handle the case of a target directory
existing but empty, so I have added a wrapper on pg_check_dir().
- XLogRecordHasFPW() could be checked directly in the function saving
the blocks.  Still, there is no need for it as we apply the same
checks again in the inner loop of the routine.
- The new test has been renamed.
- RestoreBlockImage() would report a failure and the code would just
skip it and continue its work.  This could point out to a compression
failure for example, so like any code paths calling this routine I
think that we'd better do a pg_fatal() and fail hard.
- I did not understand why there is a reason to make this option
conditional on the record prints or even the stats, so I have moved
the FPW save routine into a separate code path.  The other two could
be silenced (or not) using --quiet for example, for the same result as
v12 without impacting the usability of this feature.
- Few tweaks to the docs, the --help output, the comments and the
tests.
- Indentation applied.

Being able to filter the blocks saved using start/end LSNs or just
--relation is really cool, especially as the file names use the same
order as what's needed for this option.

Comments?
--
Michael
From 66eedbb96858a4f6804509900b116d8a63b39925 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 26 Dec 2022 16:28:00 +0900
Subject: [PATCH v13] Teach pg_waldump to extract FPIs from the WAL stream

Extracts full-page images from the WAL stream into a given target directory.
These images are subject to the same filtering rules as normal display in
pg_waldump, which means that you can isolate the full page writes to a
target relation, among other things.

Files are saved with the filename: _ with
formatting to make things somewhat sortable; for instance:

-01C0.1663.1.6117.0_main
-01000150.1664.0.6115.0_main
-010001E0.1664.0.6114.0_main
-01000270.1663.1.6116.0_main
-01000300.1663.1.6113.0_main
-01000390.1663.1.6112.0_main
-01000420.1663.1.8903.0_main
-010004B0.1663.1.8902.0_main
-01000540.1663.1.6111.0_main
-010005D0.1663.1.6110.0_main

It's noteworthy that the raw block images do not have the current LSN
stored with them in the WAL stream (as would be true for on-heap
versions of the blocks), nor would the checksum be updated in
them (though WAL itself has checksums, so there is some protection
there).

These images could be loaded/inspected via `pg_read_binary_file()` and
used in the `pageinspect` suite of tools to perform detailed analysis on
the pages in question, based on historical information, and may come in
handy for forensics work.
---
 src/bin/pg_waldump/meson.build|   1 +
 src/bin/pg_waldump/pg_waldump.c   | 107 +
 src/bin/pg_waldump/t/002_save_fullpage.pl | 111 ++
 doc/src/sgml/ref/pg_waldump.sgml  |  66 +
 4 files changed, 285 insertions(+)
 create mode 100644 src/bin/pg_waldump/t/002_save_fullpage.pl

diff --git a/src/bin/pg_waldump/meson.build b/src/bin/pg_waldump/meson.build
index 3fa1b53e71..0428998350 100644
--- a/src/bin/pg_waldump/meson.build
+++ b/src/bin/pg_waldump/meson.build
@@ -31,6 +31,7 @@ tests += {
   'tap': {
 'tests': [
   't/001_basic.pl',
+  't/002_save_fullpage.pl',
 ],
   },
 }
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 9993378ca5..d90a142c68 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -23,9 +23,13 @@
 #include "access/xlogrecord.h"
 #include "access/xlogstats.h"
 #include "common/fe_memutils.h"
+#include "common/file_perm.h"
+#include "common/file_utils.h"
 #include "common/logging.h"
+#include "common/relpath.h"
 #include "getopt_long.h"
 #include "rmgrdesc.h"
+#include "storage/bufpage.h"
 
 /*
  * NOTE: For any code change or issue fix here, it is highly recommended to
@@ -70,6 +74,9 @@ typedef struct XLogDumpConfig
 	bool		filter_by_relation_block_enabled;
 	ForkNumber	filter_by_relation_forknum;
 	bool		filter_by_fpw;
+
+	/* save options */
+	char	   *save_fullpage_path;
 } XLogDumpConfig;
 
 
@@ -112,6 +119,37 @@ verify_directory(const char *directory)
 	return true;
 }
 
+/*
+ * Create if necessary the directory storing the full-

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-24 Thread Bharath Rupireddy

On Sat, Dec 24, 2022 at 12:58 AM David Christensen
 wrote:
>
> On Fri, Dec 23, 2022 at 12:57 PM David Christensen
>  wrote:
> >
> > On Wed, Dec 21, 2022 at 5:47 AM Bharath Rupireddy
> >  wrote:
>
> > > 2. +$node->init(extra => ['-k'], allows_streaming => 1);
> > > When enabled with allows_streaming, there are a bunch of things that
> > > happen to the node while initializing, I don't think we need all of
> > > them for this.
> >
> > I think the "allows_streaming" was required to ensure the WAL files
> > were preserved properly, and was the approach we ended up taking
> > rather than trying to fail the archive_command or other approaches I'd
> > taken earlier. I'd rather keep this if we can, unless you can propose
> > a different approach that would continue to work in the same way.
>
> Confirmed that we needed this in order to create the replication slot,
> so this /is/ required for the test to work.

The added test needs wal_level to be replica, but the TAP tests set it
to minimal if allows_streaming isn't passed. However, if passed
allows_streaming, it sets a bunch of other parameters which are not
required for this test (see note->init function in cluster.pm), hence
we could just set the required parameters wal_level = replica and
max_wal_senders for the replication slot to be created.

> Enclosing v11 with yours and Michael's latest feedback.

Thanks for the patch. I've made the above change as well as renamed
the test file name to be save_fpi.pl, everything else remains the same
as v11. Here's the v12 patch which LGTM. I'll mark it as RfC -
https://commitfest.postgresql.org/41/3628/.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

v12-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-st.patch
Description: Binary data

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-23 Thread David Christensen

On Fri, Dec 23, 2022 at 12:57 PM David Christensen
 wrote:
>
> On Wed, Dec 21, 2022 at 5:47 AM Bharath Rupireddy
>  wrote:

[snip]

> > 2. +$node->init(extra => ['-k'], allows_streaming => 1);
> > When enabled with allows_streaming, there are a bunch of things that
> > happen to the node while initializing, I don't think we need all of
> > them for this.
>
> I think the "allows_streaming" was required to ensure the WAL files
> were preserved properly, and was the approach we ended up taking
> rather than trying to fail the archive_command or other approaches I'd
> taken earlier. I'd rather keep this if we can, unless you can propose
> a different approach that would continue to work in the same way.

Confirmed that we needed this in order to create the replication slot,
so this /is/ required for the test to work.

Enclosing v11 with yours and Michael's latest feedback.

Best,

David


v11-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-st.patch
Description: Binary data

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-23 Thread David Christensen

On Mon, Dec 19, 2022 at 12:23 AM Michael Paquier  wrote:
>
> On Thu, Dec 15, 2022 at 05:17:46PM -0600, David Christensen wrote:
> > On Thu, Dec 15, 2022 at 12:36 AM Michael Paquier  
> > wrote:
> > This v10 should incorporate your feedback as well as Bharath's.
>
> Thanks for the new version.  I have minor comments.
>
> >> It seems to me that you could allow things to work even if the
> >> directory exists and is empty.  See for example
> >> verify_dir_is_empty_or_create() in pg_basebackup.c.
> >
> > The `pg_mkdir_p()` supports an existing directory (and I don't think
> > we want to require it to be empty first), so this only errors when it
> > can't create a directory for some reason.
>
> Sure, but things can also be made so as we don't fail if the directory
> exists and is empty?  This would be more consistent with the base
> directories created by pg_basebackup and initdb.

I guess I'm feeling a little dense here; how is this failing if there
is an existing empty directory?

> >> +$node->safe_psql('postgres', < >> +SELECT 'init' FROM 
> >> pg_create_physical_replication_slot('regress_pg_waldump_slot', true, 
> >> false);
> >> +CREATE TABLE test_table AS SELECT generate_series(1,100) a;
> >> +CHECKPOINT; -- required to force FPI for next writes
> >> +UPDATE test_table SET a = a + 1;
> >> Using an EOF to execute a multi-line query would be a first.  Couldn't
> >> you use the same thing as anywhere else?  009_twophase.pl just to
> >> mention one.  (Mentioned by Bharath upthread, where he asked for an
> >> extra opinion so here it is.)
> >
> > Fair enough, while idiomatic perl to me, not a hill to die on;
> > converted to a standard multiline string.
>
> By the way, knowing that we have an option called --fullpage, could be
> be better to use --save-fullpage for the option name?

Works for me.  I think I'd just wanted to avoid reformatting the
entire usage message which is why I'd gone with the shorter version.

> +   OPF = fopen(filename, PG_BINARY_W);
> +   if (!OPF)
> +   pg_fatal("couldn't open file for output: %s", filename);
> [..]
> +   if (fwrite(page, BLCKSZ, 1, OPF) != 1)
> +   pg_fatal("couldn't write out complete full page image to file: 
> %s", filename);
> These should more more generic, as of "could not open file \"%s\"" and
> "could not write file \"%s\"" as the file name provides all the
> information about what this writes.

Sure, will update.

Best,

David

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-23 Thread David Christensen

On Wed, Dec 21, 2022 at 5:47 AM Bharath Rupireddy
 wrote:
>
> On Fri, Dec 16, 2022 at 4:47 AM David Christensen
>  wrote:
> >
> > On Thu, Dec 15, 2022 at 12:36 AM Michael Paquier  
> > wrote:
> > >
> > > On Wed, Dec 14, 2022 at 04:44:34PM -0600, David Christensen wrote:
> > > > I can get one sent in tomorrow.
> >
> > This v10 should incorporate your feedback as well as Bharath's.
>
> Thanks for the patch. Here're some minor comments:
>
> 1. +my $node =  PostgreSQL::Test::Cluster->new('primary');
> Can the name be other than 'primary' because we don't create a standby
> for this test? Something like - 'node_a' or 'node_extract_fpi' or some
> other.

Sure, no issues.

> 2. +$node->init(extra => ['-k'], allows_streaming => 1);
> When enabled with allows_streaming, there are a bunch of things that
> happen to the node while initializing, I don't think we need all of
> them for this.

I think the "allows_streaming" was required to ensure the WAL files
were preserved properly, and was the approach we ended up taking
rather than trying to fail the archive_command or other approaches I'd
taken earlier. I'd rather keep this if we can, unless you can propose
a different approach that would continue to work in the same way.

> 3. +$node->init(extra => ['-k'], allows_streaming => 1);
> Can we use --data-checksums instead of -k for more readability?
> Perhaps, a comment on why we need that option helps greatly.

Yeah, can spell out; don't recall exactly why we needed it offhand,
but will confirm or remove if insignificant.

> 4.
> +page = (Page) buf.data;
> +
> +if (!XLogRecHasBlockRef(record, block_id))
> +continue;
> +
> +if (!XLogRecHasBlockImage(record, block_id))
> +continue;
> +
> +if (!RestoreBlockImage(record, block_id, page))
> +continue;
> Can you shift  page = (Page) buf.data; just before the last if
> condition RestoreBlockImage() so that it doesn't get executed for the
> other two continue statements?

Sure; since it was just setting a pointer value I didn't consider it
to be a hotspot for optimization.

Best,

David

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-21 Thread Bharath Rupireddy

On Fri, Dec 16, 2022 at 4:47 AM David Christensen
 wrote:
>
> On Thu, Dec 15, 2022 at 12:36 AM Michael Paquier  wrote:
> >
> > On Wed, Dec 14, 2022 at 04:44:34PM -0600, David Christensen wrote:
> > > I can get one sent in tomorrow.
>
> This v10 should incorporate your feedback as well as Bharath's.

Thanks for the patch. Here're some minor comments:

1. +my $node =  PostgreSQL::Test::Cluster->new('primary');
Can the name be other than 'primary' because we don't create a standby
for this test? Something like - 'node_a' or 'node_extract_fpi' or some
other.

2. +$node->init(extra => ['-k'], allows_streaming => 1);
When enabled with allows_streaming, there are a bunch of things that
happen to the node while initializing, I don't think we need all of
them for this.

3. +$node->init(extra => ['-k'], allows_streaming => 1);
Can we use --data-checksums instead of -k for more readability?
Perhaps, a comment on why we need that option helps greatly.

4.
+page = (Page) buf.data;
+
+if (!XLogRecHasBlockRef(record, block_id))
+continue;
+
+if (!XLogRecHasBlockImage(record, block_id))
+continue;
+
+if (!RestoreBlockImage(record, block_id, page))
+continue;
Can you shift  page = (Page) buf.data; just before the last if
condition RestoreBlockImage() so that it doesn't get executed for the
other two continue statements?

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-18 Thread Michael Paquier

On Thu, Dec 15, 2022 at 05:17:46PM -0600, David Christensen wrote:
> On Thu, Dec 15, 2022 at 12:36 AM Michael Paquier  wrote:
> This v10 should incorporate your feedback as well as Bharath's.

Thanks for the new version.  I have minor comments.

>> It seems to me that you could allow things to work even if the
>> directory exists and is empty.  See for example
>> verify_dir_is_empty_or_create() in pg_basebackup.c.
> 
> The `pg_mkdir_p()` supports an existing directory (and I don't think
> we want to require it to be empty first), so this only errors when it
> can't create a directory for some reason.

Sure, but things can also be made so as we don't fail if the directory
exists and is empty?  This would be more consistent with the base
directories created by pg_basebackup and initdb.

>> +$node->safe_psql('postgres', <> +SELECT 'init' FROM 
>> pg_create_physical_replication_slot('regress_pg_waldump_slot', true, false);
>> +CREATE TABLE test_table AS SELECT generate_series(1,100) a;
>> +CHECKPOINT; -- required to force FPI for next writes
>> +UPDATE test_table SET a = a + 1;
>> Using an EOF to execute a multi-line query would be a first.  Couldn't
>> you use the same thing as anywhere else?  009_twophase.pl just to
>> mention one.  (Mentioned by Bharath upthread, where he asked for an
>> extra opinion so here it is.)
> 
> Fair enough, while idiomatic perl to me, not a hill to die on;
> converted to a standard multiline string.

By the way, knowing that we have an option called --fullpage, could be
be better to use --save-fullpage for the option name?

+   OPF = fopen(filename, PG_BINARY_W);
+   if (!OPF)
+   pg_fatal("couldn't open file for output: %s", filename);
[..]
+   if (fwrite(page, BLCKSZ, 1, OPF) != 1)
+   pg_fatal("couldn't write out complete full page image to file: %s", 
filename);
These should more more generic, as of "could not open file \"%s\"" and
"could not write file \"%s\"" as the file name provides all the
information about what this writes.
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-15 Thread David Christensen

On Thu, Dec 15, 2022 at 12:36 AM Michael Paquier  wrote:
>
> On Wed, Dec 14, 2022 at 04:44:34PM -0600, David Christensen wrote:
> > I can get one sent in tomorrow.

This v10 should incorporate your feedback as well as Bharath's.

> -XLogRecordHasFPW(XLogReaderState *record)
> +XLogRecordHasFPI(XLogReaderState *record)
> This still refers to a FPW, so let's leave that out as well as any
> renamings of this kind..

Reverted those changes.

> +   if (config.save_fpi_path != NULL)
> +   {
> +   /* Create the dir if it doesn't exist */
> +   if (pg_mkdir_p(config.save_fpi_path, pg_dir_create_mode) < 0)
> +   {
> +   pg_log_error("could not create output directory \"%s\": %m",
> +config.save_fpi_path);
> +   goto bad_argument;
> +   }
> +   }
> It seems to me that you could allow things to work even if the
> directory exists and is empty.  See for example
> verify_dir_is_empty_or_create() in pg_basebackup.c.

The `pg_mkdir_p()` supports an existing directory (and I don't think
we want to require it to be empty first), so this only errors when it
can't create a directory for some reason.

> +my $file_re =
> +  
> qr/^([0-9A-F]{8})-([0-9A-F]{8})[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+(?:_vm|_init|_fsm|_main)?$/;
> This is artistic to parse for people not used to regexps (I do, a
> little).  Perhaps this could use a small comment with an example of
> name or a reference describing this format?

Added a description of what this is looking for.

> +# Set umask so test directories and files are created with default 
> permissions
> +umask(0077);
> Incorrect copy-paste coming from elsewhere like the TAP tests of
> pg_basebackup with group permissions?  Doesn't
> PostgreSQL::Test::Utils::tempdir give already enough protection in
> terms of umask() and permissions?

I'd expect that's where that came from. Removed.

> +   if (config.save_fpi_path != NULL)
> +   {
> +   /* We fsync our output directory only; since these files are not part
> +* of the production database we do not require the performance hit
> +* that fsyncing every FPI would entail, so are doing this as a
> +* compromise. */
> +
> +   fsync_fname(config.save_fpi_path, true);
> +   }
> Why is it necessary to flush anything at all, then?

I personally don't think it is, but added it per Bharath's request.
Removed in this revision.

> +my $relation = $node->safe_psql('postgres',
> +   q{SELECT format('%s/%s/%s', CASE WHEN reltablespace = 0 THEN
> dattablespace ELSE reltablespace END, pg_database.oid,
> pg_relation_filenode(pg_class.oid)) FROM pg_class, pg_database WHERE
> relname = 'test_table' AND datname = current_database()}
> Could you rewrite that to be on multiple lines?

Sure, reformated.

> +diag "using walfile: $walfile";
> You should avoid the use of "diag", as this would cause extra output
> noise with a simple make check.

Had been using it for debugging and didn't realize it'd cause issues.
Removed both instances.

> +$node->safe_psql('postgres', "SELECT 
> pg_drop_replication_slot('regress_pg_waldump_slot')")
> That's not really necessary, the nodes are wiped out at the end of the
> test.

Removed.

> +$node->safe_psql('postgres', < +SELECT 'init' FROM 
> pg_create_physical_replication_slot('regress_pg_waldump_slot', true, false);
> +CREATE TABLE test_table AS SELECT generate_series(1,100) a;
> +CHECKPOINT; -- required to force FPI for next writes
> +UPDATE test_table SET a = a + 1;
> Using an EOF to execute a multi-line query would be a first.  Couldn't
> you use the same thing as anywhere else?  009_twophase.pl just to
> mention one.  (Mentioned by Bharath upthread, where he asked for an
> extra opinion so here it is.)

Fair enough, while idiomatic perl to me, not a hill to die on;
converted to a standard multiline string.

Best,

David


v10-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-st.patch
Description: Binary data

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-14 Thread Michael Paquier

On Wed, Dec 14, 2022 at 04:44:34PM -0600, David Christensen wrote:
> I can get one sent in tomorrow.

-XLogRecordHasFPW(XLogReaderState *record)
+XLogRecordHasFPI(XLogReaderState *record)
This still refers to a FPW, so let's leave that out as well as any
renamings of this kind..

+   if (config.save_fpi_path != NULL)
+   {
+   /* Create the dir if it doesn't exist */
+   if (pg_mkdir_p(config.save_fpi_path, pg_dir_create_mode) < 0)
+   {
+   pg_log_error("could not create output directory \"%s\": %m",
+config.save_fpi_path);
+   goto bad_argument;
+   }
+   }
It seems to me that you could allow things to work even if the
directory exists and is empty.  See for example
verify_dir_is_empty_or_create() in pg_basebackup.c.

+my $file_re =
+  
qr/^([0-9A-F]{8})-([0-9A-F]{8})[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+(?:_vm|_init|_fsm|_main)?$/;
This is artistic to parse for people not used to regexps (I do, a
little).  Perhaps this could use a small comment with an example of
name or a reference describing this format?

+# Set umask so test directories and files are created with default permissions
+umask(0077);
Incorrect copy-paste coming from elsewhere like the TAP tests of
pg_basebackup with group permissions?  Doesn't 
PostgreSQL::Test::Utils::tempdir give already enough protection in
terms of umask() and permissions?

+   if (config.save_fpi_path != NULL)
+   {
+   /* We fsync our output directory only; since these files are not part
+* of the production database we do not require the performance hit
+* that fsyncing every FPI would entail, so are doing this as a
+* compromise. */
+
+   fsync_fname(config.save_fpi_path, true);
+   }
Why is it necessary to flush anything at all, then?

+my $relation = $node->safe_psql('postgres',
+   q{SELECT format('%s/%s/%s', CASE WHEN reltablespace = 0 THEN
dattablespace ELSE reltablespace END, pg_database.oid,
pg_relation_filenode(pg_class.oid)) FROM pg_class, pg_database WHERE
relname = 'test_table' AND datname = current_database()}
Could you rewrite that to be on multiple lines?

+diag "using walfile: $walfile";
You should avoid the use of "diag", as this would cause extra output
noise with a simple make check.

+$node->safe_psql('postgres', "SELECT 
pg_drop_replication_slot('regress_pg_waldump_slot')")
That's not really necessary, the nodes are wiped out at the end of the
test. 

+$node->safe_psql('postgres', <

signature.asc
Description: PGP signature

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-14 Thread David Christensen

Hi Bharath,

I can get one sent in tomorrow.

Thanks,

David

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-14 Thread Bharath Rupireddy

lication_slot('regress_pg_waldump_slot', true,
> > false);
> > +CREATE TABLE test_table AS SELECT generate_series(1,100) a;
> > +CHECKPOINT; -- required to force FPI for next writes
> > +UPDATE test_table SET a = a + 1;
> > +EOF
> > The EOF with append_conf() is being used in 4 files and elsewhere in
> > the TAP test files (more than 100?) qq[] or quotes is being used. I
> > have no strong opinion here, I'll leave it to the other reviewers or
> > committer.
>
> I'm inclined to leave it just for (personal) readability, but can
> change if there's a strong consensus against.
>
> > > > 11.
> > > > +# verify filename formats matches w/--save-fpi
> > > > +for my $fullpath (glob "$tmp_folder/raw/*")
> > > > Do we need to look for the exact match of the file that gets created
> > > > in the save-fpi path? While checking for this is great, it makes the
> > > > test code non-portable (may not work on Windows or other platforms,
> > > > no?) and complex? This way, you can get rid of get_block_info() as
> > > > well? And +for my $fullpath (glob "$tmp_folder/raw/*")
> > > > will also  get simplified.
> > > >
> > > > I think you can further simplify the tests by:
> > > > create the node
> > > > generate an FPI
> > > > call pg_waldump with save-fpi option
> > > > check the target directory for a file that contains the relid,
> > > > something like '%relid%'.
> > > >
> > > > The above would still serve the purpose, tests the code without much 
> > > > complexity.
> > >
> > > I disagree; I think there is utility in keeping the validation of the
> > > expected output.  Since we have the code that works for it (and does
> > > work on Windows, per passing the CI tests) I'm not seeing why we
> > > wouldn't want to continue to validate as much as possible.
> >
> > My intention is to simplify the tests further and I still stick to it.
> > It looks like the majority of test code is to form the file name in
> > the format that pg_waldump outputs and match with the file name in the
> > target directory - for instance, in get_block_info(), and in the loop
> > for my $fullpath (glob "$tmp_folder/raw/*"). I don't think the tests
> > need to aim for file format checks, it's enough to look for the
> > written file with '%relid%' by pg_waldump, if needed, the contents of
> > the files written/FPI can also be verified with, say, pg_checksum
> > tool. Others may have different opinions though.
>
> I would like to get broader feedback before changing this.

David, is there a plan to provide an updated patch in this commitfest?

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Standardizing how pg_waldump presents recovery conflict XID cutoffs

2022-11-17 Thread Peter Geoghegan

On Thu, Nov 17, 2022 at 9:02 AM Peter Geoghegan  wrote:
> Plan is to commit this later on today, barring objections.

Pushed, thanks.

-- 
Peter Geoghegan

Re: Standardizing how pg_waldump presents recovery conflict XID cutoffs

2022-11-17 Thread Peter Geoghegan

On Wed, Nov 16, 2022 at 4:34 PM Peter Geoghegan  wrote:
> WFM.

Attached is v3.

Plan is to commit this later on today, barring objections.

Thanks
-- 
Peter Geoghegan


v3-0001-Standardize-rmgrdesc-recovery-conflict-XID-output.patch
Description: Binary data

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-17 Thread David Christensen

ity, but can
change if there's a strong consensus against.

> > > 11.
> > > +# verify filename formats matches w/--save-fpi
> > > +for my $fullpath (glob "$tmp_folder/raw/*")
> > > Do we need to look for the exact match of the file that gets created
> > > in the save-fpi path? While checking for this is great, it makes the
> > > test code non-portable (may not work on Windows or other platforms,
> > > no?) and complex? This way, you can get rid of get_block_info() as
> > > well? And +for my $fullpath (glob "$tmp_folder/raw/*")
> > > will also  get simplified.
> > >
> > > I think you can further simplify the tests by:
> > > create the node
> > > generate an FPI
> > > call pg_waldump with save-fpi option
> > > check the target directory for a file that contains the relid,
> > > something like '%relid%'.
> > >
> > > The above would still serve the purpose, tests the code without much 
> > > complexity.
> >
> > I disagree; I think there is utility in keeping the validation of the
> > expected output.  Since we have the code that works for it (and does
> > work on Windows, per passing the CI tests) I'm not seeing why we
> > wouldn't want to continue to validate as much as possible.
>
> My intention is to simplify the tests further and I still stick to it.
> It looks like the majority of test code is to form the file name in
> the format that pg_waldump outputs and match with the file name in the
> target directory - for instance, in get_block_info(), and in the loop
> for my $fullpath (glob "$tmp_folder/raw/*"). I don't think the tests
> need to aim for file format checks, it's enough to look for the
> written file with '%relid%' by pg_waldump, if needed, the contents of
> the files written/FPI can also be verified with, say, pg_checksum
> tool. Others may have different opinions though.

I would like to get broader feedback before changing this.

David

Re: Standardizing how pg_waldump presents recovery conflict XID cutoffs

2022-11-16 Thread Peter Geoghegan

On Wed, Nov 16, 2022 at 4:25 PM Andres Freund  wrote:
> > Anyway, worth calling this out directly in these comments IMV. We're
> > addressing two closely related things that assign opposite meanings to
> > InvalidTransactionId, which is rather confusing.
>
> It makes sense to call this out, but I'd
> s/snapshotConflictHorizon format XIDs/cutoff with snapshotConflictHorizon 
> semantics/
>
> or such?

WFM.

-- 
Peter Geoghegan

Re: Standardizing how pg_waldump presents recovery conflict XID cutoffs

2022-11-16 Thread Andres Freund

Hi,

On 2022-11-16 15:37:40 -0800, Peter Geoghegan wrote:
> On Wed, Nov 16, 2022 at 3:27 PM Andres Freund  wrote:
> > What are "snapshotConflictHorizon format XIDs"? I guess you mean format in 
> > the
> > sense of having the semantics of snapshotConflictHorizon?
> 
> Yes. That is the only possible way that any recovery conflict ever
> works on the REDO side, with the exception of a few
> not-very-interesting cases such as DROP TABLESPACE.
> 
> GetConflictingVirtualXIDs() assigns a special meaning to
> InvalidTransactionId which is the *opposite* of the special meaning
> that snapshotConflictHorizon-based values assign to
> InvalidTransactionId. At one point they actually did the same
> definition for InvalidTransactionId, but that was changed soon after
> hot standby first went in (when we taught btree delete records to not
> use ludicrously conservative cutoffs that caused needless conflicts).
> 
> Anyway, worth calling this out directly in these comments IMV. We're
> addressing two closely related things that assign opposite meanings to
> InvalidTransactionId, which is rather confusing.

It makes sense to call this out, but I'd
s/snapshotConflictHorizon format XIDs/cutoff with snapshotConflictHorizon 
semantics/

or such?

Greetings,

Andres Freund

Re: Standardizing how pg_waldump presents recovery conflict XID cutoffs

2022-11-16 Thread Peter Geoghegan

On Wed, Nov 16, 2022 at 3:27 PM Andres Freund  wrote:
> The "(also...) formulation seems a bit odd. How about "an obsolescent heap
> tuple that the caller is physically removing, e.g. via HOT pruning or index
> deletion." or such?

Okay, WFM.

> > + * snapshotConflictHorizon format values are how all hot Standby conflicts 
> > are
> > + * generated by REDO routines (at least wherever a granular cutoff is 
> > used).
>
> Not quite parsing for me.

I meant something like: this is a cutoff that works in the same way as
any other cutoff involved with recovery conflicts, in general, with
the exception of those cases that have very coarse grained conflicts,
such as DROP TABLESPACE.

I suppose it would be better to just say the first part. Will fix.

> > + /*
> > +  * It's quite possible that final snapshotConflictHorizon value will 
> > be
> > +  * invalid in final WAL record, indicating that we definitely don't 
> > need to
> > +  * generate a conflict
> > +  */
>
> *the final
>
> Isn't this already described in the header?

Sort of, but arguably it makes sense to call it out specifically.
Though on second thought, yeah, lets just get rid of it.

> What are "snapshotConflictHorizon format XIDs"? I guess you mean format in the
> sense of having the semantics of snapshotConflictHorizon?

Yes. That is the only possible way that any recovery conflict ever
works on the REDO side, with the exception of a few
not-very-interesting cases such as DROP TABLESPACE.

GetConflictingVirtualXIDs() assigns a special meaning to
InvalidTransactionId which is the *opposite* of the special meaning
that snapshotConflictHorizon-based values assign to
InvalidTransactionId. At one point they actually did the same
definition for InvalidTransactionId, but that was changed soon after
hot standby first went in (when we taught btree delete records to not
use ludicrously conservative cutoffs that caused needless conflicts).

Anyway, worth calling this out directly in these comments IMV. We're
addressing two closely related things that assign opposite meanings to
InvalidTransactionId, which is rather confusing.

-- 
Peter Geoghegan

Re: Standardizing how pg_waldump presents recovery conflict XID cutoffs

2022-11-16 Thread Andres Freund

Hi,

On 2022-11-16 14:14:30 -0800, Peter Geoghegan wrote:
>  /*
> - * If 'tuple' contains any visible XID greater than latestRemovedXid,
> - * ratchet forwards latestRemovedXid to the greatest one found.
> - * This is used as the basis for generating Hot Standby conflicts, so
> - * if a tuple was never visible then removing it should not conflict
> - * with queries.
> + * Maintain snapshotConflictHorizon for caller by ratcheting forward its 
> value
> + * using any committed XIDs contained in 'tuple', an obsolescent heap tuple
> + * that caller is in the process of physically removing via pruning.
> + * (Also supports generating index deletion snapshotConflictHorizon values.)

The "(also...) formulation seems a bit odd. How about "an obsolescent heap
tuple that the caller is physically removing, e.g. via HOT pruning or index
deletion." or such?


> + * snapshotConflictHorizon format values are how all hot Standby conflicts 
> are
> + * generated by REDO routines (at least wherever a granular cutoff is used).

Not quite parsing for me.

> + * Caller must initialize its value to InvalidTransactionId, which is 
> generally
> + * interpreted as "definitely no need for a recovery conflict".
> + *
> + * Final value must reflect all heap tuples that caller will physically 
> remove
> + * via the ongoing pruning operation.  ResolveRecoveryConflictWithSnapshot() 
> is
> + * passed the final value (taken from caller's WAL record) by a REDO routine.

> + /*
> +  * It's quite possible that final snapshotConflictHorizon value will be
> +  * invalid in final WAL record, indicating that we definitely don't 
> need to
> +  * generate a conflict
> +  */

*the final

Isn't this already described in the header?


> @@ -3337,12 +3337,17 @@ GetCurrentVirtualXIDs(TransactionId limitXmin, bool 
> excludeXmin0,
>   * GetConflictingVirtualXIDs -- returns an array of currently active VXIDs.
>   *
>   * Usage is limited to conflict resolution during recovery on standby 
> servers.
> - * limitXmin is supplied as either latestRemovedXid, or InvalidTransactionId
> - * in cases where we cannot accurately determine a value for 
> latestRemovedXid.
> + * limitXmin is supplied as either a snapshotConflictHorizon format XID, or 
> as
> + * InvalidTransactionId in cases where caller cannot accurately determine a
> + * safe snapshotConflictHorizon value.
>   *
>   * If limitXmin is InvalidTransactionId then we want to kill everybody,
>   * so we're not worried if they have a snapshot or not, nor does it really
> - * matter what type of lock we hold.
> + * matter what type of lock we hold.  Caller must avoid calling here with
> + * snapshotConflictHorizon format XIDs that were set to InvalidTransactionId

What are "snapshotConflictHorizon format XIDs"? I guess you mean format in the
sense of having the semantics of snapshotConflictHorizon?



Greetings,

Andres Freund

Re: Standardizing how pg_waldump presents recovery conflict XID cutoffs

2022-11-16 Thread Peter Geoghegan

On Tue, Nov 15, 2022 at 8:48 PM Peter Geoghegan  wrote:
> Okay, let's go with snapshotConflictHorizon. I'll use that name in the
> next revision, which I should be able to post tomorrow.

Attached is a somewhat cleaned up version that uses that symbol name
for everything.

--
Peter Geoghegan


v2-0001-Standardize-rmgrdesc-recovery-conflict-XID-output.patch
Description: Binary data

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-16 Thread Bharath Rupireddy

On Wed, Nov 16, 2022 at 1:20 AM David Christensen
 wrote:
>
> Enclosed is v9.
>
> - code style consistency (FPI instead of FPW) internally.
> - cleanup of no-longer needed checksum-related pieces from code and tests.
> - test cleanup/simplification.
> - other comment cleanup.
>
> Passes all CI checks.

Thanks for the updated patch.

1.
-if (config.filter_by_fpw && !XLogRecordHasFPW(xlogreader_state))
+if (config.filter_by_fpw && !XLogRecordHasFPI(xlogreader_state))
These changes are not related to this feature, hence renaming those
variables/function names must be dealt with separately. If required,
proposing another patch can be submitted to change filter_by_fpw to
filter_by_fpi and XLogRecordHasFPW() to XLogRecordHasFPI().

2.
+/* We fsync our output directory only; since these files are not part
+ * of the production database we do not require the performance hit
+ * that fsyncing every FPI would entail, so are doing this as a
+ * compromise. */
The commenting style doesn't match the standard that we follow
elsewhere in postgres, please refer to other multi-line comments.

3.
+fsync_fname(config.save_fpi_path, true);
+}
It looks like fsync_fname()/fsync() in general isn't recursive, in the
sense that it doesn't fsync the files under the directory, but the
directory only. So, the idea of directory fsync doesn't seem worth it.
We either 1) get rid of fsync entirely or 2) fsync all the files after
they are created and the directory at the end or 3) do option (2) with
--no-sync option similar to its friends. Since option (2) is a no go,
we can either choose option (1) or option (2). My vote at this point
is for option (1).

4.
+($walfile_name, $blocksize) = split '\|' =>
$node->safe_psql('postgres',"SELECT pg_walfile_name(pg_switch_wal()),
current_setting('block_size')");
+my $walfile = $node->basedir . '/pgdata/pg_wal/' . $walfile_name;
I think there's something wrong with this, no? pg_switch_wal() can, at
times, return end+1 of the prior segment (see below snippet) and I'm
not sure if such a case can happen here.

 * The return value is either the end+1 address of the switch record,
 * or the end+1 address of the prior segment if we did not need to
 * write a switch record because we are already at segment start.
 */
XLogRecPtr
RequestXLogSwitch(bool mark_unimportant)

5.
+my $walfile = $node->basedir . '/pgdata/pg_wal/' . $walfile_name;
+ok(-f $walfile, "Got a WAL file");
Is this checking if the WAL file is present or not in PGDATA/pg_wal?
If yes, I think this isn't required as pg_switch_wal() ensures that
the WAL is written and flushed to disk.

6.
+my $walfile = $node->basedir . '/pgdata/pg_wal/' . $walfile_name;
Isn't "pgdata" hardcoded here? I think you might need to do the following:
$node->data_dir . '/pg_wal/' . $walfile_name;;

7.
+# save filename for later verification
+$files{$file}++;

+# validate that we ended up with some FPIs saved
+ok(keys %files > 0, 'verify we processed some files');
Why do we need to store filenames in an array when we later just check
the size of the array? Can't we use a boolean (file_found) or an int
variable (file_count) to verify that we found the file.

8.
+$node->safe_psql('postgres', < > 11.
> > +# verify filename formats matches w/--save-fpi
> > +for my $fullpath (glob "$tmp_folder/raw/*")
> > Do we need to look for the exact match of the file that gets created
> > in the save-fpi path? While checking for this is great, it makes the
> > test code non-portable (may not work on Windows or other platforms,
> > no?) and complex? This way, you can get rid of get_block_info() as
> > well? And +for my $fullpath (glob "$tmp_folder/raw/*")
> > will also  get simplified.
> >
> > I think you can further simplify the tests by:
> > create the node
> > generate an FPI
> > call pg_waldump with save-fpi option
> > check the target directory for a file that contains the relid,
> > something like '%relid%'.
> >
> > The above would still serve the purpose, tests the code without much 
> > complexity.
>
> I disagree; I think there is utility in keeping the validation of the
> expected output.  Since we have the code that works for it (and does
> work on Windows, per passing the CI tests) I'm not seeing why we
> wouldn't want to continue to validate as much as possible.

My intention is to simplify the tests further and I still stick to it.
It looks like the majority of test code is to form the file name in
the format that pg_waldump outputs and match with the file name in the
target directory - for instance, in get_b

Re: Standardizing how pg_waldump presents recovery conflict XID cutoffs

2022-11-15 Thread Andres Freund

On 2022-11-15 20:48:56 -0800, Peter Geoghegan wrote:
> On Tue, Nov 15, 2022 at 5:29 PM Andres Freund  wrote:
> > If we want to focus on the mvcc affects we could just go for something like
> > snapshotConflictHorizon or such.
> 
> Okay, let's go with snapshotConflictHorizon. I'll use that name in the
> next revision, which I should be able to post tomorrow.

Cool!

Re: Standardizing how pg_waldump presents recovery conflict XID cutoffs

2022-11-15 Thread Peter Geoghegan

On Tue, Nov 15, 2022 at 5:29 PM Andres Freund  wrote:
> If we want to focus on the mvcc affects we could just go for something like
> snapshotConflictHorizon or such.

Okay, let's go with snapshotConflictHorizon. I'll use that name in the
next revision, which I should be able to post tomorrow.

-- 
Peter Geoghegan

Re: Standardizing how pg_waldump presents recovery conflict XID cutoffs

2022-11-15 Thread Andres Freund

Hi,

On 2022-11-15 13:54:24 -0800, Peter Geoghegan wrote:
> On Tue, Nov 15, 2022 at 12:29 PM Andres Freund  wrote:
> > ... I strongly dislike latestCommittedXid. That seems at least as misleading
> > as latestRemovedXid and has the danger of confusion with latestCompletedXid
> > as you mention.
> 
> > How about latestAffectedXid?
> 
> I get why you don't care for latestCommittedXid, of course, but the
> name does have some advantages. Namely:
> 
> 1. Most conflicts come from PRUNE records (less often index deletion
> records) where the XID is some heap tuple's xmax, a
> committed-to-everybody XID on the primary (at the point of the
> original execution of the prune). It makes sense to emphasize the idea
> that snapshots running on a replica need to agree that this XID is
> definitely committed -- we need to kill any snapshots that don't
> definitely agree that this one particular XID is committed by now.

I don't agree that it makes sense there - to me it sounds like the record is
just carrying the globally latest committed xid rather than something just
describing the record.

I also just don't think "agreeing that a particular XID is committed" is a
good description of latestRemovedXID, there's just too many ways that
"agreeing xid has committed" can be understood. To me it's not obvious that
it's about mvcc snapshots.

If we want to focus on the mvcc affects we could just go for something like
snapshotConflictHorizon or such.

> Perhaps something like "mustBeCommittedCutoff" would work better? What
> do you think of that? The emphasis on how things need to work on the
> REDO side seems useful.

I don't think "committed" should be part of the name.

Greetings,

Andres Freund

Re: Standardizing how pg_waldump presents recovery conflict XID cutoffs

2022-11-15 Thread Peter Geoghegan

On Tue, Nov 15, 2022 at 12:29 PM Andres Freund  wrote:
> ... I strongly dislike latestCommittedXid. That seems at least as misleading
> as latestRemovedXid and has the danger of confusion with latestCompletedXid
> as you mention.

> How about latestAffectedXid?

I get why you don't care for latestCommittedXid, of course, but the
name does have some advantages. Namely:

1. Most conflicts come from PRUNE records (less often index deletion
records) where the XID is some heap tuple's xmax, a
committed-to-everybody XID on the primary (at the point of the
original execution of the prune). It makes sense to emphasize the idea
that snapshots running on a replica need to agree that this XID is
definitely committed -- we need to kill any snapshots that don't
definitely agree that this one particular XID is committed by now.

2. It hints at the idea that we don't need to set any XID to do
cleanup for aborted transactions, per the optimization in
HeapTupleHeaderAdvanceLatestRemovedXid().

Perhaps something like "mustBeCommittedCutoff" would work better? What
do you think of that? The emphasis on how things need to work on the
REDO side seems useful.

-- 
Peter Geoghegan

Re: Standardizing how pg_waldump presents recovery conflict XID cutoffs

2022-11-15 Thread Andres Freund

Hi,

I like the idea of this, but:

On 2022-11-15 10:24:05 -0800, Peter Geoghegan wrote:
> I'm not necessarily that attached to the name latestCommittedXid. It
> is more accurate, but it's also a little bit too similar to another
> common XID symbol name, latestCompletedXid. Can anyone suggest an
> alternative?

... I strongly dislike latestCommittedXid. That seems at least as misleading
as latestRemovedXid and has the danger of confusion with latestCompletedXid
as you mention.

How about latestAffectedXid? Based on a quick scroll through the changed
structures it seems like it'd be reasonably discriptive for most?

Greetings,

Andres Freund

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-15 Thread David Christensen

Enclosed is v9.

- code style consistency (FPI instead of FPW) internally.
- cleanup of no-longer needed checksum-related pieces from code and tests.
- test cleanup/simplification.
- other comment cleanup.

Passes all CI checks.

Best,

David


v9-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-str.patch
Description: Binary data

Standardizing how pg_waldump presents recovery conflict XID cutoffs

2022-11-15 Thread Peter Geoghegan

Most recovery conflicts are generated in REDO routines using a
standard approach these days: they all call
ResolveRecoveryConflictWithSnapshot() with a latestRemovedXid argument
taken directly from the WAL record. Right now we don't quite present
this information in a uniform way, even though REDO routines apply the
cutoffs in a uniform way.

ISTM that there is value consistently using the same symbol names for
these cutoffs in every WAL record that has such a cutoff. The REDO
routine doesn't care about how the cutoff was generated during
original execution anyway -- it is always the responsibility of code
that runs during original execution (details of which will vary by
record type). Consistency makes all of this fairly explicit, and makes
it easier to use tools like pg_waldump to debug recovery conflicts --
the user can grep for the same generic symbol name and see everything.

Attached WIP patch brings heapam's VISIBLE record type and SP-GiST's
VACUUM_REDIRECT record type in line with this convention. It also
changes the symbol name from latestRemovedXid to something more
general: latestCommittedXid (since many of these WAL records don't
actually remove anything).

The patch also documents how these cutoffs are supposed to work at a
high level. We got the details slightly wrong (resulting in false
conflicts) for several years with FREEZE_PAGE records (see bugfix
commit 66fbcb0d2e for details), which seems like a relatively easy
mistake to make -- so we should try to avoid similar mistakes in the
future.

I'm not necessarily that attached to the name latestCommittedXid. It
is more accurate, but it's also a little bit too similar to another
common XID symbol name, latestCompletedXid. Can anyone suggest an
alternative?

-- 
Peter Geoghegan


v1-0001-Standardize-rmgrdesc-recovery-conflict-XID-output.patch
Description: Binary data

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-15 Thread David Christensen

On Tue, Nov 15, 2022 at 4:41 AM Bharath Rupireddy
 wrote:
>
> On Tue, Nov 15, 2022 at 1:29 AM David Christensen
>  wrote:
> >
> > Enclosed is v8, which uses the replication slot method to retain WAL
> > as well as fsync'ing the output directory when everything is done.
>
> Thanks. It mostly is in good shape. However, few more comments:
>
> 1.
> +if it does not exist.  The images saved will be subject to the same
> +filtering and limiting criteria as display records, but in this
> +mode pg_waldump will not output any other
> +information.
> May I know what's the intention of the statement 'The images saved
> '? If it's not necessary and convey anything useful to the user,
> can we remove it?

Basically I mean if you're limiting to a specific relation or rmgr
type, etc, it only saves those FPIs.  (So filtering is applied first
before considering whether to save the FPI or not.)

> 2.
> +#include "storage/checksum.h"
> +#include "storage/checksum_impl.h"
> I think we don't need the above includes as we got rid of verifying
> page checksums. The patch compiles without them for me.

Good catch.

> 3.
> +char   *save_fpw_path;
> Can we rename the above variable to save_fpi_path, just to be in sync
> with what we expose to the user, the option name 'save-fpi'?

Sure.

> 4.
> +if (config.save_fpw_path != NULL)
> +{
> +/* Fsync our output directory */
> +fsync_fname(config.save_fpw_path, true);
> +}
> I guess adding a comment there as to why we aren't fsyncing for every
> file that gets created, but once per the directory at the end. That'd
> help clarify doubts that other members might get while looking at the
> code.

Can do.

> 5.
> +if (config.save_fpw_path != NULL)
> +{
> +/* Fsync our output directory */
> +fsync_fname(config.save_fpw_path, true);
> +}
> So, are we sure that we don't want to fsync for time_to_stop exit(0)
> cases, say when CTRL+C'ed. Looks like we handle time_to_stop safely
> meaning exiting with return code 0, shouldn't we fsync the directory?

We can. Like I've said before, since these aren't production parts of
the cluster I don't personally have much of an opinion if fsync() is
appropriate at all, so don't have strong feelings here.

> 6.
> +else if (config.save_fpw_path)
> Let's use the same convention to check non-NULLness,
> config.save_fpw_path != NULL.

Good catch.

> 7.
> +CHECKPOINT;
> +SELECT pg_switch_wal();
> +UPDATE test_table SET a = a + 1;
> +SELECT pg_switch_wal();
> I don't think switching WAL after checkpoint is necessary here,
> because the checkpoint ensures all the WAL gets flushed to disk.
> Please remove it.

The point is to ensure we have a clean WAL segment that we know will
contain the relation we are filtering by.  Will test if this still
holds without the extra pg_switch_wal(), but that's the rationale.

> PS: I've seen the following code:
> +my $walfile = [sort { $a <=> $b } glob("$waldir/00*")]->[1]; # we
> want the second WAL file, which will be a complete WAL file with
> full-page writes for our specific relation.

I don't understand the question.

> 8.
> +$node->safe_psql('postgres', < +EOF
> Why EOF is used here? Can't we do something like below to execute
> multiple statements?
> $node->safe_psql(
> 'postgres', qq[
> SELECT data FROM pg_logical_slot_get_changes('regression_slot1', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> SELECT data FROM pg_logical_slot_get_changes('regression_slot2', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> SELECT data FROM pg_logical_slot_get_changes('regression_slot3', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> SELECT data FROM pg_logical_slot_get_changes('regression_slot4', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> ]);
>
> Same here:
> +$node->safe_psql('postgres', < +SELECT pg_drop_replication_slot('regress_pg_waldump_slot');
> +EOQ

As a long-time perl programmer, heredocs seem more natural and easier
to read rather than a string that accomplishes the same function.  If
there is an established project style I'll stick with it, but it just
rolled out that way. :-)

> 9.
> +my $walfile = [sort { $a <=> $b } glob("

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-15 Thread Bharath Rupireddy

On Tue, Nov 15, 2022 at 1:29 AM David Christensen
 wrote:
>
> Enclosed is v8, which uses the replication slot method to retain WAL
> as well as fsync'ing the output directory when everything is done.

Thanks. It mostly is in good shape. However, few more comments:

1.
+if it does not exist.  The images saved will be subject to the same
+filtering and limiting criteria as display records, but in this
+mode pg_waldump will not output any other
+information.
May I know what's the intention of the statement 'The images saved
'? If it's not necessary and convey anything useful to the user,
can we remove it?

2.
+#include "storage/checksum.h"
+#include "storage/checksum_impl.h"
I think we don't need the above includes as we got rid of verifying
page checksums. The patch compiles without them for me.

3.
+char   *save_fpw_path;
Can we rename the above variable to save_fpi_path, just to be in sync
with what we expose to the user, the option name 'save-fpi'?

4.
+if (config.save_fpw_path != NULL)
+{
+/* Fsync our output directory */
+fsync_fname(config.save_fpw_path, true);
+}
I guess adding a comment there as to why we aren't fsyncing for every
file that gets created, but once per the directory at the end. That'd
help clarify doubts that other members might get while looking at the
code.

5.
+if (config.save_fpw_path != NULL)
+{
+/* Fsync our output directory */
+fsync_fname(config.save_fpw_path, true);
+}
So, are we sure that we don't want to fsync for time_to_stop exit(0)
cases, say when CTRL+C'ed. Looks like we handle time_to_stop safely
meaning exiting with return code 0, shouldn't we fsync the directory?

6.
+else if (config.save_fpw_path)
Let's use the same convention to check non-NULLness,
config.save_fpw_path != NULL.

7.
+CHECKPOINT;
+SELECT pg_switch_wal();
+UPDATE test_table SET a = a + 1;
+SELECT pg_switch_wal();
I don't think switching WAL after checkpoint is necessary here,
because the checkpoint ensures all the WAL gets flushed to disk.
Please remove it.

PS: I've seen the following code:
+my $walfile = [sort { $a <=> $b } glob("$waldir/00*")]->[1]; # we
want the second WAL file, which will be a complete WAL file with
full-page writes for our specific relation.

8.
+$node->safe_psql('postgres', <safe_psql(
'postgres', qq[
SELECT data FROM pg_logical_slot_get_changes('regression_slot1', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
SELECT data FROM pg_logical_slot_get_changes('regression_slot2', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
SELECT data FROM pg_logical_slot_get_changes('regression_slot3', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
SELECT data FROM pg_logical_slot_get_changes('regression_slot4', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
]);

Same here:
+$node->safe_psql('postgres', < $b } glob("$waldir/00*")]->[1]; # we
want the second WAL file, which will be a complete WAL file with
full-page writes for our specific relation.
Is it guaranteed that just looking at the second WAL file in the
pg_wal directory assures WAL file with FPIs? I think we have to save
the WAL file that contains FPIs, that is the file after, CHECKPOINT,
UPDATE and pg_switch_wal. I think you can store output LSN of
pg_switch_wal

10.
+$node->safe_psql('postgres', <https://aws.amazon.com

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-14 Thread David Christensen

Enclosed is v8, which uses the replication slot method to retain WAL
as well as fsync'ing the output directory when everything is done.


v8-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-str.patch
Description: Binary data

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-14 Thread David Christensen

On Fri, Nov 11, 2022 at 4:57 AM Bharath Rupireddy
 wrote:
> > Well if it's not the same output then I guess you're right and there's
> > not a use for the `--fixup` mode.  By the same token, I'd say
> > calculating/setting the checksum also wouldn't need to be done, we
> > should just include the page as included in the WAL stream.
>
> Let's hear from others, we may be missing something here. I recommend
> keeping the --fixup patch as 0002, in case if we decide to discard
> it's easier, however I'll leave that to you.

I've whacked in `git` for now; I can resurrect if people consider it useful.

> > > > > 5.
> > > > > +if (!RestoreBlockImage(record, block_id, page))
> > > > > +continue;
> > > > > +
> > > > > +/* we have our extracted FPI, let's save it now */
> > > > > After extracting the page from the WAL record, do we need to perform a
> > > > > checksum on it?
> > >
> > > I think you just need to do the following, this will ensure the
> > > authenticity of the page that pg_waldump returns.
> > > if ((PageHeader) page)->pd_checksum != pg_checksum_page((char *) page, 
> > > blk))
> > > {
> > > pg_fatal("page checksum failed");
> > > }
> >
> > The WAL already has a checksum, so not certain this makes sense on its
> > own. Also I'm inclined to make it a warning if it doesn't match rather
> > than a fatal. (I'd also have to verify that the checksum is properly
> > set on the page prior to copying the FPI into WAL, which I'm pretty
> > sure it is but not certain.)
>
> How about having it as an Assert()?

Based on empirical testing, the checksums don't match, so
asserting/alerting on each block extracted seems next to useless, so
going to just remove that.

> > > 5.
> > > +fsync(fileno(OPF));
> > > +fclose(OPF);
> > > I think just the fsync() isn't enough, you still need fsync_fname()
> > > and/or fsync_parent_path(), perhaps after for (block_id = 0; block_id
> > > <= XLogRecMaxBlockId(record); block_id++) loop.
> >
> > I'm not sure I get the value of the fsyncs here; if you are using this
> > tool at this capacity you're by definition doing some sort of
> > transient investigative steps.  Since the WAL was fsync'd, you could
> > always rerun/recreate as needed in the unlikely event of an OS crash
> > in the middle of this investigation.  Since this is outside the
> > purview of the database operations proper (unlike, say, initdb) seems
> > like it's unnecessary (or definitely shouldn't need to be selectable).
> > My thoughts are that if we're going to fsync, just do the fsyncs
> > unconditionally rather than complicate the interface further.
>
> -1 for fysnc() per file created as it can create a lot of sync load on
> production servers impacting performance. How about just syncing the
> directory at the end assuming it doesn't cost as much as fsync() per
> FPI file created would?

I can fsync the dir if that's a useful compromise.

> > > 4.
> > > +$primary->append_conf('postgresql.conf', "wal_level='replica'");
> > > +$primary->append_conf('postgresql.conf', 'archive_mode=on');
> > > +$primary->append_conf('postgresql.conf', "archive_command='/bin/false'");
> > > Why do you need to set wal_level to replica, out of the box your
> > > cluster comes with replica only no?
> > > And why do you need archive_mode on and set the command to do nothing?
> > > Why archiving is needed for testing your feature firstly?
> >
> > I think it had shown "minimal" in my testing; I was purposefully
> > failing archives so the WAL would stick around.  Maybe a custom
> > archive command that just copied a single WAL file into a known
> > location so I could use that instead of the current approach would
> > work, though not sure how Windows support would work with that.  Open
> > to other ideas to more cleanly get a single WAL file that isn't the
> > last one.  (Earlier versions of this test were using /all/ of the
> > generated WAL files rather than a single one, so maybe I am
> > overcomplicating things for a single WAL file case.)
>
> Typically we create a physical replication slot at the beginning so
> that the server keeps the WAL required for you in pg_wal itself, for
> instance, please see pg_walinspect:
>
> -- Make sure checkpoints don't interfere with the test.
> SELECT 'init' FROM
> pg_create_physical_replication_slot('regress_pg_walinspect_slot',
> true, false);

Will see if I can get something like this to work; I'm currently
stopping the server before running the file-based tests, but I suppose
there's no reason to do so, so a temporary slot that holds it around
until the test is complete is probably fine.

David

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-11 Thread David Christensen

On Fri, Nov 11, 2022 at 8:15 AM Justin Pryzby  wrote:
>
> On Wed, Nov 09, 2022 at 02:37:29PM -0600, David Christensen wrote:
> > On Wed, Nov 9, 2022 at 2:08 PM David Christensen 
> >  wrote:
> > > Justin sez:
> > > > I was wondering if there's any reason to do "CREATE DATABASE".  The vast
> > > > majority of TAP tests don't.
> > > >
> > > > $ git grep -ho 'safe_psql[^ ]*' '*pl' |sort |uniq -c |sort -nr |head
> > > >1435 safe_psql('postgres',
> > > > 335 safe_psql(
> > > >  23 safe_psql($connect_db,
> > >
> > > If there was a reason, I don't recall offhand; I will test removing it
> > > and if things still work will consider it good enough.
> >
> > Things blew up when I did that; rather than hunt it down, I just left it 
> > in. :-)
>
> > +$primary->safe_psql('db1',  <
> It worked for me when I removed the 3 references to db1.
> That's good for efficiency of the test.

I did figure that out later; fixed in git.

> > +my $blocksize = 8192;
>
> I think this should be just "my $blocksize;" rather than setting a value
> which is later overwriten.

Yep.  Fixed in git.

David

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-11 Thread Justin Pryzby

On Wed, Nov 09, 2022 at 02:37:29PM -0600, David Christensen wrote:
> On Wed, Nov 9, 2022 at 2:08 PM David Christensen 
>  wrote:
> > Justin sez:
> > > I was wondering if there's any reason to do "CREATE DATABASE".  The vast
> > > majority of TAP tests don't.
> > >
> > > $ git grep -ho 'safe_psql[^ ]*' '*pl' |sort |uniq -c |sort -nr |head
> > >1435 safe_psql('postgres',
> > > 335 safe_psql(
> > >  23 safe_psql($connect_db,
> >
> > If there was a reason, I don't recall offhand; I will test removing it
> > and if things still work will consider it good enough.
> 
> Things blew up when I did that; rather than hunt it down, I just left it in. 
> :-)

> +$primary->safe_psql('db1',  < +my $blocksize = 8192;

I think this should be just "my $blocksize;" rather than setting a value
which is later overwriten.

-- 
Justin

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-11 Thread Bharath Rupireddy

On Thu, Nov 10, 2022 at 9:52 PM David Christensen
 wrote:
>
> > > > 2. I'm unable to understand the use-case for --fixup-fpi option.
> > > > pg_waldump is supposed to be just WAL reader, and must not return any
> > > > modified information, with --fixup-fpi option, the patch violates this
> > > > principle i.e. it sets page LSN and returns. Without actually
> > > > replaying the WAL record on the page, how is it correct to just set
> > > > the LSN? How will it be useful? ISTM, we must ignore this option
> > > > unless there's a strong use-case.
> > >
> > > How I was envisioning this was for cases like extreme surgery for
> > > corrupted pages, where you extract the page from WAL but it has lsn
> > > and checksum set so you could do something like `dd if=fixup-block
> > > of=relation ...`, so it *simulates* the replay of said fullpage blocks
> > > in cases where for some reason you can't play the intermediate
> > > records; since this is always a fullpage block, it's capturing what
> > > would be the snapshot so you could manually insert somewhere as needed
> > > without needing to replay (say if dealing with an incomplete or
> > > corrupted WAL stream).
> >
> > Recovery sets the page LSN after it replayed the WAL record on the
> > page right? Recovery does this - base_page/FPI +
> > apply_WAL_record_and_then_set_applied_WAL_record's_LSN =
> > new_version_of_page. Essentially, in your patch, you are just setting
> > the WAL record LSN with the page contents being the base page's. I'm
> > still not sure what's the real use-case here. We don't have an
> > independent function in postgres, given a base page and a WAL record
> > that just replays the WAL record and output's the new version of the
> > page, so I think what you do in the patch with fixup option seems
> > wrong to me.
>
> Well if it's not the same output then I guess you're right and there's
> not a use for the `--fixup` mode.  By the same token, I'd say
> calculating/setting the checksum also wouldn't need to be done, we
> should just include the page as included in the WAL stream.

Let's hear from others, we may be missing something here. I recommend
keeping the --fixup patch as 0002, in case if we decide to discard
it's easier, however I'll leave that to you.

> > > > 5.
> > > > +    if (!RestoreBlockImage(record, block_id, page))
> > > > +continue;
> > > > +
> > > > +/* we have our extracted FPI, let's save it now */
> > > > After extracting the page from the WAL record, do we need to perform a
> > > > checksum on it?
> >
> > I think you just need to do the following, this will ensure the
> > authenticity of the page that pg_waldump returns.
> > if ((PageHeader) page)->pd_checksum != pg_checksum_page((char *) page, blk))
> > {
> > pg_fatal("page checksum failed");
> > }
>
> The WAL already has a checksum, so not certain this makes sense on its
> own. Also I'm inclined to make it a warning if it doesn't match rather
> than a fatal. (I'd also have to verify that the checksum is properly
> set on the page prior to copying the FPI into WAL, which I'm pretty
> sure it is but not certain.)

How about having it as an Assert()?

> > 5.
> > +fsync(fileno(OPF));
> > +fclose(OPF);
> > I think just the fsync() isn't enough, you still need fsync_fname()
> > and/or fsync_parent_path(), perhaps after for (block_id = 0; block_id
> > <= XLogRecMaxBlockId(record); block_id++) loop.
>
> I'm not sure I get the value of the fsyncs here; if you are using this
> tool at this capacity you're by definition doing some sort of
> transient investigative steps.  Since the WAL was fsync'd, you could
> always rerun/recreate as needed in the unlikely event of an OS crash
> in the middle of this investigation.  Since this is outside the
> purview of the database operations proper (unlike, say, initdb) seems
> like it's unnecessary (or definitely shouldn't need to be selectable).
> My thoughts are that if we're going to fsync, just do the fsyncs
> unconditionally rather than complicate the interface further.

-1 for fysnc() per file created as it can create a lot of sync load on
production servers impacting performance. How about just syncing the
directory at the end assuming it doesn't cost as much as fsync() per
FPI file created would?

> > 4.
> > +$primary->append_conf('postgresql.co

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-10 Thread David Christensen

On Thu, Nov 10, 2022 at 2:33 AM Bharath Rupireddy
 wrote:
>
> On Thu, Nov 10, 2022 at 1:31 AM David Christensen
>  wrote:
> >
>
> Thanks for providing the v7 patch, please see my comments and responses below.

Hi Bharath, Thanks for the feedback.

> > > 2. I'm unable to understand the use-case for --fixup-fpi option.
> > > pg_waldump is supposed to be just WAL reader, and must not return any
> > > modified information, with --fixup-fpi option, the patch violates this
> > > principle i.e. it sets page LSN and returns. Without actually
> > > replaying the WAL record on the page, how is it correct to just set
> > > the LSN? How will it be useful? ISTM, we must ignore this option
> > > unless there's a strong use-case.
> >
> > How I was envisioning this was for cases like extreme surgery for
> > corrupted pages, where you extract the page from WAL but it has lsn
> > and checksum set so you could do something like `dd if=fixup-block
> > of=relation ...`, so it *simulates* the replay of said fullpage blocks
> > in cases where for some reason you can't play the intermediate
> > records; since this is always a fullpage block, it's capturing what
> > would be the snapshot so you could manually insert somewhere as needed
> > without needing to replay (say if dealing with an incomplete or
> > corrupted WAL stream).
>
> Recovery sets the page LSN after it replayed the WAL record on the
> page right? Recovery does this - base_page/FPI +
> apply_WAL_record_and_then_set_applied_WAL_record's_LSN =
> new_version_of_page. Essentially, in your patch, you are just setting
> the WAL record LSN with the page contents being the base page's. I'm
> still not sure what's the real use-case here. We don't have an
> independent function in postgres, given a base page and a WAL record
> that just replays the WAL record and output's the new version of the
> page, so I think what you do in the patch with fixup option seems
> wrong to me.

Well if it's not the same output then I guess you're right and there's
not a use for the `--fixup` mode.  By the same token, I'd say
calculating/setting the checksum also wouldn't need to be done, we
should just include the page as included in the WAL stream.

> > > 5.
> > > +if (!RestoreBlockImage(record, block_id, page))
> > > +continue;
> > > +
> > > +/* we have our extracted FPI, let's save it now */
> > > After extracting the page from the WAL record, do we need to perform a
> > > checksum on it?
>
> I think you just need to do the following, this will ensure the
> authenticity of the page that pg_waldump returns.
> if ((PageHeader) page)->pd_checksum != pg_checksum_page((char *) page, blk))
> {
> pg_fatal("page checksum failed");
> }

The WAL already has a checksum, so not certain this makes sense on its
own. Also I'm inclined to make it a warning if it doesn't match rather
than a fatal. (I'd also have to verify that the checksum is properly
set on the page prior to copying the FPI into WAL, which I'm pretty
sure it is but not certain.)

> > > case 'W':
> > > config.save_fpw_path = pg_strdup(optarg);
> > > case 'X':
> > >config.fixup_fpw = true;
> > >config.save_fpw_path = pg_strdup(optarg);
> >
> > Like separate opt processing with their own `break` statement?
> > Probably a bit more readable/conventional.
>
> Yes.

Moot with the removal of the --fixup mode.

> Some more comments:
>
> 1.
> +PGAlignedBlockzerobuf;
> Essentially, it's not a zero buffer, please rename the variable to
> something like 'buf' or 'page_buf' or someother?

Sure.

> 2.
> +if (pg_pwrite(fileno(OPF), page, BLCKSZ, 0) != BLCKSZ)
> Replace pg_pwrite with fwrite() and avoid fileno() system calls that
> should suffice here, AFICS, we don't need pg_pwrite.

Sure.

> 3.
> +if (config.save_fpw_path != NULL)
> +{
> +/* Create the dir if it doesn't exist */
> +if (pg_mkdir_p(config.save_fpw_path, pg_dir_create_mode) < 0)
> I think  you still need pg_check_dir() here, how about something like below?

I was assuming pg_mkdir_p() acted just like mkdir -p, where it's just
an idempotent action, so an existing dir is just treated the same.
What's the benefit here? Would assume if a non-dir file existed at
that path or other permissions issues arose we'd just get an error
from pg_mkdir_p().  (Will review the code there and confirm.)

> 4.
> +/* Create the dir if it doesn

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-10 Thread Bharath Rupireddy

On Thu, Nov 10, 2022 at 1:31 AM David Christensen
 wrote:
>

Thanks for providing the v7 patch, please see my comments and responses below.

> > 2. I'm unable to understand the use-case for --fixup-fpi option.
> > pg_waldump is supposed to be just WAL reader, and must not return any
> > modified information, with --fixup-fpi option, the patch violates this
> > principle i.e. it sets page LSN and returns. Without actually
> > replaying the WAL record on the page, how is it correct to just set
> > the LSN? How will it be useful? ISTM, we must ignore this option
> > unless there's a strong use-case.
>
> How I was envisioning this was for cases like extreme surgery for
> corrupted pages, where you extract the page from WAL but it has lsn
> and checksum set so you could do something like `dd if=fixup-block
> of=relation ...`, so it *simulates* the replay of said fullpage blocks
> in cases where for some reason you can't play the intermediate
> records; since this is always a fullpage block, it's capturing what
> would be the snapshot so you could manually insert somewhere as needed
> without needing to replay (say if dealing with an incomplete or
> corrupted WAL stream).

Recovery sets the page LSN after it replayed the WAL record on the
page right? Recovery does this - base_page/FPI +
apply_WAL_record_and_then_set_applied_WAL_record's_LSN =
new_version_of_page. Essentially, in your patch, you are just setting
the WAL record LSN with the page contents being the base page's. I'm
still not sure what's the real use-case here. We don't have an
independent function in postgres, given a base page and a WAL record
that just replays the WAL record and output's the new version of the
page, so I think what you do in the patch with fixup option seems
wrong to me.

> > 5.
> > +if (!RestoreBlockImage(record, block_id, page))
> > +continue;
> > +
> > +/* we have our extracted FPI, let's save it now */
> > After extracting the page from the WAL record, do we need to perform a
> > checksum on it?

I think you just need to do the following, this will ensure the
authenticity of the page that pg_waldump returns.
if ((PageHeader) page)->pd_checksum != pg_checksum_page((char *) page, blk))
{
pg_fatal("page checksum failed");
}

> > case 'W':
> > config.save_fpw_path = pg_strdup(optarg);
> > case 'X':
> >config.fixup_fpw = true;
> >config.save_fpw_path = pg_strdup(optarg);
>
> Like separate opt processing with their own `break` statement?
> Probably a bit more readable/conventional.

Yes.

Some more comments:

1.
+PGAlignedBlockzerobuf;
Essentially, it's not a zero buffer, please rename the variable to
something like 'buf' or 'page_buf' or someother?

2.
+if (pg_pwrite(fileno(OPF), page, BLCKSZ, 0) != BLCKSZ)
Replace pg_pwrite with fwrite() and avoid fileno() system calls that
should suffice here, AFICS, we don't need pg_pwrite.

3.
+if (config.save_fpw_path != NULL)
+{
+/* Create the dir if it doesn't exist */
+if (pg_mkdir_p(config.save_fpw_path, pg_dir_create_mode) < 0)
I think  you still need pg_check_dir() here, how about something like below?

if (pg_check_dir(config.save_fpw_path) == 0)
{
if (pg_mkdir_p(config.save_fpw_path, pg_dir_create_mode) < 0)
{
  /* error */
}
}

4.
+/* Create the dir if it doesn't exist */
+if (pg_mkdir_p(config.save_fpw_path, pg_dir_create_mode) < 0)
+{
+pg_log_error("could not create output directory \"%s\": %m",
+ config.save_fpw_path);
+goto bad_argument;
Why is the directory creation error a bad_argument? I think you need
just pg_fatal() here.

5.
+fsync(fileno(OPF));
+fclose(OPF);
I think just the fsync() isn't enough, you still need fsync_fname()
and/or fsync_parent_path(), perhaps after for (block_id = 0; block_id
<= XLogRecMaxBlockId(record); block_id++) loop.

6. Speaking of which, do we need to do fsync()'s optionally? If we
were to write many such FPI files, aren't there going to be more
fsync() calls and imagine this feature being used on a production
server and a lot of fsync() will definitely make running server
fsync() ops slower.  I think we need a new option whether pg_waldump
ever do fsync() or not, something similar to --no-sync of
pg_receivewal/pg_upgrade/pg_dump/pg_initdb/pg_checksums etc. I would
like it if the pg_waldump's --no-sync is coded as 0001 and 0002 can
make use of it.

7.
+pg_fatal("couldn't write out complete fullpage image to
file: %s", filename);
We typically use "full page image" in the output strings, pl

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-09 Thread David Christensen

On Wed, Nov 9, 2022 at 2:08 PM David Christensen
 wrote:
> Justin sez:
> > I was wondering if there's any reason to do "CREATE DATABASE".  The vast
> > majority of TAP tests don't.
> >
> > $ git grep -ho 'safe_psql[^ ]*' '*pl' |sort |uniq -c |sort -nr |head
> >1435 safe_psql('postgres',
> > 335 safe_psql(
> >  23 safe_psql($connect_db,
>
> If there was a reason, I don't recall offhand; I will test removing it
> and if things still work will consider it good enough.

Things blew up when I did that; rather than hunt it down, I just left it in. :-)

Enclosed is v7, with changes thus suggested thus far.


v7-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-str.patch
Description: Binary data

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-09 Thread David Christensen

> > 6.
> > +if (dir_status == 0 && mkdir(config.save_fpw_path, 0700) < 0)
> > Use pg_dir_create_mode instead of hard-coded 0007?
>
> I think I thought of that when I first looked at the patch ...  but, I'm
> not sure, since it says:
>
> src/include/common/file_perm.h-/* Modes for creating directories and files IN 
> THE DATA DIRECTORY */
> src/include/common/file_perm.h:extern PGDLLIMPORT int pg_dir_create_mode;

Looks like it's pretty evenly split in src/bin:

$ git grep -o -E -w '(pg_mkdir_p|mkdir)' '**.c' | sort | uniq -c
  3 initdb/initdb.c:mkdir
  3 initdb/initdb.c:pg_mkdir_p
  1 pg_basebackup/bbstreamer_file.c:mkdir
  2 pg_basebackup/pg_basebackup.c:pg_mkdir_p
  1 pg_dump/pg_backup_directory.c:mkdir
  1 pg_rewind/file_ops.c:mkdir
  4 pg_upgrade/pg_upgrade.c:mkdir

So if that is the preferred approach I'll go ahead and use it.

> I was wondering if there's any reason to do "CREATE DATABASE".  The vast
> majority of TAP tests don't.
>
> $ git grep -ho 'safe_psql[^ ]*' '*pl' |sort |uniq -c |sort -nr |head
>1435 safe_psql('postgres',
> 335 safe_psql(
>  23 safe_psql($connect_db,

If there was a reason, I don't recall offhand; I will test removing it
and if things still work will consider it good enough.

David

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-09 Thread David Christensen

On Wed, Nov 9, 2022 at 6:30 AM Bharath Rupireddy
 wrote:
>
> On Wed, Nov 9, 2022 at 5:08 AM David Christensen
>  wrote:
> >
> > Enclosed is v6, which squashes your refactor and adds the additional
> > recent suggestions.
>
> Thanks for working on this feature. Here are some comments for now. I
> haven't looked at the tests, I will take another look at the code and
> tests after these and all other comments are addressed.
>
> 1. For ease of review, please split the test patch to 0002.

Per later discussion seems like new feature tests are fine in the same
patch, yes?

> 2. I'm unable to understand the use-case for --fixup-fpi option.
> pg_waldump is supposed to be just WAL reader, and must not return any
> modified information, with --fixup-fpi option, the patch violates this
> principle i.e. it sets page LSN and returns. Without actually
> replaying the WAL record on the page, how is it correct to just set
> the LSN? How will it be useful? ISTM, we must ignore this option
> unless there's a strong use-case.

How I was envisioning this was for cases like extreme surgery for
corrupted pages, where you extract the page from WAL but it has lsn
and checksum set so you could do something like `dd if=fixup-block
of=relation ...`, so it *simulates* the replay of said fullpage blocks
in cases where for some reason you can't play the intermediate
records; since this is always a fullpage block, it's capturing what
would be the snapshot so you could manually insert somewhere as needed
without needing to replay (say if dealing with an incomplete or
corrupted WAL stream).

> 3.
> +if (fork >= 0 && fork <= MAX_FORKNUM)
> +{
> +if (fork)
> +sprintf(forkname, "_%s", forkNames[fork]);
> +else
> +forkname[0] = 0;
> +}
> +else
> +pg_fatal("found invalid fork number: %u", fork);
>
> Isn't the above complex? What's the problem with something like below?
> Why do we need if (fork) - else block?
>
> if (fork >= 0  && fork <= MAX_FORKNUM)
> sprintf(forkname, "_%s", forkNames[fork]);
> else
> pg_fatal("found invalid fork number: %u", fork);

This was to suppress any suffix for main forks, but yes, could
simplify and include the `_` in the suffix name.  Will include such a
change.

> 3.
> +charpage[BLCKSZ] = {0};
> I think when writing to a file, we need PGAlignedBlock rather than a
> simple char array of bytes, see the description around PGAlignedBlock
> for why it is so.

Easy enough change, and makes sense.

> 4.
> +if (pg_pwrite(fileno(OPF), page, BLCKSZ, 0) != BLCKSZ)
> Why pg_pwrite(), why not just fwrite()? If fwrite() is used, you can
> avoid fileno() system calls, no? Do you need the file position to
> remain the same after writing, hence pg_pwrite()?

I don't recall the original motivation, TBH.

> 5.
> +if (!RestoreBlockImage(record, block_id, page))
> +continue;
> +
> +/* we have our extracted FPI, let's save it now */
> After extracting the page from the WAL record, do we need to perform a
> checksum on it?

That is there in fixup mode (or should be).  Are you thinking this
should also be set if not in fixup mode?  That defeats the purpose of
the raw page extract, which is to see *exactly* what the WAL stream
has.

> 6.
> +if (dir_status == 0 && mkdir(config.save_fpw_path, 0700) < 0)
> Use pg_dir_create_mode instead of hard-coded 0007?

Sure.

> 7.
> +if (dir_status == 0 && mkdir(config.save_fpw_path, 0700) < 0)
> +fsync(fileno(OPF));
> +fclose(OPF);
> Since you're creating the directory in case it's not available, you
> need to fsync the directory too?

Sure.

> 8.
> +case 'W':
> +case 'X':
> +config.fixup_fpw = (option == 'X');
> +config.save_fpw_path = pg_strdup(optarg);
> +break;
> Just set  config.fixup_fpw = false before the switch block starts,
> like the other variables, and then perhaps doing like below is more
> readable:
> case 'W':
> config.save_fpw_path = pg_strdup(optarg);
> case 'X':
>config.fixup_fpw = true;
>config.save_fpw_path = pg_strdup(optarg);

Like separate opt processing with their own `break` statement?
Probably a bit more readable/conventional.

> 9.
> +if (dir_status == 0 && mkdir(config.save_fpw_path, 0700) < 0)
> Should we use pg_mkdir_p() instead of mkdir()?

Sure.

Thanks,

David

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-09 Thread Alvaro Herrera

On 2022-Nov-09, Justin Pryzby wrote:

> On Wed, Nov 09, 2022 at 06:00:40PM +0530, Bharath Rupireddy wrote:

> > 1. For ease of review, please split the test patch to 0002.
> 
> This is just my opinion, but .. why ?  Since it's easy to
> filter/skip/display a file, I don't think it's usually useful to have
> separate patches for tests or docs.

I concur with Justin.  When a patch is bugfix and a test is added that
verifies it, I like to keep the test in a separate commit (for submit
purposes and in my personal repo -- not for the official push!) so that
I can git-checkout to just the test and make sure it fails ahead of
pushing the fix commit.  But for a new feature, there's no reason to do
so.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-09 Thread Justin Pryzby

On Wed, Nov 09, 2022 at 06:00:40PM +0530, Bharath Rupireddy wrote:
> On Wed, Nov 9, 2022 at 5:08 AM David Christensen 
>  wrote:
> >
> > Enclosed is v6, which squashes your refactor and adds the additional
> > recent suggestions.
> 
> Thanks for working on this feature. Here are some comments for now. I
> haven't looked at the tests, I will take another look at the code and
> tests after these and all other comments are addressed.
> 
> 1. For ease of review, please split the test patch to 0002.

This is just my opinion, but .. why ?  Since it's easy to
filter/skip/display a file, I don't think it's usually useful to have
separate patches for tests or docs.

> 6.
> +if (dir_status == 0 && mkdir(config.save_fpw_path, 0700) < 0)
> Use pg_dir_create_mode instead of hard-coded 0007?

I think I thought of that when I first looked at the patch ...  but, I'm
not sure, since it says:

src/include/common/file_perm.h-/* Modes for creating directories and files IN 
THE DATA DIRECTORY */
src/include/common/file_perm.h:extern PGDLLIMPORT int pg_dir_create_mode;

I was wondering if there's any reason to do "CREATE DATABASE".  The vast
majority of TAP tests don't.

$ git grep -ho 'safe_psql[^ ]*' '*pl' |sort |uniq -c |sort -nr |head
   1435 safe_psql('postgres',
335 safe_psql(
 23 safe_psql($connect_db,

-- 
Justin

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-09 Thread Bharath Rupireddy

On Wed, Nov 9, 2022 at 5:08 AM David Christensen
 wrote:
>
> Enclosed is v6, which squashes your refactor and adds the additional
> recent suggestions.

Thanks for working on this feature. Here are some comments for now. I
haven't looked at the tests, I will take another look at the code and
tests after these and all other comments are addressed.

1. For ease of review, please split the test patch to 0002.

2. I'm unable to understand the use-case for --fixup-fpi option.
pg_waldump is supposed to be just WAL reader, and must not return any
modified information, with --fixup-fpi option, the patch violates this
principle i.e. it sets page LSN and returns. Without actually
replaying the WAL record on the page, how is it correct to just set
the LSN? How will it be useful? ISTM, we must ignore this option
unless there's a strong use-case.
3.
+if (fork >= 0 && fork <= MAX_FORKNUM)
+{
+if (fork)
+sprintf(forkname, "_%s", forkNames[fork]);
+else
+forkname[0] = 0;
+}
+else
+pg_fatal("found invalid fork number: %u", fork);

Isn't the above complex? What's the problem with something like below?
Why do we need if (fork) - else block?

if (fork >= 0  && fork <= MAX_FORKNUM)
sprintf(forkname, "_%s", forkNames[fork]);
else
pg_fatal("found invalid fork number: %u", fork);

3.
+charpage[BLCKSZ] = {0};
I think when writing to a file, we need PGAlignedBlock rather than a
simple char array of bytes, see the description around PGAlignedBlock
for why it is so.

4.
+if (pg_pwrite(fileno(OPF), page, BLCKSZ, 0) != BLCKSZ)
Why pg_pwrite(), why not just fwrite()? If fwrite() is used, you can
avoid fileno() system calls, no? Do you need the file position to
remain the same after writing, hence pg_pwrite()?

5.
+if (!RestoreBlockImage(record, block_id, page))
+continue;
+
+/* we have our extracted FPI, let's save it now */
After extracting the page from the WAL record, do we need to perform a
checksum on it?

6.
+if (dir_status == 0 && mkdir(config.save_fpw_path, 0700) < 0)
Use pg_dir_create_mode instead of hard-coded 0007?

7.
+if (dir_status == 0 && mkdir(config.save_fpw_path, 0700) < 0)
+fsync(fileno(OPF));
+fclose(OPF);
Since you're creating the directory in case it's not available, you
need to fsync the directory too?

8.
+case 'W':
+case 'X':
+config.fixup_fpw = (option == 'X');
+config.save_fpw_path = pg_strdup(optarg);
+break;
Just set  config.fixup_fpw = false before the switch block starts,
like the other variables, and then perhaps doing like below is more
readable:
case 'W':
config.save_fpw_path = pg_strdup(optarg);
case 'X':
   config.fixup_fpw = true;
   config.save_fpw_path = pg_strdup(optarg);

9.
+if (dir_status == 0 && mkdir(config.save_fpw_path, 0700) < 0)
Should we use pg_mkdir_p() instead of mkdir()?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-08 Thread sho kato

The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, failed
Spec compliant:   tested, failed
Documentation:tested, failed

Hello,

I tested this patch on Linux and there is no problem.
Also, I reviewed this patch and commented below.

@@ -439,6 +447,107 @@ XLogRecordHasFPW(XLogReaderState *record)
+   if (fork >= 0 && fork <= MAX_FORKNUM)
+   {
+   if (fork)
+   sprintf(forkname, "_%s", forkNames[fork]);
+   else
+   forkname[0] = 0;
+   }
+   else
+   pg_fatal("found invalid fork number: %u", fork);

Should we add the comment if the main fork is saved without "_main" suffix for 
code readability?

@@ -679,6 +788,9 @@ usage(void)
 " (default: 1 or the value used in 
STARTSEG)\n"));
printf(_("  -V, --version  output version information, then 
exit\n"));
printf(_("  -w, --fullpage only show records with a full page 
write\n"));
+   printf(_("  -W, --save-fpi=pathsave full page images to given path as\n"
+" LSN.T.D.R.B_F\n"));
+   printf(_("  -X, --fixup-fpi=path   like --save-fpi but apply LSN fixups to 
saved page\n"));
printf(_("  -x, --xid=XID  only show records with transaction ID 
XID\n"));
printf(_("  -z, --stats[=record]   show statistics instead of records\n"
 " (optionally, show per-record 
statistics)\n"));

Since lower-case options are displayed at the top, should we switch the order 
of -x and -X?

@@ -972,6 +1093,25 @@ main(int argc, char **argv)
}
}

+   int dir_status = pg_check_dir(config.save_fpw_path);
+
+   if (dir_status < 0)
+   {
+   pg_log_error("could not access output directory: %s", 
config.save_fpw_path);
+   goto bad_argument;
+   }

Should we output %s enclosed with \"?

Regards,
Sho Kato

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-08 Thread David Christensen

Enclosed is v6, which squashes your refactor and adds the additional
recent suggestions.

Thanks!


v6-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-str.patch
Description: Binary data

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-08 Thread David Christensen

On Tue, Nov 8, 2022 at 4:45 PM Justin Pryzby  wrote:
>
> On Mon, Nov 07, 2022 at 05:01:01PM -0600, David Christensen wrote:
> > Hi Justin et al,
> >
> > Enclosed is v5 of this patch which now passes the CirrusCI checks for
> > all supported OSes. I went ahead and reworked the test a bit so it's a
> > little more amenable to the OS-agnostic approach for testing.
>
> Great, thanks.
>
> This includes the changes that I'd started a few months ago.
> Plus adding the test which was missing for meson.

Cool, will review, thanks.

> +format: 
> LSN.TSOID.DBOID.RELNODE.BLKNO
>
> I'd prefer if the abbreviations were "reltablespace" and "datoid"

Sure, no issues there.

> Also, should the test case call pg_relation_filenode() rather than using
> relfilenode directly ?  Is it a problem that the test code assumes
> pagesize=8192 ?

Both good points. Is pagesize just exposed via
`current_setting('block_size')` or is there a different approach?

David

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-08 Thread Justin Pryzby

On Mon, Nov 07, 2022 at 05:01:01PM -0600, David Christensen wrote:
> Hi Justin et al,
> 
> Enclosed is v5 of this patch which now passes the CirrusCI checks for
> all supported OSes. I went ahead and reworked the test a bit so it's a
> little more amenable to the OS-agnostic approach for testing.

Great, thanks.

This includes the changes that I'd started a few months ago.
Plus adding the test which was missing for meson.

+format: 
LSN.TSOID.DBOID.RELNODE.BLKNO

I'd prefer if the abbreviations were "reltablespace" and "datoid"

Also, should the test case call pg_relation_filenode() rather than using
relfilenode directly ?  Is it a problem that the test code assumes
pagesize=8192 ?

-- 
Justin
>From d4cb23bc2f0edcc65b0bef75fd2e8d6e5aeda21f Mon Sep 17 00:00:00 2001
From: David Christensen 
Date: Wed, 20 Apr 2022 19:59:35 -0500
Subject: [PATCH 1/2] Teach pg_waldump to extract FPIs from the WAL stream

Extracts full-page images from the WAL stream into a target directory,
which must be empty or not exist.  These images are subject to the same
filtering rules as normal display in pg_waldump, which means that you
can isolate the full page writes to a target relation, among other
things.

Files are saved with the filename:  with
formatting to make things somewhat sortable; for instance:

-01C0.1663.1.6117.0
-01000150.1664.0.6115.0
-010001E0.1664.0.6114.0
-01000270.1663.1.6116.0
-01000300.1663.1.6113.0
-01000390.1663.1.6112.0
-01000420.1663.1.8903.0
-010004B0.1663.1.8902.0
-01000540.1663.1.6111.0
-010005D0.1663.1.6110.0

If the FPI comes from a fork other than the main fork, the fork name
will be appended on the output file name; e.g.:

-014A4758.1663.1.12864.0_vm

It's noteworthy that the raw block images do not have the current LSN
stored with them in the WAL stream (as would be true for on-heap
versions of the blocks), nor would the checksum be updated in
them (though WAL itself has checksums, so there is some protection
there).  As such there are two versions of this functionality, one which
returns the raw page as it appears in the WAL (--save-fpi) and one
which applies the updated pd_lsn and pd_checksum (--fixup-fpi).

These images could be loaded/inspected via `pg_read_binary_file()` and
used in the `pageinspect` suite of tools to perform detailed analysis on
the pages in question, based on historical information, and may come in
handy for forensics work.
---
 doc/src/sgml/ref/pg_waldump.sgml  |  79 +++
 src/bin/pg_waldump/pg_waldump.c   | 151 +-
 src/bin/pg_waldump/t/002_save_fullpage.pl | 137 
 3 files changed, 366 insertions(+), 1 deletion(-)
 create mode 100644 src/bin/pg_waldump/t/002_save_fullpage.pl

diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index d559f091e5f..cc3ce918905 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -240,6 +240,85 @@ PostgreSQL documentation

  
 
+ 
+   -W save_path
+   --save-fpi=save_path
+   -X save_path
+   --fixup-fpi=save_path
+   
+   
+Save full page images seen in the WAL stream to the
+save_path directory, which will be created
+if it does not exist.  The images saved will be subject to the same
+filtering and limiting criteria as display records, but in this
+mode pg_waldump will not output any other
+information.
+   
+   
+If invoked using
+the -X/--fixup-fpi
+option, this page image will include the pd_lsn of
+the replayed record rather than the raw page image; as well,
+the pd_checksum field will be updated if it had
+previously existed.
+   
+   
+The page images will be saved with the file
+format: LSN.TSOID.DBOID.RELNODE.BLKNOFORK
+
+The dot-separated components are (in order):
+
+
+ 
+  
+   
+Component
+Description
+   
+  
+
+  
+   
+LSN
+The LSN of the record with this block, formatted
+ as two 8-character hexadecimal numbers %08X-%08X
+   
+
+   
+TSOID
+tablespace OID for the block
+   
+
+   
+DBOID
+database OID for the block
+   
+
+   
+RELNODE
+relnode id for the block
+   
+
+   
+BLKNO
+the block number of this block
+   
+
+   
+FORK
+
+ if coming from the main fork, will be empty, otherwise will be
+ one of _fsm, _vm,
+ or _init.
+
+   
+  
+ 
+
+   
+   
+ 
+

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-07 Thread David Christensen

Hi Justin et al,

Enclosed is v5 of this patch which now passes the CirrusCI checks for
all supported OSes. I went ahead and reworked the test a bit so it's a
little more amenable to the OS-agnostic approach for testing.

Best,

David


v5-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-str.patch
Description: Binary data

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-04 Thread David Christensen

On Fri, Nov 4, 2022 at 1:38 PM Justin Pryzby  wrote:
> > > As I recall, that's due to relying on "cp".  And "rsync", which
> > > shouldn't be assumed to exist by regression tests).

Will poke around other TAP tests to see if there's a more consistent
interface, what perl version we can assume and available modules, etc.
If there's not some trivial wrapper at this point so all TAP tests
could use it regardless of OS, it would definitely be good to
introduce such a method.

> > I will work on supporting the windows compatibility here. Is there some 
> > list of guidelines for what you can and can’t use? I don’t have a windows 
> > machine available to develop on.
>
> I think a lot (most?) developers here don't have a windows environment
> available, so now have been using cirrusci's tests to verify.  If you
> haven't used cirrusci directly (not via cfbot) before, start at:
> src/tools/ci/README

Thanks, good starting point.

> > Was it failing on windows? I was attempting to skip it as I recall.
>
> I don't see anything about skipping, and cirrus's logs from 2
> commitfests ago were pruned.  I looked at this patch earlier this year,
> but never got around to replacing the calls to rsync and cp.

Ah, it's skipped (not fixed) in my git repo, but never got around to
submitting that version through email.  That explains it.

David

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-04 Thread Justin Pryzby

On Fri, Nov 04, 2022 at 09:16:29AM -0500, David Christensen wrote:
> On Nov 4, 2022, at 9:02 AM, Justin Pryzby  wrote:
> > On Fri, Nov 04, 2022 at 11:52:59AM +0900, Ian Lawrence Barwick wrote:
> >> 2022年5月3日(火) 8:45 David Christensen :
> >>> 
> >>> ...and pushing a couple fixups pointed out by cfbot, so here's v4.
> >> 
> >> cfbot reports the patch no longer applies [1]. As CommitFest 2022-11 is
> >> currently underway, this would be an excellent time to update the patch.
> > 
> > More important than needing to be rebased, the patch has never passed
> > its current tests on windows.
> > 
> > As I recall, that's due to relying on "cp".  And "rsync", which
> > shouldn't be assumed to exist by regression tests).
> 
> I will work on supporting the windows compatibility here. Is there some list 
> of guidelines for what you can and can’t use? I don’t have a windows machine 
> available to develop on. 

I think a lot (most?) developers here don't have a windows environment
available, so now have been using cirrusci's tests to verify.  If you
haven't used cirrusci directly (not via cfbot) before, start at:
src/tools/ci/README

There's not much assumed about the build environment, and not much more
assumed about the test environment.  Most of the portability is handled
by using C and perl.  I think there's even no assumption that "tar" is
available (except maybe for building releases).  This patch should avoid
relying on tools that aren't already required.

As a practical matter, cfbot needs to pass, not only to demonstrate that
the patch consistently passes tests, but also because if the patch were
merged while it failed tests in cfbot, it would cause every other patch
to start to fail, too.

> Was it failing on windows? I was attempting to skip it as I recall. 

I don't see anything about skipping, and cirrus's logs from 2
commitfests ago were pruned.  I looked at this patch earlier this year,
but never got around to replacing the calls to rsync and cp.

-- 
Justin

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-04 Thread David Christensen

On Nov 4, 2022, at 9:02 AM, Justin Pryzby  wrote:
> 
> On Fri, Nov 04, 2022 at 11:52:59AM +0900, Ian Lawrence Barwick wrote:
>> 2022年5月3日(火) 8:45 David Christensen :
>>> 
>>> ...and pushing a couple fixups pointed out by cfbot, so here's v4.
>> 
>> cfbot reports the patch no longer applies [1]. As CommitFest 2022-11 is
>> currently underway, this would be an excellent time to update the patch.
> 
> More important than needing to be rebased, the patch has never passed
> its current tests on windows.
> 
> As I recall, that's due to relying on "cp".  And "rsync", which
> shouldn't be assumed to exist by regression tests).

I will work on supporting the windows compatibility here. Is there some list of 
guidelines for what you can and can’t use? I don’t have a windows machine 
available to develop on. 

Was it failing on windows? I was attempting to skip it as I recall. 

Best,

David

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-04 Thread Justin Pryzby

On Fri, Nov 04, 2022 at 11:52:59AM +0900, Ian Lawrence Barwick wrote:
> 2022年5月3日(火) 8:45 David Christensen :
> >
> > ...and pushing a couple fixups pointed out by cfbot, so here's v4.
> 
> cfbot reports the patch no longer applies [1]. As CommitFest 2022-11 is
> currently underway, this would be an excellent time to update the patch.

More important than needing to be rebased, the patch has never passed
its current tests on windows.

As I recall, that's due to relying on "cp".  And "rsync", which
shouldn't be assumed to exist by regression tests).

https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/38/3628

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-11-03 Thread Ian Lawrence Barwick

2022年5月3日(火) 8:45 David Christensen :
>
> ...and pushing a couple fixups pointed out by cfbot, so here's v4.

Hi

cfbot reports the patch no longer applies [1]. As CommitFest 2022-11 is
currently underway, this would be an excellent time to update the patch.

[1] http://cfbot.cputube.org/patch_40_3628.log

Thanks

Ian Barwick

Re: pg_waldump: add test for coverage

2022-09-22 Thread Andres Freund

Hi,

On 2022-08-23 10:50:08 +0900, Dong Wook Lee wrote:
> I wrote a test for coverage.

Unfortunately the test doesn't seem to pass on windows, and hasn't ever done so:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/39/3834

Due to the merge of the meson patchset, you should also add 001_basic.pl to
the list of tests in meson.build

Greetings,

Andres Freund

Re: pg_waldump: add test for coverage

2022-09-05 Thread Peter Eisentraut


On 23.08.22 03:50, Dong Wook Lee wrote:

Hi Hackers,
I wrote a test for coverage.
Unfortunately, it seems to take quite a while to run the test.
I want to improve these execution times, but I don't know exactly what to do.
Therefore, I want to hear feedback from many people.


I don't find these tests to be particularly slow.  How long do they take 
for you to run?


A couple of tips:

- You should give each test a name.  That's why each test function has a 
(usually) last argument that takes a string.


- You could use command_like() to run a command and check that it exits 
successfully and check its standard out.  For example, instead of


# test pg_waldump with -F (main)
IPC::Run::run [ 'pg_waldump', "$wal_dump_path", '-F', 'main' ], '>', 
\$stdout, '2>', \$stderr;

isnt($stdout, '', "");

it is better to write

command_like([ 'pg_waldump', "$wal_dump_path", '-F', 'main' ],
 qr/TODO/, 'test -F (main)');

- It would be useful to test the actual output (that is, fill in the 
TODO above).  I don't know what the best way to do that is -- that is 
part of designing these tests.


Also,

- Your patch introduces a spurious blank line at the end of the test file.

- For portability, options must be before non-option arguments.  So 
instead of


[ 'pg_waldump', "$wal_dump_path", '-F', 'main' ]

it should be

[ 'pg_waldump', '-F', 'main', "$wal_dump_path" ]


I think having some more test coverage for pg_waldump would be good, so 
I encourage you to continue working on this.

pg_waldump: add test for coverage

2022-08-22 Thread Dong Wook Lee

Hi Hackers,
I wrote a test for coverage.
Unfortunately, it seems to take quite a while to run the test.
I want to improve these execution times, but I don't know exactly what to do.
Therefore, I want to hear feedback from many people.
---
Regards,
Dong Wook Lee


v1_add_test_pg_waldump.patch
Description: Binary data

Re: pg_waldump got an error with waldump file generated by initdb

2022-07-10 Thread Justin Pryzby

On Sun, Jul 10, 2022 at 02:39:00PM -0700, Andres Freund wrote:
> On 2022-07-10 21:51:04 +0900, Dong Wook Lee wrote:
> > I don't know if this is an error.
> > when I do ./initdb -D ../data and execute pg_waldump like this, In the last 
> > part I got an error.
> > 
> > ```
> > ./pg_waldump ../data/pg_wal/00010001
> > ```
> > 
> > pg_waldump: error: error in WAL record at 0/140: invalid record length 
> > at 0/1499A08: wanted 24, got 0
> > 
> > my environment is `16devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 
> > 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit`
> > Is this normal?
> 
> Yes, that's likely normal - i.e. pg_waldump has reached the point at which the
> WAL ends.

It's the issue that's fixed by this patch.

38/2490 Make message at end-of-recovery less scary  Kyotaro 
Horiguchi
https://commitfest.postgresql.org/38/2490/

Re: pg_waldump got an error with waldump file generated by initdb

2022-07-10 Thread Andres Freund



Hi,

On 2022-07-10 21:51:04 +0900, Dong Wook Lee wrote:
> I don't know if this is an error.
> when I do ./initdb -D ../data and execute pg_waldump like this, In the last 
> part I got an error.
> 
> ```
> ./pg_waldump ../data/pg_wal/00010001
> ```
> 
> pg_waldump: error: error in WAL record at 0/140: invalid record length at 
> 0/1499A08: wanted 24, got 0
> 
> my environment is `16devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 
> 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit`
> Is this normal?

Yes, that's likely normal - i.e. pg_waldump has reached the point at which the
WAL ends.

Greetings,

Andres Freund

pg_waldump got an error with waldump file generated by initdb

2022-07-10 Thread Dong Wook Lee

Hi, hackers

I don't know if this is an error.
when I do ./initdb -D ../data and execute pg_waldump like this, In the last 
part I got an error.

```
./pg_waldump ../data/pg_wal/00010001
```

pg_waldump: error: error in WAL record at 0/140: invalid record length at 
0/1499A08: wanted 24, got 0

my environment is `16devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 
9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit`
Is this normal?

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-06-14 Thread Michael Paquier

On Tue, May 03, 2022 at 10:34:41AM +0200, Alvaro Herrera wrote:
> I remember Greg Mullane posted a tool that attempts to correct page CRC
> mismatches[1].  This new tool might be useful to feed healing attempts,
> too.  (It's of course not in any way a *solution*, because the page
> might have been modified by other WAL records since the last FPI, but it
> could be a start towards building a solution that scoops page contents
> from WAL.)
> 
> [1] https://github.com/turnstep/pg_healer

Fun.  Thanks for mentioning that.
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-05-03 Thread Alvaro Herrera

On 2022-Apr-27, Michael Paquier wrote:

> On Tue, Apr 26, 2022 at 01:15:05PM -0500, David Christensen wrote:
> > True. :-) This does seem like a tool geared towards "expert mode", so
> > maybe we just assume if you need it you know what you're doing?
> 
> This is definitely an expert mode toy.

I remember Greg Mullane posted a tool that attempts to correct page CRC
mismatches[1].  This new tool might be useful to feed healing attempts,
too.  (It's of course not in any way a *solution*, because the page
might have been modified by other WAL records since the last FPI, but it
could be a start towards building a solution that scoops page contents
from WAL.)

[1] https://github.com/turnstep/pg_healer

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
Subversion to GIT: the shortest path to happiness I've ever heard of
(Alexey Klyukin)

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-05-02 Thread David Christensen

...and pushing a couple fixups pointed out by cfbot, so here's v4.


On Mon, May 2, 2022 at 8:42 AM David Christensen
 wrote:
>
> Enclosed is v3 of this patch; this adds two modes for this feature,
> one with the raw page `--save-fullpage/-W` and one with the
> LSN+checksum fixups `--save-fullpage-fixup/-X`.
>
> I've added at least some basic sanity-checking of the underlying
> feature, as well as run the test file and the changes to pg_waldump.c
> through pgindent/perltidy to make them adhere to project standards.
> Threw in a rebase as well.
>
> Would appreciate any additional feedback here.
>
> Best,
>
> David


v4-pg_waldump-save-fpi.patch
Description: Binary data

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-05-02 Thread David Christensen

Enclosed is v3 of this patch; this adds two modes for this feature,
one with the raw page `--save-fullpage/-W` and one with the
LSN+checksum fixups `--save-fullpage-fixup/-X`.

I've added at least some basic sanity-checking of the underlying
feature, as well as run the test file and the changes to pg_waldump.c
through pgindent/perltidy to make them adhere to project standards.
Threw in a rebase as well.

Would appreciate any additional feedback here.

Best,

David


v3-pg_waldump-save-fpi.patch
Description: Binary data

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-04-26 Thread Michael Paquier

On Tue, Apr 26, 2022 at 01:15:05PM -0500, David Christensen wrote:
> True. :-) This does seem like a tool geared towards "expert mode", so
> maybe we just assume if you need it you know what you're doing?

This is definitely an expert mode toy.
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-04-26 Thread David Christensen

On Mon, Apr 25, 2022 at 9:42 PM Michael Paquier  wrote:
> On Mon, Apr 25, 2022 at 10:24:52AM -0500, David Christensen wrote:
> > On Mon, Apr 25, 2022 at 6:03 AM Bharath Rupireddy
> >  wrote:
> >> Thanks for working on this. I'm just thinking if we can use these FPIs
> >> to repair the corrupted pages? I would like to understand more
> >> detailed usages of the FPIs other than inspecting with pageinspect.
> >
> > My main use case was for being able to look at potential corruption,
> > either in the WAL stream, on heap, or in tools associated with the WAL
> > stream.  I suppose you could use the page images to replace corrupted
> > on-disk pages (and in fact I think I've heard of a tool or two that
> > try to do that), though don't know that I consider this the primary
> > purpose (and having toast tables and the list, as well as clog would
> > make it potentially hard to just drop-in a page version without
> > issues).  Might help in extreme situations though.
>
> You could do a bunch of things with those images, even make things
> worse if you are not careful enough.

True. :-) This does seem like a tool geared towards "expert mode", so
maybe we just assume if you need it you know what you're doing?

> >> 10) Along with pg_pwrite(), can we also fsync the files (of course
> >> users can choose it optionally) so that the writes will be durable for
> >> the OS crashes?
> >
> > Can add; you thinking a separate flag to disable this with default true?
>
> We expect data generated by tools like pg_dump, pg_receivewal
> (depending on the use --synchronous) or pg_basebackup to be consistent
> when we exit from the call.  FWIW, flushing this data does not seem
> like a strong requirement for something aimed at being used page-level
> chirurgy or lookups, because the WAL segments should still be around
> even if the host holding the archives is unplugged.

I have added the fsync to the latest path (forthcoming), but I have no
strong preferences here as to the correct/expected behavior.

Best,

David

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-04-26 Thread David Christensen

On Mon, Apr 25, 2022 at 9:54 PM Michael Paquier  wrote:
> On Mon, Apr 25, 2022 at 10:11:10AM -0500, David Christensen wrote:
> > On Mon, Apr 25, 2022 at 1:11 AM Michael Paquier  wrote:
> >> I don't think that there is any need to rely on a new logic if there
> >> is already some code in place able to do the same work.  See
> >> verify_dir_is_empty_or_create() in pg_basebackup.c, as one example,
> >> that relies on pg_check_dir().  I think that you'd better rely at
> >> least on what pgcheckdir.c offers.
> >
> > Yeah, though I am tending towards what another user had suggested and
> > just accepting an existing directory rather than requiring it to be
> > empty, so thinking I might just skip this one. (Will review the file
> > you've pointed out to see if there is a relevant function though.)
>
> OK.  FWIW, pg_check_dir() is used in initdb and pg_basebackup because
> these care about the behavior to use when specifying a target path.
> You could reuse it, but use a different policy depending on its
> returned result for the needs of what you see fit in this case.

I have a new version of the patch (pending tests) that uses
pg_check_dir's return value to handle things appropriately, so at
least some code reuse now.  It did end up simplifying a lot.

> >> +   PageSetLSN(page, record->ReadRecPtr);
> >> +   /* if checksum field is non-zero then we have 
> >> checksums enabled,
> >> +* so recalculate the checksum with new LSN (yes, this 
> >> is a hack)
> >> +*/
> >> Yeah, that looks like a hack, but putting in place a page on a cluster
> >> that has checksums enabled would be more annoying with
> >> zero_damaged_pages enabled if we don't do that, so that's fine by me
> >> as-is.  Perhaps you should mention that FPWs don't have their
> >> pd_checksum updated when written.
> >
> > Can make a mention; you thinking just a comment in the code is
> > sufficient, or want there to be user-visible docs as well?
>
> I was thinking about a comment, at least.

New patch version has significantly more comments.

> > This was to make the page as extracted equivalent to the effect of
> > applying the WAL record block on replay (the LSN and checksum both);
> > since recovery calls this function to mark the LSN where the page came
> > from this is why I did this as well.  (I do see perhaps a case for
> > --raw output that doesn't munge the page whatsoever, just made
> > comparisons easier.)
>
> Hm.  Okay.  The argument goes both ways, I guess, depending on what we
> want to do with those raw pages.  Still you should not need pd_lsn if
> the point is to be able to stick the pages back in place to attempt to
> get back as much data as possible when loading it back to the shared
> buffers?

Yeah, I can see that too; I think there's at least enough of an
argument for a flag to apply the fixups or just extract only the raw
page pre-modification.  Not sure which should be the "default"
behavior; either `--raw` or `--fixup-metadata` or something could
work. (Naming suggestions welcomed.)

David

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-04-25 Thread Michael Paquier

On Mon, Apr 25, 2022 at 10:11:10AM -0500, David Christensen wrote:
> On Mon, Apr 25, 2022 at 1:11 AM Michael Paquier  wrote:
>> I don't think that there is any need to rely on a new logic if there
>> is already some code in place able to do the same work.  See
>> verify_dir_is_empty_or_create() in pg_basebackup.c, as one example,
>> that relies on pg_check_dir().  I think that you'd better rely at
>> least on what pgcheckdir.c offers.
> 
> Yeah, though I am tending towards what another user had suggested and
> just accepting an existing directory rather than requiring it to be
> empty, so thinking I might just skip this one. (Will review the file
> you've pointed out to see if there is a relevant function though.)

OK.  FWIW, pg_check_dir() is used in initdb and pg_basebackup because
these care about the behavior to use when specifying a target path.
You could reuse it, but use a different policy depending on its
returned result for the needs of what you see fit in this case.

>> +   PageSetLSN(page, record->ReadRecPtr);
>> +   /* if checksum field is non-zero then we have checksums 
>> enabled,
>> +* so recalculate the checksum with new LSN (yes, this 
>> is a hack)
>> +*/
>> Yeah, that looks like a hack, but putting in place a page on a cluster
>> that has checksums enabled would be more annoying with
>> zero_damaged_pages enabled if we don't do that, so that's fine by me
>> as-is.  Perhaps you should mention that FPWs don't have their
>> pd_checksum updated when written.
> 
> Can make a mention; you thinking just a comment in the code is
> sufficient, or want there to be user-visible docs as well?

I was thinking about a comment, at least.

> This was to make the page as extracted equivalent to the effect of
> applying the WAL record block on replay (the LSN and checksum both);
> since recovery calls this function to mark the LSN where the page came
> from this is why I did this as well.  (I do see perhaps a case for
> --raw output that doesn't munge the page whatsoever, just made
> comparisons easier.)

Hm.  Okay.  The argument goes both ways, I guess, depending on what we
want to do with those raw pages.  Still you should not need pd_lsn if
the point is to be able to stick the pages back in place to attempt to
get back as much data as possible when loading it back to the shared
buffers?
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-04-25 Thread Michael Paquier

On Mon, Apr 25, 2022 at 10:24:52AM -0500, David Christensen wrote:
> On Mon, Apr 25, 2022 at 6:03 AM Bharath Rupireddy
>  wrote:
>> Thanks for working on this. I'm just thinking if we can use these FPIs
>> to repair the corrupted pages? I would like to understand more
>> detailed usages of the FPIs other than inspecting with pageinspect.
> 
> My main use case was for being able to look at potential corruption,
> either in the WAL stream, on heap, or in tools associated with the WAL
> stream.  I suppose you could use the page images to replace corrupted
> on-disk pages (and in fact I think I've heard of a tool or two that
> try to do that), though don't know that I consider this the primary
> purpose (and having toast tables and the list, as well as clog would
> make it potentially hard to just drop-in a page version without
> issues).  Might help in extreme situations though.

You could do a bunch of things with those images, even make things
worse if you are not careful enough.

>> 10) Along with pg_pwrite(), can we also fsync the files (of course
>> users can choose it optionally) so that the writes will be durable for
>> the OS crashes?
> 
> Can add; you thinking a separate flag to disable this with default true?

We expect data generated by tools like pg_dump, pg_receivewal
(depending on the use --synchronous) or pg_basebackup to be consistent
when we exit from the call.  FWIW, flushing this data does not seem
like a strong requirement for something aimed at being used page-level
chirurgy or lookups, because the WAL segments should still be around
even if the host holding the archives is unplugged.
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-04-25 Thread David Christensen

On Mon, Apr 25, 2022 at 6:03 AM Bharath Rupireddy
 wrote:
> Thanks for working on this. I'm just thinking if we can use these FPIs
> to repair the corrupted pages? I would like to understand more
> detailed usages of the FPIs other than inspecting with pageinspect.

My main use case was for being able to look at potential corruption,
either in the WAL stream, on heap, or in tools associated with the WAL
stream.  I suppose you could use the page images to replace corrupted
on-disk pages (and in fact I think I've heard of a tool or two that
try to do that), though don't know that I consider this the primary
purpose (and having toast tables and the list, as well as clog would
make it potentially hard to just drop-in a page version without
issues).  Might help in extreme situations though.

> Given that others have realistic use-cases (of course I would like to
> know more about those), +1 for the idea. However, I would suggest
> adding a function to extract raw FPI data to the pg_walinspect
> extension that got recently committed in PG 15, the out of which can
> directly be fed to pageinspect functions or

Yeah, makes sense to have some overlap here; will review what is there
and see if there is some shared code base we can utilize.  (ISTR some
work towards getting these two tools using more of the same code, and
this seems like another such instance.)

> Few comments:
> 1) I think it's good to mention the stored file name format.
> + printf(_("  -W, --raw-fpi=path save found full page images to
> given path\n"));

+1, though I've also thought there could be uses to have multiple
possible output formats here (most immediately, there may be cases
where we want *each* FPI for a block vs the *latest*, so files name
with/without the LSN component seem the easiest way forward here).
That would introduce some additional complexity though, so might need
to see if others think that makes any sense.

> 2)
> + for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
> + {
> + /* we will now extract the fullpage image from the XLogRecord and save
> + * it to a calculated filename */
> +
> + if (XLogRecHasBlockImage(record, block_id))
>
> I think we need XLogRecHasBlockRef to be true to check
> XLogRecHasBlockImage otherwise, we will see some build farms failing,
> recently I've seen this failure for pg_walinspect..
>
> for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
> {
> if (!XLogRecHasBlockRef(record, block_id))
> continue;
>
> if (XLogRecHasBlockImage(record, block_id))
> *fpi_len += XLogRecGetBlock(record, block_id)->bimg_len;
> }

Good point; my previous patch that got committed here (127aea2a65)
probably also needed this treatment.

> 3) Please correct the commenting format:
> + /* we will now extract the fullpage image from the XLogRecord and save
> + * it to a calculated filename */

Ack.

> 4) Usually we start errors with lower case letters "could not ."
> + pg_fatal("Couldn't open file for output: %s", filename);
> + pg_fatal("Couldn't write out complete FPI to file: %s", filename);
> And the variable name too:
> + FILE *OPF;

Ack.

> 5) Not sure how the FPIs of TOASTed tables get stored, but it would be
> good to check.

What would be different here? Are there issues you can think of, or
just more from the pageinspect side of things?

> 6) Good to specify the known usages of FPIs in the documentation.

Ack. Prob good to get additional info/use cases from others, as mine
is fairly short. :-)

> 7) Isn't it good to emit an error if RestoreBlockImage returns false?
> + if (RestoreBlockImage(record, block_id, page))
> + {

Ack.

> 8) I think I don't mind if a non-empty directory is specified - IMO
> better usability is this - if the directory is non-empty, just go add
> the FPI files if FPI file exists just replace it, if the directory
> isn't existing, create and write the FPI files.
> + /* we accept an empty existing directory */
> + if (stat(config.save_fpw_path, &st) == 0 && S_ISDIR(st.st_mode))
> + {

Agreed; was mainly trying to prevent accidental expansion inside
`pg_wal` when an earlier version of the patch implied `.` as the
current dir with an optional path, but I've since made the path
non-optional and agree that this is unnecessarily restrictive.

>  9) Instead of following:
> + if (XLogRecordHasFPW(xlogreader_state))
> + XLogRecordSaveFPWs(xlogreader_state, config.save_fpw_path);
> I will just do this in XLogRecordSaveFPWs:
> for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
> {
> if (!XLogRecHasBlockRef(record, block_id))
> continue;
>
> if (XLogRecHasBlockImage(record, block_id))
> {
>
> }
> }

Yeah, a little redundant.

> 10) Along with pg_pwrite(), can we also fsync the files (of course
> users can choose it optionally) so that the writes will be durable for
> the OS crashes?

Can add; you thinking a separate flag to disable thi

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-04-25 Thread David Christensen

On Mon, Apr 25, 2022 at 2:00 AM Drouvot, Bertrand  wrote:
>
> Hi,
>
> On 4/25/22 8:11 AM, Michael Paquier wrote:
> > On Sat, Apr 23, 2022 at 01:43:36PM -0500, David Christensen wrote:
> >> Hi Matthias, great point.  Enclosed is a revised version of the patch
> >> that adds the fork identifier to the end if it's a non-main fork.
> > Like Alvaro, I have seen cases where this would have been really
> > handy.  So +1 from me, as well, to have more tooling like what you are
> > proposing.
>
> +1 on the idea.
>
> FWIW, there is an extension doing this [1] but having the feature
> included in pg_waldump would be great.

Cool, glad to see there is some interest; definitely some overlap in
forensics inside and outside the database both, as there are different
use cases for both.

David

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-04-25 Thread David Christensen

On Mon, Apr 25, 2022 at 1:11 AM Michael Paquier  wrote:
>
> On Sat, Apr 23, 2022 at 01:43:36PM -0500, David Christensen wrote:
> > Hi Matthias, great point.  Enclosed is a revised version of the patch
> > that adds the fork identifier to the end if it's a non-main fork.
>
> Like Alvaro, I have seen cases where this would have been really
> handy.  So +1 from me, as well, to have more tooling like what you are
> proposing.  Fine for me to use one file for each block with a name
> like what you are suggesting for each one of them.
>
> +   /* we accept an empty existing directory */
> +   if (stat(config.save_fpw_path, &st) == 0 && S_ISDIR(st.st_mode))
> +   {
> I don't think that there is any need to rely on a new logic if there
> is already some code in place able to do the same work.  See
> verify_dir_is_empty_or_create() in pg_basebackup.c, as one example,
> that relies on pg_check_dir().  I think that you'd better rely at
> least on what pgcheckdir.c offers.

Yeah, though I am tending towards what another user had suggested and
just accepting an existing directory rather than requiring it to be
empty, so thinking I might just skip this one. (Will review the file
you've pointed out to see if there is a relevant function though.)

> +   {"raw-fpi", required_argument, NULL, 'W'},
> I think that we'd better rename this option.  "fpi", that is not used
> much in the user-facing docs, is additionally not adapted when we have
> an other option called -w/--fullpage.  I can think of
> --save-fullpage.

Makes sense.

> +   PageSetLSN(page, record->ReadRecPtr);
> +   /* if checksum field is non-zero then we have checksums 
> enabled,
> +* so recalculate the checksum with new LSN (yes, this is 
> a hack)
> +*/
> Yeah, that looks like a hack, but putting in place a page on a cluster
> that has checksums enabled would be more annoying with
> zero_damaged_pages enabled if we don't do that, so that's fine by me
> as-is.  Perhaps you should mention that FPWs don't have their
> pd_checksum updated when written.

Can make a mention; you thinking just a comment in the code is
sufficient, or want there to be user-visible docs as well?

> +   /* we will now extract the fullpage image from the XLogRecord and save
> +* it to a calculated filename */
> The format of this comment is incorrect.
>
> +The LSN of the record with this block, formatted
> +as %08x-%08X instead of the
> +conventional %X/%X due to filesystem naming
> +limits
> The last part of the sentence about %X/%X could just be removed.  That
> could be confusing, at worse.

Makes sense.

> +   PageSetLSN(page, record->ReadRecPtr);
> Why is pd_lsn set?

This was to make the page as extracted equivalent to the effect of
applying the WAL record block on replay (the LSN and checksum both);
since recovery calls this function to mark the LSN where the page came
from this is why I did this as well.  (I do see perhaps a case for
--raw output that doesn't munge the page whatsoever, just made
comparisons easier.)

> git diff --check complains a bit.

Can look into this.

> This stuff should include some tests.  With --end, the tests can
> be cheap.

Yeah, makes sense, will include some in the next version.

David

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-04-25 Thread Bharath Rupireddy

On Sat, Apr 23, 2022 at 4:21 AM David Christensen
 wrote:
>
> Hi -hackers,
>
> Enclosed is a patch to allow extraction/saving of FPI from the WAL
> stream via pg_waldump.
>
> Description from the commit:
>
> Extracts full-page images from the WAL stream into a target directory,
> which must be empty or not
> exist.  These images are subject to the same filtering rules as normal
> display in pg_waldump, which
> means that you can isolate the full page writes to a target relation,
> among other things.
>
> Files are saved with the filename:  with
> formatting to make things
> somewhat sortable; for instance:
>
> -01C0.1663.1.6117.0
> -01000150.1664.0.6115.0
> -010001E0.1664.0.6114.0
> -01000270.1663.1.6116.0
> -01000300.1663.1.6113.0
> -01000390.1663.1.6112.0
> -01000420.1663.1.8903.0
> -010004B0.1663.1.8902.0
> -01000540.1663.1.6111.0
> -010005D0.1663.1.6110.0
>
> It's noteworthy that the raw images do not have the current LSN stored
> with them in the WAL
> stream (as would be true for on-heap versions of the blocks), nor
> would the checksum be valid in
> them (though WAL itself has checksums, so there is some protection
> there).  This patch chooses to
> place the LSN and calculate the proper checksum (if non-zero in the
> source image) in the outputted
> block.  (This could perhaps be a targetted flag if we decide we don't
> always want this.)
>
> These images could be loaded/inspected via `pg_read_binary_file()` and
> used in the `pageinspect`
> suite of tools to perform detailed analysis on the pages in question,
> based on historical
> information, and may come in handy for forensics work.

Thanks for working on this. I'm just thinking if we can use these FPIs
to repair the corrupted pages? I would like to understand more
detailed usages of the FPIs other than inspecting with pageinspect.

Given that others have realistic use-cases (of course I would like to
know more about those), +1 for the idea. However, I would suggest
adding a function to extract raw FPI data to the pg_walinspect
extension that got recently committed in PG 15, the out of which can
directly be fed to pageinspect functions or

Few comments:
1) I think it's good to mention the stored file name format.
+ printf(_("  -W, --raw-fpi=path save found full page images to
given path\n"));
2)
+ for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
+ {
+ /* we will now extract the fullpage image from the XLogRecord and save
+ * it to a calculated filename */
+
+ if (XLogRecHasBlockImage(record, block_id))

I think we need XLogRecHasBlockRef to be true to check
XLogRecHasBlockImage otherwise, we will see some build farms failing,
recently I've seen this failure for pg_walinspect..

for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
{
if (!XLogRecHasBlockRef(record, block_id))
continue;

if (XLogRecHasBlockImage(record, block_id))
*fpi_len += XLogRecGetBlock(record, block_id)->bimg_len;
}
3) Please correct the commenting format:
+ /* we will now extract the fullpage image from the XLogRecord and save
+ * it to a calculated filename */
4) Usually we start errors with lower case letters "could not ."
+ pg_fatal("Couldn't open file for output: %s", filename);
+ pg_fatal("Couldn't write out complete FPI to file: %s", filename);
And the variable name too:
+ FILE *OPF;
5) Not sure how the FPIs of TOASTed tables get stored, but it would be
good to check.
6) Good to specify the known usages of FPIs in the documentation.
7) Isn't it good to emit an error if RestoreBlockImage returns false?
+ if (RestoreBlockImage(record, block_id, page))
+ {
8) I think I don't mind if a non-empty directory is specified - IMO
better usability is this - if the directory is non-empty, just go add
the FPI files if FPI file exists just replace it, if the directory
isn't existing, create and write the FPI files.
+ /* we accept an empty existing directory */
+ if (stat(config.save_fpw_path, &st) == 0 && S_ISDIR(st.st_mode))
+ {
 9) Instead of following:
+ if (XLogRecordHasFPW(xlogreader_state))
+ XLogRecordSaveFPWs(xlogreader_state, config.save_fpw_path);
I will just do this in XLogRecordSaveFPWs:
for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
{
if (!XLogRecHasBlockRef(record, block_id))
continue;

if (XLogRecHasBlockImage(record, block_id))
{

}
}
10) Along with pg_pwrite(), can we also fsync the files (of course
users can choose it optionally) so that the writes will be durable for
the OS crashes?

Regards,
Bharath Rupireddy.

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-04-24 Thread Michael Paquier

On Sat, Apr 23, 2022 at 01:43:36PM -0500, David Christensen wrote:
> Hi Matthias, great point.  Enclosed is a revised version of the patch
> that adds the fork identifier to the end if it's a non-main fork.

Like Alvaro, I have seen cases where this would have been really
handy.  So +1 from me, as well, to have more tooling like what you are
proposing.  Fine for me to use one file for each block with a name
like what you are suggesting for each one of them. 

+   /* we accept an empty existing directory */
+   if (stat(config.save_fpw_path, &st) == 0 && S_ISDIR(st.st_mode))
+   {
I don't think that there is any need to rely on a new logic if there
is already some code in place able to do the same work.  See
verify_dir_is_empty_or_create() in pg_basebackup.c, as one example,
that relies on pg_check_dir().  I think that you'd better rely at
least on what pgcheckdir.c offers.

+   {"raw-fpi", required_argument, NULL, 'W'},
I think that we'd better rename this option.  "fpi", that is not used
much in the user-facing docs, is additionally not adapted when we have
an other option called -w/--fullpage.  I can think of
--save-fullpage.

+   PageSetLSN(page, record->ReadRecPtr);
+   /* if checksum field is non-zero then we have checksums 
enabled,
+* so recalculate the checksum with new LSN (yes, this is a 
hack)
+*/
Yeah, that looks like a hack, but putting in place a page on a cluster
that has checksums enabled would be more annoying with
zero_damaged_pages enabled if we don't do that, so that's fine by me
as-is.  Perhaps you should mention that FPWs don't have their
pd_checksum updated when written.

+   /* we will now extract the fullpage image from the XLogRecord and save
+* it to a calculated filename */
The format of this comment is incorrect.

+The LSN of the record with this block, formatted
+as %08x-%08X instead of the
+conventional %X/%X due to filesystem naming
+limits
The last part of the sentence about %X/%X could just be removed.  That
could be confusing, at worse.

+   PageSetLSN(page, record->ReadRecPtr);
Why is pd_lsn set?

git diff --check complains a bit.

This stuff should include some tests.  With --end, the tests can
be cheap.
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-04-23 Thread David Christensen

On Sat, Apr 23, 2022 at 9:49 AM Matthias van de Meent
 wrote:

> Regardless of my (lack of) opinion on the inclusion of this patch in
> PG (I did not significantly review this patch); I noticed that you do
> not yet identify the 'fork' of the FPI in the file name.
>
> A lack of fork identifier in the exported file names would make
> debugging much more difficult due to the relatively difficult to
> identify data contained in !main forks, so I think this oversight
> should be fixed, be it through `_forkname` postfix like normal fork
> segments, or be it through `.` numerical in- or postfix in
> the filename.
>
> -Matthias

Hi Matthias, great point.  Enclosed is a revised version of the patch
that adds the fork identifier to the end if it's a non-main fork.

Best,

David


v2-pg_waldump-save-fpi.patch
Description: Binary data

Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-04-23 Thread Matthias van de Meent

On Sat, 23 Apr 2022 at 00:51, David Christensen
 wrote:
>
> Hi -hackers,
>
> Enclosed is a patch to allow extraction/saving of FPI from the WAL
> stream via pg_waldump.
>
> Description from the commit:
>
> Extracts full-page images from the WAL stream into a target directory,
> which must be empty or not
> exist.  These images are subject to the same filtering rules as normal
> display in pg_waldump, which
> means that you can isolate the full page writes to a target relation,
> among other things.
>
> Files are saved with the filename:  with
> formatting to make things

Regardless of my (lack of) opinion on the inclusion of this patch in
PG (I did not significantly review this patch); I noticed that you do
not yet identify the 'fork' of the FPI in the file name.

A lack of fork identifier in the exported file names would make
debugging much more difficult due to the relatively difficult to
identify data contained in !main forks, so I think this oversight
should be fixed, be it through `_forkname` postfix like normal fork
segments, or be it through `.` numerical in- or postfix in
the filename.

-Matthias

1 2 3 >

1 - 100 of 273 matches

Mail list logo