date:20210917

Re: Timeout failure in 019_replslot_limit.pl

2021-09-17 Thread Michael Paquier

On Fri, Sep 17, 2021 at 08:41:00PM -0700, Noah Misch wrote:
> If this fixes things for the OP, I'd like it slightly better than the "ps"
> approach.  It's less robust, but I like the brevity.
> 
> Another alternative might be to have walreceiver reach walsender via a proxy
> Perl script.  Then, make that proxy able to accept an instruction to pause
> passing data until further notice.  However, I like two of your options better
> than this one.

Could it be possible to rely on a combination of wait events set in WAL
senders and pg_stat_replication to assume that a WAL sender is in a
stopped state?  I would think about something like that in the top of
my mind (perhaps this would need 2 WAL senders, one stopped and one
still running):
1) SIGSTOP WAL sender 1.
2) Check WAL sender 1 is in WalSenderMain.  If not retry 1) after a
SIGCONT.
3) Generate some WAL, and look at pg_stat_replication to see if there
has been some progress in 1), but that 2) is done.
--
Michael


signature.asc
Description: PGP signature

Re: Teach pg_receivewal to use lz4 compression

2021-09-17 Thread Michael Paquier

On Fri, Sep 17, 2021 at 08:12:42AM +, gkokola...@pm.me wrote:
> Yeah, I was considering it to split them over to a separate commit,
> then decided against it. Will do so in the future.

I have been digging into the issue I saw in the TAP tests when closing
a segment, and found the problem.  The way you manipulate
frameInfo.contentSize by just setting it to WalSegSz when *opening*
a segment causes problems on LZ4F_compressEnd(), making the code
throw a ERROR_frameSize_wrong.  In lz4frame.c, the end of
LZ4F_compressEnd() triggers this check and the error:
if (cctxPtr->prefs.frameInfo.contentSize) {
if (cctxPtr->prefs.frameInfo.contentSize != cctxPtr->totalInSize)
return err0r(LZ4F_ERROR_frameSize_wrong);
}

We don't really care about contentSize as long as a segment is not
completed.  Rather than filling contentSize all the time we write
something, we'd better update frameInfo once the segment is
completed and closed.  That would also take take of the error as this
is not checked if contentSize is 0.  It seems to me that we should
fill in the information when doing a CLOSE_NORMAL.

-   if (stream->walmethod->compression() == 0 &&
+   if (stream->walmethod->compression() == COMPRESSION_NONE &&
stream->walmethod->existsfile(fn))
This one was a more serious issue, as the compression() callback would
return an integer for the compression level but v5 compared it to a
WalCompressionMethod.  In order to take care of this issue, mainly for
pg_basebackup, I think that we have to update the compression()
callback to compression_method(), and it is cleaner to save the
compression method as well as the compression level for the tar data.

I am attaching a new patch, on which I have done many tweaks and
adjustments while reviewing it.  The attached patch fixes the second
issue, and I have done nothing about the first issue yet, but that
should be simple enough to address as this needs an update of the
frame info when closing a completed segment.  Could you look at it?

Thanks,
--
Michael
From 37e3800d279566445864ed82f29e8d650c72d8cd Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Sat, 18 Sep 2021 15:11:49 +0900
Subject: [PATCH v6] Teach pg_receivewal to use LZ4 compression

The program pg_receivewal can use gzip compression to store the received WAL.
This commit teaches it to also be able to use LZ4 compression. It is required
that the binary is build using the -llz4 flag. It is enabled via the --with-lz4
flag on configuration time.

Previously, the user had to use the option --compress with a value between [0-9]
to denote that gzip compression was required. This specific behaviour has not
maintained. A newly introduced option --compression-method=[LZ4|gzip] can be
used to ask for the logs to be compressed. Compression values can be selected
only when the compression method is gzip. A compression value of 0 now returns
an error.

Under the hood there is nothing exceptional to be noted. Tar based archives have
not yet been taught to use LZ4 compression. If that is felt useful, then it is
easy to be added in the future.

Tests have been added to verify the creation and correctness of the generated
LZ4 files. The later is achieved by the use of LZ4 program, if present in the
installation.
---
 src/bin/pg_basebackup/Makefile   |   1 +
 src/bin/pg_basebackup/pg_basebackup.c|   7 +-
 src/bin/pg_basebackup/pg_receivewal.c| 278 +++
 src/bin/pg_basebackup/receivelog.c   |   2 +-
 src/bin/pg_basebackup/t/020_pg_receivewal.pl |  81 +-
 src/bin/pg_basebackup/walmethods.c   | 216 --
 src/bin/pg_basebackup/walmethods.h   |  16 +-
 doc/src/sgml/ref/pg_receivewal.sgml  |  28 +-
 src/Makefile.global.in   |   1 +
 src/tools/pgindent/typedefs.list |   1 +
 10 files changed, 546 insertions(+), 85 deletions(-)

diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index 459d514183..387d728345 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -24,6 +24,7 @@ export TAR
 # used by the command "gzip" to pass down options, so stick with a different
 # name.
 export GZIP_PROGRAM=$(GZIP)
+export LZ4

 override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
 LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
diff --git a/src/bin/pg_basebackup/pg_basebackup.c 
b/src/bin/pg_basebackup/pg_basebackup.c
index 669aa207a3..18c6a93cec 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -555,10 +555,13 @@ LogStreamerMain(logstreamer_param *param)
stream.replication_slot = replication_slot;

if (format == 'p')
-   stream.walmethod = CreateWalDirectoryMethod(param->xlog, 0,
+   stream.walmethod = CreateWalDirectoryMethod(param->xlog,
+

Re: Timeout failure in 019_replslot_limit.pl

2021-09-17 Thread Noah Misch

On Fri, Sep 17, 2021 at 06:59:24PM -0300, Alvaro Herrera wrote:
> On 2021-Sep-07, Kyotaro Horiguchi wrote:
> > It seems like the "kill 'STOP'" in the script didn't suspend the
> > processes before advancing WAL. The attached uses 'ps' command to
> > check that since I didn't come up with the way to do the same in Perl.
> 
> Ah! so we tell the kernel to send the signal, but there's no guarantee
> about the timing for the reaction from the other process.  Makes sense.

Agreed.

> Your proposal is to examine the other process' state until we see that
> it gets the T flag.  I wonder how portable this is; I suspect not very.
> `ps` is pretty annoying, meaning not consistently implemented -- GNU's
> manpage says there are "UNIX options", "BSD options" and "GNU long
> options", so it seems hard to believe that there is one set of options
> that will work everywhere.

I like this, and it's the most-robust way.  I agree there's no portable way,
so I'd modify it to be fail-open.  Run a "ps" command that works on the OP's
system.  If the output shows the process in a state matching [DRS], we can
confidently sleep a bit for signal delivery to finish.  If the command fails
or prints something else (including state T, which we need check explicitly),
assume SIGSTOP delivery is complete.  If some other platform shows this race
in the future, we can add an additional "ps" command.

If we ever get the "stop events" system
(https://postgr.es/m/flat/capphfdtseohx8dsk9qp+z++i4bgqoffkip6jdwngea+g7z-...@mail.gmail.com),
it would be useful for crafting this kind of test without problem seen here.

> I found a Perl module (Proc::ProcessTable) that can be used to get the
> list of processes and their metadata, but it isn't in core Perl and it
> doesn't look very well maintained either, so that one's out.

Agreed, that one's out.

> Another option might be to wait on the kernel -- do something that would
> involve the kernel taking action on the other process, acting like a
> barrier of sorts.  I don't know if this actually works, but we could
> try.  Something like sending SIGSTOP first, then "kill 0" -- or just
> send SIGSTOP twice:
> 
> diff --git a/src/test/recovery/t/019_replslot_limit.pl 
> b/src/test/recovery/t/019_replslot_limit.pl
> index e065c5c008..e8f323066a 100644
> --- a/src/test/recovery/t/019_replslot_limit.pl
> +++ b/src/test/recovery/t/019_replslot_limit.pl
> @@ -346,6 +346,8 @@ $logstart = get_log_size($node_primary3);
>  # freeze walsender and walreceiver. Slot will still be active, but 
> walreceiver
>  # won't get anything anymore.
>  kill 'STOP', $senderpid, $receiverpid;
> +kill 'STOP', $senderpid, $receiverpid;
> +
>  advance_wal($node_primary3, 2);
>  
>  my $max_attempts = 180;

If this fixes things for the OP, I'd like it slightly better than the "ps"
approach.  It's less robust, but I like the brevity.

Another alternative might be to have walreceiver reach walsender via a proxy
Perl script.  Then, make that proxy able to accept an instruction to pause
passing data until further notice.  However, I like two of your options better
than this one.

Re: psql: \dl+ to list large objects privileges

2021-09-17 Thread Neil Chen

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

Hi, I think this is an interesting patch. +1
I tested it for the latest version, and it works well.

Re: Access last_sqlstate from libpq

2021-09-17 Thread Tom Lane

Daniel Frey  writes:
> In case of an error when I received a PGresult*, I can access the SQLSTATE by 
> calling

>PGresult* pgresult = ...;
>const char* sqlstate = PQresultErrorField( pgresult, PG_DIAG_SQLSTATE );

Right ...

> However, this is not possible in a couple of other cases where I don't have a 
> PGresult*, only the PGconn* is available:
> * PQconnectdb (and variants)
> * PQputCopyData
> * PQputCopyEnd
> * PQgetCopyData

In these cases, any error you might get is probably from libpq itself,
not from the server.  libpq does not generate SQLSTATEs for its errors,
so it's likely that last_sqlstate is not relevant at all.

(Getting libpq to assign SQLSTATEs to its errors has been on the to-do
list for a couple of decades.  I'm not holding my breath for somebody
to undertake that.)

> Are there any problems adding a simple accessor to libpq?

I would be strongly against that unless somebody first did the
legwork to ensure it was meaningful.

regards, tom lane

Re: improve pg_receivewal code

2021-09-17 Thread Michael Paquier

On Fri, Sep 17, 2021 at 11:46:33AM +0530, Bharath Rupireddy wrote:
> Thanks. I changed the code that way. PSA v3 patch.

Thanks.  Done.
--
Michael


signature.asc
Description: PGP signature

Re: Access last_sqlstate from libpq

2021-09-17 Thread David G. Johnston

On Friday, September 17, 2021, Daniel Frey  wrote:
>
>
> Are you sure or are you guessing?


>
Guessing regarding the implementations of these interfaces.

David J.

Re: postgres.h included from relcache.h - but removing it breaks pg_upgrade

2021-09-17 Thread Andres Freund

Hi,

On 2021-09-18 02:51:09 +0300, Alexander Korotkov wrote:
> On Tue, Sep 14, 2021 at 6:53 AM Tom Lane  wrote:
> > Without having looked at the details, I think using a forward-declare
> > to avoid including relcache.h in visibilitymap.h might be a reasonably
> > non-painful fix.
> 
> I like that idea, but I didn't find an appropriate existing header for
> forward-declaration of Relation.  relation.h isn't suitable, because
> it includes primnodes.h.  A separate header for just
> forward-definition of Relation seems too much.

I was just thinking of doing something like the attached.


> > TOH, in the long run it might be worth the effort
> > to split visibilitymap.h to separate useful file-contents knowledge
> > from backend function declarations.
> 
> I've drafted a patch splitting visibilitymap_maros.h from
> visibilitymap.h.  What do you think?

I'd name it visibilitymapdefs.h or such, mostly because that's what other
headers are named like...

Greetings,

Andres Freund
diff --git i/src/include/access/visibilitymap.h w/src/include/access/visibilitymap.h
index 57362c36876..d7f7f30b1fb 100644
--- i/src/include/access/visibilitymap.h
+++ w/src/include/access/visibilitymap.h
@@ -17,7 +17,6 @@
 #include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
-#include "utils/relcache.h"
 
 /* Number of bits for one heap page */
 #define BITS_PER_HEAPBLOCK 2
@@ -27,6 +26,7 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 			 * flags bits */
+struct RelationData;
 
 /* Macros for visibilitymap test */
 #define VM_ALL_VISIBLE(r, b, v) \
@@ -34,17 +34,17 @@
 #define VM_ALL_FROZEN(r, b, v) \
 	((visibilitymap_get_status((r), (b), (v)) & VISIBILITYMAP_ALL_FROZEN) != 0)
 
-extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
+extern bool visibilitymap_clear(struct RelationData *rel, BlockNumber heapBlk,
 Buffer vmbuf, uint8 flags);
-extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
+extern void visibilitymap_pin(struct RelationData *rel, BlockNumber heapBlk,
 			  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern void visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
+extern void visibilitymap_set(struct RelationData *rel, BlockNumber heapBlk, Buffer heapBuf,
 			  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
 			  uint8 flags);
-extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
-extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
-extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
+extern uint8 visibilitymap_get_status(struct RelationData *rel, BlockNumber heapBlk, Buffer *vmbuf);
+extern void visibilitymap_count(struct RelationData *rel, BlockNumber *all_visible, BlockNumber *all_frozen);
+extern BlockNumber visibilitymap_prepare_truncate(struct RelationData *rel,
   BlockNumber nheapblocks);
 
 #endif			/* VISIBILITYMAP_H */
diff --git i/src/include/utils/relcache.h w/src/include/utils/relcache.h
index f772855ac69..d2c17575f65 100644
--- i/src/include/utils/relcache.h
+++ w/src/include/utils/relcache.h
@@ -14,7 +14,6 @@
 #ifndef RELCACHE_H
 #define RELCACHE_H
 
-#include "postgres.h"
 #include "access/tupdesc.h"
 #include "nodes/bitmapset.h"

Re: Access last_sqlstate from libpq

2021-09-17 Thread Daniel Frey

> On 18. Sep 2021, at 01:45, David G. Johnston  
> wrote:
> 
> 
> 
> On Friday, September 17, 2021, Daniel Frey  wrote:
> 
> However, this is not possible in a couple of other cases where I don't have a 
> PGresult*, only the PGconn* is available:
> 
> * PQconnectdb (and variants)
> 
> * PQputCopyData
> * PQputCopyEnd
> * PQgetCopyData
> 
> * lo_* (large object functions)
> 
> After some research, it appears that PGconn* does have a field called 
> last_sqlstate - it just can't be accessed.
> Are there any problems adding a simple accessor to libpq? Or is there some 
> way to access it that I'm missing?
> 
> I suspect the reason for the omission is that there isn’t any usable data to 
> be gotten.  Those interfaces are not SQL interfaces and thus do not have a 
> relevant last_sqlstate to report.
> 
> David J.

Are you sure or are you guessing? It appears that for PQconnectdb there are a 
couple of SQLSTATES defined which could help users. The 08 Class "Connection 
Exception" contains at least 08001, 08004, 08P01 which could be helpful for 
users. For PGputCopyData, etc. Class 22 contains a lot of states that could 
explain what went wrong (in case it's the data), other states potentially also 
apply (like when the connection is lost, etc.). Even for large data it might me 
helpful to see states that indicate if the server ran out of disk space, etc.

Maybe not all of this is currently implemented (i.e. a reasonable SQLSTATE is 
stored in last_sqlstate), but I would hope that it is in some cases.

Daniel

Re: postgres.h included from relcache.h - but removing it breaks pg_upgrade

2021-09-17 Thread Alexander Korotkov

On Tue, Sep 14, 2021 at 6:53 AM Tom Lane  wrote:
> Andres Freund  writes:
> > On 2021-09-13 22:40:19 -0400, Tom Lane wrote:
> >> As for the fix ... what in the world is pg_upgrade doing including
> >> relcache.h?  It seems like there's a more fundamental problem here:
> >> either relcache.h is declaring something that needs to be elsewhere,
> >> or pg_upgrade is doing something it should not.
>
> > We could split visibilitymap.h into two, or we could forward-declare 
> > Relation
> > and not include relcache...
>
> Without having looked at the details, I think using a forward-declare
> to avoid including relcache.h in visibilitymap.h might be a reasonably
> non-painful fix.

I like that idea, but I didn't find an appropriate existing header for
forward-declaration of Relation.  relation.h isn't suitable, because
it includes primnodes.h.  A separate header for just
forward-definition of Relation seems too much.

> TOH, in the long run it might be worth the effort
> to split visibilitymap.h to separate useful file-contents knowledge
> from backend function declarations.

I've drafted a patch splitting visibilitymap_maros.h from
visibilitymap.h.  What do you think?

--
Regards,
Alexander Korotkov


0001-Split-macros-from-visibilitymap.h-into-a-separate-h-.patch
Description: Binary data

Re: Access last_sqlstate from libpq

2021-09-17 Thread David G. Johnston

On Friday, September 17, 2021, Daniel Frey  wrote:
>
>
> However, this is not possible in a couple of other cases where I don't
> have a PGresult*, only the PGconn* is available:
>
> * PQconnectdb (and variants)
>
> * PQputCopyData
> * PQputCopyEnd
> * PQgetCopyData
>
> * lo_* (large object functions)
>
> After some research, it appears that PGconn* does have a field called
> last_sqlstate - it just can't be accessed.
> Are there any problems adding a simple accessor to libpq? Or is there some
> way to access it that I'm missing?
>

I suspect the reason for the omission is that there isn’t any usable data
to be gotten.  Those interfaces are not SQL interfaces and thus do not have
a relevant last_sqlstate to report.

David J.

Access last_sqlstate from libpq

2021-09-17 Thread Daniel Frey

Hi,

I am the author of a PostgreSQL C++ client library, taoPQ 
(https://github.com/taocpp/taopq), wrapping the C-API of libpq.

In case of an error when I received a PGresult*, I can access the SQLSTATE by 
calling

   PGresult* pgresult = ...;
   const char* sqlstate = PQresultErrorField( pgresult, PG_DIAG_SQLSTATE );

However, this is not possible in a couple of other cases where I don't have a 
PGresult*, only the PGconn* is available:

* PQconnectdb (and variants)

* PQputCopyData
* PQputCopyEnd
* PQgetCopyData

* lo_* (large object functions)

Obviously, I can take the error message from PQerrorMessage and throw a generic 
runtime error - but it would be so much nicer if I could use the SQLSTATE to 
throw the correct exception class and give users more information just like I 
do for PGresult*.

After some research, it appears that PGconn* does have a field called 
last_sqlstate - it just can't be accessed.
Are there any problems adding a simple accessor to libpq? Or is there some way 
to access it that I'm missing?

Regards,
Daniel

Re: postgres.h included from relcache.h - but removing it breaks pg_upgrade

2021-09-17 Thread Alexander Korotkov

On Tue, Sep 14, 2021 at 5:40 AM Tom Lane  wrote:
> Andres Freund  writes:
> > I noticed that postgres.h is included from relcache.h (starting in [1]) and
> > wanted to fix that - it violates our usual policy against including 
> > postgres.h
> > from within headers.
>
> Ugh, yeah, that's entirely against policy.

I see. This is my oversight, sorry for that.

--
Regards,
Alexander Korotkov

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

2021-09-17 Thread Cameron Murdoch

Hi,

I manage a bunch of Postgres servers at Oslo University and we use real ssl
certs on all our servers.

I was actually really surprised to discover that the libpq default is
sslmode=require and that the root cert defaults to a file under the user’s
home directory. I have been planning to use our management system
(CFEngine) to globally change the client settings to verify-ca and to use
the system trust store.

So that’s a +1 to use the system cert store for client connections.

I also agree that the proposed patch is not the right way to go as it is
essentially the same as verify-full, and I think that the correct fix would
be to change the default.

Thanks
C

Re: Timeout failure in 019_replslot_limit.pl

2021-09-17 Thread Alvaro Herrera

On 2021-Sep-07, Kyotaro Horiguchi wrote:

> It seems like the "kill 'STOP'" in the script didn't suspend the
> processes before advancing WAL. The attached uses 'ps' command to
> check that since I didn't come up with the way to do the same in Perl.

Ah! so we tell the kernel to send the signal, but there's no guarantee
about the timing for the reaction from the other process.  Makes sense.

Your proposal is to examine the other process' state until we see that
it gets the T flag.  I wonder how portable this is; I suspect not very.
`ps` is pretty annoying, meaning not consistently implemented -- GNU's
manpage says there are "UNIX options", "BSD options" and "GNU long
options", so it seems hard to believe that there is one set of options
that will work everywhere.

I found a Perl module (Proc::ProcessTable) that can be used to get the
list of processes and their metadata, but it isn't in core Perl and it
doesn't look very well maintained either, so that one's out.

Another option might be to wait on the kernel -- do something that would
involve the kernel taking action on the other process, acting like a
barrier of sorts.  I don't know if this actually works, but we could
try.  Something like sending SIGSTOP first, then "kill 0" -- or just
send SIGSTOP twice:

diff --git a/src/test/recovery/t/019_replslot_limit.pl 
b/src/test/recovery/t/019_replslot_limit.pl
index e065c5c008..e8f323066a 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -346,6 +346,8 @@ $logstart = get_log_size($node_primary3);
 # freeze walsender and walreceiver. Slot will still be active, but walreceiver
 # won't get anything anymore.
 kill 'STOP', $senderpid, $receiverpid;
+kill 'STOP', $senderpid, $receiverpid;
+
 advance_wal($node_primary3, 2);

 my $max_attempts = 180;

> + # Haven't found the means to do the same on Windows
> + return if $TestLib::windows_os;

I suppose if it came down to something like your patch, we could do
something simple here like "if Windows, sleep 2s and hope for the best".

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Ellos andaban todos desnudos como su madre los parió, y también las mujeres,
aunque no vi más que una, harto moza, y todos los que yo vi eran todos
mancebos, que ninguno vi de edad de más de XXX años" (Cristóbal Colón)

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

2021-09-17 Thread Greg Stark

Hm. Let's Encrypt's FAQ tells me I'm on the right track with that
question but the distinctinos are far more coarse than I was worried
about:


Does Let’s Encrypt issue certificates for anything other than SSL/TLS
for websites?

Let’s Encrypt certificates are standard Domain Validation
certificates, so you can use them for any server that uses a domain
name, like web servers, mail servers, FTP servers, and many more.

Email encryption and code signing require a different type of
certificate that Let’s Encrypt does not issue.


So it sounds like, at least for SSL connections, we should use the
same certificate authorities used to authenticate web sites. If ever
we implemented signed extensions, for example, it might require
different certificates -- I don't know what that means for the SSL
validation rules and the storage for them.

Re: prevent immature WAL streaming

2021-09-17 Thread Alvaro Herrera

On 2021-Sep-17, Bossart, Nathan wrote:

> > That was the first implementation, a few versions of the patch ago.  An
> > added benefit of a separate WAL record is that you can carry additional
> > data for validation, such as -- as suggested by Andres -- the CRC of the
> > partial data contained in the message that we're skipping.  I didn't
> > implement that, but it should be trivial to add it.
> 
> I see.  IMO feels a bit counterintuitive to validate a partial record
> that you are ignoring anyway, but I suppose it's still valuable to
> know when the WAL is badly broken.  It's not expensive, and it doesn't
> add a ton of complexity, either.

Yeah, we don't have any WAL record history validation other than the
verifying the LSN of the previous record; I suppose in this particular
case you could argue that we shouldn't bother with any validation
either.  But it seems safer to do it.  It doesn't hurt anything anyway.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/

Re: prevent immature WAL streaming

2021-09-17 Thread Bossart, Nathan

On 9/17/21, 1:32 PM, "Alvaro Herrera"  wrote:
> On 2021-Sep-17, Bossart, Nathan wrote:
>
>> I gave the patch a read-through.  I'm wondering if the
>> XLOG_OVERWRITE_CONTRECORD records are actually necessary.  IIUC we
>> will set XLP_FIRST_IS_ABORTED_PARTIAL on the next page, and
>> xlp_pageaddr on that page will already be validated in
>> XLogReaderValidatePageHeader().  Does adding this new record also help
>> ensure the page header is filled in and flushed to disk?
>
> That was the first implementation, a few versions of the patch ago.  An
> added benefit of a separate WAL record is that you can carry additional
> data for validation, such as -- as suggested by Andres -- the CRC of the
> partial data contained in the message that we're skipping.  I didn't
> implement that, but it should be trivial to add it.

I see.  IMO feels a bit counterintuitive to validate a partial record
that you are ignoring anyway, but I suppose it's still valuable to
know when the WAL is badly broken.  It's not expensive, and it doesn't
add a ton of complexity, either.

Nathan

Re: Automatic notification of top transaction IDs

2021-09-17 Thread Cary Huang

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

Hello

This patch applies fine to master branch and the regression tests are passing. 

Regarding the parallel worker case, the AssignTransactionId() function  is 
already handling that and it will error out if IsParallelWorker() is true. In a 
normal case, this function is only called by the main backend, and the parallel 
workers will synchronize the transaction ID when they are spawned and they will 
not call this function anyway.

thank you

Cary Huang

HighGo Software Canada
www.highgo.ca

Re: Add jsonlog log_destination for JSON server logs

2021-09-17 Thread Sehrope Sarkuni

On Thu, Sep 16, 2021 at 9:36 PM Michael Paquier  wrote:

> I am not really impressed by 0001 and the introduction of
> LOG_DESTINATIONS_WITH_FILES.  So I would leave that out and keep the
> list of destinations listed instead.  And we are talking about two
> places here, only within syslogger.c.
>

I've taken that out for now. The idea was to simplify the additions when
jsonlog is added but can circle back to that later if it makes sense.


> 0002 looks like an improvement,


Nice. That's left unchanged (renamed to 0001 now).


> but I think that 0003 makes the
> readability of the code worse with the introduction of eight static
> routines to handle all those cases.
>
> file_log_dest_should_rotate_for_size(), file_log_dest_close(),
> file_log_dest_check_rotate_for_size(), get_syslogger_file() don't
> bring major improvements.  On the contrary,
> file_log_dest_write_metadata and file_log_dest_rotate seem to reduce a
> bit the noise.
>

It was helpful to split them out while working on the patch but I see your
point upon re-reading through the result.

Attached version (renamed to 002) adds only three static functions for
checking rotation size, performing the actual rotation, and closing. Also
one other to handle generating the logfile names with a suffix to handle
the pfree-ing (wrapping logfile_open(...)).

The rest of the changes happen in place using the new structures.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/
From c85e15ffc6a8e310c10be4c85580980d04846bf2 Mon Sep 17 00:00:00 2001
From: Sehrope Sarkuni 
Date: Thu, 16 Sep 2021 15:57:00 -0400
Subject: [PATCH 1/2] Split out syslogger EXEC_BACKEND fd serialization and
 opening into helper functions

---
 src/backend/postmaster/syslogger.c | 109 -
 1 file changed, 44 insertions(+), 65 deletions(-)

diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index bca3883572..48b4c48a18 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -730,9 +730,23 @@ SysLogger_Start(void)
 	return 0;
 }
 
-
 #ifdef EXEC_BACKEND
 
+static long syslogger_get_fileno(FILE *file)
+{
+#ifndef WIN32
+	if (file != NULL)
+		return (long) fileno(file);
+	else
+		return -1;
+#else			/* WIN32 */
+	if (file != NULL)
+		return (long) _get_osfhandle(_fileno(file));
+	else
+		return 0;
+#endif			/* WIN32 */
+}
+
 /*
  * syslogger_forkexec() -
  *
@@ -751,34 +765,9 @@ syslogger_forkexec(void)
 	av[ac++] = NULL;			/* filled in by postmaster_forkexec */
 
 	/* static variables (those not passed by write_backend_variables) */
-#ifndef WIN32
-	if (syslogFile != NULL)
-		snprintf(filenobuf, sizeof(filenobuf), "%d",
- fileno(syslogFile));
-	else
-		strcpy(filenobuf, "-1");
-#else			/* WIN32 */
-	if (syslogFile != NULL)
-		snprintf(filenobuf, sizeof(filenobuf), "%ld",
- (long) _get_osfhandle(_fileno(syslogFile)));
-	else
-		strcpy(filenobuf, "0");
-#endif			/* WIN32 */
+	snprintf(filenobuf, sizeof(filenobuf), "%ld", syslogger_get_fileno(syslogFile));
 	av[ac++] = filenobuf;
-
-#ifndef WIN32
-	if (csvlogFile != NULL)
-		snprintf(csvfilenobuf, sizeof(csvfilenobuf), "%d",
- fileno(csvlogFile));
-	else
-		strcpy(csvfilenobuf, "-1");
-#else			/* WIN32 */
-	if (csvlogFile != NULL)
-		snprintf(csvfilenobuf, sizeof(csvfilenobuf), "%ld",
- (long) _get_osfhandle(_fileno(csvlogFile)));
-	else
-		strcpy(csvfilenobuf, "0");
-#endif			/* WIN32 */
+	snprintf(csvfilenobuf, sizeof(csvfilenobuf), "%ld", syslogger_get_fileno(csvlogFile));
 	av[ac++] = csvfilenobuf;
 
 	av[ac] = NULL;
@@ -787,6 +776,31 @@ syslogger_forkexec(void)
 	return postmaster_forkexec(ac, av);
 }
 
+static FILE* syslogger_fdopen(int fd)
+{
+	FILE *file = NULL;
+
+#ifndef WIN32
+	if (fd != -1)
+	{
+		file = fdopen(fd, "a");
+		setvbuf(file, NULL, PG_IOLBF, 0);
+	}
+#else			/* WIN32 */
+	if (fd != 0)
+	{
+		fd = _open_osfhandle(fd, _O_APPEND | _O_TEXT);
+		if (fd > 0)
+		{
+			file = fdopen(fd, "a");
+			setvbuf(file, NULL, PG_IOLBF, 0);
+		}
+	}
+#endif			/* WIN32 */
+
+	return file;
+}
+
 /*
  * syslogger_parseArgs() -
  *
@@ -795,8 +809,6 @@ syslogger_forkexec(void)
 static void
 syslogger_parseArgs(int argc, char *argv[])
 {
-	int			fd;
-
 	Assert(argc == 5);
 	argv += 3;
 
@@ -807,41 +819,8 @@ syslogger_parseArgs(int argc, char *argv[])
 	 * fails there's not a lot we can do to report the problem anyway.  As
 	 * coded, we'll just crash on a null pointer dereference after failure...
 	 */
-#ifndef WIN32
-	fd = atoi(*argv++);
-	if (fd != -1)
-	{
-		syslogFile = fdopen(fd, "a");
-		setvbuf(syslogFile, NULL, PG_IOLBF, 0);
-	}
-	fd = atoi(*argv++);
-	if (fd != -1)
-	{
-		csvlogFile = fdopen(fd, "a");
-		setvbuf(csvlogFile, NULL, PG_IOLBF, 0);
-	}
-#else			/* WIN32 */
-	fd = atoi(*argv++);
-	if (fd != 0)
-	{
-		fd = _open_osfhandle(fd, _O_APPEND | _O_TEXT);
-		if (fd > 0)
-		{
-			syslogFile = fdopen(fd, "a");
-			setvbuf(syslogFile, NULL, PG_IOLBF, 0);
-		}

Re: Minimal logical decoding on standbys

2021-09-17 Thread Fabrízio de Royes Mello

On Wed, Sep 15, 2021 at 8:36 AM Drouvot, Bertrand 
wrote:
>
> Another rebase attached.
>
> The patch proposal to address Andre's walsender corner cases is still a
dedicated commit (as i think it may be easier to discuss).
>

Did one more battery of tests and everything went well...

But doing some manually tests:

1. Setup master/replica (wal_level=logical, hot_standby_feedback=on, etc)
2. Initialize the master instance: "pgbench -i -s10 on master"
3. Terminal1: execute "pgbench -c20 -T 2000"
4. Terminal2: create the logical replication slot:

271480 (replica) fabrizio=# select * from
pg_create_logical_replication_slot('test_logical', 'test_decoding');
-[ RECORD 1 ]---
slot_name | test_logical
lsn   | 1/C7C59E0

Time: 37658.725 ms (00:37.659)

5. Terminal3: start the pg_recvlogical

~/pgsql
➜ pg_recvlogical -p 5433 -S test_logical -d fabrizio -f - --start
pg_recvlogical: error: could not send replication command
"START_REPLICATION SLOT "test_logical" LOGICAL 0/0": ERROR:  replication
slot "test_logical" is active for PID 271480
pg_recvlogical: disconnected; waiting 5 seconds to try again
pg_recvlogical: error: could not send replication command
"START_REPLICATION SLOT "test_logical" LOGICAL 0/0": ERROR:  replication
slot "test_logical" is active for PID 271480
pg_recvlogical: disconnected; waiting 5 seconds to try again
pg_recvlogical: error: could not send replication command
"START_REPLICATION SLOT "test_logical" LOGICAL 0/0": ERROR:  replication
slot "test_logical" is active for PID 271480
pg_recvlogical: disconnected; waiting 5 seconds to try again
pg_recvlogical: error: could not send replication command
"START_REPLICATION SLOT "test_logical" LOGICAL 0/0": ERROR:  replication
slot "test_logical" is active for PID 271480
pg_recvlogical: disconnected; waiting 5 seconds to try again
pg_recvlogical: error: could not send replication command
"START_REPLICATION SLOT "test_logical" LOGICAL 0/0": ERROR:  replication
slot "test_logical" is active for PID 271480
pg_recvlogical: disconnected; waiting 5 seconds to try again
pg_recvlogical: error: could not send replication command
"START_REPLICATION SLOT "test_logical" LOGICAL 0/0": ERROR:  replication
slot "test_logical" is active for PID 271480
pg_recvlogical: disconnected; waiting 5 seconds to try again
pg_recvlogical: error: could not send replication command
"START_REPLICATION SLOT "test_logical" LOGICAL 0/0": ERROR:  replication
slot "test_logical" is active for PID 271480
pg_recvlogical: disconnected; waiting 5 seconds to try again
BEGIN 3767318
COMMIT 3767318
BEGIN 3767319
COMMIT 3767319
BEGIN 3767320
table public.pgbench_history: TRUNCATE: (no-flags)
COMMIT 3767320
BEGIN 3767323
table public.pgbench_accounts: UPDATE: aid[integer]:398507 bid[integer]:4
abalance[integer]:-1307 filler[character]:'
   '
table public.pgbench_tellers: UPDATE: tid[integer]:17 bid[integer]:2
tbalance[integer]:-775356 filler[character]:null
table public.pgbench_branches: UPDATE: bid[integer]:4
bbalance[integer]:1862180 filler[character]:null
table public.pgbench_history: INSERT: tid[integer]:17 bid[integer]:4
aid[integer]:398507 delta[integer]:182 mtime[timestamp without time
zone]:'2021-09-17 17:25:19.811239' filler[character]:null
COMMIT 3767323
BEGIN 3767322
table public.pgbench_accounts: UPDATE: aid[integer]:989789 bid[integer]:10
abalance[integer]:1224 filler[character]:'
   '
table public.pgbench_tellers: UPDATE: tid[integer]:86 bid[integer]:9
tbalance[integer]:-283737 filler[character]:null
table public.pgbench_branches: UPDATE: bid[integer]:9
bbalance[integer]:1277609 filler[character]:null
table public.pgbench_history: INSERT: tid[integer]:86 bid[integer]:9
aid[integer]:989789 delta[integer]:-2934 mtime[timestamp without time
zone]:'2021-09-17 17:25:19.811244' filler[character]:null
COMMIT 3767322

Even with activity on primary the creation of the logical replication slot
took ~38s. Can we do something related to it or should we need to clarify
even more the documentation?

Regards,

-- 
   Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: prevent immature WAL streaming

2021-09-17 Thread Alvaro Herrera

On 2021-Sep-17, Bossart, Nathan wrote:

> I gave the patch a read-through.  I'm wondering if the
> XLOG_OVERWRITE_CONTRECORD records are actually necessary.  IIUC we
> will set XLP_FIRST_IS_ABORTED_PARTIAL on the next page, and
> xlp_pageaddr on that page will already be validated in
> XLogReaderValidatePageHeader().  Does adding this new record also help
> ensure the page header is filled in and flushed to disk?

That was the first implementation, a few versions of the patch ago.  An
added benefit of a separate WAL record is that you can carry additional
data for validation, such as -- as suggested by Andres -- the CRC of the
partial data contained in the message that we're skipping.  I didn't
implement that, but it should be trivial to add it.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"How strange it is to find the words "Perl" and "saner" in such close
proximity, with no apparent sense of irony. I doubt that Larry himself
could have managed it." (ncm, http://lwn.net/Articles/174769/)

Re: initdb --pwfile /dev/zero

2021-09-17 Thread Tom Lane

Andres Freund  writes:
> On 2021-09-17 14:48:42 -0400, Tom Lane wrote:
>> I don't think '\0' is the problem.  The only fix for this would be to
>> re-introduce some fixed limit on how long a line we'll read, which
>> I'm not too thrilled about.

> Well, '\0' can be classified as the end of a line imo. So I don't think it'd
> require a line lenght limit.

Meh.  Those functions are specified to act like fgets(), which does not
think that \0 terminates a line AFAIK.

regards, tom lane

Re: initdb --pwfile /dev/zero

2021-09-17 Thread Andres Freund

Hi,

On 2021-09-17 14:48:42 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > A colleague tried PG 14 internally and it failed during cluster creation, 
> > when
> > using the PGDG rpm packages. A bit of debugging shows that the problem is
> > that the packaging script specifies the password using --pwfile /dev/zero.
> 
> > In 14+ this turns out to lead to an endless loop in pg_get_line_append().
> 
> Well, that's because that file will source an infinite amount of stuff.
> 
> > I think we still ought to make pg_get_line() a
> > bit more resilient against '\0'?
> 
> I don't think '\0' is the problem.  The only fix for this would be to
> re-introduce some fixed limit on how long a line we'll read, which
> I'm not too thrilled about.

Well, '\0' can be classified as the end of a line imo. So I don't think it'd
require a line lenght limit.


> I think this is better classified as user error.

I also can live with that.


I don't really understand how the current PGDG rpms work given this?  Does
nobody use the provided /usr/pgsql-14/bin/postgresql-14-setup?

https://git.postgresql.org/gitweb/?p=pgrpms.git;a=blob;f=rpm/redhat/master/non-common/postgresql-14/main/postgresql-14-setup;h=d111033fc3f3bc03c243f424fd60c3e8ddf2e466;hb=HEAD#l139

Greetings,

Andres Freund

Re: right join with partitioned table crash

2021-09-17 Thread Tom Lane

Justin Pryzby  writes:
> On Wed, Sep 15, 2021 at 07:53:49PM -0400, Tom Lane wrote:
>> Jaime Casanova  writes:
>>> Here's another crash caught by sqlsmith.

>> Fun.  Looks like it fails back to v12, but not in v11,
>> so it's some optimization we added in v12 that's at fault.

> It seems to be a regression (?) in 12.6 (2021-02-11), from
> | 1cce024fd2 Fix pull_varnos' miscomputation of relids set for a 
> PlaceHolderVar.

Yeah, that patch still had a hole in it.  Fix pushed,
thanks for the report!

regards, tom lane

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

2021-09-17 Thread Tom Lane

Greg Stark  writes:
> However I have a different question. Are the system certificates
> intended or general purpose certificates? Do they have their intended
> uses annotated on the certificates? Does SSL Verification have any
> logic deciding which certificates are appropriate for signing servers?

AFAIK, once you've stuck a certificate into the system store, it
will be trusted by every service on your machine.  Most distros
ship system-store contents that are basically just designed for
web browers, because the web is the only widely-applicable use
case.  Like you said, chicken and egg problem.

regards, tom lane

Re: prevent immature WAL streaming

2021-09-17 Thread Bossart, Nathan

On 9/17/21, 10:35 AM, "Alvaro Herrera"  wrote:
> On 2021-Sep-17, Bossart, Nathan wrote:
>
>> On 9/17/21, 9:37 AM, "Alvaro Herrera"  wrote:
>
>> > If you have some time to try your reproducers with this new proposed
>> > fix, I would appreciate it.
>>
>> I haven't had a chance to look at the patch yet, but it appears to fix
>> things with my original reproduction steps for the archive_status
>> stuff [0].
>
> Thank you, this is good to hear.

I gave the patch a read-through.  I'm wondering if the
XLOG_OVERWRITE_CONTRECORD records are actually necessary.  IIUC we
will set XLP_FIRST_IS_ABORTED_PARTIAL on the next page, and
xlp_pageaddr on that page will already be validated in
XLogReaderValidatePageHeader().  Does adding this new record also help
ensure the page header is filled in and flushed to disk?

Nathan

Re: initdb --pwfile /dev/zero

2021-09-17 Thread Tom Lane

Andres Freund  writes:
> A colleague tried PG 14 internally and it failed during cluster creation, when
> using the PGDG rpm packages. A bit of debugging shows that the problem is
> that the packaging script specifies the password using --pwfile /dev/zero.

> In 14+ this turns out to lead to an endless loop in pg_get_line_append().

Well, that's because that file will source an infinite amount of stuff.

> I think we still ought to make pg_get_line() a
> bit more resilient against '\0'?

I don't think '\0' is the problem.  The only fix for this would be to
re-introduce some fixed limit on how long a line we'll read, which
I'm not too thrilled about.  I think this is better classified as
user error.

regards, tom lane

initdb --pwfile /dev/zero

2021-09-17 Thread Andres Freund

Hi,

A colleague tried PG 14 internally and it failed during cluster creation, when
using the PGDG rpm packages. A bit of debugging shows that the problem is
that the packaging script specifies the password using --pwfile /dev/zero.

In 14+ this turns out to lead to an endless loop in pg_get_line_append().

The --pwfile /dev/zero was added in
https://git.postgresql.org/gitweb/?p=pgrpms.git;a=commitdiff;h=8ca418709ef49a1781f0ea8e6166b139106135ff

Devrim, what was the goal? Even in 13 this didn't achieve anything?


While I don't think passing /dev/zero is a good idea (it mostly seems to
circumvent ""password file \"%s\" is empty", without achieving anything, given
the password will be empty). I think we still ought to make pg_get_line() a
bit more resilient against '\0'?


Greetings,

Andres Freund

Re: prevent immature WAL streaming

2021-09-17 Thread Alvaro Herrera

On 2021-Sep-17, Bossart, Nathan wrote:

> On 9/17/21, 9:37 AM, "Alvaro Herrera"  wrote:

> > If you have some time to try your reproducers with this new proposed
> > fix, I would appreciate it.
> 
> I haven't had a chance to look at the patch yet, but it appears to fix
> things with my original reproduction steps for the archive_status
> stuff [0].

Thank you, this is good to hear.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Ed is the standard text editor."
  http://groups.google.com/group/alt.religion.emacs/msg/8d94ddab6a9b0ad3

Re: prevent immature WAL streaming

2021-09-17 Thread Alvaro Herrera

On 2021-Sep-17, Alvaro Herrera wrote:

> Added Matsumura-san to CC, because he was interested in this topic too
> per the earlier thread.

I failed to do this, so hopefully this serves as a ping.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/

Re: prevent immature WAL streaming

2021-09-17 Thread Bossart, Nathan

On 9/17/21, 9:37 AM, "Alvaro Herrera"  wrote:
> OK, this version is much more palatable, because here we verify that the
> OVERWRITE_CONTRECORD we replay matches the record that was lost.  Also,
> I wrote a test script that creates such a broken record (by the simple
> expedient of deleting the WAL file containing the second half while the
> server is down); we then create a standby and we can observe that it
> replays the sequence correctly.
>
> If you have some time to try your reproducers with this new proposed
> fix, I would appreciate it.

I haven't had a chance to look at the patch yet, but it appears to fix
things with my original reproduction steps for the archive_status
stuff [0].

Nathan

[0] https://postgr.es/m/CBDDFA01-6E40-46BB-9F98-9340F4379505%40amazon.com

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

2021-09-17 Thread Greg Stark

On Tue, 7 Sept 2021 at 12:59, Tom Lane  wrote:
>
> I guess what it
> comes down to is whether you think that public or private certs are
> likely to be the majority use-case in the long run.  The shortage of
> previous requests for this feature says that right now, just about
> everyone is using self-signed or private-CA certs for Postgres
> servers.  So it would likely be a long time, if ever, before public-CA
> certs become the majority use-case.

Well the main thing making public CA certs a pain is precisely tools
that are a pain to configure to use public CA certs so it's a bit of a
chicken and egg problem. Projects like LetsEncrypt are all about
making public CA certs work easily without any additional effort.

However I have a different question. Are the system certificates
intended or general purpose certificates? Do they have their intended
uses annotated on the certificates? Does SSL Verification have any
logic deciding which certificates are appropriate for signing servers?

I ask because the only authority I'm personally aware of is the web
browser consortium that approves signers for web site domains. That's
what web browsers need but I'm not sure those are the same authorities
that are appropriate for internal services like databases.

-- 
greg

Re: prevent immature WAL streaming

2021-09-17 Thread Alvaro Herrera

OK, this version is much more palatable, because here we verify that the
OVERWRITE_CONTRECORD we replay matches the record that was lost.  Also,
I wrote a test script that creates such a broken record (by the simple
expedient of deleting the WAL file containing the second half while the
server is down); we then create a standby and we can observe that it
replays the sequence correctly.

If you have some time to try your reproducers with this new proposed
fix, I would appreciate it.


Added Matsumura-san to CC, because he was interested in this topic too
per the earlier thread.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
>From 0404d47ca1456771a1be4c8afb2c1bd1e0fa0f08 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 2 Sep 2021 17:21:46 -0400
Subject: [PATCH v7 1/2] Implement FIRST_IS_ABORTED_CONTRECORD

---
 src/backend/access/rmgrdesc/xlogdesc.c  |  11 ++
 src/backend/access/transam/xlog.c   | 153 ++--
 src/backend/access/transam/xloginsert.c |   1 +
 src/backend/access/transam/xlogreader.c |  54 -
 src/include/access/xlog_internal.h  |  20 +++-
 src/include/access/xlogreader.h |  14 +++
 src/include/catalog/pg_control.h|   1 +
 7 files changed, 244 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index e6090a9dad..0be382f8a5 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -139,6 +139,14 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 		 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_OVERWRITE_CONTRECORD)
+	{
+		xl_overwrite_contrecord xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_overwrite_contrecord));
+		appendStringInfo(buf, "lsn %X/%X",
+		 LSN_FORMAT_ARGS(xlrec.overwritten_lsn));
+	}
 }
 
 const char *
@@ -178,6 +186,9 @@ xlog_identify(uint8 info)
 		case XLOG_END_OF_RECOVERY:
 			id = "END_OF_RECOVERY";
 			break;
+		case XLOG_OVERWRITE_CONTRECORD:
+			id = "OVERWRITE_CONTRECORD";
+			break;
 		case XLOG_FPI:
 			id = "FPI";
 			break;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e51a7a749d..b6af3eb258 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -199,6 +199,15 @@ static XLogRecPtr LastRec;
 static XLogRecPtr flushedUpto = 0;
 static TimeLineID receiveTLI = 0;
 
+/*
+ * abortedRecPtr is the start pointer of a broken record at end of WAL when
+ * recovery completes; missingContrecPtr is the location of the first
+ * contrecord that went missing.  See CreateOverwriteContrecordRecord for
+ * details.
+ */
+static XLogRecPtr abortedRecPtr;
+static XLogRecPtr missingContrecPtr;
+
 /*
  * During recovery, lastFullPageWrites keeps track of full_page_writes that
  * the replayed WAL records indicate. It's initialized with full_page_writes
@@ -894,6 +903,7 @@ static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
 static void CreateEndOfRecoveryRecord(void);
+static XLogRecPtr CreateOverwriteContrecordRecord(XLogRecPtr aborted_lsn);
 static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);
 static void KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo);
 static XLogRecPtr XLogGetReplicationSlotMinimumLSN(void);
@@ -927,6 +937,7 @@ static XLogRecord *ReadRecord(XLogReaderState *xlogreader,
 static void CheckRecoveryConsistency(void);
 static XLogRecord *ReadCheckpointRecord(XLogReaderState *xlogreader,
 		XLogRecPtr RecPtr, int whichChkpt, bool report);
+static void VerifyOverwrittenRecord(XLogReaderState *record);
 static bool rescanLatestTimeLine(void);
 static void InitControlFile(uint64 sysidentifier);
 static void WriteControlFile(void);
@@ -1120,8 +1131,8 @@ XLogInsertRecord(XLogRecData *rdata,
 		 * All the record data, including the header, is now ready to be
 		 * inserted. Copy the record in the space reserved.
 		 */
-		CopyXLogRecordToWAL(rechdr->xl_tot_len, isLogSwitch, rdata,
-			StartPos, EndPos);
+		CopyXLogRecordToWAL(rechdr->xl_tot_len, isLogSwitch,
+			rdata, StartPos, EndPos);
 
 		/*
 		 * Unless record is flagged as not important, update LSN of last
@@ -1504,7 +1515,8 @@ checkXLogConsistency(XLogReaderState *record)
  * area in the WAL.
  */
 static void
-CopyXLogRecordToWAL(int write_len, bool isLogSwitch, XLogRecData *rdata,
+CopyXLogRecordToWAL(int write_len, bool isLogSwitch,
+	XLogRecData *rdata,
 	XLogRecPtr StartPos, XLogRecPtr EndPos)
 {
 	char	   *currpos;
@@ -2246,6 +2258,19 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, bool opportunistic)
 		if (!Insert->forcePageWrites)
 			NewPage->xlp_info |= XLP_BKP_REMOVABLE;
 
+		/*
+		 * If a record was found to be broken at the end of recovery, and
+		 * we're going to write on the page where its first contrecord was
+		 * lost, se

Re: POC: Cleaning up orphaned files using undo logs

2021-09-17 Thread Dmitry Dolgov

> On Tue, Sep 14, 2021 at 10:51:42AM +0200, Antonin Houska wrote:
> Antonin Houska  wrote:
>
> > Dmitry Dolgov <9erthali...@gmail.com> wrote:
>
> > > * By throwing at the patchset `make installcheck` I'm getting from time 
> > > to time
> > >   and error on the restart:
> > >
> > > TRAP: FailedAssertion("BufferIsValid(buffers[nbuffers].buffer)",
> > > File: "undorecordset.c", Line: 1098, PID: 6055)
> > >
> > >   From what I see XLogReadBufferForRedoExtended finds an invalid buffer 
> > > and
> > >   returns BLK_NOTFOUND. The commentary says:
> > >
> > >  If the block was not found, then it must be discarded later in
> > >  the WAL.
> > >
> > >   and continues with skip = false, but fails to get a page from an invalid
> > >   buffer few lines later. It seems that the skip flag is supposed to be 
> > > used
> > >   this situation, should it also guard the BufferGetPage part?
> >
> > I could see this sometime too, but can't reproduce it now. It's also not 
> > clear
> > to me how XLogReadBufferForRedoExtended() can return BLK_NOTFOUND, as the
> > whole undo log segment is created at once, even if only part of it is 
> > needed -
> > see allocate_empty_undo_segment().
>
> I could eventually reproduce the problem. The root cause was that WAL records
> were created even for temporary / unlogged undo, and thus only empty pages
> could be found during replay. I've fixed that and also setup regular test for
> the BLK_NOTFOUND value. That required a few more fixes to UndoReplay().
>
> Attached is a new version.

Yep, makes sense, thanks. I have few more questions:

* The use case with orphaned files is working somewhat differently after
  the rebase on the latest master, do you observe it as well? The
  difference is ApplyPendingUndo -> SyncPostCheckpoint doesn't clean up
  an orphaned relation file immediately (only later on checkpoint)
  because of empty pendingUnlinks. I haven't investigated more yet, but
  seems like after this commit:

commit 7ff23c6d277d1d90478a51f0dd81414d343f3850
Author: Thomas Munro 
Date:   Mon Aug 2 17:32:20 2021 +1200

Run checkpointer and bgwriter in crash recovery.

Start up the checkpointer and bgwriter during crash recovery (except in
--single mode), as we do for replication.  This wasn't done back in
commit cdd46c76 out of caution.  Now it seems like a better idea to make
the environment as similar as possible in both cases.  There may also be
some performance advantages.

  something has to be updated (pendingOps are empty right now, so no
  unlink request is remembered).

* What happened with the idea of abandoning discard worker for the sake
  of simplicity? From what I see limiting everything to foreground undo
  could reduce the core of the patch series to the first four patches
  (forgetting about test and docs, but I guess it would be enough at
  least for the design review), which is already less overwhelming.

Re: Logical replication timeout problem

2021-09-17 Thread Fabrice Chapuis

the publisher and the subscriber run on the same postgres instance.

Regards,
Fabrice

On Fri, Sep 17, 2021 at 12:26 PM Amit Kapila 
wrote:

> On Fri, Sep 17, 2021 at 3:29 PM Fabrice Chapuis 
> wrote:
> >
> > Hi,
> >
> > Logical replication is configured on one instance in version 10.18.
> Timeout errors occur regularly and the worker process exit with an exit
> code 1
> >
> > 2021-09-16 12:06:50 CEST [24881]: [1-1]
> user=postgres,db=foo,client=[local] LOG:  duration: 1281408.171 ms
> statement: COPY schem.tab (col1, col2) FROM stdin;
> > 2021-09-16 12:07:11 CEST [12161]: [1-1] user=,db=,client= LOG:
> automatic analyze of table "foo.schem.tab" system usage: CPU: user: 4.13 s,
> system: 0.55 s, elapsed: 9.58 s
> > 2021-09-16 12:07:50 CEST [3770]: [2-1] user=,db=,client= ERROR:
> terminating logical replication worker due to timeout
> > 2021-09-16 12:07:50 CEST [12546]: [11-1] user=,db=,client= LOG:  worker
> process: logical replication worker for subscription 24106654 (PID 3770)
> exited with exit code 1
> > 2021-09-16 12:07:50 CEST [13872]: [1-1] user=,db=,client= LOG:  logical
> replication apply worker for subscription "sub" has started
> > 2021-09-16 12:07:50 CEST [13873]: [1-1]
> user=repuser,db=foo,client=127.0.0.1 LOG:  received replication command:
> IDENTIFY_SYSTEM
> >
>
> Can you share the publisher-side log as well?
>
>
> --
> With Regards,
> Amit Kapila.
>

Re: proposal: possibility to read dumped table's name from file

2021-09-17 Thread Pavel Stehule

> >
> > Initially not, but now, when I am thinking about it, I don't think so
> Bison helps. The syntax of the filter file is nicely linear. Now, the code
> of the parser is a little bit larger than minimalistic, but it is due to
> nicer error's messages. The raw implementation in Bison raised just "syntax
> error" and positions. I did code refactoring, and now the scanning, parsing
> and processing are divided into separated routines. Parsing related code
> has 90 lines. In this case, I don't think using a parser grammar file can
> carry any benefit. grammar is more readable, sure, but we need to include
> bison, we need to handle errors, and if we want to raise more helpful
> errors than just "syntax error", then the code will be longer.
>
> I'm not so concerned by code size, but rather parsing of quotations etc and
> being able to reason about it's correctness.  IMHO that's easier done by
> reading a defined grammar than parsing a handwritten parser.
>
>
In this case the complex part is not a parser, but the scanner is complex
and writing this in flex is not too easy. I wrote so the grammar file can
be more readable, but the usual error from Bison is "syntax error" and
position, so it does not win from the user perspective. When a parser is
not linear, then a generated parser can help a lot, but using it at this
moment is premature.


> Will do a closer review on the patch shortly.
>
> --
> Daniel Gustafsson   https://vmware.com/
>
>

Re: proposal: possibility to read dumped table's name from file

2021-09-17 Thread Pavel Stehule

pá 17. 9. 2021 v 14:07 odesílatel Daniel Gustafsson 
napsal:

> > On 17 Sep 2021, at 13:59, Pavel Stehule  wrote:
> > pá 17. 9. 2021 v 13:56 odesílatel Daniel Gustafsson  > napsal:
> > > On 17 Sep 2021, at 13:51, Pavel Stehule  > wrote:
> > > pá 17. 9. 2021 v 13:42 odesílatel Daniel Gustafsson   >>
> napsal:
>
> > > I am unable to write a filter statement which can
> > > handle this relname:
> > >
> > > CREATE TABLE "a""
> > > ""b" (a integer);
> > >
> > > Are you able to craft one for that?
> > >
> > > I am not able to dump this directly in pg_dump. Is it possible?
> >
> > Sure, see below:
> >
> > $ ./bin/psql filter
> > psql (15devel)
> > Type "help" for help.
> >
> > I didn't ask on this
> >
> > I asked if you can use -t and some for filtering this name?
>
> I didn't try as I don't see how that's relevant?  Surely we're not
> limiting the
> capabilities of a filtering file format based on the quoting semantics of a
> shell?
>

this patch just use existing functionality, that can be buggy too.

but I had a bug in this part - if I detect double double quotes on input I
have to send double quotes to output too.

It should be fixed in attached patch

[pavel@localhost pg_dump]$ echo 'include table "a""\n""b"' | ./pg_dump
--filter=-
--
-- PostgreSQL database dump
--

-- Dumped from database version 15devel
-- Dumped by pg_dump version 15devel




> --
> Daniel Gustafsson   https://vmware.com/
>
>
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 7682226b99..1b74c0eadd 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -789,6 +789,55 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Read objects filters from the specified file. Specify "-" to read from
+stdin. Lines of this file must have the following format:
+
+{ include | exclude } { table | schema | foreign_data | data } PATTERN
+
+   
+
+   
+The first keyword specifies whether the object is to be included
+or excluded, and the second keyword specifies the type of object
+to be filtered:
+table (table),
+schema (schema),
+foreign_data (foreign server),
+data (table data).
+   
+
+   
+With the following filter file, the dump would include table
+mytable1 and data from foreign tables of
+some_foreign_server foreign server, but exclude data
+from table mytable2.
+
+include table mytable1
+include foreign_data some_foreign_server
+exclude table mytable2
+
+   
+
+   
+Lines starting with # are ignored. The comment
+(started by #) can be placed after filter too.
+Empty lines are ignored too.
+   
+
+   
+The --filter option works just like the other
+options to include or exclude tables, schemas, table data, or foreign
+tables, and both forms may be combined.  Note that there are no options
+to exclude a specific foreign table or to include a specific table's
+data.
+   
+  
+ 
+
  
   --if-exists
   
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index a485fb2d07..0e8072a782 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -55,10 +55,12 @@
 #include "catalog/pg_trigger_d.h"
 #include "catalog/pg_type_d.h"
 #include "common/connect.h"
+#include "common/string.h"
 #include "dumputils.h"
 #include "fe_utils/option_utils.h"
 #include "fe_utils/string_utils.h"
 #include "getopt_long.h"
+#include "lib/stringinfo.h"
 #include "libpq/libpq-fs.h"
 #include "parallel.h"
 #include "pg_backup_db.h"
@@ -308,7 +310,7 @@ static void appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 static char *get_synchronized_snapshot(Archive *fout);
 static void setupDumpWorker(Archive *AHX);
 static TableInfo *getRootTableInfo(const TableInfo *tbinfo);
-
+static void read_filters_from_file(char *filename, DumpOptions *dopt);
 
 int
 main(int argc, char **argv)
@@ -380,6 +382,7 @@ main(int argc, char **argv)
 		{"enable-row-security", no_argument, &dopt.enable_row_security, 1},
 		{"exclude-table-data", required_argument, NULL, 4},
 		{"extra-float-digits", required_argument, NULL, 8},
+		{"filter", required_argument, NULL, 12},
 		{"if-exists", no_argument, &dopt.if_exists, 1},
 		{"inserts", no_argument, NULL, 9},
 		{"lock-wait-timeout", required_argument, NULL, 2},
@@ -613,6 +616,10 @@ main(int argc, char **argv)
 		  optarg);
 break;
 
+			case 12:			/* filter implementation */
+read_filters_from_file(optarg, &dopt);
+break;
+
 			default:
 fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 exit_nicely(1);
@@ -1038,6 +1045,8 @@ help(const char *progname)
 			 "   access to)\n")

Re: Added schema level support for publication.

2021-09-17 Thread vignesh C

On Thu, Sep 16, 2021 at 11:24 AM Amit Kapila  wrote:
>
> On Wed, Sep 15, 2021 at 4:45 PM Greg Nancarrow  wrote:
> >
> > On Tue, Sep 14, 2021 at 6:38 PM vignesh C  wrote:
> > >
> > > I have handled this in the patch attached.
> > >
> >
> > Regarding the following function in the v28-0002 patch:
> >
> > +/*
> > + * Check if the relation schema is member of the schema list.
> > + */
> > +static void
> > +RelSchemaIsMemberOfSchemaList(List *rels, List *schemaidlist, bool 
> > schemacheck)
> >
> > I think this function is not well named or commented, and I don't like
> > how the "schemacheck" bool parameter determines the type of objects in
> > the "rels" list.
> >
>
> I think after fixing the comments in my previous email, the rels list
> will become the same for this function but surely the extra parameter
> is required for giving object-specific errors.
>
> > I would suggest you simply split this function into two separate
> > functions, corresponding to each of the blocks of the "if-else" within
> > the for-loop of the existing RelSchemaIsMemberOfSchemaList function.
> > The "Is" part of the existing "RelSchemaIsMemberOfSchemaList" function
> > name implies a boolean return value, so seems misleading.
> > So I think the names of the two functions that I am suggesting should
> > be "CheckNotAlreadyInPublication" or something similar.
> >
>
> I think if we write individual functions then we need to add new
> functions as and when we add new object types like sequences. The
> other idea could be to keep a single function like now say
> CheckObjSchemaNotAlreadyInPublication and instead of the bool
> parameter as the patch has now, we can keep an enum parameter
> "add_obj_type" for 'rel', 'schema', 'sequence'. We can either use
> exiting enum PublicationObjSpecType or define a new one for the same.

Modified the function name and changed the parameter to
PublicationObjSpecType. The changes are present at the v29 patch
posted at [1].
[1] - 
https://www.postgresql.org/message-id/CALDaNm1Wb%3D_HGd85wp2WM%2BfLc-8PSJ824TOZEJ6nDz3akWTidw%40mail.gmail.com

Regards,
Vignesh

Re: Added schema level support for publication.

2021-09-17 Thread vignesh C

On Thu, Sep 16, 2021 at 9:54 AM Amit Kapila  wrote:
>
> On Wed, Sep 15, 2021 at 12:30 PM Amit Kapila  wrote:
> >
> > On Tue, Sep 14, 2021 at 2:08 PM vignesh C  wrote:
> > >
> > > I have handled this in the patch attached.
> > >
> >
> > 4.
> > AlterPublicationSchemas()
> > {
> > ..
> > + /*
> > + * If the table option was not specified remove the existing tables
> > + * from the publication.
> > + */
> > + if (!tables)
> > + {
> > + rels = GetPublicationRelations(pubform->oid, PUBLICATION_PART_ROOT);
> > + PublicationDropTables(pubform->oid, rels, false, true);
> > + }
> > +
> > + /* Identify which schemas should be dropped */
> > + delschemas = list_difference_oid(oldschemaids, schemaidlist);
> > +
> > + /* And drop them */
> > + PublicationDropSchemas(pubform->oid, delschemas, true);
> >
> > Here, you have neither locked tables to be dropped nor schemas. I
> > think both need to be locked as we do for tables in similar code in
> > AlterPublicationTables(). Can you please test via debugger what
> > happens if we try to drop without taking lock here and concurrently
> > try to drop the actual object? It should give some error. If we decide
> > to lock here then we should be able to pass the list of relations to
> > PublicationDropTables() instead of Oids which would then obviate the
> > need for any change to that function.
> >
> > Similarly don't we need to lock schemas before dropping them in
> > AlterPublicationTables()?
> >
>
> I think there is one more similar locking problem.
> AlterPublicationSchemas()
> {
> ..
> + if (stmt->action == DEFELEM_ADD)
> + {
> + List *rels;
> +
> + rels = GetPublicationRelations(pubform->oid, PUBLICATION_PART_ROOT);
> + RelSchemaIsMemberOfSchemaList(rels, schemaidlist, true);
> ...
> ...
> }
>
> Here, we don't have a lock on the relation. So, what if the relation
> is concurrently dropped after you get the rel list by
> GetPublicationRelations?

This works fine without locking even after concurrent drop, I felt
this works because of MVCC.

Regards,
Vignesh

Re: proposal: possibility to read dumped table's name from file

2021-09-17 Thread Stephen Frost

Greetings,

On Fri, Sep 17, 2021 at 14:07 Daniel Gustafsson  wrote:

> > On 17 Sep 2021, at 13:59, Pavel Stehule  wrote:
> > pá 17. 9. 2021 v 13:56 odesílatel Daniel Gustafsson  > napsal:
> > > On 17 Sep 2021, at 13:51, Pavel Stehule  > wrote:
> > > pá 17. 9. 2021 v 13:42 odesílatel Daniel Gustafsson   >>
> napsal:
>
> > > I am unable to write a filter statement which can
> > > handle this relname:
> > >
> > > CREATE TABLE "a""
> > > ""b" (a integer);
> > >
> > > Are you able to craft one for that?
> > >
> > > I am not able to dump this directly in pg_dump. Is it possible?
> >
> > Sure, see below:
> >
> > $ ./bin/psql filter
> > psql (15devel)
> > Type "help" for help.
> >
> > I didn't ask on this
> >
> > I asked if you can use -t and some for filtering this name?
>
> I didn't try as I don't see how that's relevant?  Surely we're not
> limiting the
> capabilities of a filtering file format based on the quoting semantics of a
> shell?


Yeah, agreed. I would think that a DBA might specifically want to be able
to use a config file to get away from having to deal with shell quoting, in
fact…

Thanks,

Stephen

>

Re: proposal: possibility to read dumped table's name from file

2021-09-17 Thread Daniel Gustafsson

> On 17 Sep 2021, at 13:59, Pavel Stehule  wrote:
> pá 17. 9. 2021 v 13:56 odesílatel Daniel Gustafsson  > napsal:
> > On 17 Sep 2021, at 13:51, Pavel Stehule  > > wrote:
> > pá 17. 9. 2021 v 13:42 odesílatel Daniel Gustafsson  >  >> 
> > napsal:

> > I am unable to write a filter statement which can
> > handle this relname:
> > 
> > CREATE TABLE "a""
> > ""b" (a integer);
> > 
> > Are you able to craft one for that?
> > 
> > I am not able to dump this directly in pg_dump. Is it possible?
> 
> Sure, see below:
> 
> $ ./bin/psql filter
> psql (15devel)
> Type "help" for help.
> 
> I didn't ask on this
> 
> I asked if you can use -t and some for filtering this name?

I didn't try as I don't see how that's relevant?  Surely we're not limiting the
capabilities of a filtering file format based on the quoting semantics of a
shell?

--
Daniel Gustafsson   https://vmware.com/

Re: proposal: possibility to read dumped table's name from file

2021-09-17 Thread Stephen Frost

Greetings,

On Fri, Sep 17, 2021 at 13:59 Pavel Stehule  wrote:

>
>
> pá 17. 9. 2021 v 13:56 odesílatel Daniel Gustafsson 
> napsal:
>
>> > On 17 Sep 2021, at 13:51, Pavel Stehule 
>> wrote:
>> > pá 17. 9. 2021 v 13:42 odesílatel Daniel Gustafsson > > napsal:
>>
>> > I am unable to write a filter statement which can
>> > handle this relname:
>> >
>> > CREATE TABLE "a""
>> > ""b" (a integer);
>> >
>> > Are you able to craft one for that?
>> >
>> > I am not able to dump this directly in pg_dump. Is it possible?
>>
>> Sure, see below:
>>
>> $ ./bin/psql filter
>> psql (15devel)
>> Type "help" for help.
>>
>>
> I didn't ask on this
>
> I asked if you can use -t and some for filtering this name
>
> ?
>

For my part, at least, I don’t see that this particularly matters..  for a
new feature that’s being developed to allow users to export specific
tables, I would think we’d want to support any table names which can exist.

Thanks,

Stephen

Re: proposal: possibility to read dumped table's name from file

2021-09-17 Thread Pavel Stehule

pá 17. 9. 2021 v 13:56 odesílatel Daniel Gustafsson 
napsal:

> > On 17 Sep 2021, at 13:51, Pavel Stehule  wrote:
> > pá 17. 9. 2021 v 13:42 odesílatel Daniel Gustafsson  > napsal:
>
> > I am unable to write a filter statement which can
> > handle this relname:
> >
> > CREATE TABLE "a""
> > ""b" (a integer);
> >
> > Are you able to craft one for that?
> >
> > I am not able to dump this directly in pg_dump. Is it possible?
>
> Sure, see below:
>
> $ ./bin/psql filter
> psql (15devel)
> Type "help" for help.
>
>
I didn't ask on this

I asked if you can use -t and some for filtering this name

?


> filter=# create table "a""
> filter"# ""b" (a integer);
> CREATE TABLE
> filter=# select relname from pg_class order by oid desc limit 1;
>  relname
> -
>  a" +
>  "b
> (1 row)
>
> filter=# ^D\q
> $ ./bin/pg_dump -s filter
> --
> -- PostgreSQL database dump
> --
>
> -- Dumped from database version 15devel
> -- Dumped by pg_dump version 15devel
>
> SET statement_timeout = 0;
> SET lock_timeout = 0;
> SET idle_in_transaction_session_timeout = 0;
> SET client_encoding = 'UTF8';
> SET standard_conforming_strings = on;
> SELECT pg_catalog.set_config('search_path', '', false);
> SET check_function_bodies = false;
> SET xmloption = content;
> SET client_min_messages = warning;
> SET row_security = off;
>
> SET default_tablespace = '';
>
> SET default_table_access_method = heap;
>
> --
> -- Name: a" "b; Type: TABLE; Schema: public; Owner: danielg
> --
>
> CREATE TABLE public."a""
> ""b" (
> a integer
> );
>
>
> ALTER TABLE public."a""
> ""b" OWNER TO danielg;
>
> --
> -- PostgreSQL database dump complete
> --
>
> --
> Daniel Gustafsson   https://vmware.com/
>
>

Re: Added schema level support for publication.

2021-09-17 Thread vignesh C

On Thu, Sep 16, 2021 at 8:59 AM houzj.f...@fujitsu.com
 wrote:
>
> On Tuesday, September 14, 2021 4:39 PM vignesh C  wrote:
> >
> > I have handled this in the patch attached.
>
> Thanks for updating the patch.
> Here are some comments.
>
> 1)
> +static void
> +AlterPublicationSchemas(AlterPublicationStmt *stmt, Relation rel,
> ...
> +   /*
> +* If the table option was not specified remove the existing 
> tables
> +* from the publication.
> +*/
> +   if (!tables)
> +   {
> +   rels = GetPublicationRelations(pubform->oid, 
> PUBLICATION_PART_ROOT);
> +   PublicationDropTables(pubform->oid, rels, false, 
> true);
> +   }
>
>
> It seems not natural to drop tables in AlterPublication*Schemas*,
> I think we'd better do it in AlterPublicationTables.

I felt keeping the current way keeps it better to avoid additional
checks. Thoughts?

> 2)
>  static void
>  AlterPublicationTables(AlterPublicationStmt *stmt, Relation rel,
>  ...
> +   /*
> +* If ALL TABLES IN SCHEMA option was not specified 
> remove the
> +* existing schemas from the publication.
> +*/
> +   List *pubschemas = GetPublicationSchemas(pubid);
> +   PublicationDropSchemas(pubform->oid, pubschemas, 
> false);
>
> Same as 1), Is it better to modify the schema list in AlterPublicationSchemas 
> ?

This is similar to above.

> 3)
>  static void
>  AlterPublicationTables(AlterPublicationStmt *stmt, Relation rel,
> ...
> /* check if the relation is member of the schema list 
> specified */
> RelSchemaIsMemberOfSchemaList(rels, schemaidlist, false);
>
> IIRC, The check here is to check the specified tables and schemas in the
> command. Personally, this seems a common operation which can be placed in
> function AlterPublication(). If we move this check to AlterPublication() and 
> if
> comment 1) and 2) makes sense to you, then we don't need the new function
> parameters in AlterPublicationTables() and AlterPublicationSchemas().

I felt we can keep the checks as is currently, else we will have to
extra checks outside and addition calls for conversion from oid to
Relation like:
  if (stmt->options)
AlterPublicationOptions(pstate, stmt, rel, tup);
  else
  {
if (relations)
{
  if (stmt->action != DEFELEM_DROP)
  {
List *rels = OpenTableList(relations);

/* check if relation is member of the schema list specified */
RelSchemaIsMemberOfSchemaList(rels, schemaidlist, false);
CloseTableList(rels);
  }

  AlterPublicationTables(stmt, rel, tup, relations,
   list_length(schemaidlist));
}
if (schemaidlist)
  AlterPublicationSchemas(stmt, rel, tup, schemaidlist,
  list_length(relations));
  }

Thoughts?

Regards,
Vignesh

Re: proposal: possibility to read dumped table's name from file

2021-09-17 Thread Daniel Gustafsson

> On 17 Sep 2021, at 13:51, Pavel Stehule  wrote:
> pá 17. 9. 2021 v 13:42 odesílatel Daniel Gustafsson  > napsal:

> I am unable to write a filter statement which can
> handle this relname:
> 
> CREATE TABLE "a""
> ""b" (a integer);
> 
> Are you able to craft one for that?
> 
> I am not able to dump this directly in pg_dump. Is it possible?

Sure, see below:

$ ./bin/psql filter
psql (15devel)
Type "help" for help.

filter=# create table "a""
filter"# ""b" (a integer);
CREATE TABLE
filter=# select relname from pg_class order by oid desc limit 1;
 relname
-
 a" +
 "b
(1 row)

filter=# ^D\q
$ ./bin/pg_dump -s filter
--
-- PostgreSQL database dump
--

-- Dumped from database version 15devel
-- Dumped by pg_dump version 15devel

SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET xmloption = content;
SET client_min_messages = warning;
SET row_security = off;

SET default_tablespace = '';

SET default_table_access_method = heap;

--
-- Name: a" "b; Type: TABLE; Schema: public; Owner: danielg
--

CREATE TABLE public."a""
""b" (
a integer
);


ALTER TABLE public."a""
""b" OWNER TO danielg;

--
-- PostgreSQL database dump complete
--

--
Daniel Gustafsson   https://vmware.com/

Re: proposal: possibility to read dumped table's name from file

2021-09-17 Thread Pavel Stehule

pá 17. 9. 2021 v 13:42 odesílatel Daniel Gustafsson 
napsal:

> > On 15 Sep 2021, at 19:31, Pavel Stehule  wrote:
> > po 13. 9. 2021 v 15:01 odesílatel Daniel Gustafsson  > napsal:
>
> > One issue with this syntax is that the include keyword can be quite
> misleading
> > as it's semantic interpretion of "include table t" can be different from
> > "--table=t".  The former is less clear about the fact that it means
> "exclude
> > all other tables than " then the latter.  It can be solved with
> documentation,
> > but I think that needs be to be made clearer.
> >
> > I invite any documentation enhancing and fixing
>
> Sure, that can be collabored on.  This gist is though that IMO the
> keywords in
> the filter file aren't as clear on the sideeffects as the command line
> params,
> even though they are equal in functionality.
>
> > +   pg_log_error("invalid format of filter file \"%s\": %s",
> > +filename,
> > +message);
> > +
> > +   fprintf(stderr, "%d: %s\n", lineno, line);
> > Can't we just include the lineno in the error logging and skip dumping
> the
> > offending line?  Fast-forwarding the pointer to print the offending part
> is
> > less useful than a context marker, and in some cases suboptimal.  With
> this
> > coding, if a pattern is omitted for example the below error message is
> given:
> >
> > pg_dump: error: invalid format of filter file "filter.txt": missing
> object name
> > 1:
> >
> > The errormessage and the linenumber in the file should be enough for the
> user
> > to figure out what to fix.
> >
> > I did it like you proposed, but still, I think the content can be useful.
>
> Not when there is no content in the error message, printing an empty
> string for
> a line number which isn't a blank line doesn't seem terribly helpful.  If
> we
> know the error context is empty, printing a tailored error hint seems more
> useful for the user.
>
> > More times you read dynamically generated files, or you read data from
> stdin, and in complex environments it can be hard regenerate new content
> for debugging.
>
> That seems odd given that the arguments for this format has been that it's
> likely to be handwritten.
>
> > If I create a table called "a\nb" and try to dump it I get an error in
> parsing
> > the file.  Isn't this supposed to work?
> > $ cat filter.txt
> > include table "a
> > b"
> > $ ./bin/pg_dump --filter=filter.txt
> > pg_dump: error: invalid format of filter file "filter.txt":
> unexpected chars after object name
> > 2:
> >
> > probably there was some issue, because it should work. I tested a new
> version and this is tested in new regress tests. Please, check
>
> That seems to work, but I am unable to write a filter statement which can
> handle this relname:
>
> CREATE TABLE "a""
> ""b" (a integer);
>
> Are you able to craft one for that?
>

I am not able to dump this directly in pg_dump. Is it possible?



> > Did you consider implementing this in Bison to abstract some of the
> messier
> > parsing logic?
> >
> > Initially not, but now, when I am thinking about it, I don't think so
> Bison helps. The syntax of the filter file is nicely linear. Now, the code
> of the parser is a little bit larger than minimalistic, but it is due to
> nicer error's messages. The raw implementation in Bison raised just "syntax
> error" and positions. I did code refactoring, and now the scanning, parsing
> and processing are divided into separated routines. Parsing related code
> has 90 lines. In this case, I don't think using a parser grammar file can
> carry any benefit. grammar is more readable, sure, but we need to include
> bison, we need to handle errors, and if we want to raise more helpful
> errors than just "syntax error", then the code will be longer.
>
> I'm not so concerned by code size, but rather parsing of quotations etc and
> being able to reason about it's correctness.  IMHO that's easier done by
> reading a defined grammar than parsing a handwritten parser.
>
> Will do a closer review on the patch shortly.
>
> --
> Daniel Gustafsson   https://vmware.com/
>
>

Re: proposal: possibility to read dumped table's name from file

2021-09-17 Thread Pavel Stehule

Hi


> A main concern among most (all?) participants of the thread, regardless of
> format supported, is that quoting is hard and must be done right for all
> object
> names postgres support (including any not currently in scope by this
> patch).
>
>
Just a small note - when quoting is calculated to design, then the
implementation is easy. I am sure, so my last code covers all
possibilities, and it is about 100 lines of code.



> Below is an attempt at summarizing and grouping the proposals so far into
> the
> set of ideas presented:
>
> A) A keyword+object based format to invoke with a switch to essentially
> allow for more filters than the commandline can handle and nothing
> more.
> After a set of revisions, the current proposal is:
> [include|exclude] [] []
>
> B) A format similar to (A) which can also be used for pg_dump
> configuration
>
> C) The format in (A), or a close variant thereof, with the intention
> of it
> being included in/referred to from a future configuration file of
> currently
> unknown format.  One reference being a .gitignore type file.
>
> D) An existing format (JSON and TOML have been suggested, with JSON
> being dismissed due to lack of comment support) which has quoting
> conventions that supports postgres' object names and which can be used
> to
> define a full pg_dump configuration file syntax.
>
> For B), C) and D) there is implicit consensus in the thread that we don't
> need
> to implement the full configuration file as of this patch, merely that it
> *must* be possible to do so without having to paint ourselves out of a
> corner.
>
> At this point it seems to me that B) and C) has the broadest support.  Can
> the
> C) option may represent the compromise between "simple" format for object
> filtering and a more structured format for configuration?  Are there other
> options?
>

What should be a benefit of this variant?

Regards

Pavel


>
> Thoughts?
>
> --
> Daniel Gustafsson   https://vmware.com/
>
> [0] https://postgr.es/m/f6674ff0-5800-4aed-9dc7-13c475707...@yesql.se
> [1]
> https://postgr.es/m/CALAY4q9u30L7oGhbsfY3dPECQ8SrYa8YO=h-xon5xwueien...@mail.gmail.com
> [2] https://postgr.es/m/20201110200904.gu16...@tamriel.snowman.net
> [3]
> https://postgr.es/m/CAEZATCVKMG7+b+_5tNwrNZ-aNDBy3=fmrnea2bo9o4qgcev...@mail.gmail.com
> [4] https://postgr.es/m/502641.1606334...@sss.pgh.pa.us
> [5] https://postgr.es/m/619671.1606406...@sss.pgh.pa.us
> [6]
> https://postgr.es/m/cb545d78-2dae-8d27-f062-822a07ca5...@enterprisedb.com
> [7] https://postgr.es/m/202107122259.n6o5uwb5erza@alvherre.pgsql
> [8] https://postgr.es/m/3183720.1626131...@sss.pgh.pa.us
>
>

Re: proposal: possibility to read dumped table's name from file

2021-09-17 Thread Daniel Gustafsson

> On 15 Sep 2021, at 19:31, Pavel Stehule  wrote:
> po 13. 9. 2021 v 15:01 odesílatel Daniel Gustafsson  > napsal:

> One issue with this syntax is that the include keyword can be quite misleading
> as it's semantic interpretion of "include table t" can be different from
> "--table=t".  The former is less clear about the fact that it means "exclude
> all other tables than " then the latter.  It can be solved with documentation,
> but I think that needs be to be made clearer.
> 
> I invite any documentation enhancing and fixing 

Sure, that can be collabored on.  This gist is though that IMO the keywords in
the filter file aren't as clear on the sideeffects as the command line params,
even though they are equal in functionality.

> +   pg_log_error("invalid format of filter file \"%s\": %s",
> +filename,
> +message);
> +
> +   fprintf(stderr, "%d: %s\n", lineno, line);
> Can't we just include the lineno in the error logging and skip dumping the
> offending line?  Fast-forwarding the pointer to print the offending part is
> less useful than a context marker, and in some cases suboptimal.  With this
> coding, if a pattern is omitted for example the below error message is given:
>   
> pg_dump: error: invalid format of filter file "filter.txt": missing 
> object name
> 1:
> 
> The errormessage and the linenumber in the file should be enough for the user
> to figure out what to fix.
> 
> I did it like you proposed, but still, I think the content can be useful.

Not when there is no content in the error message, printing an empty string for
a line number which isn't a blank line doesn't seem terribly helpful.  If we
know the error context is empty, printing a tailored error hint seems more
useful for the user.

> More times you read dynamically generated files, or you read data from stdin, 
> and in complex environments it can be hard regenerate new content for 
> debugging.

That seems odd given that the arguments for this format has been that it's
likely to be handwritten.

> If I create a table called "a\nb" and try to dump it I get an error in parsing
> the file.  Isn't this supposed to work?
> $ cat filter.txt
> include table "a
> b"
> $ ./bin/pg_dump --filter=filter.txt
> pg_dump: error: invalid format of filter file "filter.txt": unexpected 
> chars after object name
> 2:
> 
> probably there was some issue, because it should work. I tested a new version 
> and this is tested in new regress tests. Please, check

That seems to work, but I am unable to write a filter statement which can
handle this relname:

CREATE TABLE "a""
""b" (a integer);

Are you able to craft one for that?

> Did you consider implementing this in Bison to abstract some of the messier
> parsing logic?
> 
> Initially not, but now, when I am thinking about it, I don't think so Bison 
> helps. The syntax of the filter file is nicely linear. Now, the code of the 
> parser is a little bit larger than minimalistic, but it is due to nicer 
> error's messages. The raw implementation in Bison raised just "syntax error" 
> and positions. I did code refactoring, and now the scanning, parsing and 
> processing are divided into separated routines. Parsing related code has 90 
> lines. In this case, I don't think using a parser grammar file can carry any 
> benefit. grammar is more readable, sure, but we need to include bison, we 
> need to handle errors, and if we want to raise more helpful errors than just 
> "syntax error", then the code will be longer. 

I'm not so concerned by code size, but rather parsing of quotations etc and
being able to reason about it's correctness.  IMHO that's easier done by
reading a defined grammar than parsing a handwritten parser.

Will do a closer review on the patch shortly.

--
Daniel Gustafsson   https://vmware.com/

Re: proposal: possibility to read dumped table's name from file

2021-09-17 Thread Daniel Gustafsson

As there have been a lot of differing opinions raised in this thread, I re-read
it and tried to summarize the discussion so far to try and figure out where we
agree and on what (and disagree) before we get deep into the technicalities wrt
the current patch.  If anyone feel I've misrepresented them below then I
sincerely do apologize.  If I missed a relevant viewpoint I also apologize,
I've tried to objectively represent the thread.

I proposed JSON in [0] which is where the format discussion to some extent
started, Justin and Pavel had up until that point discussed the format by
refining the original proposal.

In [1] Surafel Temesgen brought up --exclude-database from pg_dumpall and
--no-comments, and argued for them being handled by this patch.  This was
objected against on the grounds that pg_dumpall is out of scope, and
all-or-nothing switches not being applicable in a filter option.

Stephen objected to both the proposed, and the suggestion of JSON, in [2] and
argued for a more holistic configuration file approach.  TOML was suggested.
Dean then +1'd the config file approach in [3].

In [4] Tom supported the idea of a more generic config file, and remarked that
the proposed filter for table names only makes sense when the number of exclude
patterns are large enough that we might hit other problems in pg_dump.
Further, in [5] Tom commented that a format with established quoting
conventions would buy us not having to invent our own to cope with complicated
relation names.

The fact that JSON doesn't support comments is brought up in a few emails and
is a very valid point, as the need for comments regardless of format is brought
up as well.

Tomas Vondra in [6] wanted the object filter be a separate file from a config
file, and argued for a simpler format for these lists (while still supporting
multiple object types).

Alvaro agreed with Tomas on [+-] OBJTYPE OBJIDENT in [7] and Tom extended the
proposal to use [include/exclude] keywords in [8] in order to support more than
just excluding and including.  Regardless of stance on format, the use of
keywords instead of [+-] is a rare point of consensus in this thread.

Stephen and myself have also expressed concern in various parts of the thread
that inventing our own format rather than using something with existing broad
library support will end up with third-parties (like pgAdmin et.al) having to
all write their own generators and parsers.

A main concern among most (all?) participants of the thread, regardless of
format supported, is that quoting is hard and must be done right for all object
names postgres support (including any not currently in scope by this patch).

Below is an attempt at summarizing and grouping the proposals so far into the
set of ideas presented:

A) A keyword+object based format to invoke with a switch to essentially
allow for more filters than the commandline can handle and nothing more.
After a set of revisions, the current proposal is:
[include|exclude] [] []

B) A format similar to (A) which can also be used for pg_dump configuration

C) The format in (A), or a close variant thereof, with the intention of it
being included in/referred to from a future configuration file of currently
unknown format.  One reference being a .gitignore type file.

D) An existing format (JSON and TOML have been suggested, with JSON
being dismissed due to lack of comment support) which has quoting
conventions that supports postgres' object names and which can be used to
define a full pg_dump configuration file syntax.

For B), C) and D) there is implicit consensus in the thread that we don't need
to implement the full configuration file as of this patch, merely that it
*must* be possible to do so without having to paint ourselves out of a corner.

At this point it seems to me that B) and C) has the broadest support.  Can the
C) option may represent the compromise between "simple" format for object
filtering and a more structured format for configuration?  Are there other
options?

Thoughts?

--
Daniel Gustafsson   https://vmware.com/

[0] https://postgr.es/m/f6674ff0-5800-4aed-9dc7-13c475707...@yesql.se
[1] 
https://postgr.es/m/CALAY4q9u30L7oGhbsfY3dPECQ8SrYa8YO=h-xon5xwueien...@mail.gmail.com
[2] https://postgr.es/m/20201110200904.gu16...@tamriel.snowman.net
[3] 
https://postgr.es/m/CAEZATCVKMG7+b+_5tNwrNZ-aNDBy3=fmrnea2bo9o4qgcev...@mail.gmail.com
[4] https://postgr.es/m/502641.1606334...@sss.pgh.pa.us
[5] https://postgr.es/m/619671.1606406...@sss.pgh.pa.us
[6] https://postgr.es/m/cb545d78-2dae-8d27-f062-822a07ca5...@enterprisedb.com
[7] https://postgr.es/m/202107122259.n6o5uwb5erza@alvherre.pgsql
[8] https://postgr.es/m/3183720.1626131...@sss.pgh.pa.us

Re: [BUG] Unexpected action when publishing partition tables

2021-09-17 Thread Amit Kapila

On Fri, Sep 17, 2021 at 11:36 AM houzj.f...@fujitsu.com
 wrote:
>
> On Thursday, September 16, 2021 6:05 PM Amit Kapila 
> > > On Tuesday, September 14, 2021 10:41 PM vignesh C  
> > > wrote:
> > > > On Tue, Sep 7, 2021 at 11:38 AM houzj.f...@fujitsu.com 
> > > >  wrote:
> > >
> > > Thanks for the comment.
> > > Attach new version patches which clean the table at the end.
> > >
> >
> > + * For partitioned table contained in the publication, we must
> > + * invalidate all partitions contained in the respective partition
> > + * trees, not just the one explicitly mentioned in the publication.
> >
> > Can we slightly change the above comment as: "For the partitioned tables, we
> > must invalidate all partitions contained in the respective partition 
> > hierarchies,
> > not just the one explicitly mentioned in the publication. This is required
> > because we implicitly publish the child tables when the parent table is
> > published."
> >
> > Apart from this, the patch looks good to me.
> >
> > I think we need to back-patch this till v13. What do you think?
>
> I agreed.
>
> Attach patches for back-branch, each has passed regression tests and pgindent.
>

Thanks, your patches look good to me. I'll push them sometime next
week after Tuesday unless there are any comments.

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2021-09-17 Thread Amit Kapila

On Fri, Sep 17, 2021 at 3:29 PM Fabrice Chapuis  wrote:
>
> Hi,
>
> Logical replication is configured on one instance in version 10.18. Timeout 
> errors occur regularly and the worker process exit with an exit code 1
>
> 2021-09-16 12:06:50 CEST [24881]: [1-1] user=postgres,db=foo,client=[local] 
> LOG:  duration: 1281408.171 ms  statement: COPY schem.tab (col1, col2) FROM 
> stdin;
> 2021-09-16 12:07:11 CEST [12161]: [1-1] user=,db=,client= LOG:  automatic 
> analyze of table "foo.schem.tab" system usage: CPU: user: 4.13 s, system: 
> 0.55 s, elapsed: 9.58 s
> 2021-09-16 12:07:50 CEST [3770]: [2-1] user=,db=,client= ERROR:  terminating 
> logical replication worker due to timeout
> 2021-09-16 12:07:50 CEST [12546]: [11-1] user=,db=,client= LOG:  worker 
> process: logical replication worker for subscription 24106654 (PID 3770) 
> exited with exit code 1
> 2021-09-16 12:07:50 CEST [13872]: [1-1] user=,db=,client= LOG:  logical 
> replication apply worker for subscription "sub" has started
> 2021-09-16 12:07:50 CEST [13873]: [1-1] user=repuser,db=foo,client=127.0.0.1 
> LOG:  received replication command: IDENTIFY_SYSTEM
>

Can you share the publisher-side log as well?


-- 
With Regards,
Amit Kapila.

Re: walsender timeout on logical replication set

2021-09-17 Thread Amit Kapila

On Fri, Sep 17, 2021 at 12:48 PM Kyotaro Horiguchi
 wrote:
>
> Thank you vary much for coming in!
>
> At Fri, 17 Sep 2021 10:18:11 +0530, Amit Kapila  
> wrote in
> > On Mon, Sep 13, 2021 at 7:01 AM Kyotaro Horiguchi
> >  wrote:
> > >
> > > Hello.
> > >
> > > As reported in [1] it seems that walsender can suffer timeout in
> > > certain cases.  It is not clearly confirmed, but I suspect that
> > > there's the case where LogicalRepApplyLoop keeps running the innermost
> > > loop without receiving keepalive packet for longer than
> > > wal_sender_timeout (not wal_receiver_timeout).
> >
> > Why is that happening? In the previous investigation in this area [1]
> > your tests revealed that after reading a WAL page, we always send keep
> > alive, so even if the transaction is large, we should send some
> > keepalive in-between.
>
> We fixed too-many keepalives (aka keepalive-flood) in the thread, but
> this is an issue of long absense of subscriber response. What I'm
> suspecting, or assuming here is:
>
> - The publisher is working fine. It doesn't send extra keepalives so
>   much and does send regular keepalives with wal_sender_timeout/2 by
>   the sender's clock.
>

I think the publisher should also send it even after a certain amount
of WAL is consumed via the below code:
WalSndWaitForWal()
{
...
if (MyWalSnd->flush < sentPtr &&
MyWalSnd->write < sentPtr &&
!waiting_for_ping_response)
WalSndKeepalive(false);
...
}

> - Networks conveys all the data in-time.
>
> - The subscriber consumes received data at less than half the speed at
>   which the publisher sends data.  In this case, while the burst
>   traffic is coming, the publisher keep sending for
>   wal_sender_timeout/2 seconds and it may not send a keepalive for the
>   same duration. This is the correct behavior.  On the other hand, the
>   subscriber is kept busy without receiving a keepalive for
>   wal_sender_timeout seconds.  AFAICS LogicalRepApplyLoop doesn't send
>   a response unless a keepalive comes while in the inner-most loop.
>

One way this could happen is that the apply is taking a long time
because of contention on subscriber, say there are a lot of other
operations going on in the subscriber or it stuck due to some reason.

> If wel_sender_timeout is relatively short (5 seconds, in the report),
> a burst (or a gap-less) logical replication traffic can continue
> easily for more than 2.5 seconds.  If wal_sender_timeout is longer (1
> min, ditto), burst replication traffics last for more than
> wal_sender_timeout/2 becomes relatively not so frequent.
>
> However, I'm not sure how it makes things worse again to increase it
> further to 5 min.
>

There might be a possibility that subscriber is stuck or is extremely
slow due to other operations.

> Is my diagnostics that while the innermost loop in LogicalRepAllyLoop
> [A] is busy, it doesn't have a chance to send reply until a keepalive
> comes in correct?  If so, walsender timeout due to slowness of
> subscriber happens and we might need to break the innermost loop to
> give subscriber a chance to send a response with appropriate
> intervals. This is what I wanted to propose.
>

I was thinking increasing wal_sender/receiver_timeout should solve
this problem. I am not sure why it leads to loss of WAL in the OP's
case.

> [A]
> backend/replication/logical/worker.c:2565@today's master
> >   /* Loop to process all available data (without blocking). */
> >   for (;;)
>
>
>
> > The other thing that I am not able to understand from Abhishek's reply
> > [2] is why increasing wal_sender_timeout/wal_recevier_timeout leads to
> > the removal of required WAL segments. As per my understanding, we
> > shouldn't remove WAL unless we get confirmation that the subscriber
> > has processed it.
> >
> > [1] - 
> > https://www.postgresql.org/message-id/20210610.150016.1709823354377067679.horikyota.ntt%40gmail.com
> > [2] - 
> > https://www.postgresql.org/message-id/CAEDsCzjEHLxgqa4d563CKFwSbgBvvnM91Cqfq_qoZDXCkyOsiw%40mail.gmail.com
> >
> > Note - I have added Abhishek to see if he has answers to any of these 
> > questions.
>
> Ouch! max_slot_wal_keep_size was introduced at 13. So I have no idea
> of how required segments can be removed on the publisher for now.
>
> == From the first report [1]
> sourcedb=# select * from pg_replication_slots;
> ...
> restart_lsn 116D0/C36886F8
> confirmed_flush_lsn 116D0/C3E5D370
>
> targetdb=# show wal_receiver_status_interval;
> wal_receiver_status_interval  2s
>
> targetdb=# select * from pg_stat_subscription;
> ..
> received_lsn  116D1/2BA8F170
> last_msg_send_time2021-08-20 09:05:15.398423+09
> last_msg_receipt_time 2021-08-20 09:05:15.398471+09
> latest_end_lsn116D1/2BA8F170
> latest_end_time   2021-08-20 09:05:15.398423+09
> ==
>
> There is a gap with about 105 segments (1.7GB) between how far the
> subscriber advanced and the publisher's idea of how far the subscriber
> advanced. But that alone ca

Re: [PATCH] psql: \dn+ to show size of each schema (and \dA+ for AMs)

2021-09-17 Thread Pavel Stehule

pá 17. 9. 2021 v 11:10 odesílatel Justin Pryzby 
napsal:

> On Wed, Jul 14, 2021 at 07:42:33AM +0200, Laurenz Albe wrote:
> > Besides, schemas are not physical, but logical containers.  So I see a
> point in
> > measuring the storage used in a certain tablespace, but not so much by
> all objects
> > in a certain schema.  It might be useful for accounting purposes, though.
>
> We use only a few schemas, 1) to hide child tables; 2) to exclude some
> extended
> stats from backups, and 1-2 other things.  But it's useful to be able to
> see
> how storage is used by schema, and better to do it conveniently.
>
> I think it'd be even more useful for people who use schemas more widely
> than we
> do:
> "Who's using all our space?"
> \dn++
> "Oh, it's that one - let me clean that up..."
>
> Or, "what's the pg_toast stuff, and do I need to do something about it?"
>
> > But I don't expect it to be in frequent enough demand to add a psql
> command.
> >
> > What about inventing a function pg_schema_size(regnamespace)?
>
> But for "physical" storage it's also possible to get the size from the OS,
> much
> more efficiently, using /bin/df or zfs list (assuming nothing else is using
> those filesystems).  The pg_*_size functions are inefficient, but psql
> \db+ and
> \l+ already call them anyway.
>
> For schemas, there's no way to get the size from the OS, so it's nice to
> make
> the size available from psql, conveniently.
>
> v3 patch:
>  - fixes an off by one in forkNum loop;
>  - removes an unnecessary subquery in describe.c;
>  - returns 0 rather than NULL if the schema is empty;
>  - adds pg_am_size;
>
> regression=# \dA++
>   List of access methods
>   Name  | Type  |   Handler|  Description
>  |  Size
>
> +---+--++-
>  brin   | Index | brinhandler  | block range index (BRIN) access
> method | 744 kB
>  btree  | Index | bthandler| b-tree index access method
>  | 21 MB
>  gin| Index | ginhandler   | GIN index access method
>   | 2672 kB
>  gist   | Index | gisthandler  | GiST index access method
>  | 2800 kB
>  hash   | Index | hashhandler  | hash index access method
>  | 2112 kB
>  heap   | Table | heap_tableam_handler | heap table access method
>  | 60 MB
>  heap2  | Table | heap_tableam_handler |
>   | 120 kB
>  spgist | Index | spghandler   | SP-GiST index access method
>   | 5840 kB
> (8 rows)
>
> regression=# \dn++
>List of schemas
> Name|  Owner  | Access privileges  |  Description
>  |  Size
>
> +-+++-
>  fkpart3| pryzbyj ||
>   | 168 kB
>  fkpart4| pryzbyj ||
>   | 104 kB
>  fkpart5| pryzbyj ||
>   | 40 kB
>  fkpart6| pryzbyj ||
>   | 48 kB
>  mvtest_mvschema| pryzbyj ||
>   | 16 kB
>  public | pryzbyj | pryzbyj=UC/pryzbyj+| standard public
> schema | 69 MB
> | | =UC/pryzbyj|
>   |
>  regress_indexing   | pryzbyj ||
>   | 48 kB
>  regress_rls_schema | pryzbyj ||
>   | 0 bytes
>  regress_schema_2   | pryzbyj ||
>   | 0 bytes
>  testxmlschema  | pryzbyj ||
>   | 24 kB
> (10 rows)
>
>
I tested this patch. It looks well. The performance is good enough. I got
the result for a schema with 100K tables in 3 seconds.

I am not sure if using \dt+ and \dP+ without change is a good idea. I can
imagine \dt+ and \dt++. \dP can exist just only in ++ form or we can ignore
it (like now, and support \dP+ and \dP++) with same result

I can live with the proposed patch, and I understand why  ++ was
introduced. But I am still not sure it is really user friendly. I prefer to
extend \dA and \dn with some columns (\dA has only two columns and \dn has
two columns too), and then we don't need special ++ variants for sizes.
Using three levels of detail looks not too practical (more when the basic
reports \dA and \dn) are really very simple).

Regards

Pavel



-- 
> Justin
>

Logical replication timeout problem

2021-09-17 Thread Fabrice Chapuis

Hi,

Logical replication is configured on one instance in version 10.18. Timeout
errors occur regularly and the worker process exit with an exit code 1

2021-09-16 12:06:50 CEST [24881]: [1-1] user=postgres,db=foo,client=[local]
LOG:  duration: 1281408.171 ms  statement: COPY schem.tab (col1, col2) FROM
stdin;
2021-09-16 12:07:11 CEST [12161]: [1-1] user=,db=,client= LOG:  automatic
analyze of table "foo.schem.tab" system usage: CPU: user: 4.13 s, system:
0.55 s, elapsed: 9.58 s
2021-09-16 12:07:50 CEST [3770]: [2-1] user=,db=,client= ERROR:
terminating logical replication worker due to timeout
2021-09-16 12:07:50 CEST [12546]: [11-1] user=,db=,client= LOG:  worker
process: logical replication worker for subscription 24106654 (PID 3770)
exited with exit code 1
2021-09-16 12:07:50 CEST [13872]: [1-1] user=,db=,client= LOG:  logical
replication apply worker for subscription "sub" has started
2021-09-16 12:07:50 CEST [13873]: [1-1]
user=repuser,db=foo,client=127.0.0.1 LOG:  received replication command:
IDENTIFY_SYSTEM

Why this happen?

Thanks a lot for your help

Fabrice

Re: Logical replication keepalive flood

2021-09-17 Thread Greg Nancarrow

On Thu, Sep 16, 2021 at 10:36 PM Amit Kapila  wrote:
>
> I think here the reason is that the first_lsn of a transaction is
> always equal to end_lsn of the previous transaction (See comments
> above first_lsn and end_lsn fields of ReorderBufferTXN).

That may be the case, but those comments certainly don't make this clear.

>I have not
> debugged but I think in StreamLogicalLog() the cur_record_lsn after
> receiving 'w' message, in this case, will be equal to endpos whereas
> we expect to be greater than endpos to exit. Before the patch, it will
> always get the 'k' message where we expect the received lsn to be
> equal to endpos to conclude that we can exit. Do let me know if your
> analysis differs?
>

Yes, pg_recvlogical seems to be relying on receiving a keepalive for
its "--endpos" logic to work (and the 006 test is relying on '' record
output from pg_recvlogical in this case).
But is it correct to be relying on a keepalive for this?
As I already pointed out, there's also code which seems to be relying
on replies from sending keepalives, to update flush and write
locations related to LSN.
The original problem reporter measured 500 keepalives per second being
sent by walsender (which I also reproduced, for pg_recvlogical and
pub/sub cases).
None of these cases appear to be traditional uses of "keepalive" type
messages to me.
Am I missing something? Documentation?

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Logical replication keepalive flood

2021-09-17 Thread Amit Kapila

On Fri, Sep 17, 2021 at 12:42 PM Peter Smith  wrote:
>
> On Thu, Sep 16, 2021 at 10:59 AM houzj.f...@fujitsu.com
>  wrote:
>
> Hello Hous-san, thanks for including the steps. Unfortunately, no
> matter what I tried, I could never get the patch to display the
> problem "BEGIN 709" for the 2nd time execution of pg_recvlogical
>
> After discussion offline (thanks Greg!) it was found that the
> pg_recvlogical step 2 posted above is not quite identical to what the
> TAP 006 test is doing.
>
> Specifically, the TAP test also includes some other options (-o
> include-xids=0 -o skip-empty-xacts=1) which are not in your step.
>
> If I include these options then I can reproduce the problem.
> -
> [postgres@CentOS7-x64 ~]$ pg_recvlogical -E '0/150B5E0'  -d postgres
> -S 'test_slot' --start --no-loop -o include-xids=0 -o
> skip-empty-xacts=1 -f -
> BEGIN
> table public.decoding_test: INSERT: x[integer]:5 y[text]:'5'
> -
>
> I don't know why these options should make any difference but they do.
>

I think there is a possibility that skip-empty-xacts = 1 is making
difference. Basically, if there is some empty transaction say via
autovacuum, it would skip it and possibly send keep-alive message
before sending transaction id 709. Then you won't see the problem with
Hou-San's test. Can you try by keeping autovacuum = off and by not
using skip-empty-xact option?

-- 
With Regards,
Amit Kapila.

Re: Correct handling of blank/commented lines in PSQL interactive-mode history

2021-09-17 Thread Peter Eisentraut


On 07.09.21 21:16, Tom Lane wrote:

[ this is a digression from the main point of the thread, but ... ]

Alvaro Herrera  writes:

I am particularly bothered by the uselessness
that M-# results in -- namely, inserting a # at the start of the buffer.


Fixing that might be as simple as the attached.  I've not beat on
it hard though.


I see this in my .inputrc, although I don't remember when/how I put it 
there:


$if psql
set comment-begin --
set expand-tilde on
$endif

Re: Teach pg_receivewal to use lz4 compression

2021-09-17 Thread gkokolatos




‐‐‐ Original Message ‐‐‐

On Friday, September 17th, 2021 at 09:39, Michael Paquier  
wrote:

> On Thu, Sep 16, 2021 at 03:17:15PM +, gkokola...@pm.me wrote:
>
> > Hopefully fixed.
>
> Thanks for the new version. I have put my hands on the patch, and
> began reviewing its internals with LZ4. I am not done with it yet,
> and I have noticed some places that could be improved (error handling,
> some uses of LZ4F_flush() that should be replaced LZ4F_compressEnd(),
> and more tweaks). I'll send an updated version once I complete my
> review, but that looks rather solid overall.

Thanks! Looking forward to seeing it!

> The changes done in close_walfile()@receivelog.c are useful taken
> independently, so I have applied these separately.

Yeah, I was considering it to split them over to a separate commit,
then decided against it. Will do so in the future.

Cheers,
//Georgios

> 
> Michael

Re: Patch to avoid orphaned dependencies

2021-09-17 Thread Ronan Dunklau

Hello Bertrand,

Le mardi 4 mai 2021, 11:55:43 CEST Drouvot, Bertrand a écrit :
> 
> Implementation overview:
> 
>   * A new catalog snapshot is added: DirtyCatalogSnapshot.
>   * This catalog snapshot is a dirty one to be able to look for
> in-flight dependencies.
>   * Its usage is controlled by a new UseDirtyCatalogSnapshot variable.
>   * Any time this variable is being set to true, then the next call to
> GetNonHistoricCatalogSnapshot() is returning the dirty snapshot.
>   * This snapshot is being used to check for in-flight dependencies and
> also to get the objects description to generate the error messages.
> 

I quickly tested the patch, it behaves as advertised, and passes tests.

Isolation tests should be added to demonstrate the issues it is solving.

However, I am bit wary of this behaviour of setting the DirtyCatalogSnapshot 
global variable which is then reset after each snapshot acquisition: I'm 
having trouble understanding all the implications of that, if it would be 
possible to introduce an unforeseen snapshot before the one we actually want 
to be dirty. 

I don't want to derail this thread, but couldn't predicate locks on the 
pg_depend index pages corresponding to the dropped object / referenced objects 
work as a different approach ? I'm not familiar enough with them so maybe there 
is some fundamental misunderstanding on my end.

-- 
Ronan Dunklau

Re: Teach pg_receivewal to use lz4 compression

2021-09-17 Thread Michael Paquier

On Thu, Sep 16, 2021 at 03:17:15PM +, gkokola...@pm.me wrote:
> Hopefully fixed.

Thanks for the new version.  I have put my hands on the patch, and
began reviewing its internals with LZ4.  I am not done with it yet,
and I have noticed some places that could be improved (error handling,
some uses of LZ4F_flush() that should be replaced LZ4F_compressEnd(),
and more tweaks).  I'll send an updated version once I complete my
review, but that looks rather solid overall.

The changes done in close_walfile()@receivelog.c are useful taken
independently, so I have applied these separately.
--
Michael


signature.asc
Description: PGP signature

Re: walsender timeout on logical replication set

2021-09-17 Thread Kyotaro Horiguchi

Thank you vary much for coming in!

At Fri, 17 Sep 2021 10:18:11 +0530, Amit Kapila  wrote 
in 
> On Mon, Sep 13, 2021 at 7:01 AM Kyotaro Horiguchi
>  wrote:
> >
> > Hello.
> >
> > As reported in [1] it seems that walsender can suffer timeout in
> > certain cases.  It is not clearly confirmed, but I suspect that
> > there's the case where LogicalRepApplyLoop keeps running the innermost
> > loop without receiving keepalive packet for longer than
> > wal_sender_timeout (not wal_receiver_timeout).
> 
> Why is that happening? In the previous investigation in this area [1]
> your tests revealed that after reading a WAL page, we always send keep
> alive, so even if the transaction is large, we should send some
> keepalive in-between.

We fixed too-many keepalives (aka keepalive-flood) in the thread, but
this is an issue of long absense of subscriber response. What I'm
suspecting, or assuming here is:

- The publisher is working fine. It doesn't send extra keepalives so
  much and does send regular keepalives with wal_sender_timeout/2 by
  the sender's clock.

- Networks conveys all the data in-time.

- The subscriber consumes received data at less than half the speed at
  which the publisher sends data.  In this case, while the burst
  traffic is coming, the publisher keep sending for
  wal_sender_timeout/2 seconds and it may not send a keepalive for the
  same duration. This is the correct behavior.  On the other hand, the
  subscriber is kept busy without receiving a keepalive for
  wal_sender_timeout seconds.  AFAICS LogicalRepApplyLoop doesn't send
  a response unless a keepalive comes while in the inner-most loop.

If wel_sender_timeout is relatively short (5 seconds, in the report),
a burst (or a gap-less) logical replication traffic can continue
easily for more than 2.5 seconds.  If wal_sender_timeout is longer (1
min, ditto), burst replication traffics last for more than
wal_sender_timeout/2 becomes relatively not so frequent.

However, I'm not sure how it makes things worse again to increase it
further to 5 min.

Is my diagnostics that while the innermost loop in LogicalRepAllyLoop
[A] is busy, it doesn't have a chance to send reply until a keepalive
comes in correct?  If so, walsender timeout due to slowness of
subscriber happens and we might need to break the innermost loop to
give subscriber a chance to send a response with appropriate
intervals. This is what I wanted to propose.

[A]
backend/replication/logical/worker.c:2565@today's master
>   /* Loop to process all available data (without blocking). */
>   for (;;)

> The other thing that I am not able to understand from Abhishek's reply
> [2] is why increasing wal_sender_timeout/wal_recevier_timeout leads to
> the removal of required WAL segments. As per my understanding, we
> shouldn't remove WAL unless we get confirmation that the subscriber
> has processed it.
> 
> [1] - 
> https://www.postgresql.org/message-id/20210610.150016.1709823354377067679.horikyota.ntt%40gmail.com
> [2] - 
> https://www.postgresql.org/message-id/CAEDsCzjEHLxgqa4d563CKFwSbgBvvnM91Cqfq_qoZDXCkyOsiw%40mail.gmail.com
> 
> Note - I have added Abhishek to see if he has answers to any of these 
> questions.

Ouch! max_slot_wal_keep_size was introduced at 13. So I have no idea
of how required segments can be removed on the publisher for now.

== From the first report [1]
sourcedb=# select * from pg_replication_slots;
...
restart_lsn 116D0/C36886F8
confirmed_flush_lsn 116D0/C3E5D370

targetdb=# show wal_receiver_status_interval;
wal_receiver_status_interval  2s

targetdb=# select * from pg_stat_subscription;
..
received_lsn  116D1/2BA8F170
last_msg_send_time2021-08-20 09:05:15.398423+09
last_msg_receipt_time 2021-08-20 09:05:15.398471+09
latest_end_lsn116D1/2BA8F170
latest_end_time   2021-08-20 09:05:15.398423+09
==

There is a gap with about 105 segments (1.7GB) between how far the
subscriber advanced and the publisher's idea of how far the subscriber
advanced. But that alone cannot cause wal removal..

[1] 
https://www.postgresql.org/message-id/flat/CAEDsCzjEHLxgqa4d563CKFwSbgBvvnM91Cqfq_qoZDXCkyOsiw%40mail.gmail.com#72da631f3af885b06669ddc1636a0a63

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Logical replication keepalive flood

2021-09-17 Thread Peter Smith

On Thu, Sep 16, 2021 at 10:59 AM houzj.f...@fujitsu.com
 wrote:
>
> From Tuesday, September 14, 2021 1:39 PM Greg Nancarrow  
> wrote:
> > However, the problem I found is that, with the patch applied, there is
> > a test failure when running “make check-world”:
> >
> >  t/006_logical_decoding.pl  4/14
> > #   Failed test 'pg_recvlogical acknowledged changes'
> > #   at t/006_logical_decoding.pl line 117.
> > #  got: 'BEGIN
> > # table public.decoding_test: INSERT: x[integer]:5 y[text]:'5''
> > # expected: ''
> > # Looks like you failed 1 test of 14.
> > t/006_logical_decoding.pl  Dubious, test returned 1 (wstat
> > 256, 0x100) Failed 1/14 subtests
> >
> >
>
> After applying the patch,
> I saw the same problem and can reproduce it by the following steps:
>
> 1) execute the SQLs.
> ---SQL---
> CREATE TABLE decoding_test(x integer, y text);
> SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding');
> INSERT INTO decoding_test(x,y) SELECT s, s::text FROM generate_series(1,4) s;
>
> -- use the lsn here to execute pg_recvlogical later
> SELECT lsn FROM pg_logical_slot_peek_changes('test_slot', NULL, NULL) ORDER 
> BY lsn DESC LIMIT 1;
> INSERT INTO decoding_test(x,y) SELECT s, s::text FROM generate_series(5,50) s;
> --
>
> 2) Then, if I execute the following command twice:
> # pg_recvlogical -E lsn -d postgres -S 'test_slot' --start --no-loop -f -
> BEGIN 708
> table public.decoding_test: INSERT: x[integer]:1 y[text]:'1'
> table public.decoding_test: INSERT: x[integer]:2 y[text]:'2'
> table public.decoding_test: INSERT: x[integer]:3 y[text]:'3'
> table public.decoding_test: INSERT: x[integer]:4 y[text]:'4'
> COMMIT 708
>
> # pg_recvlogical -E lsn -d postgres -S 'test_slot' --start --no-loop -f -
> BEGIN 709
>
> It still generated ' BEGIN 709' in the second time execution.
> But it will output nothing in the second time execution without the patch.
>

Hello Hous-san, thanks for including the steps. Unfortunately, no
matter what I tried, I could never get the patch to display the
problem "BEGIN 709" for the 2nd time execution of pg_recvlogical

After discussion offline (thanks Greg!) it was found that the
pg_recvlogical step 2 posted above is not quite identical to what the
TAP 006 test is doing.

Specifically, the TAP test also includes some other options (-o
include-xids=0 -o skip-empty-xacts=1) which are not in your step.

If I include these options then I can reproduce the problem.
-
[postgres@CentOS7-x64 ~]$ pg_recvlogical -E '0/150B5E0'  -d postgres
-S 'test_slot' --start --no-loop -o include-xids=0 -o
skip-empty-xacts=1 -f -
BEGIN
table public.decoding_test: INSERT: x[integer]:5 y[text]:'5'
-

I don't know why these options should make any difference but they do.
Perhaps they cause a fluke of millisecond timing differences in our
different VM environments.

--
Kind Regards,
Peter Smith.
Fujitsu Australia

63 matches

Mail list logo