Re: Prevent writes on large objects in read-only transactions

2022-05-31 Thread Michael Paquier
On Tue, May 31, 2022 at 05:17:42PM -0400, Robert Haas wrote:
> On Tue, May 31, 2022 at 3:49 PM Tom Lane  wrote:
>> What I'm wondering about is how far the principle of read-only-ness
>> ought to be expected to go.  Should a read-only transaction fail
>> to execute adminpack's pg_file_write(), for example?  Should it
>> refuse to execute random() on the grounds that that changes the
>> session's PRNG state?  The latter seems obviously silly, but
>> I'm not very sure about pg_file_write().  Maybe the restriction
>> should be "no changes to database state that's visible to other
>> sessions", which would leave outside-the-DB changes out of the
>> discussion.
> 
> Yeah, I think that's a pretty good idea. It's really pretty hard to
> imagine preventing outside-the-database writes in any kind of
> principled way. Somebody can install a C function that does anything,
> and we can do a pretty fair job preventing it from e.g. acquiring a
> transaction ID or writing WAL by making changes in PostgreSQL itself,
> but we can't prevent it from doing whatever it wants outside the
> database. Nor is it even a very clear concept definitionally. I
> wouldn't consider a function read-write solely on the basis that it
> can cause data to be written to the PostgreSQL log file, for instance,
> so it doesn't seem correct to suppose that a C function provided by an
> extension is read-write just because it calls write(2) -- not that we
> can detect that anyway, but even if we could.

Agreed.  There are a couple of arguments in authorizing
pg_file_write() in a read-only state or writes as long as it does not
affect WAL or the data.  For example, a change of configuration file
can be very useful at recovery if one wants to switch the
configuration (ALTER TABLE SET, etc.), so restricting functions that
perform writes outside the scope of WAL or the data does not make
sense to restrict.  Not to count updates in the control file, but
that's different.

Now the LO handling is quite old, and I am not sure if this is worth
changing as we have seen no actual complains about that with read-only
transactions, even if I agree on that it is inconsistent.  That could
cause more harm than the consistency benefit is worth :/
--
Michael


signature.asc
Description: PGP signature


Re: Unicode Variation Selector and Combining character

2022-05-31 Thread Peter Eisentraut

On 30.05.22 02:27, 荒井元成 wrote:
I tried it on PostgreSQL 13. If you use the Unicode Variation Selector 
and Combining Character


, the base character and the Variation selector will be 2 in length. 
Since it will be one character on the display, we expect it to be one in 
length. Please provide a function corresponding to the unicode variasion 
selector. I hope It is supposed to be provided as an extension.


The functions that need to be supported are as follows:

char_length|character_length|substring|trim|btrim|left

|length|lpad|ltrim|regexp_match|regexp_matches

|regexp_replace|regexp_split_to_array|regexp_split_to_table

|replace|reverse|right|rpad|rtrim|split_part|strpos|substr|starts_with


Please show a test case of what you mean.  For example,

select char_length(...) returns X but should return Y

Examples with Unicode escapes (U&'\...') would be the most robust.




Re: Perform streaming logical transactions by background workers and parallel apply

2022-05-31 Thread Amit Kapila
On Wed, Jun 1, 2022 at 7:30 AM Masahiko Sawada  wrote:
>
> On Tue, May 31, 2022 at 5:53 PM Amit Kapila  wrote:
> >
> > On Mon, May 30, 2022 at 5:08 PM Amit Kapila  wrote:
> > >
> > > On Mon, May 30, 2022 at 2:22 PM wangw.f...@fujitsu.com
> > >  wrote:
> > > >
> > > > Attach the new patches(only changed 0001 and 0002)
> > > >
> > >
> >
> > This patch allows the same replication origin to be used by the main
> > apply worker and the bgworker that uses it to apply streaming
> > transactions. See the changes [1] in the patch. I am not completely
> > sure whether that is a good idea even though I could not spot or think
> > of problems that can't be fixed in your patch. I see that currently
> > both the main apply worker and bgworker will assign MyProcPid to the
> > assigned origin slot, this can create the problem because
> > ReplicationOriginExitCleanup() can clean it up even though the main
> > apply worker or another bgworker is still using that origin slot.
>
> Good point.
>
> > Now,
> > one way to fix is that we assign only the main apply worker's
> > MyProcPid to session_replication_state->acquired_by. I have tried to
> > think about the concurrency issues as multiple workers could now point
> > to the same replication origin state. I think it is safe because the
> > patch maintains the commit order by allowing only one process to
> > commit at a time, so no two workers will be operating on the same
> > origin at the same time. Even, though there is no case where the patch
> > will try to advance the session's origin concurrently, it appears safe
> > to do so as we change/advance the session_origin LSNs under
> > replicate_state LWLock.
>
> Right. That way, the cleanup is done only by the main apply worker.
> Probably the bgworker can check if the origin is already acquired by
> its (leader) main apply worker process for safety.
>

Yeah, that makes sense.

> >
> > Another idea could be that we allow multiple replication origins (one
> > for each bgworker and one for the main apply worker) for the apply
> > workers corresponding to a subscription. Then on restart, we can find
> > the highest LSN among all the origins for a subscription. This should
> > work primarily because we will maintain the commit order. Now, for
> > this to work we need to somehow map all the origins for a subscription
> > and one possibility is that we have a subscription id in each of the
> > origin names. Currently we use ("pg_%u", MySubscription->oid) as
> > origin_name. We can probably append some unique identifier number for
> > each worker to allow each origin to have a subscription id. We need to
> > drop all origins for a particular subscription on DROP SUBSCRIPTION. I
> > think having multiple origins for the same subscription will have some
> > additional work when we try to filter changes based on origin.
>
> It also seems to work but need additional work and resource.
>
> > The advantage of the first idea is that it won't increase the need to
> > have more origins per subscription but it is quite possible that I am
> > missing something and there are problems due to which we can't use
> > that approach.
>
> I prefer the first idea as it's simpler than the second one. I don't
> see any concurrency problem so far unless I'm not missing something.
>

Thanks for evaluating it and sharing your opinion.

-- 
With Regards,
Amit Kapila.




Re: Ignore heap rewrites for materialized views in logical replication

2022-05-31 Thread Amit Kapila
On Tue, May 31, 2022 at 8:28 PM Euler Taveira  wrote:
>
> On Tue, May 31, 2022, at 11:13 AM, Amit Kapila wrote:
>
> I think we don't need the retry logical to check error, a simple
> wait_for_caught_up should be sufficient as we are doing in other
> tests. See attached. I have slightly modified the commit message as
> well. Kindly let me know what you think?
>
> Your modification will hang until the test timeout without the patch. That's
> why I avoided to use wait_for_caught_up and used a loop for fast exit on 
> success
> or failure.
>

Right, but that is true for other tests as well and we are not
expecting to face this/other errors. I think keeping it simple and
similar to other tests seems enough for this case.

> I'm fine with a simple test case like you proposed.
>

Thanks, I'll push this in a day or two unless I see any other
suggestions/comments. Note to others: this is v10 fix only. As
mentioned by Euler in his initial email, this is not required from v11
onwards due to a different solution for this problem via commit
325f2ec555.

-- 
With Regards,
Amit Kapila.




Re: convert libpq uri-regress tests to tap test

2022-05-31 Thread Michael Paquier
On Tue, May 31, 2022 at 01:58:25PM +0900, Michael Paquier wrote:
> Why don't you just use src/interfaces/ instead of adding a direct
> path to libpq?

So, this leads to something like the attached.  Does that sound fine
to you?
--
Michael
From 893ef90ce07084c4e4837205a805c590798ac115 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 1 Jun 2022 13:57:46 +0900
Subject: [PATCH v2] libpq tests were not being run

---
 GNUmakefile.in | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/GNUmakefile.in b/GNUmakefile.in
index 2352fc1171..38713b5d12 100644
--- a/GNUmakefile.in
+++ b/GNUmakefile.in
@@ -68,10 +68,10 @@ check check-tests installcheck installcheck-parallel installcheck-tests: CHECKPR
 check check-tests installcheck installcheck-parallel installcheck-tests: submake-generated-headers
 	$(MAKE) -C src/test/regress $@
 
-$(call recurse,check-world,src/test src/pl src/interfaces/ecpg contrib src/bin,check)
-$(call recurse,checkprep,  src/test src/pl src/interfaces/ecpg contrib src/bin)
+$(call recurse,check-world,src/test src/pl src/interfaces contrib src/bin,check)
+$(call recurse,checkprep,  src/test src/pl src/interfaces contrib src/bin)
 
-$(call recurse,installcheck-world,src/test src/pl src/interfaces/ecpg contrib src/bin,installcheck)
+$(call recurse,installcheck-world,src/test src/pl src/interfaces contrib src/bin,installcheck)
 $(call recurse,install-tests,src/test/regress,install-tests)
 
 GNUmakefile: GNUmakefile.in $(top_builddir)/config.status
-- 
2.36.1



signature.asc
Description: PGP signature


Re: Multi-Master Logical Replication

2022-05-31 Thread Amit Kapila
On Tue, May 31, 2022 at 7:36 PM Bruce Momjian  wrote:
>
> On Wed, May 25, 2022 at 10:32:50PM -0400, Bruce Momjian wrote:
> > On Wed, May 25, 2022 at 12:13:17PM +0530, Amit Kapila wrote:
> > >
> > > It helps with setting up logical replication among two or more nodes
> > > (data flows both ways) which is important for use cases where
> > > applications are data-aware. For such apps, it will be beneficial to
> >
> > That does make sense, thanks.
>
> Uh, thinking some more, why would anyone set things up this way ---
> having part of a table being primary on one server and a different part
> of the table be a subscriber.  Seems it would be simpler and safer to
> create two child tables and have one be primary on only one server.
> Users can access both tables using the parent.
>

Yes, users can choose to do that way but still, to keep the nodes in
sync and continuity of operations, it will be very difficult to manage
the operations without the LRG APIs. Let us consider a simple two-node
example where on each node there is Table T that has partitions P1 and
P2. As far as I can understand, one needs to have the below kind of
set-up to allow local operations on geographically distributed nodes.

Node-1:
node1 writes to P1
node1 publishes P1
node2 subscribes to P1 of node1

Node-2:
node2 writes to P2
node2 publishes P2
node1 subscribes to P2 on node2

In this setup, we need to publish individual partitions, otherwise, we
will face the loop problem where the data sent by node-1 to node-2 via
logical replication will again come back to it causing problems like
constraints violations, duplicate data, etc. There could be other ways
to do this set up with current logical replication commands (for ex.
publishing via root table) but that would require ways to avoid loops
and could have other challenges.

Now, in such a setup/scheme, consider a scenario (scenario-1), where
node-2 went off (either it crashes, went out of network, just died,
etc.) and comes up after some time. Now, one can either make the
node-2 available by fixing the problem it has or can promote standby
in that location (if any) to become master, both might require some
time. In the meantime to continue the operations (which provides a
seamless experience to users), users will be connected to node-1 to
perform the required write operations. Now, to achieve this without
LRG APIs, it will be quite complex for users to keep the data in sync.
One needs to perform various steps to get the partition P2 data that
went to node-1 till the time node-2 was not available. On node-1, it
has to publish P2 changes for the time node-2 becomes available with
the help of Create/Drop Publication APIs. And when node-2 comes back,
it has to create a subscription for the above publication pub-2 to get
that data, ensure both the nodes and in sync, and then allow
operations on node-2.

Not only this, but if there are more nodes in this set-up (say-10), it
has to change (drop/create) subscriptions corresponding to partition
P2 on all other nodes as each individual node is the owner of some
partition.

Another possibility is that the entire data center where node-2 was
present was gone due to some unfortunate incident in which case they
need to set up a new data center and hence a new node. Now, in such a
case, the user needs to do all the steps mentioned in the previous
scenario and additionally, it needs to ensure that it set up the node
to sync all the existing data (of all partitions) before this node
again starts receiving write changes for partition P2.

I think all this should be relatively simpler with LRG APIs wherein
for the second scenario user ideally just needs to use the lrg_attach*
API and in the first scenario, it should automatically sync the
missing data once the node-2 comes back.

Now, the other important point that we should also consider for these
LRG APIs is the ease of setup even in the normal case where we are
just adding a new node as mentioned by Peter Smith in his email [1]
(LRG makes setup easier). e.g. even if there are many nodes we only
need a single lrg_attach by the joining node instead of needing N-1
subscriptions on all the existing nodes.

[1] - 
https://www.postgresql.org/message-id/CAHut%2BPsvvfTWWwE8vkgUg4q%2BQLyoCyNE7NU%3DmEiYHcMcXciXdg%40mail.gmail.com

-- 
With Regards,
Amit Kapila.




Re: [RFC] building postgres with meson -v8

2022-05-31 Thread Peter Eisentraut


On 06.05.22 23:27, Andres Freund wrote:

I added pkgconfig since then. They're not exactly the same, but pretty close,
except for one thing: Looks like some of the ecpg libraries really should link
to some other ecpg libs? I think we're missing something there... That then
leads to missing requirements in the .pc files.


I took a closer look at the generated pkgconfig files.  I think they are 
ok.  There are a couple of insignificant textual differences that we 
could reduce by patching Makefile.shlib.  But technically they are ok.


There is one significant difference: the ecpg libraries now get a 
Requires.private for openssl, which I think is technically correct since 
both libpgcommon and libpgport require openssl.


Attached is a tiny patch to make the description in one file backward 
consistent.From 8320c520e01b52933f6688e210ffa9cdfdefffe9 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 31 May 2022 20:46:20 +0200
Subject: [PATCH] fixup! meson: Add meson based buildsystem.

Fix description in pkg-config file for backward consistency.
---
 src/interfaces/ecpg/compatlib/meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/interfaces/ecpg/compatlib/meson.build 
b/src/interfaces/ecpg/compatlib/meson.build
index 3cd61217e3..84cd42ccee 100644
--- a/src/interfaces/ecpg/compatlib/meson.build
+++ b/src/interfaces/ecpg/compatlib/meson.build
@@ -11,6 +11,6 @@ ecpg_compat = both_libraries('ecpg_compat',
 pkgconfig.generate(
   ecpg_compat.get_shared_lib(),
   name: 'libecpg_compat',
-  description: 'PostgreSQL ecpg_compat library',
+  description: 'PostgreSQL libecpg_compat library',
   url: pg_url,
 )
-- 
2.36.1



Re: ParseTzFile doesn't FreeFile on error

2022-05-31 Thread Kyotaro Horiguchi
At Tue, 31 May 2022 14:21:28 -0400, Tom Lane  wrote in 
> Actually the problem *is* reachable, if you intentionally break the
> already-active timezone abbreviation file: newly started sessions
> produce file-leak warnings after failing to apply the setting.
> I concede that's not a likely scenario, but that's why I think it's
> worth fixing.

Ah, I see. Thanks!

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: pg15b1: FailedAssertion("val > base", File: "...src/include/utils/relptr.h", Line: 67, PID: 30485)

2022-05-31 Thread Kyotaro Horiguchi
At Wed, 01 Jun 2022 11:42:01 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
me> At Tue, 31 May 2022 16:10:05 -0400, Tom Lane  wrote in 
me> tgl> Robert Haas  writes:
me> tgl> > Yeah, so when I created this stuff in the first place, I figured that
me> tgl> > it wasn't a problem if we reserved relptr == 0 to mean a NULL 
pointer,

Mmm. Sorry. It's just an accidental shooting.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: pg15b1: FailedAssertion("val > base", File: "...src/include/utils/relptr.h", Line: 67, PID: 30485)

2022-05-31 Thread Kyotaro Horiguchi
At Tue, 31 May 2022 15:57:14 -0400, Robert Haas  wrote 
in 
> 1. Using a relative pointer value other than 0 to represent a null
> pointer. Andres suggested (Size) -1.

I thought that relptr as a part of DSM so the use of offset=0 is
somewhat illegal. But I like this. We can fix this by this
modification. I think ((Size) -1) is natural to signal something
special. (I see glibc uses "(size_t) -1".)

> 2. Not storing the free page manager for the DSM in the main shared
> memory segment at byte offset 0.
> 3. Dropping the assertion while loudly singing "la la la la la la".

reagards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/include/utils/relptr.h b/src/include/utils/relptr.h
index cc80a7200d..c6d39a1360 100644
--- a/src/include/utils/relptr.h
+++ b/src/include/utils/relptr.h
@@ -41,7 +41,7 @@
 #ifdef HAVE__BUILTIN_TYPES_COMPATIBLE_P
 #define relptr_access(base, rp) \
 	(AssertVariableIsOfTypeMacro(base, char *), \
-	 (__typeof__((rp).relptr_type)) ((rp).relptr_off == 0 ? NULL : \
+	 (__typeof__((rp).relptr_type)) ((rp).relptr_off == ((Size) -1) ? NULL : \
 		(base + (rp).relptr_off)))
 #else
 /*
@@ -50,21 +50,21 @@
  */
 #define relptr_access(base, rp) \
 	(AssertVariableIsOfTypeMacro(base, char *), \
-	 (void *) ((rp).relptr_off == 0 ? NULL : (base + (rp).relptr_off)))
+	 (void *) ((rp).relptr_off == ((Size) -1) ? NULL : (base + (rp).relptr_off)))
 #endif
 
 #define relptr_is_null(rp) \
-	((rp).relptr_off == 0)
+	((rp).relptr_off == ((Size) -1))
 
 /* We use this inline to avoid double eval of "val" in relptr_store */
 static inline Size
 relptr_store_eval(char *base, char *val)
 {
 	if (val == NULL)
-		return 0;
+		return ((Size) -1);
 	else
 	{
-		Assert(val > base);
+		Assert(val >= base);
 		return val - base;
 	}
 }


Re: pg15b1: FailedAssertion("val > base", File: "...src/include/utils/relptr.h", Line: 67, PID: 30485)

2022-05-31 Thread Kyotaro Horiguchi
At Tue, 31 May 2022 16:10:05 -0400, Tom Lane  wrote in 
tgl> Robert Haas  writes:
tgl> > Yeah, so when I created this stuff in the first place, I figured that
tgl> > it wasn't a problem if we reserved relptr == 0 to mean a NULL pointer,
tgl> > because you would never have a relative pointer pointing to the
tgl> > beginning of a DSM, because it would probably always start with a
tgl> > dsm_toc. But when Thomas made it so that DSM allocations could happen
tgl> > in the main shared memory segment, that ceased to be true. This
tgl> > example happened not to break because we never use relptr_access() on
tgl> > fpm->self. We do use fpm_segment_base(), but that accidentally fails
tgl> > to break, because instead of using relptr_access() it drills right
tgl> > through the abstraction and doesn't have any kind of special case for
tgl> > 0.
tgl> 
tgl> Seems like that in itself is a a lousy idea.  Either the code should
tgl> respect the abstraction, or it shouldn't be declaring the variable
tgl> as a relptr in the first place.
tgl> 
tgl> > So we can fix this by:
tgl> > 1. Using a relative pointer value other than 0 to represent a null
tgl> > pointer. Andres suggested (Size) -1.
tgl> > 2. Not storing the free page manager for the DSM in the main shared
tgl> > memory segment at byte offset 0.
tgl> > 3. Dropping the assertion while loudly singing "la la la la la la".
tgl> 
tgl> I'm definitely down on #3, because that just leaves the ambiguity
tgl> in place to bite somewhere else in future.  #1 would work as long
tgl> as nobody expects memset-to-zero to produce null relptrs, but that
tgl> doesn't seem very nice either.
tgl> 
tgl> On the whole, wasting MAXALIGN worth of memory seems like the least bad
tgl> alternative, but I wonder if we ought to do it right here as opposed
tgl> to somewhere in the DSM code proper.  Why is this DSM space not like
tgl> other DSM spaces in starting with a TOC?
tgl> 
tgl>regards, tom lane




RE: Build-farm - intermittent error in 031_column_list.pl

2022-05-31 Thread osumi.takami...@fujitsu.com
On Thursday, May 26, 2022 11:37 AM Amit Kapila  wrote:
> On Wed, May 25, 2022 at 6:54 PM Tomas Vondra
>  wrote:
> >
> > On 5/25/22 13:26, Amit Kapila wrote:
> > > On Wed, May 25, 2022 at 8:16 AM Kyotaro Horiguchi
> > >  wrote:
> > >>
> > >> It does "fix" the case of [1].  But AFAIS
> > >> RelationSyncEntry.replicate_valid is only used to inhibit repeated
> > >> loading in get_rel_sync_entry and the function doesn't seem to be
> > >> assumed to return a invalid entry. (Since the flag is not checked
> > >> nowhere else.)
> > >>
> > >> For example pgoutput_change does not check for the flag of the
> > >> entry returned from the function before uses it, which is not
> > >> seemingly safe. (I didn't check further, though)
> > >>
> > >> Don't we need to explicitly avoid using invalid entries outside the
> > >> function?
> > >>
> > >
> > > We decide that based on pubactions in the callers, so even if entry
> > > is valid, it won't do anything.  Actually, we don't need to avoid
> > > setting replication_valid flag as some of the publications for the
> > > table may be already present. We can check if the publications_valid
> > > flag is set while trying to validate the entry. Now, even if we
> > > don't find any publications the replicate_valid flag will be set but
> > > none of the actions will be set, so it won't do anything in the
> > > caller. Is this better than the previous approach?
> > >
> >
> > For the record, I'm not convinced this is the right way to fix the
> > issue, as it may easily mask the real problem.
> >
> > We do silently ignore missing objects in various places, but only when
> > either requested or when it's obvious it's expected and safe to ignore.
> > But I'm not sure that applies here, in a clean way.
> >
> > Imagine you have a subscriber using two publications p1 and p2, and
> > someone comes around and drops p1 by mistake. With the proposed patch,
> > the subscription will notice this, but it'll continue sending data
> > ignoring the missing publication. Yes, it will continue working, but
> > it's quite possible this breaks the subscriber and it's be better to
> > fail and stop replicating.
> >
> 
> Ideally, shouldn't we disallow drop of publication in such cases where it is 
> part
> of some subscription? I know it will be tricky because some subscriptions
> could be disabled.
> 
> > The other aspect I dislike is that we just stop caching publication
> > info, forcing us to reload it for every replicated change/row. So even
> > if dropping the publication happens not to "break" the subscriber (i.e.
> > the data makes sense), this may easily cause performance issues, lag
> > in the replication, and so on. And the users will have no idea why
> > and/or how to fix it, because we just do this silently.
> >
> 
> Yeah, this is true that if there are missing publications, it needs to reload 
> all the
> publications info again unless we build a mechanism to change the existing
> cached entry by loading only required info. The other thing we could do here 
> is
> to LOG the info for missing publications to make users aware of the fact. I 
> think
> we can also introduce a new option while defining/altering subscription to
> indicate whether to continue on missing publication or not, that way by 
> default
> we will stop replication as we are doing now but users will have a way to move
> replication.
> 
> BTW, what are the other options we have to fix the cases where replication is
> broken (or the user has no clue on how to proceed) as we are discussing the
> case here or the OP reported yet another case on pgsql-bugs [1]?
Hi, 


FYI, I've noticed that after the last report by Peter-san
we've gotten the same errors on Build Farm.
We need to keep discussing to conclude this.


1. Details for system "xenodermus" failure at stage subscriptionCheck, snapshot 
taken 2022-05-31 13:00:04
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=xenodermus=2022-05-31%2013%3A00%3A04


2. Details for system "phycodurus" failure at stage subscriptionCheck, snapshot 
taken 2022-05-26 17:30:04
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=phycodurus=2022-05-26%2017%3A30%3A04


Best Regards,
Takamichi Osumi



Re: Perform streaming logical transactions by background workers and parallel apply

2022-05-31 Thread Masahiko Sawada
On Tue, May 31, 2022 at 5:53 PM Amit Kapila  wrote:
>
> On Mon, May 30, 2022 at 5:08 PM Amit Kapila  wrote:
> >
> > On Mon, May 30, 2022 at 2:22 PM wangw.f...@fujitsu.com
> >  wrote:
> > >
> > > Attach the new patches(only changed 0001 and 0002)
> > >
> >
>
> This patch allows the same replication origin to be used by the main
> apply worker and the bgworker that uses it to apply streaming
> transactions. See the changes [1] in the patch. I am not completely
> sure whether that is a good idea even though I could not spot or think
> of problems that can't be fixed in your patch. I see that currently
> both the main apply worker and bgworker will assign MyProcPid to the
> assigned origin slot, this can create the problem because
> ReplicationOriginExitCleanup() can clean it up even though the main
> apply worker or another bgworker is still using that origin slot.

Good point.

> Now,
> one way to fix is that we assign only the main apply worker's
> MyProcPid to session_replication_state->acquired_by. I have tried to
> think about the concurrency issues as multiple workers could now point
> to the same replication origin state. I think it is safe because the
> patch maintains the commit order by allowing only one process to
> commit at a time, so no two workers will be operating on the same
> origin at the same time. Even, though there is no case where the patch
> will try to advance the session's origin concurrently, it appears safe
> to do so as we change/advance the session_origin LSNs under
> replicate_state LWLock.

Right. That way, the cleanup is done only by the main apply worker.
Probably the bgworker can check if the origin is already acquired by
its (leader) main apply worker process for safety.

>
> Another idea could be that we allow multiple replication origins (one
> for each bgworker and one for the main apply worker) for the apply
> workers corresponding to a subscription. Then on restart, we can find
> the highest LSN among all the origins for a subscription. This should
> work primarily because we will maintain the commit order. Now, for
> this to work we need to somehow map all the origins for a subscription
> and one possibility is that we have a subscription id in each of the
> origin names. Currently we use ("pg_%u", MySubscription->oid) as
> origin_name. We can probably append some unique identifier number for
> each worker to allow each origin to have a subscription id. We need to
> drop all origins for a particular subscription on DROP SUBSCRIPTION. I
> think having multiple origins for the same subscription will have some
> additional work when we try to filter changes based on origin.

It also seems to work but need additional work and resource.

> The advantage of the first idea is that it won't increase the need to
> have more origins per subscription but it is quite possible that I am
> missing something and there are problems due to which we can't use
> that approach.

I prefer the first idea as it's simpler than the second one. I don't
see any concurrency problem so far unless I'm not missing something.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: PostgreSQL Limits: maximum number of columns in SELECT result

2022-05-31 Thread Tom Lane
David Rowley  writes:
> Maybe "columns per result set" would have been a better title for consistency.

I can't quite put my finger on why, but that wording seems odd to me,
even though "columns per table" is natural enough.  "In a" reads much
better here IMO.  Anyway, I see you committed it that way, and it's
certainly not worth the effort to change further.

regards, tom lane




Re: PostgreSQL Limits: maximum number of columns in SELECT result

2022-05-31 Thread David Rowley
On Wed, 1 Jun 2022 at 12:42, Tom Lane  wrote:
>
> David Rowley  writes:
> > I've adjusted the patch to use the wording proposed by Alvaro. See attached.
>
> Should we also change the adjacent item to "columns in a table",
> for consistency of wording?  Not sure though, because s/per/in a/
> throughout the list doesn't seem like it'd be an improvement.

I might agree if there weren't so many other "per"s in the list.

Maybe "columns per result set" would have been a better title for consistency.

David




Re: PostgreSQL Limits: maximum number of columns in SELECT result

2022-05-31 Thread Gavin Flower

On 1/06/22 12:42, Tom Lane wrote:

David Rowley  writes:

I've adjusted the patch to use the wording proposed by Alvaro. See attached.

Should we also change the adjacent item to "columns in a table",
for consistency of wording?  Not sure though, because s/per/in a/
throughout the list doesn't seem like it'd be an improvement.

regards, tom lane


I like the word 'per' better than the phrase 'in a', at least in this 
context.


(Though I'm not too worried either way!)


Cheers,
Gavin





Re: PostgreSQL Limits: maximum number of columns in SELECT result

2022-05-31 Thread Tom Lane
David Rowley  writes:
> I've adjusted the patch to use the wording proposed by Alvaro. See attached.

Should we also change the adjacent item to "columns in a table",
for consistency of wording?  Not sure though, because s/per/in a/
throughout the list doesn't seem like it'd be an improvement.

regards, tom lane




Re: PostgreSQL Limits: maximum number of columns in SELECT result

2022-05-31 Thread Dave Cramer
On Tue, 31 May 2022 at 20:33, David Rowley  wrote:

> On Wed, 1 Jun 2022 at 07:08, Dave Cramer 
> wrote:
> >
> > On Tue, 31 May 2022 at 14:51, Tom Lane  wrote:
> >>
> >> Alvaro Herrera  writes:
> >> > I think it's reasonable to have two adjacent rows in the table for
> these
> >> > two closely related things, but rather than "columns per tuple" I
> would
> >> > label the second one "columns in a result set".  This is easy enough
> to
> >> > understand and to differentiate from the other limit.
> >>
> >> OK, with that wording it's probably clear enough.
>
> > Reworded patch attached
>
> I see the patch does not have the same text as what was proposed and
> seconded above.  My personal preferences would be "result set
> columns", but "columns in a result set" seems fine too.
>
> I've adjusted the patch to use the wording proposed by Alvaro. See
> attached.
>
> I will push this shortly.
>
> David
>

Thanks David, Apparently I am truly unable to multi-task.

Dave


Re: PostgreSQL Limits: maximum number of columns in SELECT result

2022-05-31 Thread David Rowley
On Wed, 1 Jun 2022 at 07:08, Dave Cramer  wrote:
>
> On Tue, 31 May 2022 at 14:51, Tom Lane  wrote:
>>
>> Alvaro Herrera  writes:
>> > I think it's reasonable to have two adjacent rows in the table for these
>> > two closely related things, but rather than "columns per tuple" I would
>> > label the second one "columns in a result set".  This is easy enough to
>> > understand and to differentiate from the other limit.
>>
>> OK, with that wording it's probably clear enough.

> Reworded patch attached

I see the patch does not have the same text as what was proposed and
seconded above.  My personal preferences would be "result set
columns", but "columns in a result set" seems fine too.

I've adjusted the patch to use the wording proposed by Alvaro. See attached.

I will push this shortly.

David
diff --git a/doc/src/sgml/limits.sgml b/doc/src/sgml/limits.sgml
index 7713ff7177..d5b2b627dd 100644
--- a/doc/src/sgml/limits.sgml
+++ b/doc/src/sgml/limits.sgml
@@ -62,6 +62,12 @@
  below
 
 
+
+ columns in a result set
+ 1664
+ 
+
+
 
  field size
  1 GB


Re: PostgreSQL Limits: maximum number of columns in SELECT result

2022-05-31 Thread Bruce Momjian
On Tue, May 31, 2022 at 01:22:44PM -0400, Dave Cramer wrote:
> 
> 
> On Tue, 31 May 2022 at 10:49, Tom Lane  wrote:
> 
> Dave Cramer  writes:
> > On Tue, 31 May 2022 at 10:16, Tom Lane  wrote:
> >> We've generally felt that the existing "columns per table" limit is
> >> sufficient detail here.
> 
> > ISTM that adding detail is free whereas the readers time to figure out
> why
> > and where this number came from is not.
> 
> Detail is far from "free".  Most readers are going to spend more time
> wondering what the difference is between "columns per table" and "columns
> per tuple", and which limit applies when, than they are going to save by
> having the docs present them with two inconsistent numbers.
> 
> 
> Sounds to me like we are discussing different sides of the same coin. On one
> hand we have readers of the documentation who may be confused, 
> and on the other hand we have developers who run into this and have to spend
> time digging into the code to figure out what's what.

How many people ask about this limit.  I can't remember one.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Re: pg15b1: FailedAssertion("val > base", File: "...src/include/utils/relptr.h", Line: 67, PID: 30485)

2022-05-31 Thread Tom Lane
Robert Haas  writes:
> On Tue, May 31, 2022 at 6:14 PM Tom Lane  wrote:
>> However, now that I've corrected that mistaken image ... I wonder if
>> it could make sense to redefine relptr as self-relative?  That ought
>> to provide some notational savings since you'd only need to carry
>> around the relptr's own address not that plus a base address.
>> Probably not something to consider for v15 though.

> I think that would be pretty hard to make work, since copying around a
> relative pointer would change its meaning. Code like "relptr_foo x =
> *y" would be broken, for example, but the compiler would not complain.

Sure, but the current definition is far from error-proof as well:
nothing stops you from using the wrong base address with a relptr's
value.  Anyway, it's just idle speculation at this point.

regards, tom lane




Re: pg15b1: FailedAssertion("val > base", File: "...src/include/utils/relptr.h", Line: 67, PID: 30485)

2022-05-31 Thread Robert Haas
On Tue, May 31, 2022 at 6:14 PM Tom Lane  wrote:
> However, now that I've corrected that mistaken image ... I wonder if
> it could make sense to redefine relptr as self-relative?  That ought
> to provide some notational savings since you'd only need to carry
> around the relptr's own address not that plus a base address.
> Probably not something to consider for v15 though.

I think that would be pretty hard to make work, since copying around a
relative pointer would change its meaning. Code like "relptr_foo x =
*y" would be broken, for example, but the compiler would not complain.
Or maybe I misunderstand your idea?

Also keep in mind that the major use case here is DSM segments, which
can be mapped at different addresses in different processes. Mainly,
we expect to store relative pointers in the segment to other things in
the same segment. Sometimes, we might read the values from there into
local variables - or maybe global variables - in code that is
accessing those DSM segments for some purpose.

There is little use for a relative pointer that can access all of the
address space that exists. For that, it is better to just as a regular
pointer.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-05-31 Thread Jacob Champion
v10 is rebased over latest; I've also added a PGDLLIMPORT to the new global.

--Jacob
From c8b3d2df4ce461fc65a27699419a54a5b7bb2001 Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Mon, 14 Feb 2022 08:10:53 -0800
Subject: [PATCH v10 1/2] Add API to retrieve authn_id from SQL

The authn_id field in MyProcPort is currently only accessible to the
backend itself.  Add a SQL function, pg_session_authn_id(), to expose
the field to triggers that may want to make use of it.
---
 doc/src/sgml/func.sgml| 26 +++
 src/backend/utils/adt/name.c  | 12 ++-
 src/include/catalog/pg_proc.dat   |  3 +++
 src/test/authentication/t/001_password.pl | 11 ++
 src/test/ssl/t/001_ssltests.pl|  7 ++
 5 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index db3147d1c4..0532e2d605 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -23344,6 +23344,32 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n);

   
 
+  
+   
+
+ pg_session_authn_id
+
+pg_session_authn_id ()
+text
+   
+   
+Returns the authenticated identity for the current connection, or
+NULL if the user has not been authenticated.
+   
+   
+The authenticated identity is an immutable identifier for the user
+presented during the connection handshake; the exact format depends on
+the authentication method in use. (For example, when using the
+scram-sha-256 auth method, the authenticated identity
+is simply the username. When using the cert auth
+method, the authenticated identity is the Distinguished Name of the
+client certificate.) Even for auth methods which use the username as
+the authenticated identity, this function differs from
+session_user in that its return value cannot be
+changed after login.
+   
+  
+
   

 
diff --git a/src/backend/utils/adt/name.c b/src/backend/utils/adt/name.c
index e8bba3670c..662a7943ed 100644
--- a/src/backend/utils/adt/name.c
+++ b/src/backend/utils/adt/name.c
@@ -23,6 +23,7 @@
 #include "catalog/namespace.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
+#include "libpq/libpq-be.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
@@ -257,7 +258,7 @@ namestrcmp(Name name, const char *str)
 
 
 /*
- * SQL-functions CURRENT_USER, SESSION_USER
+ * SQL-functions CURRENT_USER, SESSION_USER, PG_SESSION_AUTHN_ID
  */
 Datum
 current_user(PG_FUNCTION_ARGS)
@@ -271,6 +272,15 @@ session_user(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(DirectFunctionCall1(namein, CStringGetDatum(GetUserNameFromId(GetSessionUserId(), false;
 }
 
+Datum
+pg_session_authn_id(PG_FUNCTION_ARGS)
+{
+	if (!MyProcPort || !MyProcPort->authn_id)
+		PG_RETURN_NULL();
+
+	PG_RETURN_TEXT_P(cstring_to_text(MyProcPort->authn_id));
+}
+
 
 /*
  * SQL-functions CURRENT_SCHEMA, CURRENT_SCHEMAS
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 87aa571a33..8e181b4771 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1508,6 +1508,9 @@
 { oid => '746', descr => 'session user name',
   proname => 'session_user', provolatile => 's', prorettype => 'name',
   proargtypes => '', prosrc => 'session_user' },
+{ oid => '9774', descr => 'session authenticated identity',
+  proname => 'pg_session_authn_id', provolatile => 's', proparallel => 'r',
+  prorettype => 'text', proargtypes => '', prosrc => 'pg_session_authn_id' },
 
 { oid => '744',
   proname => 'array_eq', prorettype => 'bool',
diff --git a/src/test/authentication/t/001_password.pl b/src/test/authentication/t/001_password.pl
index 3e3079c824..f0bdeda52d 100644
--- a/src/test/authentication/t/001_password.pl
+++ b/src/test/authentication/t/001_password.pl
@@ -82,6 +82,10 @@ test_role($node, 'scram_role', 'trust', 0,
 test_role($node, 'md5_role', 'trust', 0,
 	log_unlike => [qr/connection authenticated:/]);
 
+my $res =
+  $node->safe_psql('postgres', "SELECT pg_session_authn_id() IS NULL;");
+is($res, 't', "users with trust authentication have NULL authn_id");
+
 # For plain "password" method, all users should also be able to connect.
 reset_pg_hba($node, 'password');
 test_role($node, 'scram_role', 'password', 0,
@@ -91,6 +95,13 @@ test_role($node, 'md5_role', 'password', 0,
 	log_like =>
 	  [qr/connection authenticated: identity="md5_role" method=password/]);
 
+$res = $node->safe_psql(
+	'postgres',
+	"SELECT pg_session_authn_id();",
+	connstr => "user=md5_role");
+is($res, 'md5_role',
+	"users with md5 authentication have authn_id matching role name");
+
 # For "scram-sha-256" method, user "scram_role" should be able to connect.
 reset_pg_hba($node, 'scram-sha-256');
 test_role(
diff --git a/src/test/ssl/t/001_ssltests.pl 

Re: pg15b1: FailedAssertion("val > base", File: "...src/include/utils/relptr.h", Line: 67, PID: 30485)

2022-05-31 Thread Tom Lane
Robert Haas  writes:
> Seems backwards to me. A relative pointer is supposed to point to
> something inside some range of memory, like a DSM gment -- it can
> never be legally used to point to anything outside that segment. So it
> seems to me that you could perfectly legally point to the second byte
> of the segment, but never to the -1'th byte.

Okay, I was thinking about it slightly wrong: relptr is defined as an
offset relative to some base address, not to its own address.  As long
as you're prepared to assume that the base address really is the start
of the addressable area, then yeah the above argument works.

However, now that I've corrected that mistaken image ... I wonder if
it could make sense to redefine relptr as self-relative?  That ought
to provide some notational savings since you'd only need to carry
around the relptr's own address not that plus a base address.
Probably not something to consider for v15 though.

regards, tom lane




Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2022-05-31 Thread Zhihong Yu
On Tue, May 31, 2022 at 1:43 PM Zhihong Yu  wrote:

>
>
> On Tue, May 31, 2022 at 12:43 PM Dmitry Koval 
> wrote:
>
>>  >Just out of curiosity, why is SPLIT / MERGE support not included for
>>  >HASH partitions? Because sibling partitions can have a different
>>  >modulus, you should be able to e.g. split a partition with (modulus,
>>  >remainder) of (3, 1) into two partitions with (mod, rem) of (6, 1) and
>>  >(6, 4) respectively, with the reverse being true for merge operations,
>>  >right?
>>
>> You are right, SPLIT/MERGE operations can be added for HASH-partitioning
>> in the future. But HASH-partitioning is rarer than RANGE- and
>> LIST-partitioning and I decided to skip it in the first step.
>> Maybe community will say that SPLIT/MERGE commands are not needed... (At
>> first step I would like to make sure that it is no true)
>>
>> P.S. I attached patch with 1-line warning fix (for cfbot).
>> --
>> With best regards,
>> Dmitry Koval
>>
>> Postgres Professional: http://postgrespro.com
>
>
> Hi,
> For attachPartTable, the parameter wqueue is missing from comment.
> The parameters of CloneRowTriggersToPartition are called parent
> and partition. I think it is better to name the parameters to
> attachPartTable in a similar manner.
>
> For struct SplitPartContext, SplitPartitionContext would be better name.
>
> +   /* Store partition contect into list. */
> contect -> context
>
> Cheers
>
Hi,
For transformPartitionCmdForMerge(), nested loop is used to detect
duplicate names.
If the number of partitions in partcmd->partlist, we should utilize map to
speed up the check.

For check_parent_values_in_new_partitions():

+   if (!find_value_in_new_partitions(>partsupfunc[0],
+ key->partcollation, parts,
nparts, datum, false))
+   found = false;

It seems we can break out of the loop when found is false.

Cheers


Re: pg15b1: FailedAssertion("val > base", File: "...src/include/utils/relptr.h", Line: 67, PID: 30485)

2022-05-31 Thread Robert Haas
On Tue, May 31, 2022 at 5:52 PM Tom Lane  wrote:
> Thomas Munro  writes:
> > Count we make the relptrs 1-based, so that 0 is reserved as a sentinel
> > that has the nice memset(0) property?
>
> Hm ... almost.  A +1 offset would mean that zero is ambiguous with a
> pointer to the byte just before the relptr.  Maybe that case never
> arises in practice, but now that we've seen this problem I'm not real
> comfortable with such an assumption.  But how about a -1 offset?
> Then zero would be ambiguous with a pointer to the second byte of the
> relptr, and I think I *am* prepared to assume that that has no use-cases.
>
> The other advantage of such a definition is that it'd help flush out
> anybody breaking the relptr abstraction ;-)

Seems backwards to me. A relative pointer is supposed to point to
something inside some range of memory, like a DSM gment -- it can
never be legally used to point to anything outside that segment. So it
seems to me that you could perfectly legally point to the second byte
of the segment, but never to the -1'th byte.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg15b1: FailedAssertion("val > base", File: "...src/include/utils/relptr.h", Line: 67, PID: 30485)

2022-05-31 Thread Tom Lane
Thomas Munro  writes:
> Count we make the relptrs 1-based, so that 0 is reserved as a sentinel
> that has the nice memset(0) property?

Hm ... almost.  A +1 offset would mean that zero is ambiguous with a
pointer to the byte just before the relptr.  Maybe that case never
arises in practice, but now that we've seen this problem I'm not real
comfortable with such an assumption.  But how about a -1 offset?
Then zero would be ambiguous with a pointer to the second byte of the
relptr, and I think I *am* prepared to assume that that has no use-cases.

The other advantage of such a definition is that it'd help flush out
anybody breaking the relptr abstraction ;-)

regards, tom lane




Re: pg15b1: FailedAssertion("val > base", File: "...src/include/utils/relptr.h", Line: 67, PID: 30485)

2022-05-31 Thread Thomas Munro
On Wed, Jun 1, 2022 at 9:09 AM Tom Lane  wrote:
> Robert Haas  writes:
> > Could it use something other than its own address as the base address?
>
> Hmm, maybe we could make something of that idea ...
>
> > One way to do this would be to put it at the *end* of the
> > "Preallocated DSM" space, rather than the beginning.
>
> ... but that way doesn't sound good.  Doesn't it just move the
> problem to the first object allocated inside the FPM?

Count we make the relptrs 1-based, so that 0 is reserved as a sentinel
that has the nice memset(0) property?




Re: Prevent writes on large objects in read-only transactions

2022-05-31 Thread Robert Haas
On Tue, May 31, 2022 at 3:49 PM Tom Lane  wrote:
> Yeah.  Certainly we'd not want to back-patch this change, in case
> anyone is relying on the current behavior ... but it's hard to argue
> that it's not wrong.

Agreed.

> What I'm wondering about is how far the principle of read-only-ness
> ought to be expected to go.  Should a read-only transaction fail
> to execute adminpack's pg_file_write(), for example?  Should it
> refuse to execute random() on the grounds that that changes the
> session's PRNG state?  The latter seems obviously silly, but
> I'm not very sure about pg_file_write().  Maybe the restriction
> should be "no changes to database state that's visible to other
> sessions", which would leave outside-the-DB changes out of the
> discussion.

Yeah, I think that's a pretty good idea. It's really pretty hard to
imagine preventing outside-the-database writes in any kind of
principled way. Somebody can install a C function that does anything,
and we can do a pretty fair job preventing it from e.g. acquiring a
transaction ID or writing WAL by making changes in PostgreSQL itself,
but we can't prevent it from doing whatever it wants outside the
database. Nor is it even a very clear concept definitionally. I
wouldn't consider a function read-write solely on the basis that it
can cause data to be written to the PostgreSQL log file, for instance,
so it doesn't seem correct to suppose that a C function provided by an
extension is read-write just because it calls write(2) -- not that we
can detect that anyway, but even if we could.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg15b1: FailedAssertion("val > base", File: "...src/include/utils/relptr.h", Line: 67, PID: 30485)

2022-05-31 Thread Tom Lane
Robert Haas  writes:
> Could it use something other than its own address as the base address?

Hmm, maybe we could make something of that idea ...

> One way to do this would be to put it at the *end* of the
> "Preallocated DSM" space, rather than the beginning.

... but that way doesn't sound good.  Doesn't it just move the
problem to the first object allocated inside the FPM?

regards, tom lane




Re: pg15b1: FailedAssertion("val > base", File: "...src/include/utils/relptr.h", Line: 67, PID: 30485)

2022-05-31 Thread Robert Haas
On Tue, May 31, 2022 at 4:32 PM Thomas Munro  wrote:
> This FPM isn't in a DSM.  (It happens to have DSMs *inside it*,
> because I'm using it as a separate DSM allocator: instead of making
> them with dsm_impl.c mechanisms, this one recycles space from the main
> shmem area).  I view FPM as a reusable 4kb page-based memory allocator
> that could have many potential uses, not as a thing that must live
> inside another thing with a TOC.  The fact that it uses the relptr
> thing makes it possible to use FPM inside DSMs too, but that doesn't
> mean it has to be used inside a DSM.

Could it use something other than its own address as the base address?
One way to do this would be to put it at the *end* of the
"Preallocated DSM" space, rather than the beginning.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg15b1: FailedAssertion("val > base", File: "...src/include/utils/relptr.h", Line: 67, PID: 30485)

2022-05-31 Thread Robert Haas
On Tue, May 31, 2022 at 4:10 PM Tom Lane  wrote:
> Seems like that in itself is a a lousy idea.  Either the code should
> respect the abstraction, or it shouldn't be declaring the variable
> as a relptr in the first place.

Yep. I think it should be respecting the abstraction, but the 2016
version of me failed to realize the issue when committing 13e14a78ea1.
Hindsight is 20-20, perhaps.

> > So we can fix this by:
> > 1. Using a relative pointer value other than 0 to represent a null
> > pointer. Andres suggested (Size) -1.
> > 2. Not storing the free page manager for the DSM in the main shared
> > memory segment at byte offset 0.
> > 3. Dropping the assertion while loudly singing "la la la la la la".
>
> I'm definitely down on #3, because that just leaves the ambiguity
> in place to bite somewhere else in future.  #1 would work as long
> as nobody expects memset-to-zero to produce null relptrs, but that
> doesn't seem very nice either.

Well, that's a good point that I hadn't considered, actually. I was
thinking I'd only picked 0 as the value out of adherence to
convention, but I might have had this in mind too, at the time.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: generate_series for timestamptz and time zone problem

2022-05-31 Thread Tom Lane
=?UTF-8?Q?Przemys=c5=82aw_Sztoch?=  writes:
> |generate_series| ( /|start|/ |timestamp with time zone|, /|stop|/ 
> |timestamp with time zone|, /|step|/ |interval| )
> produces results depending on the timezone value set:

That's intentional.  If you don't want it, maybe you should be using
generate_series on timestamp without time zone?

regards, tom lane




Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2022-05-31 Thread Zhihong Yu
On Tue, May 31, 2022 at 12:43 PM Dmitry Koval 
wrote:

>  >Just out of curiosity, why is SPLIT / MERGE support not included for
>  >HASH partitions? Because sibling partitions can have a different
>  >modulus, you should be able to e.g. split a partition with (modulus,
>  >remainder) of (3, 1) into two partitions with (mod, rem) of (6, 1) and
>  >(6, 4) respectively, with the reverse being true for merge operations,
>  >right?
>
> You are right, SPLIT/MERGE operations can be added for HASH-partitioning
> in the future. But HASH-partitioning is rarer than RANGE- and
> LIST-partitioning and I decided to skip it in the first step.
> Maybe community will say that SPLIT/MERGE commands are not needed... (At
> first step I would like to make sure that it is no true)
>
> P.S. I attached patch with 1-line warning fix (for cfbot).
> --
> With best regards,
> Dmitry Koval
>
> Postgres Professional: http://postgrespro.com


Hi,
For attachPartTable, the parameter wqueue is missing from comment.
The parameters of CloneRowTriggersToPartition are called parent
and partition. I think it is better to name the parameters to
attachPartTable in a similar manner.

For struct SplitPartContext, SplitPartitionContext would be better name.

+   /* Store partition contect into list. */
contect -> context

Cheers


Logging query parmeters in auto_explain

2022-05-31 Thread Dagfinn Ilmari Mannsåker
Hi hackers,

Inspired by a question on IRC, I noticed that while the core statement
logging system gained the option to log statement parameters in PG 13,
auto_explain was left out.

Here's a patch that adds a corresponding
auto_explain.log_parameter_max_length config setting, which controls the
"Query Parameters" node in the logged plan.  Just like in core, the
default is -1, which logs the parameters in full, and 0 disables
parameter logging, while any other value truncates each parameter to
that many bytes.

- ilmari

>From 24a31273475f26b3174a6e2178d0121f5e23b5f5 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Tue, 31 May 2022 21:12:12 +0100
Subject: [PATCH] Log query parameters in auto_explain

Add an auto_explain.log_parameter_max_length config setting, similar
to the corresponding core setting, that controls the inclusion of
query parameters in the logged explain output.
---
 contrib/auto_explain/auto_explain.c| 15 +
 contrib/auto_explain/t/001_auto_explain.pl | 25 --
 doc/src/sgml/auto-explain.sgml | 19 
 src/backend/commands/explain.c | 22 +++
 src/include/commands/explain.h |  1 +
 5 files changed, 80 insertions(+), 2 deletions(-)

diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index c9a0d947c8..1ba7536879 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -19,12 +19,14 @@
 #include "common/pg_prng.h"
 #include "executor/instrument.h"
 #include "jit/jit.h"
+#include "nodes/params.h"
 #include "utils/guc.h"
 
 PG_MODULE_MAGIC;
 
 /* GUC variables */
 static int	auto_explain_log_min_duration = -1; /* msec or -1 */
+static int	auto_explain_log_parameter_max_length = -1; /* bytes or -1 */
 static bool auto_explain_log_analyze = false;
 static bool auto_explain_log_verbose = false;
 static bool auto_explain_log_buffers = false;
@@ -105,6 +107,18 @@ _PG_init(void)
 			NULL,
 			NULL);
 
+	DefineCustomIntVariable("auto_explain.log_parameter_max_length",
+			"Sets the maximum length of query parameters to log.",
+			"Zero logs no query parameters, -1 logs them in full.",
+			_explain_log_parameter_max_length,
+			-1,
+			-1, INT_MAX,
+			PGC_SUSET,
+			GUC_UNIT_BYTE,
+			NULL,
+			NULL,
+			NULL);
+
 	DefineCustomBoolVariable("auto_explain.log_analyze",
 			 "Use EXPLAIN ANALYZE for plan logging.",
 			 NULL,
@@ -389,6 +403,7 @@ explain_ExecutorEnd(QueryDesc *queryDesc)
 
 			ExplainBeginOutput(es);
 			ExplainQueryText(es, queryDesc);
+			ExplainQueryParameters(es, queryDesc->params, auto_explain_log_parameter_max_length);
 			ExplainPrintPlan(es, queryDesc);
 			if (es->analyze && auto_explain_log_triggers)
 ExplainPrintTriggers(es, queryDesc);
diff --git a/contrib/auto_explain/t/001_auto_explain.pl b/contrib/auto_explain/t/001_auto_explain.pl
index 82e4d9d15c..ac57c8d6bf 100644
--- a/contrib/auto_explain/t/001_auto_explain.pl
+++ b/contrib/auto_explain/t/001_auto_explain.pl
@@ -13,20 +13,21 @@
 $node->append_conf('postgresql.conf',
 	"shared_preload_libraries = 'auto_explain'");
 $node->append_conf('postgresql.conf', "auto_explain.log_min_duration = 0");
+$node->append_conf('postgresql.conf', "auto_explain.log_parameter_max_length = -1");
 $node->append_conf('postgresql.conf', "auto_explain.log_analyze = on");
 $node->start;
 
 # run a couple of queries
 $node->safe_psql("postgres", "SELECT * FROM pg_class;");
 $node->safe_psql("postgres",
-	"SELECT * FROM pg_proc WHERE proname = 'int4pl';");
+	q{PREPARE get_proc(name) AS SELECT * FROM pg_proc WHERE proname = $1; EXECUTE get_proc('int4pl');});
 
 # emit some json too
 $node->append_conf('postgresql.conf', "auto_explain.log_format = json");
 $node->reload;
 $node->safe_psql("postgres", "SELECT * FROM pg_proc;");
 $node->safe_psql("postgres",
-	"SELECT * FROM pg_class WHERE relname = 'pg_class';");
+	q{PREPARE get_class(name) AS SELECT * FROM pg_class WHERE relname = $1; EXECUTE get_class('pg_class');});
 
 $node->stop('fast');
 
@@ -34,6 +35,16 @@
 
 my $log_contents = slurp_file($log);
 
+like(
+	$log_contents,
+	qr/Query Text: SELECT \* FROM pg_class;/,
+	"query text logged, text mode");
+
+like(
+	$log_contents,
+	qr/Query Parameters: \$1 = 'int4pl'/,
+	"query parameters logged, text mode");
+
 like(
 	$log_contents,
 	qr/Seq Scan on pg_class/,
@@ -44,6 +55,16 @@
 	qr/Index Scan using pg_proc_proname_args_nsp_index on pg_proc/,
 	"index scan logged, text mode");
 
+like(
+	$log_contents,
+	qr/"Query Text": "SELECT \* FROM pg_proc;"/,
+	"query text logged, json mode");
+
+like(
+	$log_contents,
+	qr/"Query Parameters": "\$1 = 'pg_class'"/,
+	"query parameters logged, json mode");
+
 like(
 	$log_contents,
 	qr/"Node Type": "Seq Scan"[^}]*"Relation Name": "pg_proc"/s,
diff --git a/doc/src/sgml/auto-explain.sgml b/doc/src/sgml/auto-explain.sgml
index 

Re: pg15b1: FailedAssertion("val > base", File: "...src/include/utils/relptr.h", Line: 67, PID: 30485)

2022-05-31 Thread Thomas Munro
On Wed, Jun 1, 2022 at 8:10 AM Tom Lane  wrote:
> Robert Haas  writes:
> > So we can fix this by:
> > 1. Using a relative pointer value other than 0 to represent a null
> > pointer. Andres suggested (Size) -1.
> > 2. Not storing the free page manager for the DSM in the main shared
> > memory segment at byte offset 0.
> > 3. Dropping the assertion while loudly singing "la la la la la la".
>
> I'm definitely down on #3, because that just leaves the ambiguity
> in place to bite somewhere else in future.  #1 would work as long
> as nobody expects memset-to-zero to produce null relptrs, but that
> doesn't seem very nice either.
>
> On the whole, wasting MAXALIGN worth of memory seems like the least bad
> alternative, but I wonder if we ought to do it right here as opposed
> to somewhere in the DSM code proper.  Why is this DSM space not like
> other DSM spaces in starting with a TOC?

This FPM isn't in a DSM.  (It happens to have DSMs *inside it*,
because I'm using it as a separate DSM allocator: instead of making
them with dsm_impl.c mechanisms, this one recycles space from the main
shmem area).  I view FPM as a reusable 4kb page-based memory allocator
that could have many potential uses, not as a thing that must live
inside another thing with a TOC.  The fact that it uses the relptr
thing makes it possible to use FPM inside DSMs too, but that doesn't
mean it has to be used inside a DSM.

I vote for #1.




Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2022-05-31 Thread Dmitry Koval

I didn't read the patch, but what lock level does that place on the
partitioned table?  Anything more than ACCESS SHARE?


Current patch locks a partitioned table with ACCESS EXCLUSIVE lock. 
Unfortunately only this lock guarantees that other session can not work 
with partitions that are splitting or merging.


I want add CONCURRENTLY mode in future. With this mode partitioned table 
during SPLIT/MERGE operation will be locked with SHARE UPDATE EXCLUSIVE 
(as ATTACH/DETACH PARTITION commands in CONCURRENTLY mode).
But in this case queries from other sessions that want to work with 
partitions that are splitting/merging at this time should receive an 
error (like "Partition data is moving. Repeat the operation later") 
because old partitions will be deleted at the end of SPLIT/MERGE operation.

I hope exists a better solution, but I don't know it now...

--
With best regards,
Dmitry Koval

Postgres Professional: http://postgrespro.com




Re: Assorted small doc patches

2022-05-31 Thread David G. Johnston
Anything I should be doing differently here to get a bit of
reviewer/committer time on these?  I'll add them to the commitfest for next
month if needed but I'm seeing quick patches going in every week and the
batch format done at the beginning of the month got processed through
without issue.

Per: https://wiki.postgresql.org/wiki/Submitting_a_Patch
I was hoping for Workflow A especially as I acquit myself more than
adequately on the "How do you get someone to respond to you?" items.

I was going to chalk it up to bad timing but the volume of doc patches this
month hasn't really dipped even with the couple of bad bugs being worked on.

Thank you!

David J.

On Fri, Apr 29, 2022 at 6:52 AM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> Updated status of the set.
>
> On Wed, Apr 20, 2022 at 5:59 PM David G. Johnston <
> david.g.johns...@gmail.com> wrote:
>
>>
>> v0001-database-default-name (-bugs, with a related cleanup suggestion as
>> well)
>>
>> https://www.postgresql.org/message-id/flat/CAKFQuwZvHH1HVSOu7EYjvshynk4pnDwC5RwkF%3DVfZJvmUskwrQ%40mail.gmail.com#0e6d799478d88aee93402bec35fa64a2
>>
>>
>> v0002-doc-extension-dependent-routine-behavior (-general, reply to user
>> confusion)
>>
>> https://www.postgresql.org/message-id/CAKFQuwb_QtY25feLeh%3D8uNdnyo1H%3DcN4R3vENsUwQzJP4-0xZg%40mail.gmail.com
>>
>>
>> v0001-doc-savepoint-name-reuse (-docs, reply to user request for
>> improvement)
>>
>> https://www.postgresql.org/message-id/CAKFQuwYzSb9OW5qTFgc0v9RWMN8bX83wpe8okQ7x6vtcmfA2KQ%40mail.gmail.com
>>
>
> Pending discussion of alternate presentation of transaction sequence; if
> not favorable, I can just go with a simple factual fix of the mistake in
> v0001 (see this thread).
>
>
>>
>> v0001-on-conflict-excluded-is-name-not-table (-docs, figured out while
>> trying to improve the docs to reduce user confusion in this area)
>>
>> https://www.postgresql.org/message-id/flat/CAKFQuwYN20c0%2B7kKvm3PBgibu77BzxDvk9RvoXBb1%3Dj1mDODPw%40mail.gmail.com#ea79c88b55fdccecbd2c4fe549f321c9
>>
>>
>> v0001-doc-make-row-estimation-example-match-prose (-docs, reply to user
>> pointing of an inconsistency)
>>
>> https://www.postgresql.org/message-id/CAKFQuwax7V5R_rw%3DEOWmy%3DTBON6v3sveBx_WvwsENskCL5CLQQ%40mail.gmail.com
>>
>
> David J.
>


Re: pg15b1: FailedAssertion("val > base", File: "...src/include/utils/relptr.h", Line: 67, PID: 30485)

2022-05-31 Thread Tom Lane
Robert Haas  writes:
> Yeah, so when I created this stuff in the first place, I figured that
> it wasn't a problem if we reserved relptr == 0 to mean a NULL pointer,
> because you would never have a relative pointer pointing to the
> beginning of a DSM, because it would probably always start with a
> dsm_toc. But when Thomas made it so that DSM allocations could happen
> in the main shared memory segment, that ceased to be true. This
> example happened not to break because we never use relptr_access() on
> fpm->self. We do use fpm_segment_base(), but that accidentally fails
> to break, because instead of using relptr_access() it drills right
> through the abstraction and doesn't have any kind of special case for
> 0.

Seems like that in itself is a a lousy idea.  Either the code should
respect the abstraction, or it shouldn't be declaring the variable
as a relptr in the first place.

> So we can fix this by:
> 1. Using a relative pointer value other than 0 to represent a null
> pointer. Andres suggested (Size) -1.
> 2. Not storing the free page manager for the DSM in the main shared
> memory segment at byte offset 0.
> 3. Dropping the assertion while loudly singing "la la la la la la".

I'm definitely down on #3, because that just leaves the ambiguity
in place to bite somewhere else in future.  #1 would work as long
as nobody expects memset-to-zero to produce null relptrs, but that
doesn't seem very nice either.

On the whole, wasting MAXALIGN worth of memory seems like the least bad
alternative, but I wonder if we ought to do it right here as opposed
to somewhere in the DSM code proper.  Why is this DSM space not like
other DSM spaces in starting with a TOC?

regards, tom lane




Re: support for MERGE

2022-05-31 Thread Justin Pryzby
On Wed, May 18, 2022 at 11:57:15AM -0500, Justin Pryzby wrote:
> I prefer my original, but the most important thing is to include the link at
> *somewhere*.

Any other opinions ?




Re: pg15b1: FailedAssertion("val > base", File: "...src/include/utils/relptr.h", Line: 67, PID: 30485)

2022-05-31 Thread Robert Haas
On Thu, May 19, 2022 at 11:00 PM Kyotaro Horiguchi
 wrote:
> The path is taken only when a valid value is given to the
> parameter. If we don't use preallocated dsm, it is initialized
> elsewhere.  In those cases the first bytes of the base address (the
> second parameter of FreePageManagerInitialize) are used for
> dsa_segment_header so the relptr won't be zero (!= NULL).
>
> It can be silenced by wasting the first MAXALIGN bytes of
> dsm_main_space_begin..

Yeah, so when I created this stuff in the first place, I figured that
it wasn't a problem if we reserved relptr == 0 to mean a NULL pointer,
because you would never have a relative pointer pointing to the
beginning of a DSM, because it would probably always start with a
dsm_toc. But when Thomas made it so that DSM allocations could happen
in the main shared memory segment, that ceased to be true. This
example happened not to break because we never use relptr_access() on
fpm->self. We do use fpm_segment_base(), but that accidentally fails
to break, because instead of using relptr_access() it drills right
through the abstraction and doesn't have any kind of special case for
0. So we can fix this by:

1. Using a relative pointer value other than 0 to represent a null
pointer. Andres suggested (Size) -1.
2. Not storing the free page manager for the DSM in the main shared
memory segment at byte offset 0.
3. Dropping the assertion while loudly singing "la la la la la la".

-- 
Robert Haas
EDB: http://www.enterprisedb.com




generate_series for timestamptz and time zone problem

2022-05-31 Thread Przemysław Sztoch
|generate_series| ( /|start|/ |timestamp with time zone|, /|stop|/ 
|timestamp with time zone|, /|step|/ |interval| )

produces results depending on the timezone value set:

SET timezone = 'UTC';
SELECT ts, ts AT TIME ZONE 'UTC'
FROM generate_series('2022-03-26 00:00:00+01'::timestamptz, '2022-03-30 
00:00:00+01'::timestamptz, '1 day') AS ts;


SET timezone = 'Europe/Warsaw';
SELECT ts, ts AT TIME ZONE 'UTC'
FROM generate_series('2022-03-26 00:00:00+01'::timestamptz, '2022-03-30 
00:00:00+01'::timestamptz, '1 day') AS ts;


Sometimes this is a very big problem.

The fourth argument with the time zone will be very useful:
|generate_series| ( /|start|/ |timestamp with time zone|, /|stop|/ 
|timestamp with time zone|, /|step|/ |interval|  [, zone text] )


The situation is similar with the function timestamptz_pl_interval. The 
third parameter for specifying the zone would be very useful.


--
Przemysław Sztoch | Mobile +48 509 99 00 66


Re: Prevent writes on large objects in read-only transactions

2022-05-31 Thread Tom Lane
Robert Haas  writes:
> On Sat, May 28, 2022 at 5:01 AM Michael Paquier  wrote:
>> Well, there is an actual risk to break applications that have worked
>> until now for a behavior that has existed for years with zero
>> complaints on the matter, so I would leave that alone.  Saying that, I
>> don't really disagree with improving the error messages a bit if we
>> are in recovery.

> On the other hand, there's a good argument that the existing behavior
> is simply incorrect.

Yeah.  Certainly we'd not want to back-patch this change, in case
anyone is relying on the current behavior ... but it's hard to argue
that it's not wrong.

What I'm wondering about is how far the principle of read-only-ness
ought to be expected to go.  Should a read-only transaction fail
to execute adminpack's pg_file_write(), for example?  Should it
refuse to execute random() on the grounds that that changes the
session's PRNG state?  The latter seems obviously silly, but
I'm not very sure about pg_file_write().  Maybe the restriction
should be "no changes to database state that's visible to other
sessions", which would leave outside-the-DB changes out of the
discussion.

regards, tom lane




Re: [RFC] building postgres with meson

2022-05-31 Thread Andres Freund
Hi,

On 2022-05-31 16:49:17 +0300, Aleksander Alekseev wrote:
> I tried the branch on GitHub on MacOS Monterey 12.3.1 and Ubuntu 20.04 LTS.
> I was going to test it against several third party extensions, but it looks 
> like
> it is a bit early for this. On Ubuntu I got the following error:

What do those extensions use to build? Since the unconference I added some
rudimentary PGXS compatibility, but it's definitely not complete yet.


> ```
> ../src/include/parser/kwlist.h:332:25: error: ‘PARAMETER’ undeclared here (not
> in a function)
> 332 | PG_KEYWORD("parameter", PARAMETER, UNRESERVED_KEYWORD, BARE_LABEL)
> 
> ../src/interfaces/ecpg/preproc/keywords.c:32:55: note: in definition of macro
> ‘PG_KEYWORD’
> 32 | #define PG_KEYWORD(kwname, value, category, collabel) value,
> ```

Huh. I've not seen this before - could you provide a bit more detail about
what you did? CI isn't testing ubuntu, but it is testing Debian, so I'd expect
this to work.


> On MacOS I got multiple errors regarding LDAP:

Ah, yes. Sorry, that's an open issue that I need to fix. -Dldap=disabled for
the rescue.  There's some crazy ordering dependency in macos framework
headers. The ldap framework contains an "ldap.h" header that includes
"ldap.h". So if you end up with the framework on the include path, you get
endless recursion.

Greetings,

Andres Freund




Re: PostgreSQL Limits: maximum number of columns in SELECT result

2022-05-31 Thread Dave Cramer
On Tue, 31 May 2022 at 14:51, Tom Lane  wrote:

> Alvaro Herrera  writes:
> > I think it's reasonable to have two adjacent rows in the table for these
> > two closely related things, but rather than "columns per tuple" I would
> > label the second one "columns in a result set".  This is easy enough to
> > understand and to differentiate from the other limit.
>
> OK, with that wording it's probably clear enough.
>
> regards, tom lane
>
> Reworded patch attached


columns_per_tuple.patch
Description: Binary data


Re: PostgreSQL Limits: maximum number of columns in SELECT result

2022-05-31 Thread Tom Lane
Alvaro Herrera  writes:
> I think it's reasonable to have two adjacent rows in the table for these
> two closely related things, but rather than "columns per tuple" I would
> label the second one "columns in a result set".  This is easy enough to
> understand and to differentiate from the other limit.

OK, with that wording it's probably clear enough.

regards, tom lane




Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)

2022-05-31 Thread Tomas Vondra
On 5/31/22 16:36, Ranier Vilela wrote:
> Em dom., 29 de mai. de 2022 às 17:10, Ranier Vilela  > escreveu:
> 
> Em dom., 29 de mai. de 2022 às 15:21, Andres Freund
> mailto:and...@anarazel.de>> escreveu:
> 
> On 2022-05-29 18:00:14 +0200, Tomas Vondra wrote:
> > Also, there's the question of correctness, and I'd bet Andres
> is right
> > getting snapshot without holding ProcArrayLock is busted.
> 
> Unless there's some actual analysis of this by Rainier, I'm just
> going to
> ignore this thread going forward. It's pointless to invest time when
> everything we say is just ignored.
> 
> Sorry, just not my intention to ignore this important point.
> Of course, any performance gain is good, but robustness comes first.
> 
> As soon as I have some time.
> 
> I redid the benchmarks, with getting a snapshot with holding ProcArrayLock.
> 
> Average Results
>   
>   
>   
>   
> Connections: 
>   tps headtps patch   diff
> 1 39196,3088985   39858,0207936   661,71189518101,69%
> 2 65050,8643819   65245,9852367   195,1208548 100,30%
> 5 91486,0298359   91862,9026528   376,87281685100,41%
> 10131318,0774955  131547,1404573  229,06296175100,17%
> 50116531,2334144  116687,0325522  155,79913781100,13%
> 100   98969,4650449   98808,6778717   -160,78717311   99,84%
> 200   89514,5238649   89463,6196075   -50,9042574599,94%
> 300   88426,3612183   88457,2695151   30,908296802100,03%
> 400   88078,1686912   88338,2859163   260,11722505100,30%
> 500   87791,1620039   88074,3418504   283,17984653100,32%
> 600   87552,3343394   87930,8645184   378,53017894100,43%
> 1000  86538,3772895   86771,1946099   232,81732045100,27%
> avg   89204,408873191789420,444631825 1981,0816042100,24%
> 
>  
> For clients with 1 connections, the results are good.

Isn't that a bit strange, considering the aim of this patch was
scalability? Which should improve higher client counts in the first place.

> But for clients with 100 and 200 connections, the results are not good.
> I can't say why these two tests were so bad.
> Because, 100 and 200 results, I'm not sure if this should go ahead, if
> it's worth the effort.
> 

I'd argue this is either just noise, and there's no actual difference.
This could be verified by some sort of statistical testing (e.g. the
well known t-test).

Another option is that this is simply due to differences in binary
layout - this can result in small differences (easily 1-2%) that are
completely unrelated to what the patch does. This is exactly what the
"stabilizer" talk I mentioned a couple days ago was about.

FWIW, when a patch improves scalability, the improvement usually
increases with the number of clients. So you'd see no/small improvement
for 10 clients, 100 clients would be improved more, 200 more, etc. We
see nothing like that here. So either the patch does not really improve
anything, or perhaps the benchmark doesn't even hit the bottleneck the
patch is meant to improve (which was already suggested in this thread
repeatedly).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Prevent writes on large objects in read-only transactions

2022-05-31 Thread Robert Haas
On Sat, May 28, 2022 at 5:01 AM Michael Paquier  wrote:
> Well, there is an actual risk to break applications that have worked
> until now for a behavior that has existed for years with zero
> complaints on the matter, so I would leave that alone.  Saying that, I
> don't really disagree with improving the error messages a bit if we
> are in recovery.

On the other hand, there's a good argument that the existing behavior
is simply incorrect.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: ParseTzFile doesn't FreeFile on error

2022-05-31 Thread Tom Lane
Kyotaro Horiguchi  writes:
> At Mon, 30 May 2022 13:11:04 -0400, Tom Lane  wrote in 
>> BTW, my first thought about it was "what if one of the callees throws
>> elog(ERROR), eg palloc out-of-memory"?  But I think that's all right
>> since then we'll reach transaction abort cleanup, which won't whine
>> about open files.  The problem is limited to the case where no error
>> gets thrown.

> Right. This "issue" is not a problem unless the caller continues
> without throwing an exception after the function errors out, which is
> not done by the current code.

Actually the problem *is* reachable, if you intentionally break the
already-active timezone abbreviation file: newly started sessions
produce file-leak warnings after failing to apply the setting.
I concede that's not a likely scenario, but that's why I think it's
worth fixing.

regards, tom lane




Re: PostgreSQL Limits: maximum number of columns in SELECT result

2022-05-31 Thread Alvaro Herrera
On 2022-May-31, Tom Lane wrote:

> Detail is far from "free".  Most readers are going to spend more time
> wondering what the difference is between "columns per table" and "columns
> per tuple", and which limit applies when, than they are going to save by
> having the docs present them with two inconsistent numbers.

I think it's reasonable to have two adjacent rows in the table for these
two closely related things, but rather than "columns per tuple" I would
label the second one "columns in a result set".  This is easy enough to
understand and to differentiate from the other limit.

(Replacing "in a" with "per" sounds OK to me but less natural, not sure
why.)

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"The Gord often wonders why people threaten never to come back after they've
been told never to return" (www.actsofgord.com)




Re: PostgreSQL Limits: maximum number of columns in SELECT result

2022-05-31 Thread Dave Cramer
On Tue, 31 May 2022 at 10:49, Tom Lane  wrote:

> Dave Cramer  writes:
> > On Tue, 31 May 2022 at 10:16, Tom Lane  wrote:
> >> We've generally felt that the existing "columns per table" limit is
> >> sufficient detail here.
>
> > ISTM that adding detail is free whereas the readers time to figure out
> why
> > and where this number came from is not.
>
> Detail is far from "free".  Most readers are going to spend more time
> wondering what the difference is between "columns per table" and "columns
> per tuple", and which limit applies when, than they are going to save by
> having the docs present them with two inconsistent numbers.
>

Sounds to me like we are discussing different sides of the same coin. On
one hand we have readers of the documentation who may be confused,
and on the other hand we have developers who run into this and have to
spend time digging into the code to figure out what's what.

For me, while I have some familiarity with the server code it takes me
quite a while to load and find what I am looking for.
Then we have the less than clear names like "resno" for which I still
haven't groked. So imagine someone who has no familiarity
with the backend code trying to figure out why 1664 is relevant when the
docs mention 1600. Surely there must be some middle ground
where we can give them some clues without having to wade through the source
code ?

Dave


Re: PG15 beta1 sort performance regression due to Generation context change

2022-05-31 Thread Robert Haas
On Tue, May 31, 2022 at 11:09 AM Tom Lane  wrote:
> Yeah, we don't have any hard data here.  It could be that it's a win to
> switch to a rule that chunks must present an offset (instead of a pointer)
> back to a block header, which'd then be required to contain a link to the
> actual context, meaning that every context has to do something like what
> I proposed for generation.c.  But nobody's coded that up let alone done
> any testing on it, and I feel like it's too late in the v15 cycle for
> changes as invasive as that.  Quite aside from that change in itself,
> you wouldn't get any actual space savings in aset.c contexts unless
> you did something with the chunk-size field too, and that seems a lot
> messier.
>
> Right now my vote would be to leave things as they stand for v15 ---
> the performance loss that started this thread occurs in a narrow
> enough set of circumstances that I don't feel too much angst about
> it being the price of winning in most other circumstances.  We can
> investigate these options at leisure for v16 or later.

I don't think it's all that narrow a case, but I think it's narrow
enough that I agree that we can live with it if we don't feel like the
risk-reward trade-offs are looking favorable. I don't think tuples
whose size, after rounding to a multiple of 8, is exactly a power of 2
are going to be terribly uncommon. It is not very likely that many
people will have tuples between 25 and 31 bytes, but tables with lots
of 57-64 byte tuples are probably pretty common, and tables with lots
of 121-128 and 249-256 byte tuples are somewhat plausible as well. I
think we will win more than we lose, but I think we will lose often
enough that I wouldn't be very surprised if we get >0 additional
complaints about this over the next 5 years. While it may seem risky
to do something about that now, it will certainly seem a lot riskier
once the release is out, and there is some risk in doing nothing,
because we don't know how many people are going to run into the bad
cases or how severely they might be impacted.

I think the biggest risk in your patch as presented is that slowing
down pfree() might suck in some set of workloads upon which we can't
easily identify now. I don't really know how to rule out that
possibility. In general terms, I think we're not doing ourselves any
favors by relying partly on memory context cleanup and partly on
pfree. The result is that pfree() has to work for every memory context
type and can't be unreasonably slow, which rules out a lot of really
useful ideas, both in terms of how to make allocation faster, and also
in terms of how to make it denser. If you're ever of a mind to put
some efforts into really driving things forward in this area, I think
figuring out some way to break that hard dependence would be effort
really well-spent.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: PostgreSQL Limits: maximum number of columns in SELECT result

2022-05-31 Thread Vladimir Sitnikov
>ost readers are going to spend more time
>wondering what the difference is between "columns per table" and "columns
>per tuple"

"tuple" is already mentioned 10 times on "limits" page, so adding "columns
per tuple" is not really obscure.
The comment could be like "for instance, max number of expressions in each
SELECT clause"


I know I visited current/limits.html many times (mostly for things like
"max field length")
However, I was really surprised there's an easy to hit limit on the number
of expressions in SELECT.

I don't ask to lift the limit, however, I am sure documenting the limit
would make it clear
for the application developers that the limit exists and they should plan
for it in advance.



I bumped into "target lists can have at most 1664 entries" when I was
trying to execute a statement with 65535 parameters.
I know wire format uses unsigned int2 for the number of parameters, so I
wanted to test if the driver supports that.

a) My first test was like select ? c1, ? c2, ? c3, ..., ? c65535
Then it failed with "ERROR: target lists can have at most 1664 entries".
I do not think "columns per table" is applicable to select like that

b) Then I tried select ?||?||?||?||||?
I wanted to verify that the driver sent all the values properly, so I don't
want to just ignore them and I concatenated the values.
Unfortunately, it failed with "stack depth limit exceeded. Increase the
configuration parameter "max_stack_depth" (currently 2048kB), after
ensuring the platform's stack depth limit is adequate"

Finally, I settled on select ARRAY[?, ?, ... ?] which worked up to 65535
parameters just fine.
Please, do not suggest me avoid 65535 parameters. What I wanted was just to
test that the driver was able to handle 65535 parameters.

Vladimir


Re: PG15 beta1 sort performance regression due to Generation context change

2022-05-31 Thread Tom Lane
Robert Haas  writes:
> I don't want to take the position that we ought necessarily to commit
> your patch, because I don't really have a clear sense of what the wins
> and losses actually are.

Yeah, we don't have any hard data here.  It could be that it's a win to
switch to a rule that chunks must present an offset (instead of a pointer)
back to a block header, which'd then be required to contain a link to the
actual context, meaning that every context has to do something like what
I proposed for generation.c.  But nobody's coded that up let alone done
any testing on it, and I feel like it's too late in the v15 cycle for
changes as invasive as that.  Quite aside from that change in itself,
you wouldn't get any actual space savings in aset.c contexts unless
you did something with the chunk-size field too, and that seems a lot
messier.

Right now my vote would be to leave things as they stand for v15 ---
the performance loss that started this thread occurs in a narrow
enough set of circumstances that I don't feel too much angst about
it being the price of winning in most other circumstances.  We can
investigate these options at leisure for v16 or later.

regards, tom lane




Re: SQL/JSON functions vs. ECPG vs. STRING as a reserved word

2022-05-31 Thread Andrew Dunstan


On 2022-05-29 Su 16:19, Tom Lane wrote:
> More generally, I feel like we have a process problem: there needs to
> be a higher bar to adding new fully- or even partially-reserved words.
> I might've missed it, but I don't recall that there was any discussion
> of the compatibility implications of this change.
>

Thanks for fixing this while I was away.

I did in fact raise the issue on 1 Feb, see
,
but nobody responded that I recall. I guess I should have pushed the
discussion further


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Ignore heap rewrites for materialized views in logical replication

2022-05-31 Thread Euler Taveira
On Tue, May 31, 2022, at 11:13 AM, Amit Kapila wrote:
> I think we don't need the retry logical to check error, a simple
> wait_for_caught_up should be sufficient as we are doing in other
> tests. See attached. I have slightly modified the commit message as
> well. Kindly let me know what you think?
Your modification will hang until the test timeout without the patch. That's
why I avoided to use wait_for_caught_up and used a loop for fast exit on success
or failure. I'm fine with a simple test case like you proposed.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: PG15 beta1 sort performance regression due to Generation context change

2022-05-31 Thread Robert Haas
On Fri, May 27, 2022 at 10:52 AM Tom Lane  wrote:
> Given David's results in the preceding message, I don't think I am.
> A scheme like this would add more arithmetic and at least one more
> indirection to GetMemoryChunkContext(), and we already know that
> adding even a test-and-branch there has measurable cost.

I think you're being too negative here. It's a 7% regression on 8-byte
allocations in a tight loop. In real life, people allocate memory
because they want to do something with it, and the allocation overhead
therefore figures to be substantially less. They also nearly always
allocate memory more than 8 bytes at a time, since there's very little
of interest that can fit into an 8 byte allocation, and if you're
dealing with one of the things that can, you're likely to allocate an
array rather than each item individually. I think it's quite plausible
that saving space is going to be more important for performance than
the tiny cost of a test-and-branch here.

I don't want to take the position that we ought necessarily to commit
your patch, because I don't really have a clear sense of what the wins
and losses actually are. But, I am worried that our whole memory
allocation infrastructure is stuck at a local maximum, and I think
your patch pushes in a generally healthy direction: let's optimize for
wasting less space, instead of for the absolute minimum number of CPU
cycles consumed.

aset.c's approach is almost unbeatable for small numbers of
allocations in tiny contexts, and PostgreSQL does a lot of that. But
when you do have cases where a lot of data needs to be stored in
memory, it starts to look pretty lame. To really get out from under
that problem, we'd need to find a way to remove the requirement of a
per-allocation header altogether, and I don't think this patch really
helps us see how we could ever get all the way to that point.
Nonetheless, I like the fact that it puts more flexibility into the
mechanism seemingly at very little real cost, and that it seems to
mean less memory spent on header information rather than user data.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: PostgreSQL Limits: maximum number of columns in SELECT result

2022-05-31 Thread Tom Lane
Dave Cramer  writes:
> On Tue, 31 May 2022 at 10:16, Tom Lane  wrote:
>> We've generally felt that the existing "columns per table" limit is
>> sufficient detail here.

> ISTM that adding detail is free whereas the readers time to figure out why
> and where this number came from is not.

Detail is far from "free".  Most readers are going to spend more time
wondering what the difference is between "columns per table" and "columns
per tuple", and which limit applies when, than they are going to save by
having the docs present them with two inconsistent numbers.

regards, tom lane




Re: PostgreSQL Limits: maximum number of columns in SELECT result

2022-05-31 Thread Dave Cramer
On Tue, 31 May 2022 at 10:16, Tom Lane  wrote:

> Amul Sul  writes:
> > On Tue, May 31, 2022 at 12:46 PM Vladimir Sitnikov
> >  wrote:
> >> I suggest that the limit of "1664 columns per tuple" (or whatever is
> the right term) should be added
> >> to the list at https://www.postgresql.org/docs/current/limits.html
> e.g. after "columns per table".
>
> We've generally felt that the existing "columns per table" limit is
> sufficient detail here.
>

ISTM that adding detail is free whereas the readers time to figure out why
and where this number came from is not.

I think it deserves mention.

Regards,
Dave.


Re: adding status for COPY progress report

2022-05-31 Thread Zhihong Yu
On Tue, May 31, 2022 at 1:20 AM Amit Langote 
wrote:

> Hi,
>
> On Thu, May 26, 2022 at 11:35 AM Zhihong Yu  wrote:
> >> The changes in pgstat_progress_end_command() and
> >> pg_stat_get_progress_info() update st_progress_command_target
> >> depending on the command type involved, breaking the existing contract
> >> of  those routines, particularly the fact that the progress fields
> >> *should* be reset in an error stack.
>
> +1 to what Michael said here.  I don't think the following changes are
> acceptable:
>
> @@ -106,7 +106,13 @@ pgstat_progress_end_command(void)
> return;
>
> PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
> -   beentry->st_progress_command = PROGRESS_COMMAND_INVALID;
> -   beentry->st_progress_command_target = InvalidOid;
> +   if (beentry->st_progress_command == PROGRESS_COMMAND_COPY)
> +   // We want to show the relation for the most recent COPY command
> +   beentry->st_progress_command = PROGRESS_COMMAND_COPY_DONE;
> +   else
> +   {
> +   beentry->st_progress_command = PROGRESS_COMMAND_INVALID;
> +   beentry->st_progress_command_target = InvalidOid;
> +   }
> PGSTAT_END_WRITE_ACTIVITY(beentry);
>  }
>
> pgstat_progress_end_command() is generic infrastructure and there
> shouldn't be anything COPY-specific there.
>
> > I searched the code base for how st_progress_command_target is used.
> > In case there is subsequent command following the COPY,
> st_progress_command_target would be set to the Oid
> > of the subsequent command.
> > In case there is no subsequent command following the COPY command, it
> seems leaving st_progress_command_target as
> > the Oid of the COPY command wouldn't hurt.
> >
> > Maybe you can let me know what side effect not resetting
> st_progress_command_target would have.
> >
> > As an alternative, upon seeing PROGRESS_COMMAND_COPY_DONE, we can
> transfer the value of
> > st_progress_command_target to a new field called, say,
> st_progress_command_previous_target (
> > and resetting st_progress_command_target as usual).
>
> That doesn't sound like a good idea.
>
> As others have said, there's no point in adding a status field to
> pg_stat_progress_copy that only tells whether a COPY is running or
> not.  You can already do that by looking at the output of `select *
> from pg_stat_progress_copy`.  If the COPY you're interested in is
> running, you'll find the corresponding row in the view.  The row is
> made to disappear from the view the instance the COPY finishes, either
> successfully or due to an error.  Whichever is the case will be known
> in the connection that initiated the COPY and you may find it in the
> server log.  I don't think we should make Postgres remember anything
> about that in the shared memory, or at least not with one-off
> adjustments of the shared progress reporting state like in the
> proposed patch.
>
> --
> Thanks, Amit Langote
> EDB: http://www.enterprisedb.com


Hi, Matthias, Michael and Amit:
Thanks for your time reviewing my patch.

I took note of what you said.

If I can make the changes more general, I would circle back.

Cheers


Re: PostgreSQL Limits: maximum number of columns in SELECT result

2022-05-31 Thread Tom Lane
Amul Sul  writes:
> On Tue, May 31, 2022 at 12:46 PM Vladimir Sitnikov
>  wrote:
>> I suggest that the limit of "1664 columns per tuple" (or whatever is the 
>> right term) should be added
>> to the list at https://www.postgresql.org/docs/current/limits.html e.g. 
>> after "columns per table".

We've generally felt that the existing "columns per table" limit is
sufficient detail here.

> Rather, I think the "columns per table" limit needs to be updated to 1664.

That number is not wrong.  See MaxTupleAttributeNumber and
MaxHeapAttributeNumber:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/include/access/htup_details.h;h=51a60eda088578188b41f4506f6053c2fb77ef0b;hb=HEAD#l23

regards, tom lane




Re: Ignore heap rewrites for materialized views in logical replication

2022-05-31 Thread Amit Kapila
On Tue, May 31, 2022 at 6:27 AM Euler Taveira  wrote:
>
> On Sat, May 28, 2022, at 7:07 AM, Amit Kapila wrote:
>
> I agree with your analysis and the fix looks correct to me.
>
> Thanks for checking.
>
> Instead of waiting for an error, we can try to insert into a new table
> created by the test case after the 'Refresh ..' command and wait for
> the change to be replicated by using wait_for_caught_up.
>
> That's a good idea. [modifying the test...] I used the same table. Whenever 
> the
> new row arrives on the subscriber or it reads that error message, it bails 
> out.
>

I think we don't need the retry logical to check error, a simple
wait_for_caught_up should be sufficient as we are doing in other
tests. See attached. I have slightly modified the commit message as
well. Kindly let me know what you think?

-- 
With Regards,
Amit Kapila.


v3-0001-Ignore-heap-rewrites-for-materialized-views-in-lo.patch
Description: Binary data


Re: PostgreSQL Limits: maximum number of columns in SELECT result

2022-05-31 Thread Dave Cramer
>
>
>
>
> On Tue, 31 May 2022 at 09:56, Amul Sul  wrote:
>
>> On Tue, May 31, 2022 at 12:46 PM Vladimir Sitnikov
>>  wrote:
>> >
>> > Hi,
>> >
>> > Today I hit "ERROR: target lists can have at most 1664 entries", and I
>> was surprised the limit was not documented.
>> >
>> > I suggest that the limit of "1664 columns per tuple" (or whatever is
>> the right term) should be added
>> > to the list at https://www.postgresql.org/docs/current/limits.html
>> e.g. after "columns per table".
>> >
>>
>> Rather, I think the "columns per table" limit needs to be updated to 1664.
>>
>
> Actually that is correct. Columns per table is MaxHeapAttributeNumber
> which is 1600.
>
> MaxTupleAttributeNumber  is 1664  and is the limit of user columns in a
> tuple.
>
> Dave
>

Attached is a patch to limits.sgml. I'm not sure this is where it belongs,
as it's not a physical limit per-se but I am not familiar enough with the
docs to propose another location.

Note this was suggested by Vladimir.

see attached


columns_per_tuple.patch
Description: Binary data


Re: Multi-Master Logical Replication

2022-05-31 Thread Bruce Momjian
On Wed, May 25, 2022 at 10:32:50PM -0400, Bruce Momjian wrote:
> On Wed, May 25, 2022 at 12:13:17PM +0530, Amit Kapila wrote:
> > > You still have not answered my question above.  "Without these features,
> > > what workload would this help with?"  You have only explained how the
> > > patch would fix one of the many larger problems.
> > >
> > 
> > It helps with setting up logical replication among two or more nodes
> > (data flows both ways) which is important for use cases where
> > applications are data-aware. For such apps, it will be beneficial to
> 
> That does make sense, thanks.

Uh, thinking some more, why would anyone set things up this way ---
having part of a table being primary on one server and a different part
of the table be a subscriber.  Seems it would be simpler and safer to
create two child tables and have one be primary on only one server. 
Users can access both tables using the parent.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Re: PostgreSQL Limits: maximum number of columns in SELECT result

2022-05-31 Thread Dave Cramer
On Tue, 31 May 2022 at 09:56, Amul Sul  wrote:

> On Tue, May 31, 2022 at 12:46 PM Vladimir Sitnikov
>  wrote:
> >
> > Hi,
> >
> > Today I hit "ERROR: target lists can have at most 1664 entries", and I
> was surprised the limit was not documented.
> >
> > I suggest that the limit of "1664 columns per tuple" (or whatever is the
> right term) should be added
> > to the list at https://www.postgresql.org/docs/current/limits.html e.g.
> after "columns per table".
> >
>
> Rather, I think the "columns per table" limit needs to be updated to 1664.
>

Actually that is correct. Columns per table is MaxHeapAttributeNumber which
is 1600.

MaxTupleAttributeNumber  is 1664  and is the limit of user columns in a
tuple.

Dave


Re: PostgreSQL Limits: maximum number of columns in SELECT result

2022-05-31 Thread Amul Sul
On Tue, May 31, 2022 at 12:46 PM Vladimir Sitnikov
 wrote:
>
> Hi,
>
> Today I hit "ERROR: target lists can have at most 1664 entries", and I was 
> surprised the limit was not documented.
>
> I suggest that the limit of "1664 columns per tuple" (or whatever is the 
> right term) should be added
> to the list at https://www.postgresql.org/docs/current/limits.html e.g. after 
> "columns per table".
>

Rather, I think the "columns per table" limit needs to be updated to 1664.

Regards,
Amul




Re: [RFC] building postgres with meson

2022-05-31 Thread Aleksander Alekseev
Hi Andres,

> Not having a ninja backend etc didn't strike me as great either - the builds
> with scons I've done weren't fast at all.

I must admit, personally I never used Scons, I just know that it was considered
(an / the only?) alternative to CMake for many years. The Scons 4.3.0 release
notes say that Ninja is supported [1], but according to the user guide [2]
Ninja support is considered experimental.

Don't get me wrong, I don't insist on using Scons. I was just curious if it was
considered. Actually, a friend of mine pointed out that the fact that Scons
build files are literally a Python code could be a disadvantage. There is less
control of this code, basically it can do anything. It could complicate the
diagnosis of certain issues, etc.

Since you invested so much effort into Meson already let's just focus on it.

I tried the branch on GitHub on MacOS Monterey 12.3.1 and Ubuntu 20.04 LTS.
I was going to test it against several third party extensions, but it looks like
it is a bit early for this. On Ubuntu I got the following error:

```
../src/include/parser/kwlist.h:332:25: error: ‘PARAMETER’ undeclared here (not
in a function)
332 | PG_KEYWORD("parameter", PARAMETER, UNRESERVED_KEYWORD, BARE_LABEL)

../src/interfaces/ecpg/preproc/keywords.c:32:55: note: in definition of macro
‘PG_KEYWORD’
32 | #define PG_KEYWORD(kwname, value, category, collabel) value,
```

On MacOS I got multiple errors regarding LDAP:

```
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/System/Library/Frameworks/
LDAP.framework/Headers/ldap.h:1:10: error: #include nested too deeply
#include 

../src/interfaces/libpq/fe-connect.c:4816:2: error: use of undeclared
identifier 'LDAP'
LDAP  *ld = NULL;
^

../src/interfaces/libpq/fe-connect.c:4817:2: error: use of undeclared
identifier 'LDAPMessage'
LDAPMessage *res,
^
... etc...
```

I didn't invest much time into investigating these issues. For now I just
wanted to report them. Please let me know if you need any help with these
and/or additional information.

[1]: https://scons.org/scons-430-is-available.html
[2]: https://scons.org/doc/production/PDF/scons-user.pdf

-- 
Best regards,
Aleksander Alekseev




Re: Shmem queue is not flushed if receiver is not yet attached

2022-05-31 Thread Robert Haas
On Mon, May 30, 2022 at 3:06 AM Pavan Deolasee  wrote:
>> I think that this patch is basically correct, except that it's not
>> correct to set mqh_counterparty_attached when receiver is still NULL.
>> Here's a v2 where I've attempted to correct that while preserving the
>> essence of your proposed fix.
>
> This looks good to me,

Thanks for checking. Committed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: "ERROR: latch already owned" on gharial

2022-05-31 Thread Robert Haas
On Tue, May 31, 2022 at 8:20 AM Robert Haas  wrote:
> On Mon, May 30, 2022 at 8:31 PM Thomas Munro  wrote:
> > On Sat, May 28, 2022 at 1:56 AM Robert Haas  wrote:
> > > What I'm inclined to do is get gharial and anole removed from the
> > > buildfarm. anole was set up by Heikki in 2011. I don't know when
> > > gharial was set up, or by whom. I don't think anyone at EDB cares
> > > about these machines any more, or has any interest in maintaining
> > > them. I think the only reason they're still running is that, just by
> > > good fortune, they haven't fallen over and died yet. The hardest part
> > > of getting them taken out of the buildfarm is likely to be finding
> > > someone who has a working username and password to log into them and
> > > take the jobs out of the crontab.
> >
> > FWIW, in a previous investigation, Semab and Sandeep had access:
> >
> > https://www.postgresql.org/message-id/CABimMB4mRs9N3eivR-%3DqF9M8oWc5E6OX7GywsWF0DXN4P5gNEA%40mail.gmail.com
>
> Yeah, I'm in touch with Sandeep but not able to get in yet for some
> reason. Will try to sort it out.

OK, I have access to the box now. I guess I might as well leave the
crontab jobs enabled until the next time this happens, since Thomas
just took steps to improve the logging, but I do think these BF
members are overdue to be killed off, and would like to do that as
soon as it seems like a reasonable step to take.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: doc: CREATE FOREIGN TABLE .. PARTITION OF .. DEFAULT

2022-05-31 Thread Robert Haas
On Mon, May 30, 2022 at 2:27 AM Etsuro Fujita  wrote:
> On Fri, May 27, 2022 at 9:22 PM Amit Langote  wrote:
> > On Fri, May 27, 2022 at 7:15 PM Etsuro Fujita  
> > wrote:
> > >  Attached is a patch for that.
>
> > I think we should also rewrite the description to match the CREATE
> > TABLE's text, as in the attached updated patch.
>
> Actually, I thought the description would be OK as-is, because it says
> “See the similar form of CREATE TABLE for more details”, but I agree
> with you; it’s much better to also rewrite the description as you
> suggest.

I would probably just update the synopsis. It's not very hard to
figure out what's likely to happen even without clicking through the
link, so it seems like it's just being long-winded to duplicate the
stuff here. But I don't care much if you feel otherwise.

> I’ll commit the patch unless Robert wants to.

Please go ahead.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Allowing REINDEX to have an optional name

2022-05-31 Thread Alvaro Herrera
On 2022-May-31, Michael Paquier wrote:

> The case with CONCURRENTLY is different though: the option will never
> work on system catalogs so we have to skip them.  Echoing with others
> on this thread, I don't think that we should introduce a different
> behavior on what's basically the same grammar.  That's just going to
> lead to more confusion.  So REINDEX DATABASE with or without a
> database name appended to it should always mean to reindex the
> catalogs on top of the existing relations.

I was thinking the opposite: REINDEX DATABASE with or without a database
name should always process the user relations and skip system catalogs.
If the user wants to do both, then they can use REINDEX SYSTEM in
addition.

The reason for doing it like this is that there is no way to process
only user tables and skip catalogs.  So this is better for
composability.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Most hackers will be perfectly comfortable conceptualizing users as entropy
 sources, so let's move on."   (Nathaniel Smith)




Re: "ERROR: latch already owned" on gharial

2022-05-31 Thread Robert Haas
On Mon, May 30, 2022 at 8:31 PM Thomas Munro  wrote:
> On Sat, May 28, 2022 at 1:56 AM Robert Haas  wrote:
> > What I'm inclined to do is get gharial and anole removed from the
> > buildfarm. anole was set up by Heikki in 2011. I don't know when
> > gharial was set up, or by whom. I don't think anyone at EDB cares
> > about these machines any more, or has any interest in maintaining
> > them. I think the only reason they're still running is that, just by
> > good fortune, they haven't fallen over and died yet. The hardest part
> > of getting them taken out of the buildfarm is likely to be finding
> > someone who has a working username and password to log into them and
> > take the jobs out of the crontab.
>
> FWIW, in a previous investigation, Semab and Sandeep had access:
>
> https://www.postgresql.org/message-id/CABimMB4mRs9N3eivR-%3DqF9M8oWc5E6OX7GywsWF0DXN4P5gNEA%40mail.gmail.com

Yeah, I'm in touch with Sandeep but not able to get in yet for some
reason. Will try to sort it out.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Allowing REINDEX to have an optional name

2022-05-31 Thread Michael Paquier
On Tue, May 31, 2022 at 11:04:58AM +0200, Bernd Helmle wrote:
> And we already have a situation where this already happens with REINDEX
> DATABASE: if you use CONCURRENTLY, it skips system catalogs already and
> prints a warning. In both cases there are good technical reasons to
> skip catalog indexes and to change the workflow to use separate
> commands.

The case with CONCURRENTLY is different though: the option will never
work on system catalogs so we have to skip them.  Echoing with others
on this thread, I don't think that we should introduce a different
behavior on what's basically the same grammar.  That's just going to
lead to more confusion.  So REINDEX DATABASE with or without a
database name appended to it should always mean to reindex the
catalogs on top of the existing relations.
--
Michael


signature.asc
Description: PGP signature


Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2022-05-31 Thread Laurenz Albe
On Tue, 2022-05-31 at 12:32 +0300, Dmitry Koval wrote:
> There are not many commands in PostgreSQL for working with partitioned 
> tables. This is an obstacle to their widespread use.
> Adding SPLIT PARTITION/MERGE PARTITIONS operations can make easier to 
> use partitioned tables in PostgreSQL.
> (This is especially important when migrating projects from ORACLE DBMS.)
> 
> SPLIT PARTITION/MERGE PARTITIONS commands are supported for range 
> partitioning (BY RANGE) and for list partitioning (BY LIST).
> For hash partitioning (BY HASH) these operations are not supported.

+1 on the general idea.

At least, it will makes these operations simpler, but probably also less
invasive (no need to detach the affected partitions).


I didn't read the patch, but what lock level does that place on the
partitioned table?  Anything more than ACCESS SHARE?


Yours,
Laurenz Albe




Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2022-05-31 Thread Matthias van de Meent
On Tue, 31 May 2022 at 11:33, Dmitry Koval  wrote:
>
> Hi, hackers!
>
> There are not many commands in PostgreSQL for working with partitioned
> tables. This is an obstacle to their widespread use.
> Adding SPLIT PARTITION/MERGE PARTITIONS operations can make easier to
> use partitioned tables in PostgreSQL.

That is quite a nice and useful feature to have.

> (This is especially important when migrating projects from ORACLE DBMS.)
>
> SPLIT PARTITION/MERGE PARTITIONS commands are supported for range
> partitioning (BY RANGE) and for list partitioning (BY LIST).
> For hash partitioning (BY HASH) these operations are not supported.

Just out of curiosity, why is SPLIT / MERGE support not included for
HASH partitions? Because sibling partitions can have a different
modulus, you should be able to e.g. split a partition with (modulus,
remainder) of (3, 1) into two partitions with (mod, rem) of (6, 1) and
(6, 4) respectively, with the reverse being true for merge operations,
right?


Kind regards,

Matthias van de Meent




Re: Remove useless tests about TRUNCATE on foreign table

2022-05-31 Thread Yugo NAGATA
On Tue, 31 May 2022 09:49:40 +0900
Michael Paquier  wrote:

> On Mon, May 30, 2022 at 05:08:10PM +0900, Michael Paquier wrote:
> > Partitions have also some coverage as far as I can see, so I agree
> > that it makes little sense to keep the tests you are removing here.
> 
> And done as of 0efa513.

Thank you!


-- 
Yugo NAGATA 




Re: Allowing REINDEX to have an optional name

2022-05-31 Thread Bernd Helmle
Am Dienstag, dem 10.05.2022 um 15:00 +0100 schrieb Simon Riggs:

[...]

> 
> > I think REINDEX DATABASE reindexing the current database is a good
> > usability improvement in itself. But skipping the shared catalogs
> > needs an explicity syntax. Not sure how feasible it is but
> > something
> > like REINDEX DATABASE skip SHARED/SYSTEM.
> 
> There are two commands:
> 
> REINDEX DATABASE does every except system catalogs
> REINDEX SYSTEM does system catalogs only
> 
> So taken together, the two commands seem intuitive to me.
> 
> It is designed like this because it is dangerous to REINDEX the
> system
> catalogs because of potential deadlocks, so we want a way to avoid
> that problem.
> 
> Perhaps I can improve the docs more, will look.
> 

And we already have a situation where this already happens with REINDEX
DATABASE: if you use CONCURRENTLY, it skips system catalogs already and
prints a warning. In both cases there are good technical reasons to
skip catalog indexes and to change the workflow to use separate
commands.

Bernd





Re: Perform streaming logical transactions by background workers and parallel apply

2022-05-31 Thread Amit Kapila
On Mon, May 30, 2022 at 5:08 PM Amit Kapila  wrote:
>
> On Mon, May 30, 2022 at 2:22 PM wangw.f...@fujitsu.com
>  wrote:
> >
> > Attach the new patches(only changed 0001 and 0002)
> >
>

This patch allows the same replication origin to be used by the main
apply worker and the bgworker that uses it to apply streaming
transactions. See the changes [1] in the patch. I am not completely
sure whether that is a good idea even though I could not spot or think
of problems that can't be fixed in your patch. I see that currently
both the main apply worker and bgworker will assign MyProcPid to the
assigned origin slot, this can create the problem because
ReplicationOriginExitCleanup() can clean it up even though the main
apply worker or another bgworker is still using that origin slot. Now,
one way to fix is that we assign only the main apply worker's
MyProcPid to session_replication_state->acquired_by. I have tried to
think about the concurrency issues as multiple workers could now point
to the same replication origin state. I think it is safe because the
patch maintains the commit order by allowing only one process to
commit at a time, so no two workers will be operating on the same
origin at the same time. Even, though there is no case where the patch
will try to advance the session's origin concurrently, it appears safe
to do so as we change/advance the session_origin LSNs under
replicate_state LWLock.

Another idea could be that we allow multiple replication origins (one
for each bgworker and one for the main apply worker) for the apply
workers corresponding to a subscription. Then on restart, we can find
the highest LSN among all the origins for a subscription. This should
work primarily because we will maintain the commit order. Now, for
this to work we need to somehow map all the origins for a subscription
and one possibility is that we have a subscription id in each of the
origin names. Currently we use ("pg_%u", MySubscription->oid) as
origin_name. We can probably append some unique identifier number for
each worker to allow each origin to have a subscription id. We need to
drop all origins for a particular subscription on DROP SUBSCRIPTION. I
think having multiple origins for the same subscription will have some
additional work when we try to filter changes based on origin.

The advantage of the first idea is that it won't increase the need to
have more origins per subscription but it is quite possible that I am
missing something and there are problems due to which we can't use
that approach.

Thoughts?

[1]:
-replorigin_session_setup(RepOriginId node)
+replorigin_session_setup(RepOriginId node, bool acquire)
 {
  static bool registered_cleanup;
  int i;
@@ -1110,7 +1110,7 @@ replorigin_session_setup(RepOriginId node)
  if (curstate->roident != node)
  continue;

- else if (curstate->acquired_by != 0)
+ else if (curstate->acquired_by != 0 && acquire)
  {
...
...

+ /*
+ * The apply bgworker don't need to monopolize this replication origin
+ * which was already acquired by its leader process.
+ */
+ replorigin_session_setup(originid, false);
+ replorigin_session_origin = originid;

-- 
With Regards,
Amit Kapila.




Re: Allowing REINDEX to have an optional name

2022-05-31 Thread Bernd Helmle
Am Freitag, dem 27.05.2022 um 19:08 + schrieb Cary Huang:


[...]

> The patch applies and tests fine and I think this patch has good
> intentions to prevent the default behavior of REINDEX DATABASE to
> cause a deadlock. However, I am not in favor of simply omitting the
> database name after DATABASE clause because of consistency. Almost
> all other queries involving the DATABASE clause require database name
> to be given following after. For example, ALTER DATABASE [dbname].  
> 
> Being able to omit database name for REINDEX DATABASE seems
> inconsistent to me.
> 
> The documentation states that REINDEX DATABASE only works on the
> current database, but it still requires the user to provide a
> database name and require that it must match the current database.
> Not very useful option, isn’t it? But it is still required from the
> user to stay consistent with other DATABASE clauses.
> 

Hmm, right, but you can see this from another perspective, too: For
example, ALTER DATABASE works by adjusting properties of other
databases very well, SET TABLESPACE can be used when not connected to
the target database only, so you are required to specify its name in
that case.
REINDEX DATABASE cannot reindex other databases than the one we're
connected to. Seen from that point, i currently can't see the logical
justification to have the database name there, besides of "yes, i
really meant that database i am connected to" or consistency.

> Maybe the best way is to keep the query clause as is (with the
> database name still required) and simply don’t let it reindex system
> catalog to prevent deadlock. At the end, give user a notification
> that system catalogs have not been reindexed, and tell them to use
> REINDEX SYSTEM instead.

+1





Re: Fix spelling mistake in README file

2022-05-31 Thread Amit Kapila
On Tue, May 31, 2022 at 1:58 PM Peter Smith  wrote:
>
> PSA a patch to fix a spelling mistake that I happened upon...
>

LGTM. I'll push this in some time. Thanks!

-- 
With Regards,
Amit Kapila.




Fix spelling mistake in README file

2022-05-31 Thread Peter Smith
PSA a patch to fix a spelling mistake that I happened upon...

--
Kind Regards,
Peter Smith.
Fujitsu Australia


v1-0001-Fix-typo.patch
Description: Binary data


Re: adding status for COPY progress report

2022-05-31 Thread Amit Langote
Hi,

On Thu, May 26, 2022 at 11:35 AM Zhihong Yu  wrote:
>> The changes in pgstat_progress_end_command() and
>> pg_stat_get_progress_info() update st_progress_command_target
>> depending on the command type involved, breaking the existing contract
>> of  those routines, particularly the fact that the progress fields
>> *should* be reset in an error stack.

+1 to what Michael said here.  I don't think the following changes are
acceptable:

@@ -106,7 +106,13 @@ pgstat_progress_end_command(void)
return;

PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
-   beentry->st_progress_command = PROGRESS_COMMAND_INVALID;
-   beentry->st_progress_command_target = InvalidOid;
+   if (beentry->st_progress_command == PROGRESS_COMMAND_COPY)
+   // We want to show the relation for the most recent COPY command
+   beentry->st_progress_command = PROGRESS_COMMAND_COPY_DONE;
+   else
+   {
+   beentry->st_progress_command = PROGRESS_COMMAND_INVALID;
+   beentry->st_progress_command_target = InvalidOid;
+   }
PGSTAT_END_WRITE_ACTIVITY(beentry);
 }

pgstat_progress_end_command() is generic infrastructure and there
shouldn't be anything COPY-specific there.

> I searched the code base for how st_progress_command_target is used.
> In case there is subsequent command following the COPY, 
> st_progress_command_target would be set to the Oid
> of the subsequent command.
> In case there is no subsequent command following the COPY command, it seems 
> leaving st_progress_command_target as
> the Oid of the COPY command wouldn't hurt.
>
> Maybe you can let me know what side effect not resetting 
> st_progress_command_target would have.
>
> As an alternative, upon seeing PROGRESS_COMMAND_COPY_DONE, we can transfer 
> the value of
> st_progress_command_target to a new field called, say, 
> st_progress_command_previous_target (
> and resetting st_progress_command_target as usual).

That doesn't sound like a good idea.

As others have said, there's no point in adding a status field to
pg_stat_progress_copy that only tells whether a COPY is running or
not.  You can already do that by looking at the output of `select *
from pg_stat_progress_copy`.  If the COPY you're interested in is
running, you'll find the corresponding row in the view.  The row is
made to disappear from the view the instance the COPY finishes, either
successfully or due to an error.  Whichever is the case will be known
in the connection that initiated the COPY and you may find it in the
server log.  I don't think we should make Postgres remember anything
about that in the shared memory, or at least not with one-off
adjustments of the shared progress reporting state like in the
proposed patch.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: pg_upgrade test writes to source directory

2022-05-31 Thread Michael Paquier
On Sat, May 28, 2022 at 04:14:01PM -0400, Tom Lane wrote:
> Yeah, I'd noticed the obsoleted comments too, but not bothered to complain
> since that was just WIP and not an officially proposed patch.  I'll be
> happy to review if you want to put up a full patch.

Well, here is a formal patch set, then.  Please feel free to comment.

FWIW, I am on the fence with dropping TESTDIR, as it could be used by
out-of-core test code as well.  If there are doubts about
back-patching the first part, doing that only on HEAD would be fine to
fix the problem of this thread.
--
Michael
From 3e6b833deafdfafed949abae49f171c8def689dc Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 31 May 2022 15:44:49 +0900
Subject: [PATCH v3 1/2] Add TESTOUTDIR as directory for the output of the TAP
 tests

TESTDIR is the directory of the build, while TESTOUTDIR would be
generally $TESTDIR/tmp_check/.
---
 src/bin/psql/t/010_tab_completion.pl   | 40 +-
 src/test/perl/PostgreSQL/Test/Utils.pm |  4 +--
 src/Makefile.global.in | 11 ---
 src/tools/msvc/vcregress.pl|  2 ++
 4 files changed, 31 insertions(+), 26 deletions(-)

diff --git a/src/bin/psql/t/010_tab_completion.pl b/src/bin/psql/t/010_tab_completion.pl
index 2eea515e87..1f41ce47e6 100644
--- a/src/bin/psql/t/010_tab_completion.pl
+++ b/src/bin/psql/t/010_tab_completion.pl
@@ -67,26 +67,26 @@ delete $ENV{TERM};
 delete $ENV{LS_COLORS};
 
 # In a VPATH build, we'll be started in the source directory, but we want
-# to run in the build directory so that we can use relative paths to
-# access the tmp_check subdirectory; otherwise the output from filename
-# completion tests is too variable.
-if ($ENV{TESTDIR})
+# to run in the test output directory so that we can use relative paths from
+# the tmp_check subdirectory; otherwise the output from filename completion
+# tests is too variable.
+if ($ENV{TESTOUTDIR})
 {
-	chdir $ENV{TESTDIR} or die "could not chdir to \"$ENV{TESTDIR}\": $!";
+	chdir "$ENV{TESTOUTDIR}" or die "could not chdir to \"$ENV{TESTOUTDIR}\": $!";
 }
 
 # Create some junk files for filename completion testing.
 my $FH;
-open $FH, ">", "tmp_check/somefile"
-  or die("could not create file \"tmp_check/somefile\": $!");
+open $FH, ">", "somefile"
+  or die("could not create file \"somefile\": $!");
 print $FH "some stuff\n";
 close $FH;
-open $FH, ">", "tmp_check/afile123"
-  or die("could not create file \"tmp_check/afile123\": $!");
+open $FH, ">", "afile123"
+  or die("could not create file \"afile123\": $!");
 print $FH "more stuff\n";
 close $FH;
-open $FH, ">", "tmp_check/afile456"
-  or die("could not create file \"tmp_check/afile456\": $!");
+open $FH, ">", "afile456"
+  or die("could not create file \"afile456\": $!");
 print $FH "other stuff\n";
 close $FH;
 
@@ -272,16 +272,16 @@ clear_query();
 
 # check filename completion
 check_completion(
-	"\\lo_import tmp_check/some\t",
-	qr|tmp_check/somefile |,
+	"\\lo_import some\t",
+	qr|somefile |,
 	"filename completion with one possibility");
 
 clear_query();
 
 # note: readline might print a bell before the completion
 check_completion(
-	"\\lo_import tmp_check/af\t",
-	qr|tmp_check/af\a?ile|,
+	"\\lo_import af\t",
+	qr|af\a?ile|,
 	"filename completion with multiple possibilities");
 
 # broken versions of libedit require clear_line not clear_query here
@@ -291,15 +291,15 @@ clear_line();
 # note: broken versions of libedit want to backslash the closing quote;
 # not much we can do about that
 check_completion(
-	"COPY foo FROM tmp_check/some\t",
-	qr|'tmp_check/somefile\\?' |,
+	"COPY foo FROM some\t",
+	qr|'somefile\\?' |,
 	"quoted filename completion with one possibility");
 
 clear_line();
 
 check_completion(
-	"COPY foo FROM tmp_check/af\t",
-	qr|'tmp_check/afile|,
+	"COPY foo FROM af\t",
+	qr|'afile|,
 	"quoted filename completion with multiple possibilities");
 
 # some versions of readline/libedit require two tabs here, some only need one
@@ -307,7 +307,7 @@ check_completion(
 # the quotes might appear, too
 check_completion(
 	"\t\t",
-	qr|afile123'? +'?(tmp_check/)?afile456|,
+	qr|afile123'? +'?afile456|,
 	"offer multiple file choices");
 
 clear_line();
diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm
index 1ca2cc5917..6cd3ff2caf 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -190,9 +190,9 @@ INIT
 	$SIG{PIPE} = 'IGNORE';
 
 	# Determine output directories, and create them.  The base path is the
-	# TESTDIR environment variable, which is normally set by the invoking
+	# TESTOUTDUR environment variable, which is normally set by the invoking
 	# Makefile.
-	$tmp_check = $ENV{TESTDIR} ? "$ENV{TESTDIR}/tmp_check" : "tmp_check";
+	$tmp_check = $ENV{TESTOUTDIR} ? $ENV{TESTOUTDIR} : "tmp_check";
 	$log_path = "$tmp_check/log";
 
 	mkdir $tmp_check;
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 051718e4fe..4a4e1bd4c9 100644
--- 

PostgreSQL Limits: maximum number of columns in SELECT result

2022-05-31 Thread Vladimir Sitnikov
Hi,

Today I hit "ERROR: target lists can have at most 1664 entries", and I was
surprised the limit was not documented.

I suggest that the limit of "1664 columns per tuple" (or whatever is the
right term) should be added
to the list at https://www.postgresql.org/docs/current/limits.html e.g.
after "columns per table".

Could someone please commit that 1-2 line doc improvement or do you need a
patch for it?

Vladimir


Re: Skipping schema changes in publication

2022-05-31 Thread Peter Smith
Here are my review comments for patch v7-0002.

==

1. doc/src/sgml/logical-replication.sgml

@@ -1167,8 +1167,9 @@ CONTEXT:  processing remote data for replication
origin "pg_16395" during "INSER
   
To add tables to a publication, the user must have ownership rights on the
table. To add all tables in schema to a publication, the user must be a
-   superuser. To create a publication that publishes all tables or
all tables in
-   schema automatically, the user must be a superuser.
+   superuser. To add all tables to a publication, the user must be a superuser.
+   To create a publication that publishes all tables or all tables in schema
+   automatically, the user must be a superuser.
   

I felt that maybe this whole paragraph should be rearranged. Put the
"create publication" parts before the "alter publication" parts;
Re-word the sentences more similarly. I also felt the ALL TABLES and
ALL TABLES IN SCHEMA etc should be written uppercase/literal since
that is what was meant.

SUGGESTION
To create a publication using FOR ALL TABLES or FOR ALL TABLES IN
SCHEMA, the user must be a superuser. To add ALL TABLES or ALL TABLES
IN SCHEMA to a publication, the user must be a superuser. To add
tables to a publication, the user must have ownership rights on the
table.

~~~

2. doc/src/sgml/ref/alter_publication.sgml

@@ -82,8 +88,8 @@ ALTER PUBLICATION name RESET

   
You must own the publication to use ALTER PUBLICATION.
-   Adding a table to a publication additionally requires owning that table.
-   The ADD ALL TABLES IN SCHEMA,
+   Adding a table to or excluding a table from a publication additionally
+   requires owning that table. The ADD ALL TABLES IN SCHEMA,
SET ALL TABLES IN SCHEMA to a publication and

Isn't this missing some information that says ADD ALL TABLES requires
the invoking user to be a superuser?

~~~

3. doc/src/sgml/ref/alter_publication.sgml - examples

+  
+   Alter publication production_publication to publish
+   all tables except users and
+   departments tables:
+
+ALTER PUBLICATION production_publication ADD ALL TABLES EXCEPT users,
departments;
+
+

I didn't think it needs to say "tables" 2x (e.g. remove the last "tables")

~~~

4. doc/src/sgml/ref/create_publication.sgml - examples

+  
+   Create a publication that publishes all changes in all the tables except for
+   the changes of users and
+   departments tables:
+
+CREATE PUBLICATION mypublication FOR ALL TABLES EXCEPT users, departments;
+
+  

I didn't think it needs to say "tables" 2x (e.g. remove the last "tables")

~~~

5. src/backend/catalog/pg_publication.c

  foreach(lc, ancestors)
  {
  Oid ancestor = lfirst_oid(lc);
- List*apubids = GetRelationPublications(ancestor);
- List*aschemaPubids = NIL;
+ List*apubids = GetRelationPublications(ancestor, false);
+ List*aschemapubids = NIL;
+ List*aexceptpubids = NIL;

  level++;

- if (list_member_oid(apubids, puboid))
+ /* check if member of table publications */
+ if (!list_member_oid(apubids, puboid))
  {
- topmost_relid = ancestor;
-
- if (ancestor_level)
- *ancestor_level = level;
- }
- else
- {
- aschemaPubids = GetSchemaPublications(get_rel_namespace(ancestor));
- if (list_member_oid(aschemaPubids, puboid))
+ /* check if member of schema publications */
+ aschemapubids = GetSchemaPublications(get_rel_namespace(ancestor));
+ if (!list_member_oid(aschemapubids, puboid))
  {
- topmost_relid = ancestor;
-
- if (ancestor_level)
- *ancestor_level = level;
+ /*
+ * If the publication is all tables publication and the table
+ * is not part of exception tables.
+ */
+ if (puballtables)
+ {
+ aexceptpubids = GetRelationPublications(ancestor, true);
+ if (list_member_oid(aexceptpubids, puboid))
+ goto next;
+ }
+ else
+ goto next;
  }
  }

+ topmost_relid = ancestor;
+
+ if (ancestor_level)
+ *ancestor_level = level;
+
+next:
  list_free(apubids);
- list_free(aschemaPubids);
+ list_free(aschemapubids);
+ list_free(aexceptpubids);
  }


I felt those negative (!) conditions and those goto are making this
logic hard to understand. Can’t it be simplified more than this? Even
just having another bool flag might help make it easier.

e.g. Perhaps something a bit like this (but add some comments)

foreach(lc, ancestors)
{
Oid ancestor = lfirst_oid(lc);
List*apubids = GetRelationPublications(ancestor);
List*aschemaPubids = NIL;
List*aexceptpubids = NIL;
bool set_top = false;
level++;

set_top = list_member_oid(apubids, puboid);
if (!set_top)
{
aschemaPubids = GetSchemaPublications(get_rel_namespace(ancestor));
set_top = list_member_oid(aschemaPubids, puboid);

if (!set_top && puballtables)
{
aexceptpubids = GetRelationPublications(ancestor, true);
set_top = !list_member_oid(aexceptpubids, puboid);
}
}
if (set_top)
{
topmost_relid = ancestor;

if (ancestor_level)
*ancestor_level = level;
}

list_free(apubids);
list_free(aschemapubids);
list_free(aexceptpubids);
}

--

6. src/backend/commands/publicationcmds.c -