Re: add non-option reordering to in-tree getopt_long

2023-07-12 Thread Michael Paquier
On Wed, Jul 12, 2023 at 08:49:03PM -0700, Nathan Bossart wrote:
> After a couple more small edits, I've committed this.  I looked through all
> uses of getopt_long() in PostgreSQL earlier today, and of the programs that
> accepted non-options, most accepted only one, some others accepted 2-3, and
> ecpg and pg_regress accepted any number.  Given this analysis, I'm not too
> worried about the O(n^2) behavior that the patch introduces.  You'd likely
> need to provide an enormous number of non-options for it to be noticeable,
> and I'm dubious such use-cases exist.
> 
> During my analysis, I noticed that pg_ctl contains a workaround for the
> lack of argument reordering.  I think this can now be removed.  Patch
> attached.

Interesting piece of history that you have found here, coming from
f3d6d94 back in 2004.  The old pg_ctl.sh before that did not need any
of that.  It looks sensible to do something about that.

Something does not seem to be right seen from here, a CI run with
Windows 2019 fails when using pg_ctl at the beginning of most of the
tests, like:
[04:56:07.404] - 8< 
-
[04:56:07.404] stderr:
[04:56:07.404] pg_ctl: too many command-line arguments (first is "-D")
--
Michael


signature.asc
Description: PGP signature


Re: 'ERROR: attempted to update invisible tuple' from 'ALTER INDEX ... ATTACH PARTITION' on parent index

2023-07-12 Thread Michael Paquier
On Thu, Jul 13, 2023 at 09:35:17AM +0530, Dilip Kumar wrote:
> Yeah, It seems that using pg_index tuples from relcache is not safe,
> at least for updating the catalog tuples.  However, are there known
> rules or do we need to add some comments saying that the pg_index
> tuple from the relcache cannot be used to update the catalog tuple?

I don't recall an implied rule written in the tree about that, on top
of my mind.  Perhaps something about that could be done around the
declaration of RelationData in rel.h, for instance.

> Or do we actually need to update all the tuple header information as
> well in RelationReloadIndexInfo() in order to fix that invariant so
> that it can be used for catalog tuple updates as well?

RelationReloadIndexInfo() is designed to be minimal, so I am not
really excited about extending it more than necessary without a case
in favor of it.  indisreplident is clearly on the list of things to
update in this concept.  The others would need a more careful
evaluation, though we don't really have a case for doing more, IMO,
particularly in the score of stable branches.
--
Michael


signature.asc
Description: PGP signature


Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

2023-07-12 Thread Masahiko Sawada
On Wed, Jul 12, 2023 at 11:15 PM Masahiko Sawada  wrote:
>
> On Wed, Jul 12, 2023 at 7:08 PM Amit Kapila  wrote:
> >
> > On Wed, Jul 12, 2023 at 12:31 PM Masahiko Sawada  
> > wrote:
> > >
> > > On Tue, Jul 11, 2023 at 5:31 PM Peter Smith  wrote:
> > > >
> > >
> > > I don't think we have concluded any action for it. I agree that
> > > IsIndexOnlyOnExpression() is redundant. We don't need to check *all*
> > > index fields actually. I've attached a draft patch. It removes
> > > IsIndexOnlyOnExpression() and merges
> > > RemoteRelContainsLeftMostColumnOnIdx() to
> > > FindUsableIndexForReplicaIdentityFull. One concern is that we no
> > > longer do the assertion check with
> > > IsIndexUsableForReplicaIdentityFull(). What do you think?
> > >
> >
> > I think this is a valid concern. Can't we move all the checks
> > (including the remote attrs check) inside
> > IsIndexUsableForReplicaIdentityFull() and then call it from both
> > places? Won't we have attrmap information available in the callers of
> > FindReplTupleInLocalRel() via ApplyExecutionData?
>
> You mean to pass ApplyExecutionData or attrmap down to
> RelationFindReplTupleByIndex()? I think it would be better to call it
> from FindReplTupleInLocalRel() instead.

I've attached a draft patch for this idea.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


remove_redundant_check_v2.patch
Description: Binary data


Re: Changing types of block and chunk sizes in memory contexts

2023-07-12 Thread David Rowley
On Tue, 11 Jul 2023 at 02:41, Melih Mutlu  wrote:
> > - Afaics AllocSet->keeper is unnecessary these days, as it is always 
> > allocated
> >   together with the context itself. Saves 8 bytes.
>
> This seemed like a safe change and removed the keeper field in
> AllocSet and Generation contexts. It saves an additional 8 bytes.

Seems like a good idea for an additional 8-bytes.

I looked at your v2 patch. The only thing that really looked wrong
were the (Size) casts in the context creation functions.  These should
have been casts to uint32 rather than Size. Basically, all the casts
do is say to the compiler "Yes, I know this could cause truncation due
to assigning to a size smaller than the source type's size". Some
compilers will likely warn without that and the cast will stop them.
We know there can't be any truncation due to the Asserts. There's also
the fundamental limitation that MemoryChunk can't store block offsets
larger than 1GBs anyway, so things will go bad if we tried to have
blocks bigger than 1GB.

Aside from that, I thought that a couple of other slab.c fields could
be shrunken to uint32 as the v2 patch just reduces the size of 1 field
which just creates a 4-byte hole in SlabContext.  The fullChunkSize
field is just the MAXALIGN(chunkSize) + sizeof(MemoryChunk).  We
should never be using slab contexts for any chunks anywhere near that
size. aset.c would be a better context for that, so it seems fine to
me to further restrict the maximum supported chunk size by another 8
bytes.

I've attached your patch again along with a small delta of what I adjusted.

My thoughts on these changes are that it's senseless to have Size
typed fields for storing a value that's never larger than 2^30.
Getting rid of the keeper pointer seems like a cleanup as it's pretty
much a redundant field.   For small sized contexts like the ones used
for storing index relcache entries, I think it makes sense to save 20
more bytes.  Each backend can have many thousand of those and there
could be many hundred backends. If we can fit more allocations on that
initial 1 kilobyte keeper block without having to allocate any
additional blocks, then that's great.

I feel that Andres's results showing several hundred fewer block
allocations shows this working.  Albeit, his patch reduced the size of
the structs even further than what v3 does. I think v3 is enough for
now as the additional changes Andres mentioned require some more
invasive code changes to make work.

If nobody objects or has other ideas about this, modulo commit
message, I plan to push the attached on Monday.

David
From e5e7da6a9880b0cbc7fa96d26cdd73d8bb02c88f Mon Sep 17 00:00:00 2001
From: Melih Mutlu 
Date: Mon, 12 Jun 2023 08:50:28 +0300
Subject: [PATCH v3 1/2] Change memory context fields to uint32

Block sizes and chunk sizes can be upper bounded by
MEMORYCHUNK_MAX_BLOCKOFFSET and MEMORYCHUNK_MAX_VALUE respectively.
Values of both bounds correspond to 1 less than 1GB. This allows us
to store the sizes of any block or chunk size limited by those bounds
in 32 bits.

This patch changes types of such fields that represents block or chunk sizes
from 64-bit Size to 32-bit unsigned integers.

Also; keeper fields from AllocSet and Generation structs are removed and
added AllocSetKeeper/GenerationKeeper macros where needed.
---
 src/backend/utils/mmgr/aset.c   | 52 +
 src/backend/utils/mmgr/generation.c | 51 
 src/backend/utils/mmgr/slab.c   |  6 ++--
 3 files changed, 62 insertions(+), 47 deletions(-)

diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index 0bbbf93672..797abef1b5 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -156,11 +156,10 @@ typedef struct AllocSetContext
AllocBlock  blocks; /* head of list of blocks in 
this set */
MemoryChunk *freelist[ALLOCSET_NUM_FREELISTS];  /* free chunk lists */
/* Allocation parameters for this context: */
-   SizeinitBlockSize;  /* initial block size */
-   SizemaxBlockSize;   /* maximum block size */
-   SizenextBlockSize;  /* next block size to allocate */
-   SizeallocChunkLimit;/* effective chunk size limit */
-   AllocBlock  keeper; /* keep this block over resets 
*/
+   uint32  initBlockSize;  /* initial block size */
+   uint32  maxBlockSize;   /* maximum block size */
+   uint32  nextBlockSize;  /* next block size to allocate */
+   uint32  allocChunkLimit;/* effective chunk size limit */
/* freelist this context could be put in, or -1 if not a candidate: */
int freeListIndex;  /* index in 
context_freelists[], or -1 */
 } AllocSetContext;
@@ -241,6 +240,14 @@ typedef struct AllocBlockData
  */
 #define MAX_FREE_CONTEXTS 100  /* arbitrary limit on freelist length */
 

Configurable FP_LOCK_SLOTS_PER_BACKEND

2023-07-12 Thread Nikolay Samokhvalov
We're observing a few cases with lockmanager spikes in a few quite loaded
systems.

These cases are different; queries are different, Postgres versions are 12,
13, and 14.

But in all cases, servers are quite beefy (96-128 vCPUs, ~600-800 GiB)
receiving a lot of TPS (a few dozens of thousands). Most queries that
struggle from wait_event=lockmanager involve a substantial number of
tables/indexes, often with partitioning.

FP_LOCK_SLOTS_PER_BACKEND is now hard-coded 16 in storage/proc.h – and it
is now very easy to hit this threshold in a loaded system, especially, for
example, if a table with a dozen of indexes was partitioned. It seems any
system with good growth hits it sooner or later.

I wonder, would it make sense to:
1) either consider increasing this hard-coded value, taking into account
that 16 seems to be very low for modern workloads, schemas, and hardware –
say, it could be 64,
2) or even make it configurable – a new GUC.

Thanks,
Nikolay Samokhvalov
Founder, Postgres.ai


Re: Consistent coding for the naming of LR workers

2023-07-12 Thread Peter Smith
On Wed, Jul 12, 2023 at 9:35 PM Peter Eisentraut  wrote:
>
> On 21.06.23 09:18, Alvaro Herrera wrote:
> > That is a terrible pattern in relatively new code.  Let's get rid of it
> > entirely rather than continue to propagate it.
> >
> >> So, I don't think it is fair to say that these format strings are OK
> >> for the existing HEAD code, but not OK for the patch code, when they
> >> are both the same.
> >
> > Agreed.  Let's remove them all.
>
> This is an open issue for PG16 translation.  I propose the attached
> patch to fix this.  Mostly, this just reverts to the previous wordings.
> (I don't think for these messages the difference between "apply worker"
> and "parallel apply worker" is all that interesting to explode the
> number of messages.  AFAICT, the table sync worker case wasn't even
> used, since callers always handled it separately.)

I thought the get_worker_name function was reachable by tablesync workers also.

Since ApplyWorkerMain is a common entry point for both leader apply
workers and tablesync workers, any logs in that code path could
potentially be from either kind of worker. At least, when the function
was first introduced (around patch v43-0001? [1]) it was also
replacing some tablesync logs.

--
[1] v43-0001 - 
https://www.postgresql.org/message-id/OS0PR01MB5716E366874B253B58FC0A23943C9%40OS0PR01MB5716.jpnprd01.prod.outlook.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: [PoC] Federated Authn/z with OAUTHBEARER

2023-07-12 Thread Andrey Chudnovsky
> >   - Parameters are strings, so callback is not provided yet.
> > 2. Client gets PgConn from PQconnectStart return value and updates
> > conn->async_auth to its own callback.
>
> This is where some sort of official authn callback registration (see
> above reply to Daniele) would probably come in handy.
+1

> > 3. Client polls PQconnectPoll and checks conn->sasl_state until the
> > value is SASL_ASYNC
>
> In my head, the client's custom callback would always be invoked
> during the call to PQconnectPoll, rather than making the client do
> work in between calls. That way, a client can use custom flows even
> with a synchronous PQconnectdb().
The way I see this API working is the asynchronous client needs at least 2
PQConnectPoll calls:
1. To be notified of what the authentication requirements are and get
parameters.
2. When it acquires the token, the callback is used to inform libpq of the
token and return PGRES_POLLING_OK.

For the synchronous client, the callback implementation would need to be
aware of the fact that synchronous implementation invokes callback
frequently and be implemented accordingly.

Bottom lime, I don't see much problem with the current proposal. Just the
way of callback to know that OAUTH token is requested and get parameters
relies on PQconnectPoll being invoked after corresponding parameters of
conn object are populated.

> > > 5. Expectations on async_auth:
> > > a. It returns PGRES_POLLING_READING while token acquisition is
going on
> > > b. It returns PGRES_POLLING_OK and sets conn->sasl_state->token
> > > when token acquisition succeeds.
> >
> > Yes. Though the token should probably be returned through some
> > explicit part of the callback, now that you mention it...
>
> > 6. Is the client supposed to do anything with the altsock parameter?
>
> The callback needs to set the altsock up with a select()able
> descriptor, which wakes up the client when more work is ready to be
> done. Without that, you can't handle multiple connections on a single
> thread.

Ok, thanks for clarification.

> > On a separate note:
> > The backend code currently spawns an external command for token
validation.
> > As we discussed before, an extension hook would be a more efficient
> > extensibility option.
> > We see clients make 10k+ connections using OAuth tokens per minute to
> > our service, and stating external processes would be too much overhead
> > here.
>
> +1. I'm curious, though -- what language do you expect to use to write
> a production validator hook? Surely not low-level C...?

For the server side code, it would likely be identity providers publishing
extensions to validate their tokens.
Those can do that in C too. Or extensions now can be implemented in Rust
using pgrx. Which is developer friendly enough in my opinion.

> Yeah, I'm really interested in seeing which existing high-level flows
> can be mixed in through a driver. Trying not to get too far ahead of
> myself :D

I can think of the following as the most common:
1. Authorization code with PKCE. This is by far the most common for the
user login flows. Requires to spin up a browser and listen to redirect
URL/Port. Most high level platforms have libraries to do both.
2. Client Certificates. This requires an identity provider specific library
to construct and sign the token. The providers publish SDKs to do that for
most common app development platforms.


Re: Preventing non-superusers from altering session authorization

2023-07-12 Thread Nathan Bossart
On Mon, Jul 10, 2023 at 01:49:55PM -0700, Nathan Bossart wrote:
> Great.  I'm going to wait a few more days in case anyone has additional
> feedback, but otherwise I intend to commit this shortly.

I've committed 0001 for now.  I'm hoping to commit the other two patches
within the next couple of days.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: numeric datatype for current release not available

2023-07-12 Thread Ashutosh Bapat
On Wed, Jul 12, 2023 at 8:55 PM Julien Rouhaud  wrote:
>
> it's working here. probably a transient error, it happens from time to time.

Thanks Julien for looking into it. It's working now.


-- 
Best Wishes,
Ashutosh Bapat




RE: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-07-12 Thread Hayato Kuroda (Fujitsu)
Dear Melih,

> > > Thanks for the 0003 patch. But it did not work for me. Can you create
> > > a subscription successfully with patch 0003 applied?
> > > I get the following error: " ERROR:  table copy could not start
> > > transaction on publisher: another command is already in progress".
> >
> > You got the ERROR when all the patches (0001-0005) were applied, right?
> > I have focused on 0001 and 0002 only, so I missed something.
> > If it was not correct, please attach the logfile and test script what you 
> > did.
> 
> Yes, I did get an error with all patches applied. But with only 0001
> and 0002, your version seems like working and mine does not.

Hmm, really? IIUC I did not modify 0001 and 0002 patches, I just re-assigned the
version number. I compared between yours and mine, but no meaningful differences
were found. E.g., following command compared v4-0002 and v16-0002:

```
diff --git a/../reuse_workers/v4-0002-Reuse-Tablesync-Workers.patch 
b/../reuse_workers/hayato/v16-0002-Reuse-Tablesync-Workers.patch
index 5350216e98..7785a573e4 100644
--- a/../reuse_workers/v4-0002-Reuse-Tablesync-Workers.patch
+++ b/../reuse_workers/hayato/v16-0002-Reuse-Tablesync-Workers.patch
@@ -1,7 +1,7 @@
-From d482022b40e0a5ce1b74fd0e320cb5b45da2f671 Mon Sep 17 00:00:00 2001
+From db3e8e2d7aadea79126c5816bce8b06dc82f33c2 Mon Sep 17 00:00:00 2001
 From: Melih Mutlu 
 Date: Tue, 4 Jul 2023 22:04:46 +0300
-Subject: [PATCH 2/5] Reuse Tablesync Workers
+Subject: [PATCH v16 2/5] Reuse Tablesync Workers
 
 This commit allows reusing tablesync workers for syncing more than one
 table sequentially during their lifetime, instead of exiting after
@@ -324,5 +324,5 @@ index 7aba034774..1e9f8e6e72 100644
  static inline bool
  am_tablesync_worker(void)
 -- 
-2.25.1
+2.27.0
```

For confirmation, please attach the logfile and test script what you did
if you could reproduce?

> What do you think about combining 0002 and 0003? Or should those stay
> separate?

I have no strong opinion, but it may be useful to keep them pluggable.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: 'ERROR: attempted to update invisible tuple' from 'ALTER INDEX ... ATTACH PARTITION' on parent index

2023-07-12 Thread Dilip Kumar
On Thu, Jul 13, 2023 at 3:56 AM Michael Paquier  wrote:
>
> On Wed, Jul 12, 2023 at 04:02:23PM -0400, Robert Haas wrote:
> > Oh, interesting. The fact that indisreplident isn't copied seems like
> > a pretty clear mistake, but I'm guessing that the fact that t_self
> > wasn't refreshed was deliberate and that the author of this code
> > didn't really intend for callers to look at the t_self value. We could
> > change our mind about whether that ought to be allowed, though. But,
> > like, none of the other tuple header fields are copied either... xmax,
> > xvac, infomask, etc.
>
> See 3c84046 and 8ec9438, mainly, from Tom.  I didn't know that this is
> used as a shortcut to reload index information in the cache because it
> is much cheaper than a full index information rebuild.  I agree that
> not copying indisreplident in this code path is a mistake as this bug
> shows, because any follow-up command run in a transaction that changed
> this field would get an incorrect information reference.
>
> Now, I have to admit that I am not completely sure what the
> consequences of this addition are when it comes to concurrent index
> operations (CREATE/DROP INDEX, REINDEX):
> /* Copy xmin too, as that is needed to make sense of indcheckxmin */
> HeapTupleHeaderSetXmin(relation->rd_indextuple->t_data,
>HeapTupleHeaderGetXmin(tuple->t_data));
> +   ItemPointerCopy(>t_self, >rd_indextuple->t_self);
>
> Anyway, as I have pointed upthread, I think that the craziness is also
> in validatePartitionedIndex() where this stuff thinks that it is OK to
> use a copy the pg_index tuple coming from the relcache.  As this
> report proves, it is *not* safe, because we may miss a lot of
> information not updated by RelationReloadIndexInfo() that other
> commands in the same transaction block may have updated, and the
> code would insert into the catalog an inconsistent tuple for a
> partitioned index switched to become valid.
>
> I agree that updating indisreplident in this cheap index reload path
> is necessary, as well.  Does my suggestion of using the syscache not
> make sense for somebody here?  Note that this is what all the other
> code paths do for catalog updates of pg_index when retrieving a copy
> of its tuples.

Yeah, It seems that using pg_index tuples from relcache is not safe,
at least for updating the catalog tuples.  However, are there known
rules or do we need to add some comments saying that the pg_index
tuple from the relcache cannot be used to update the catalog tuple?
Or do we actually need to update all the tuple header information as
well in RelationReloadIndexInfo() in order to fix that invariant so
that it can be used for catalog tuple updates as well?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Use COPY for populating all pgbench tables

2023-07-12 Thread Michael Paquier
On Wed, Jul 12, 2023 at 09:29:35AM -0500, Tristan Partin wrote:
> On Wed Jul 12, 2023 at 1:06 AM CDT, Michael Paquier wrote:
>> This would use the freeze option only on pgbench_accounts when no
>> partitioning is defined, but my point was a bit different.  We could
>> use the FREEZE option on the teller and branch tables as well, no?
>> Okay, the impact is limited compared to accounts in terms of amount of
>> data loaded, but perhaps some people like playing with large scaling
>> factors where this could show a benefit in the initial data loading.
> 
> Perhaps, should they all be keyed off the same option? Seemed like in
> your previous comment you wanted multiple options. Sorry for not reading
> your comment correctly.

I would have though that --partition should only apply to the
pgbench_accounts table, while FREEZE should apply where it is possible
to use it, aka all the COPY queries except when pgbench_accounts is a
partition.  Would you do something different, like not applying FREEZE
to pgbench_tellers and pgbench_branches as these have much less tuples
than pgbench_accounts?
--
Michael


signature.asc
Description: PGP signature


Re: add non-option reordering to in-tree getopt_long

2023-07-12 Thread Nathan Bossart
On Tue, Jul 11, 2023 at 09:32:32AM -0700, Nathan Bossart wrote:
> Sure.  І did it this way in v7.

After a couple more small edits, I've committed this.  I looked through all
uses of getopt_long() in PostgreSQL earlier today, and of the programs that
accepted non-options, most accepted only one, some others accepted 2-3, and
ecpg and pg_regress accepted any number.  Given this analysis, I'm not too
worried about the O(n^2) behavior that the patch introduces.  You'd likely
need to provide an enormous number of non-options for it to be noticeable,
and I'm dubious such use-cases exist.

During my analysis, I noticed that pg_ctl contains a workaround for the
lack of argument reordering.  I think this can now be removed.  Patch
attached.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 10931d7b83f3ef02f510385e1e595bc260b69a5c Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 12 Jul 2023 19:57:44 -0700
Subject: [PATCH v8 2/2] Rely on getopt_long() argument reordering in pg_ctl.

---
 src/bin/pg_ctl/pg_ctl.c | 268 +++-
 1 file changed, 128 insertions(+), 140 deletions(-)

diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 465948e707..807d7023a9 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -2260,163 +2260,151 @@ main(int argc, char **argv)
 	if (env_wait != NULL)
 		wait_seconds = atoi(env_wait);
 
-	/*
-	 * 'Action' can be before or after args so loop over both. Some
-	 * getopt_long() implementations will reorder argv[] to place all flags
-	 * first (GNU?), but we don't rely on it. Our /port version doesn't do
-	 * that.
-	 */
-	optind = 1;
-
 	/* process command-line options */
-	while (optind < argc)
+	while ((c = getopt_long(argc, argv, "cD:e:l:m:N:o:p:P:sS:t:U:wW",
+			long_options, _index)) != -1)
 	{
-		while ((c = getopt_long(argc, argv, "cD:e:l:m:N:o:p:P:sS:t:U:wW",
-long_options, _index)) != -1)
+		switch (c)
 		{
-			switch (c)
-			{
-case 'D':
-	{
-		char	   *pgdata_D;
-
-		pgdata_D = pg_strdup(optarg);
-		canonicalize_path(pgdata_D);
-		setenv("PGDATA", pgdata_D, 1);
-
-		/*
-		 * We could pass PGDATA just in an environment
-		 * variable but we do -D too for clearer postmaster
-		 * 'ps' display
-		 */
-		pgdata_opt = psprintf("-D \"%s\" ", pgdata_D);
-		free(pgdata_D);
-		break;
-	}
-case 'e':
-	event_source = pg_strdup(optarg);
-	break;
-case 'l':
-	log_file = pg_strdup(optarg);
-	break;
-case 'm':
-	set_mode(optarg);
-	break;
-case 'N':
-	register_servicename = pg_strdup(optarg);
-	break;
-case 'o':
-	/* append option? */
-	if (!post_opts)
-		post_opts = pg_strdup(optarg);
-	else
-	{
-		char	   *old_post_opts = post_opts;
-
-		post_opts = psprintf("%s %s", old_post_opts, optarg);
-		free(old_post_opts);
-	}
-	break;
-case 'p':
-	exec_path = pg_strdup(optarg);
-	break;
-case 'P':
-	register_password = pg_strdup(optarg);
-	break;
-case 's':
-	silent_mode = true;
+			case 'D':
+{
+	char	   *pgdata_D;
+
+	pgdata_D = pg_strdup(optarg);
+	canonicalize_path(pgdata_D);
+	setenv("PGDATA", pgdata_D, 1);
+
+	/*
+	 * We could pass PGDATA just in an environment variable
+	 * but we do -D too for clearer postmaster 'ps' display
+	 */
+	pgdata_opt = psprintf("-D \"%s\" ", pgdata_D);
+	free(pgdata_D);
 	break;
-case 'S':
+}
+			case 'e':
+event_source = pg_strdup(optarg);
+break;
+			case 'l':
+log_file = pg_strdup(optarg);
+break;
+			case 'm':
+set_mode(optarg);
+break;
+			case 'N':
+register_servicename = pg_strdup(optarg);
+break;
+			case 'o':
+/* append option? */
+if (!post_opts)
+	post_opts = pg_strdup(optarg);
+else
+{
+	char	   *old_post_opts = post_opts;
+
+	post_opts = psprintf("%s %s", old_post_opts, optarg);
+	free(old_post_opts);
+}
+break;
+			case 'p':
+exec_path = pg_strdup(optarg);
+break;
+			case 'P':
+register_password = pg_strdup(optarg);
+break;
+			case 's':
+silent_mode = true;
+break;
+			case 'S':
 #ifdef WIN32
-	set_starttype(optarg);
+set_starttype(optarg);
 #else
-	write_stderr(_("%s: -S option not supported on this platform\n"),
- progname);
-	exit(1);
+write_stderr(_("%s: -S option not supported on this platform\n"),
+			 progname);
+exit(1);
 #endif
-	break;
-case 't':
-	wait_seconds = atoi(optarg);
-	wait_seconds_arg = true;
-	break;
-case 'U':
-	if (strchr(optarg, '\\'))
-		register_username = pg_strdup(optarg);
-	else
-		/* Prepend .\ for local accounts */
-		register_username = psprintf(".\\%s", optarg);
-	break;
-case 'w':
-	do_wait = true;
-	break;
-case 'W':
-	do_wait = false;
-	break;
-case 'c':
-	

Re: Duplicated LLVMJitHandle->lljit assignment?

2023-07-12 Thread Michael Paquier
On Wed, Jul 12, 2023 at 06:41:37PM -0700, Gurjeet Singh wrote:
> LGTM.

Indeed.  It looks like a small thinko from 6c57f2e when support for
LLVM12 was added.
--
Michael


signature.asc
Description: PGP signature


Re: [Patch] Use *other* indexes on the subscriber when REPLICA IDENTITY is FULL

2023-07-12 Thread Amit Kapila
On Wed, Jul 12, 2023 at 8:14 PM Önder Kalacı  wrote:
>
>>
>> > - return is_btree && !is_partial && !is_only_on_expression;
>> > + /* Check whether the index is supported or not */
>> > + is_suitable_type = ((get_equal_strategy_number_for_am(indexInfo->ii_Am)
>> > + != InvalidStrategy));
>> > +
>> > + is_partial = (indexInfo->ii_Predicate != NIL);
>> > + is_only_on_expression = IsIndexOnlyOnExpression(indexInfo);
>> > +
>> > + return is_suitable_type && !is_partial && !is_only_on_expression;
>> >
>
>
> I don't want to repeat this too much, as it is a minor note. Just
> sharing my perspective here.
>
> As discussed in the other email [1], I feel like keeping
> IsIndexUsableForReplicaIdentityFull()  function readable is useful
> for documentation purposes as well.
>
> So, I'm more inclined to see something like your old code, maybe with
> a different variable name.
>
>> bool is_btree = (indexInfo->ii_Am == BTREE_AM_OID);
>
>
> to
>>
>> bool has_equal_strategy = get_equal_strategy_number_for_am...
>> 
>>  return  has_equal_strategy   && !is_partial && !is_only_on_expression;
>

+1 for the readability. I think we can as you suggest or I feel it
would be better if we return false as soon as we found any condition
is false. The current patch has a mixed style such that for
InvalidStrategy, it returns immediately but for others, it does a
combined check. The other point we should consider in this regard is
the below assert check:

+#ifdef USE_ASSERT_CHECKING
+ {
+ /* Check that the given index access method has amgettuple routine */
+ IndexAmRoutine *amroutine = GetIndexAmRoutineByAmId(indexInfo->ii_Am,
+ false);
+ Assert(amroutine->amgettuple != NULL);
+ }
+#endif

Apart from having an assert, we have the following two options (a)
check this condition as well and return false if am doesn't support
amgettuple (b) report elog(ERROR, ..) in this case.

I am of the opinion that we should either have an assert for this or
do (b) because if do (a) currently there is no case where it can
return false. What do you think?

-- 
With Regards,
Amit Kapila.




Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

2023-07-12 Thread Peter Smith
On Thu, Jul 13, 2023 at 12:22 PM Masahiko Sawada  wrote:
>
> On Thu, Jul 13, 2023 at 11:12 AM Peter Smith  wrote:
> >
> > On Thu, Jul 13, 2023 at 11:28 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Thu, Jul 13, 2023 at 8:03 AM Peter Smith  wrote:
> > > >
> > > > On Wed, Jul 12, 2023 at 5:01 PM Masahiko Sawada  
> > > > wrote:
> > > > >
> > > > > On Tue, Jul 11, 2023 at 5:31 PM Peter Smith  
> > > > > wrote:
> > > > > >
> > ...
> > > >
> > > > I checked v5-0001 and noticed the following:
> > > >
> > > > ==
> > > > doc/src/sgml/logical-replication.sgml
> > > >
> > > > BEFORE
> > > > ... and the leftmost index field must be a  column (not an expression)
> > > > that reference a published table column.
> > > >
> > > > SUGGESTION ("references the", instead of "reference a")
> > > > ... and the leftmost index field must be a  column (not an expression)
> > > > that references the published table column.
> > >
> > > Thanks, will fix.
> > >
> > > >
> > > > (maybe that last word "column" is also unnecessary?)
> > >
> > > But an index column doesn't reference the published table, but the
> > > published table's column, no?
> > >
> >
> > Yeah, but there is some inconsistency with the other code comment that
> > just says "... that references the remote relation.", so I thought one
> > of them needs to change. If not this one, then the other one.
>
> Right. So let's add "column" in both places. Attached the updated patch
>

v6-0001 LGTM.

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Support to define custom wait events for extensions

2023-07-12 Thread Michael Paquier
On Thu, Jul 13, 2023 at 10:26:35AM +0900, Masahiro Ikeda wrote:
> Thanks for your quick response. I'll rebase for the commit.

Okay, thanks.  I'll wait for the rebased version before moving on with
the next review, then.
--
Michael


signature.asc
Description: PGP signature


Re: CREATE INDEX CONCURRENTLY on partitioned index

2023-07-12 Thread Justin Pryzby
On Mon, Mar 27, 2023 at 01:28:24PM +0300, Alexander Pyhalov wrote:
> Justin Pryzby писал 2023-03-26 17:51:
> > On Sun, Dec 04, 2022 at 01:09:35PM -0600, Justin Pryzby wrote:
> > > This currently handles partitions with a loop around the whole CIC
> > > implementation, which means that things like WaitForLockers() happen
> > > once for each index, the same as REINDEX CONCURRENTLY on a partitioned
> > > table.  Contrast that with ReindexRelationConcurrently(), which handles
> > > all the indexes on a table in one pass by looping around indexes within
> > > each phase.
> > 
> > Rebased over the progress reporting fix (27f5c712b).
> > 
> > I added a list of (intermediate) partitioned tables, rather than looping
> > over the list of inheritors again, to save calling rel_get_relkind().
> > 
> > I think this patch is done.
> 
> Overall looks good to me. However, I think that using 'partitioned' as list
> of partitioned index oids in DefineIndex() is a bit misleading - we've just
> used it as boolean, specifying if we are dealing with a partitioned
> relation.

Right.  This is also rebased on 8c852ba9a4 (Allow some exclusion
constraints on partitions).

-- 
Justin
>From 3f60cbdd12b67115f86854ff60a4009028b8b99f Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 6 Jun 2020 17:42:23 -0500
Subject: [PATCH] Allow CREATE INDEX CONCURRENTLY on partitioned table

https://www.postgresql.org/message-id/flat/20201031063117.gf3...@telsasoft.com
---
 doc/src/sgml/ddl.sgml  |   4 +-
 doc/src/sgml/ref/create_index.sgml |  14 +-
 src/backend/commands/indexcmds.c   | 201 ++---
 src/test/regress/expected/indexing.out | 127 +++-
 src/test/regress/sql/indexing.sql  |  26 +++-
 5 files changed, 297 insertions(+), 75 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 58aaa691c6a..afa982154a8 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -4161,9 +4161,7 @@ ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02
  so that they are applied automatically to the entire hierarchy.
  This is very
  convenient, as not only will the existing partitions become indexed, but
- also any partitions that are created in the future will.  One limitation is
- that it's not possible to use the CONCURRENTLY
- qualifier when creating such a partitioned index.  To avoid long lock
+ also any partitions that are created in the future will.  To avoid long lock
  times, it is possible to use CREATE INDEX ON ONLY
  the partitioned table; such an index is marked invalid, and the partitions
  do not get the index applied automatically.  The indexes on partitions can
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index 40986aa502f..b05102efdaf 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -645,7 +645,10 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] 
 If a problem arises while scanning the table, such as a deadlock or a
 uniqueness violation in a unique index, the CREATE INDEX
-command will fail but leave behind an invalid index. This index
+command will fail but leave behind an invalid index.
+If this happens while build an index concurrently on a partitioned
+table, the command can also leave behind valid or
+invalid indexes on table partitions.  The invalid index
 will be ignored for querying purposes because it might be incomplete;
 however it will still consume update overhead. The psql
 \d command will report such an index as INVALID:
@@ -692,15 +695,6 @@ Indexes:
 cannot.

 
-   
-Concurrent builds for indexes on partitioned tables are currently not
-supported.  However, you may concurrently build the index on each
-partition individually and then finally create the partitioned index
-non-concurrently in order to reduce the time where writes to the
-partitioned table will be locked out.  In this case, building the
-partitioned index is a metadata only operation.
-   
-
   
  
 
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index baf3e6e57a5..dfe64052b81 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -92,6 +92,11 @@ static char *ChooseIndexName(const char *tabname, Oid namespaceId,
 			 bool primary, bool isconstraint);
 static char *ChooseIndexNameAddition(List *colnames);
 static List *ChooseIndexColumnNames(List *indexElems);
+static void DefineIndexConcurrentInternal(Oid relationId,
+		  Oid indexRelationId,
+		  IndexInfo *indexInfo,
+		  LOCKTAG heaplocktag,
+		  LockRelId heaprelid);
 static void ReindexIndex(RangeVar *indexRelation, ReindexParams *params,
 		 bool isTopLevel);
 static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
@@ -555,7 +560,6 @@ DefineIndex(Oid relationId,
 	bool		

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

2023-07-12 Thread Masahiko Sawada
On Thu, Jul 13, 2023 at 11:12 AM Peter Smith  wrote:
>
> On Thu, Jul 13, 2023 at 11:28 AM Masahiko Sawada  
> wrote:
> >
> > On Thu, Jul 13, 2023 at 8:03 AM Peter Smith  wrote:
> > >
> > > On Wed, Jul 12, 2023 at 5:01 PM Masahiko Sawada  
> > > wrote:
> > > >
> > > > On Tue, Jul 11, 2023 at 5:31 PM Peter Smith  
> > > > wrote:
> > > > >
> ...
> > >
> > > I checked v5-0001 and noticed the following:
> > >
> > > ==
> > > doc/src/sgml/logical-replication.sgml
> > >
> > > BEFORE
> > > ... and the leftmost index field must be a  column (not an expression)
> > > that reference a published table column.
> > >
> > > SUGGESTION ("references the", instead of "reference a")
> > > ... and the leftmost index field must be a  column (not an expression)
> > > that references the published table column.
> >
> > Thanks, will fix.
> >
> > >
> > > (maybe that last word "column" is also unnecessary?)
> >
> > But an index column doesn't reference the published table, but the
> > published table's column, no?
> >
>
> Yeah, but there is some inconsistency with the other code comment that
> just says "... that references the remote relation.", so I thought one
> of them needs to change. If not this one, then the other one.

Right. So let's add "column" in both places. Attached the updated patch

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


v6-0001-Doc-clarify-the-conditions-of-usable-indexes-for-.patch
Description: Binary data


Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

2023-07-12 Thread Peter Smith
On Thu, Jul 13, 2023 at 11:28 AM Masahiko Sawada  wrote:
>
> On Thu, Jul 13, 2023 at 8:03 AM Peter Smith  wrote:
> >
> > On Wed, Jul 12, 2023 at 5:01 PM Masahiko Sawada  
> > wrote:
> > >
> > > On Tue, Jul 11, 2023 at 5:31 PM Peter Smith  wrote:
> > > >
...
> >
> > I checked v5-0001 and noticed the following:
> >
> > ==
> > doc/src/sgml/logical-replication.sgml
> >
> > BEFORE
> > ... and the leftmost index field must be a  column (not an expression)
> > that reference a published table column.
> >
> > SUGGESTION ("references the", instead of "reference a")
> > ... and the leftmost index field must be a  column (not an expression)
> > that references the published table column.
>
> Thanks, will fix.
>
> >
> > (maybe that last word "column" is also unnecessary?)
>
> But an index column doesn't reference the published table, but the
> published table's column, no?
>

Yeah, but there is some inconsistency with the other code comment that
just says "... that references the remote relation.", so I thought one
of them needs to change. If not this one, then the other one.

> >
> > ==
> > src/backend/replication/logical/relation.c
> >
> > BEFORE
> > The index must be btree, non-partial, and the leftmost field must be a
> > column (not an expression) that reference the remote relation.
> >
> > SUGGESTION ("references", instead of "reference")
> > The index must be btree, non-partial, and the leftmost field must be a
> > column (not an expression) that references the remote relation.
> >
>
> Will fix.
>

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: [Patch] Use *other* indexes on the subscriber when REPLICA IDENTITY is FULL

2023-07-12 Thread Peter Smith
Here are some review comments for the v5 patch

==
Commit message.

1.
89e46d allowed using indexes other than PRIMARY KEY or REPLICA IDENTITY on the
subscriber, but only the BTree index could be used. This commit extends the
limitation, now the Hash index can be also used. These 2 types of
indexes are the
only ones supported because only these
- use fix strategy numbers
- implement the "equality" strategy
- supported by tuples_equal()
- implement the function amgettuple()

~

1a.
/fix/fixed/

~

1b.
/supported by tuples_equal()/are supported by tuples_equal()/

~~~

2.
Index access methods other than them don't have a fixed strategy for equality
operation. Note that in other index access methods, the support routines of each
operator class interpret the strategy numbers according to the operator class's
definition. In build_replindex_scan_key(), the operator which corresponds to
"equality comparison" is searched by using the strategy number, but it is
difficult for such indexes.

~

/Index access methods other than them/Other index access methods/

~~~

3.
tuples_equal() cannot accept a datatype that does not have an operator class for
Btree or Hash. One motivation for other types of indexes to be used is to define
an index to attributes such as datatypes like point and box. But
lookup_type_cache(),
which is called from tuples_equal(), cannot return the valid value if
input datatypes
do not have such a opclass.

~

That paragraph looks the same as the code comment in
IsIndexUsableForReplicaIdentityFull(). I wrote some review comments
about that (see #5d below) so the same maybe applies here too.

==
src/backend/executor/execReplication.c

4.
+/*
+ * Return the strategy number which corresponds to the equality operator for
+ * given index access method.
+ *
+ * XXX: Currently, only Btree and Hash indexes are supported. This is because
+ * index access methods other than them don't have a fixed strategy for
+ * equality operation. Note that in other index access methods, the support
+ * routines of each operator class interpret the strategy numbers according to
+ * the operator class's definition. So, we return the InvalidStrategy number
+ * for index access methods other than BTREE and HASH.
+ */
+int
+get_equal_strategy_number_for_am(Oid am)

The function comment seems a bit long. Maybe it can be simplified a bit:

SUGGESTION
Return the strategy number which corresponds to the equality operator
for the given index access method, otherwise, return InvalidStrategy.

XXX: Currently, only Btree and Hash indexes are supported. The other
index access methods don't have a fixed strategy for equality
operation - instead, the support routines of each operator class
interpret the strategy numbers according to the operator class's
definition.

==
src/backend/replication/logical/relation.c

5. FindUsableIndexForReplicaIdentityFull

 /*
  * Returns true if the index is usable for replica identity full. For details,
  * see FindUsableIndexForReplicaIdentityFull.
+ *
+ * XXX: Currently, only Btree and Hash indexes can be returned as usable one.
+ * This is because mainly two reasons:
+ *
+ * 1) Other index access methods other than Btree and Hash don't have a fixed
+ * strategy for equality operation. Refer comments atop
+ * get_equal_strategy_number_for_am.
+ * 2) tuples_equal cannot accept a datatype that does not have an operator
+ * class for Btree or Hash. One motivation for other types of indexes to be
+ * used is to define an index to attributes such as datatypes like point and
+ * box. But lookup_type_cache, which is called from tuples_equal, cannot return
+ * the valid value if input datatypes do not have such a opclass.
+ *
+ * Furthermore, BRIN and GIN indexes do not implement "amgettuple".
+ * RelationFindReplTupleByIndex assumes that tuples will be fetched one by
+ * one via index_getnext_slot, but this currently requires the "amgettuple"
+ * function.
  */

5a.
/as usable one./as useable./

~

5b.
/Other index access methods other than Btree and Hash/Other index
access methods/

~

5c.
Maybe a blank line before 2) will help readability.

~

5d.
"One motivation for other types of indexes to be used is to define an
index to attributes such as datatypes like point and box."

Isn't it enough to omit that sentence and just say:

SUGGESTION
2) tuples_equal() cannot accept a datatype (e.g. point or box) that
does not have an operator class for Btree or Hash. This is because
lookup_type_cache(), which is called from tuples_equal(), cannot
return the valid value if input datatypes do not have such an opclass.

~~~

6. FindUsableIndexForReplicaIdentityFull

+ /* Check whether the index is supported or not */
+ if (get_equal_strategy_number_for_am(indexInfo->ii_Am) == InvalidStrategy)
+ return false;

6a.

Really, this entire function is for checking "is supported or not" so
IMO this is not the correct comment just for this statement. BTW, I
noted Onder suggested keeping this as a variable assignment 

Re: Duplicated LLVMJitHandle->lljit assignment?

2023-07-12 Thread Gurjeet Singh
On Wed, Jul 12, 2023 at 5:22 PM Matheus Alcantara  wrote:
>
> I was reading the jit implementation and I notice that the lljit field of
> LLVMJitHandle is being assigned twice on llvm_compile_module function, is this
> correct? I'm attaching a supposed fixes that removes  the second assignment. I
> ran meson test and all tests have pass.

-handle->lljit = compile_orc;

LGTM.

Best regards,
Gurjeet
http://Gurje.et




Re: CommandStatus from insert returning when using a portal.

2023-07-12 Thread David G. Johnston
On Wed, Jul 12, 2023 at 5:57 PM Dave Cramer  wrote:

> On Wed, 12 Jul 2023 at 20:00,  wrote:
>
>> Dave Cramer  writes:
>> > Obviously I am biased by the JDBC API which would like to have
>> > PreparedStatement.execute() return the number of rows inserted
>> > without having to wait to read all of the rows returned
>>
>> Huh ... just how *is* PreparedStatement.execute() supposed
>> to behave when the statement is an INSERT RETURNING?
>>
>
> It's really executeUpdate which is supposed to return the number of rows
> updated.
>

Right, and executeUpdate is the wrong API method to use, in the PostgreSQL
world, when executing insert/update/delete with the non-SQL-standard
returning clause.  executeQuery is the method to use.  And execute() should
behave as if executeQuery was called, i.e., return true, which it is
capable of doing since it has resultSet data that it needs to handle.

The addition of returning turns the insert/update/delete into a select in
terms of effective client-seen behavior.

ISTM that you are trying to make user-error less painful.  While that is
laudable it apparently isn't practical.  They can either discard the
results and get a count by omitting returning or obtain the result and
derive the count by counting rows alongside whatever else they needed the
returned data for.

David J.


Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

2023-07-12 Thread Masahiko Sawada
On Thu, Jul 13, 2023 at 8:03 AM Peter Smith  wrote:
>
> On Wed, Jul 12, 2023 at 5:01 PM Masahiko Sawada  wrote:
> >
> > On Tue, Jul 11, 2023 at 5:31 PM Peter Smith  wrote:
> > >
> > > Here are my comments for v4.
> > >
> > > ==
> > >
> > > Docs/Comments:
> > >
>
> > > 
> >
> > Agreed. I've attached the updated patch. I'll push it barring any 
> > objections.
> >
> > >
>
> I checked v5-0001 and noticed the following:
>
> ==
> doc/src/sgml/logical-replication.sgml
>
> BEFORE
> ... and the leftmost index field must be a  column (not an expression)
> that reference a published table column.
>
> SUGGESTION ("references the", instead of "reference a")
> ... and the leftmost index field must be a  column (not an expression)
> that references the published table column.

Thanks, will fix.

>
> (maybe that last word "column" is also unnecessary?)

But an index column doesn't reference the published table, but the
published table's column, no?

>
> ==
> src/backend/replication/logical/relation.c
>
> BEFORE
> The index must be btree, non-partial, and the leftmost field must be a
> column (not an expression) that reference the remote relation.
>
> SUGGESTION ("references", instead of "reference")
> The index must be btree, non-partial, and the leftmost field must be a
> column (not an expression) that references the remote relation.
>

Will fix.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Autogenerate some wait events code and documentation

2023-07-12 Thread Michael Paquier
On Mon, Jul 10, 2023 at 07:52:23AM +0200, Drouvot, Bertrand wrote:
> On 7/10/23 7:20 AM, Michael Paquier wrote:
>> Hmm.  Something like that could be done, for instance:
>> 
>>   #   src/backend/utils/activity/wait_event_types.h
>> -#  typedef enum definitions for wait events.
>> +#  typedef enum definitions for wait events, generated from the first
>> +#  field.
> 
> Yeah, it looks a good place for it.

I am not sure where we are on that based on the objection from Alvaro
to not remove the first column in wait_event_names.txt about
greppability.  Anyway, I am not seeing any objections behind my
suggestion to simplify the second column and remove the quotes from
the event names, either.  Besides, the suggestion of Andres to improve
the error message on parsing and show the line information is
something useful in itself.

Hence, attached is a rebased patch set that separates the work into
more patches:
- 0001 removes the quotes from the second column, improving the
readability of the .txt file.
- 0002 fixes the report from Andres to improve the error message on
parsing.
- 0003 is the rename of the wait events, in preparation for...
- 0004 that removes entirely the first column (enum element names)
from wait_event_names.txt.

I would like to apply 0001 and 0002 to improve the format if there are
no objections.  0003 and 0004 are still here for discussion, as it
does not seem like a consensus has been reached for that yet.  Getting
more opinions would be a good next step for the last two patches, I
assume.

So, any comments?
--
Michael
From 4e10ef4726dc7179c02d23ec7e31f90d931b734a Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 13 Jul 2023 09:51:28 +0900
Subject: [PATCH v2 1/4] Remove quotes from second column of
 wait_event_names.txt

These are not required, as the fields quoted are single words.
---
 .../activity/generate-wait_event_types.pl |   8 +-
 .../utils/activity/wait_event_names.txt   | 490 +-
 2 files changed, 250 insertions(+), 248 deletions(-)

diff --git a/src/backend/utils/activity/generate-wait_event_types.pl b/src/backend/utils/activity/generate-wait_event_types.pl
index 75cddf6fa2..2a9e341c58 100644
--- a/src/backend/utils/activity/generate-wait_event_types.pl
+++ b/src/backend/utils/activity/generate-wait_event_types.pl
@@ -73,7 +73,7 @@ my @lines_sorted =
 foreach my $line (@lines_sorted)
 {
 	die "unable to parse wait_event_names.txt"
-	  unless $line =~ /^(\w+)\t+(\w+)\t+("\w+")\t+("\w.*\.")$/;
+	  unless $line =~ /^(\w+)\t+(\w+)\t+(\w+)\t+("\w.*\.")$/;
 
 	(   my $waitclassname,
 		my $waiteventenumname,
@@ -168,7 +168,9 @@ if ($gen_code)
 			$firstpass = 0;
 
 			printf $c "\t\t case %s:\n", $wev->[0];
-			printf $c "\t\t\t event_name = %s;\n\t\t\t break;\n", $wev->[1];
+			# Apply quotes to the wait event name string.
+			printf $c "\t\t\t event_name = \"%s\";\n\t\t\t break;\n",
+			  $wev->[1];
 		}
 
 		printf $h "\n} $waitclass;\n\n";
@@ -221,7 +223,7 @@ elsif ($gen_docs)
 		{
 			printf $s " \n";
 			printf $s "  %s\n",
-			  substr $wev->[1], 1, -1;
+			  $wev->[1];
 			printf $s "  %s\n", substr $wev->[2], 1, -1;
 			printf $s " \n";
 		}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 7af1a61f95..3fabad96d9 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -44,19 +44,19 @@
 
 Section: ClassName - WaitEventActivity
 
-WAIT_EVENT_ARCHIVER_MAIN	"ArchiverMain"	"Waiting in main loop of archiver process."
-WAIT_EVENT_AUTOVACUUM_MAIN	"AutoVacuumMain"	"Waiting in main loop of autovacuum launcher process."
-WAIT_EVENT_BGWRITER_HIBERNATE	"BgWriterHibernate"	"Waiting in background writer process, hibernating."
-WAIT_EVENT_BGWRITER_MAIN	"BgWriterMain"	"Waiting in main loop of background writer process."
-WAIT_EVENT_CHECKPOINTER_MAIN	"CheckpointerMain"	"Waiting in main loop of checkpointer process."
-WAIT_EVENT_LOGICAL_APPLY_MAIN	"LogicalApplyMain"	"Waiting in main loop of logical replication apply process."
-WAIT_EVENT_LOGICAL_LAUNCHER_MAIN	"LogicalLauncherMain"	"Waiting in main loop of logical replication launcher process."
-WAIT_EVENT_LOGICAL_PARALLEL_APPLY_MAIN	"LogicalParallelApplyMain"	"Waiting in main loop of logical replication parallel apply process."
-WAIT_EVENT_RECOVERY_WAL_STREAM	"RecoveryWalStream"	"Waiting in main loop of startup process for WAL to arrive, during streaming recovery."
-WAIT_EVENT_SYSLOGGER_MAIN	"SysLoggerMain"	"Waiting in main loop of syslogger process."
-WAIT_EVENT_WAL_RECEIVER_MAIN	"WalReceiverMain"	"Waiting in main loop of WAL receiver process."
-WAIT_EVENT_WAL_SENDER_MAIN	"WalSenderMain"	"Waiting in main loop of WAL sender process."
-WAIT_EVENT_WAL_WRITER_MAIN	"WalWriterMain"	"Waiting in main loop of WAL writer process."
+WAIT_EVENT_ARCHIVER_MAIN	ArchiverMain	"Waiting in main loop of archiver process."
+WAIT_EVENT_AUTOVACUUM_MAIN	AutoVacuumMain	"Waiting in 

Re: Support to define custom wait events for extensions

2023-07-12 Thread Masahiro Ikeda

On 2023-07-13 09:12, Michael Paquier wrote:

On Wed, Jul 12, 2023 at 05:46:31PM +0900, Michael Paquier wrote:

On Wed, Jul 12, 2023 at 04:52:38PM +0900, Masahiro Ikeda wrote:

If the behavior is unexpected, we need to change the current code.
I have created a patch for the areas that I felt needed to be 
changed.

- 0001-change-the-die-condition-in-generate-wait_event_type.patch
 (In addition to the above, "$continue = ",\n";" doesn't appear to be
necessary.)


die "wait event names must start with 'WAIT_EVENT_'"
  if ( $trimmedwaiteventname eq $waiteventenumname
-   && $waiteventenumname !~ /^LWLock/
-   && $waiteventenumname !~ /^Lock/);
-   $continue = ",\n";
+   && $waitclassname !~ /^WaitEventLWLock$/
+   && $waitclassname !~ /^WaitEventLock$/);

Indeed, this looks wrong as-is.  $waiteventenumname refers to the
names of the enum elements, so we could just apply a filter based on
the class names in full.  The second check in for the generation of
the c/h files uses the class names.


At the end, I have gone with an event simpler way and removed the
checks for LWLock and Lock as their hardcoded values marked as DOCONLY
satisfy this check.  The second check when generating the C and header
code has also been simplified a bit to use an exact match with the
class name.


Thanks for your quick response. I'll rebase for the commit.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION




Re: CommandStatus from insert returning when using a portal.

2023-07-12 Thread Dave Cramer
On Wed, 12 Jul 2023 at 20:00,  wrote:

> Dave Cramer  writes:
> > Obviously I am biased by the JDBC API which would like to have
> > PreparedStatement.execute() return the number of rows inserted
> > without having to wait to read all of the rows returned
>
> Huh ... just how *is* PreparedStatement.execute() supposed
> to behave when the statement is an INSERT RETURNING?
>

It's really executeUpdate which is supposed to return the number of rows
updated.
Without a cursor it returns right away as all of the results are returned
by the server. However with cursor you have to wait until you fetch the
rows before you can get the CommandComplete message which btw is wrong as
it returns INSERT 0 0 instead of INSERT 2 0

Dave


Duplicated LLVMJitHandle->lljit assignment?

2023-07-12 Thread Matheus Alcantara
Hi,

I was reading the jit implementation and I notice that the lljit field of
LLVMJitHandle is being assigned twice on llvm_compile_module function, is this
correct? I'm attaching a supposed fixes that removes  the second assignment. I
ran meson test and all tests have pass.

--
Matheus AlcantaraFrom 2a2c773b2437b2c491576db8d7ed6b6d1ba2c815 Mon Sep 17 00:00:00 2001
From: Matheus Alcantara 
Date: Wed, 12 Jul 2023 18:57:52 -0300
Subject: [PATCH] Remove duplicated LLVMJitHandle->lljit assignment

---
 src/backend/jit/llvm/llvmjit.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
index 5c30494fa1..09650e2c70 100644
--- a/src/backend/jit/llvm/llvmjit.c
+++ b/src/backend/jit/llvm/llvmjit.c
@@ -722,8 +722,6 @@ llvm_compile_module(LLVMJitContext *context)
 			elog(ERROR, "failed to JIT module: %s",
  llvm_error_message(error));
 
-		handle->lljit = compile_orc;
-
 		/* LLVMOrcLLJITAddLLVMIRModuleWithRT takes ownership of the module */
 	}
 #elif LLVM_VERSION_MAJOR > 6
-- 
2.34.1



Re: Support to define custom wait events for extensions

2023-07-12 Thread Michael Paquier
On Wed, Jul 12, 2023 at 05:46:31PM +0900, Michael Paquier wrote:
> On Wed, Jul 12, 2023 at 04:52:38PM +0900, Masahiro Ikeda wrote:
>> If the behavior is unexpected, we need to change the current code.
>> I have created a patch for the areas that I felt needed to be changed.
>> - 0001-change-the-die-condition-in-generate-wait_event_type.patch
>>  (In addition to the above, "$continue = ",\n";" doesn't appear to be
>> necessary.)
> 
> die "wait event names must start with 'WAIT_EVENT_'"
>   if ( $trimmedwaiteventname eq $waiteventenumname
> -   && $waiteventenumname !~ /^LWLock/
> -   && $waiteventenumname !~ /^Lock/);
> -   $continue = ",\n";
> +   && $waitclassname !~ /^WaitEventLWLock$/
> +   && $waitclassname !~ /^WaitEventLock$/);
> 
> Indeed, this looks wrong as-is.  $waiteventenumname refers to the
> names of the enum elements, so we could just apply a filter based on
> the class names in full.  The second check in for the generation of
> the c/h files uses the class names.

At the end, I have gone with an event simpler way and removed the
checks for LWLock and Lock as their hardcoded values marked as DOCONLY
satisfy this check.  The second check when generating the C and header
code has also been simplified a bit to use an exact match with the
class name. 
--
Michael


signature.asc
Description: PGP signature


Re: MERGE ... RETURNING

2023-07-12 Thread Vik Fearing

On 7/13/23 01:48, Jeff Davis wrote:

On Wed, 2023-07-12 at 03:47 +0200, Vik Fearing wrote:


There is no RETURNING clause in Standard SQL, and the way they would
do
this is:

  SELECT ...
  FROM OLD TABLE (
  MERGE ...
  ) AS m

The rules for that for MERGE are well defined.


I only see OLD TABLE referenced as part of a trigger definition. Where
is it defined for MERGE?


Look up  for that syntax.  For how MERGE 
generates those, see 9075-2:2023 Section 14.12  General 
Rules 6.b and 6.c.



In any case, as long as the SQL standard doesn't conflict, then we're
fine. And it looks unlikely to cause a conflict right now that wouldn't
also be a conflict with our existing RETURNING clause elsewhere, so I'm
not seeing a problem here.


I do not see a problem either, which was what I was trying to express 
(perhaps poorly).  At least not with the syntax.  I have not yet tested 
that the returned rows match the standard.

--
Vik Fearing





Re: CommandStatus from insert returning when using a portal.

2023-07-12 Thread chap

Dave Cramer  writes:

Obviously I am biased by the JDBC API which would like to have
PreparedStatement.execute() return the number of rows inserted
without having to wait to read all of the rows returned


Huh ... just how *is* PreparedStatement.execute() supposed
to behave when the statement is an INSERT RETURNING?

execute() -> true
getResultSet() -> the rows
getMoreResults() -> false
getUpdateCount() -> number inserted?

It seems that would fit the portal's behavior easily enough.

Or is the JDBC spec insisting on some other order?

Regards,
-Chap




Re: Meson build updates

2023-07-12 Thread Tristan Partin
Did you need anything more from the "Make finding pkg-config(python3)
more robust" patch? That one doesn't seem to have been applied yet.

Thanks for your reviews thus far.

-- 
Tristan Partin
Neon (https://neon.tech)




Re: MERGE ... RETURNING

2023-07-12 Thread Jeff Davis
On Wed, 2023-07-12 at 03:47 +0200, Vik Fearing wrote:

> There is no RETURNING clause in Standard SQL, and the way they would
> do 
> this is:
> 
>  SELECT ...
>  FROM OLD TABLE (
>  MERGE ...
>  ) AS m
> 
> The rules for that for MERGE are well defined.

I only see OLD TABLE referenced as part of a trigger definition. Where
is it defined for MERGE?

In any case, as long as the SQL standard doesn't conflict, then we're
fine. And it looks unlikely to cause a conflict right now that wouldn't
also be a conflict with our existing RETURNING clause elsewhere, so I'm
not seeing a problem here.

Regards,
Jeff Davis





Re: Meson build updates

2023-07-12 Thread Andres Freund
Hi,

On 2023-06-29 13:34:42 -0500, Tristan Partin wrote:
> On Thu Jun 29, 2023 at 12:35 PM CDT, Andres Freund wrote:
> > Hm. One minor disadvantage of this is that if no c++ compiler was found, you
> > can't really see anything about llvm in the the output, nor in 
> > meson-log.txt,
> > making it somewhat hard to figure out why llvm was disabled.
> >
> > I think something like
> >
> > elif llvmopt.auto()
> >   message('llvm requires a C++ compiler')
> > endif
> >
> > Should do the trick?
> 
> Your patch looks great to me.
> 
> > > v5-0011-Pass-feature-option-through-to-required-kwarg.patch
> >
> > I'm a bit confused how it ended how it's looking like it is right now, but
> > ... :)
> >
> > I'm thinking of merging 0011 and relevant parts of 0012 and 0015 into one.
> 
> Your patch looks great to me.

Pushed these. Thanks!

Greetings,

Andres Freund




Re: Remove distprep

2023-07-12 Thread Andres Freund
Hi,

On 2023-06-09 11:10:14 +0200, Peter Eisentraut wrote:
> Per discussion at the unconference[0], I started to write a patch that
> removes the make distprep target.  A more detailed explanation of the
> rationale is also in the patch.

Thanks for tackling this!


> It needs some polishing around the edges, but I wanted to put it out here to
> get things moving and avoid duplicate work.

It'd be nice if we could get this in soon, so we could move ahead with the
src/tools/msvc removal.


> One thing in particular that isn't clear right now is how "make world"
> should behave if the documentation tools are not found.  Maybe we should
> make a build option, like "--with-docs", to mirror the meson behavior.

Isn't that somewhat unrelated to distprep?  I see that you removed missing,
but I don't really understand why as part of this commit?


> -# If there are any files in the source directory that we also generate in the
> -# build directory, they might get preferred over the newly generated files,
> -# e.g. because of a #include "file", which always will search in the current
> -# directory first.
> -message('checking for file conflicts between source and build directory')

You're thinking this can be removed because distclean is now reliable? There
were some pretty annoying to debug issues early on, where people switched from
an in-tree autoconf build to meson, with some files left over from the source
build, causing problems at a *later* time (when the files should have changed,
but the wrong ones were picked up).  That's not really related distprep etc,
so I'd change this separately, if we want to change it.


> diff --git a/src/tools/pginclude/cpluspluscheck 
> b/src/tools/pginclude/cpluspluscheck
> index 4e09c4686b..287395887c 100755
> --- a/src/tools/pginclude/cpluspluscheck
> +++ b/src/tools/pginclude/cpluspluscheck
> @@ -134,6 +134,9 @@ do
>   test "$f" = src/interfaces/ecpg/preproc/preproc.h && continue
>   test "$f" = src/test/isolation/specparse.h && continue
>
> + # FIXME
> + test "$f" = src/backend/utils/adt/jsonpath_internal.h && continue
> +
>   # ppport.h is not under our control, so we can't make it standalone.
>   test "$f" = src/pl/plperl/ppport.h && continue

Hm, what's that about?

We really ought to replace these scripts with something better, which
understands concurrency...

Greetings,

Andres Freund




Re: Correct the documentation for work_mem

2023-07-12 Thread David Rowley
On Tue, 25 Apr 2023 at 04:20, Imseih (AWS), Sami  wrote:
>
> Based on the feedback, here is a v1 of the suggested doc changes.
>
> I modified Gurjeets suggestion slightly to make it clear that a specific
> query execution could have operations simultaneously using up to
> work_mem.

> -Note that for a complex query, several sort or hash operations might 
> be
> -running in parallel; each operation will generally be allowed
> +Note that a complex query may include several sort and hash 
> operations,
> +and more than one of these operations may be in progress 
> simultaneously
> +for a given query execution; each such operation will generally be 
> allowed
> to use as much memory as this value specifies before it starts
> to write data into temporary files.  Also, several running
> sessions could be doing such operations concurrently.

I'm wondering about adding "and more than one of these operations may
be in progress simultaneously".  Are you talking about concurrent
sessions running other queries which are using work_mem too?  If so,
isn't that already covered by the final sentence in the quoted text
above? if not, what is running simultaneously?

I think Tom's suggestion looks fine. I'd maybe change "sort or hash"
to "sort and hash" per the suggestion from Gurjeet above.

David




Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

2023-07-12 Thread Peter Smith
On Wed, Jul 12, 2023 at 5:01 PM Masahiko Sawada  wrote:
>
> On Tue, Jul 11, 2023 at 5:31 PM Peter Smith  wrote:
> >
> > Here are my comments for v4.
> >
> > ==
> >
> > Docs/Comments:
> >

> > 
>
> Agreed. I've attached the updated patch. I'll push it barring any objections.
>
> >

I checked v5-0001 and noticed the following:

==
doc/src/sgml/logical-replication.sgml

BEFORE
... and the leftmost index field must be a  column (not an expression)
that reference a published table column.

SUGGESTION ("references the", instead of "reference a")
... and the leftmost index field must be a  column (not an expression)
that references the published table column.

(maybe that last word "column" is also unnecessary?)

==
src/backend/replication/logical/relation.c

BEFORE
The index must be btree, non-partial, and the leftmost field must be a
column (not an expression) that reference the remote relation.

SUGGESTION ("references", instead of "reference")
The index must be btree, non-partial, and the leftmost field must be a
column (not an expression) that references the remote relation.

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: 'ERROR: attempted to update invisible tuple' from 'ALTER INDEX ... ATTACH PARTITION' on parent index

2023-07-12 Thread Michael Paquier
On Wed, Jul 12, 2023 at 10:01:49AM -0400, Robert Haas wrote:
> I'm not sure exactly what is happening here, but it looks to me like
> ATExecReplicaIdentity() only takes ShareLock on the index and
> nevertheless feels entitled to update the pg_index tuple, which is
> pretty strange. We normally require AccessExclusiveLock to perform DDL
> on an object, and in the limited exceptions that we have to that rule
> - see AlterTableGetLockLevel - it's pretty much always a
> self-exclusive lock. Otherwise, two backends might try to do the same
> DDL operation at the same time, which would lead to low-level failures
> trying to update the same tuple such as the one seen here.
> 
> But even if that doesn't happen or is prevented by some other
> mechanism, there's still a synchronization problem. Suppose backend B1
> modifies some state via a DDL operation on table T and then afterward
> backend B2 wants to perform a non-DDL operation that depends on that
> state. Well, B1 takes some lock on the relation, and B2 takes a lock
> that would conflict with it, and that guarantees that B2 starts after
> B1 commits. That means that B2 is guaranteed to see the invalidations
> that were queued by B1, which means it will flush any state out of its
> cache that was made stale by the operation performed by B1. If the
> locks didn't conflict, B2 might start before B1 committed and either
> fail to update its caches or update them but with a version of the
> tuples that's about to be made obsolete when B1 commits. So ShareLock
> doesn't feel like a very safe choice here.

Yes, I also got to wonder whether it is OK to hold only a ShareLock
for the index being used as a replica identity.  We hold an AEL on the
parent table, and ShareLock is sufficient to prevent concurrent schema
operations until the transaction that took the lock commit.  But
surely, that feels inconsistent with the common practices in
tablecmds.c.
--
Michael


signature.asc
Description: PGP signature


Re: 'ERROR: attempted to update invisible tuple' from 'ALTER INDEX ... ATTACH PARTITION' on parent index

2023-07-12 Thread Michael Paquier
On Wed, Jul 12, 2023 at 04:02:23PM -0400, Robert Haas wrote:
> Oh, interesting. The fact that indisreplident isn't copied seems like
> a pretty clear mistake, but I'm guessing that the fact that t_self
> wasn't refreshed was deliberate and that the author of this code
> didn't really intend for callers to look at the t_self value. We could
> change our mind about whether that ought to be allowed, though. But,
> like, none of the other tuple header fields are copied either... xmax,
> xvac, infomask, etc.

See 3c84046 and 8ec9438, mainly, from Tom.  I didn't know that this is
used as a shortcut to reload index information in the cache because it
is much cheaper than a full index information rebuild.  I agree that
not copying indisreplident in this code path is a mistake as this bug
shows, because any follow-up command run in a transaction that changed
this field would get an incorrect information reference.

Now, I have to admit that I am not completely sure what the
consequences of this addition are when it comes to concurrent index
operations (CREATE/DROP INDEX, REINDEX):
/* Copy xmin too, as that is needed to make sense of indcheckxmin */
HeapTupleHeaderSetXmin(relation->rd_indextuple->t_data,
   HeapTupleHeaderGetXmin(tuple->t_data));
+   ItemPointerCopy(>t_self, >rd_indextuple->t_self);

Anyway, as I have pointed upthread, I think that the craziness is also
in validatePartitionedIndex() where this stuff thinks that it is OK to
use a copy the pg_index tuple coming from the relcache.  As this
report proves, it is *not* safe, because we may miss a lot of
information not updated by RelationReloadIndexInfo() that other
commands in the same transaction block may have updated, and the
code would insert into the catalog an inconsistent tuple for a
partitioned index switched to become valid.

I agree that updating indisreplident in this cheap index reload path
is necessary, as well.  Does my suggestion of using the syscache not
make sense for somebody here?  Note that this is what all the other
code paths do for catalog updates of pg_index when retrieving a copy
of its tuples.
--
Michael


signature.asc
Description: PGP signature


Re: document the need to analyze partitioned tables

2023-07-12 Thread David Rowley
On Wed, 25 Jan 2023 at 21:43, David Rowley  wrote:
> While I agree that the majority of partitions are likely to be
> relkind='r', which you might ordinarily consider a "normal table", you
> just might change your mind when you try to INSERT or UPDATE records
> that would violate the partition constraint. Some partitions might
> also be themselves partitioned tables and others might be foreign
> tables. That does not really matter much when it comes to what
> autovacuum does or does not do, but I'm not really keen to imply in
> our documents that partitions are "normal tables".

Based on the above, I'm setting this to waiting on author.

David




Re: CommandStatus from insert returning when using a portal.

2023-07-12 Thread David G. Johnston
On Wed, Jul 12, 2023 at 2:59 PM Dave Cramer  wrote:

> On Wed, 12 Jul 2023 at 17:49, Tom Lane  wrote:
>
>> Dave Cramer  writes:
>> > Obviously I am biased by the JDBC API which would like to have
>> > PreparedStatement.execute() return the number of rows inserted
>> > without having to wait to read all of the rows returned
>>
>> Umm ... you do realize that we return the rows on-the-fly?
>>
> I do realize that.
>
>> The server does not know how many rows got inserted/returned
>>
> Well I haven't looked at the code, but it seems unintuitive that adding
> the returning clause changes the semantics of insert.
>
>
It doesn't have to - the insertions are always "as rows are produced", it
is just that in the non-returning case the final row can be sent to
/dev/null instead of the client (IOW, there is always some destination).
In both cases the total number of rows inserted are only reliably known
when the top executor node requests a new tuple and its immediate
predecessor says "no more rows present".

David J.


Re: CommandStatus from insert returning when using a portal.

2023-07-12 Thread Dave Cramer
On Wed, 12 Jul 2023 at 17:49, Tom Lane  wrote:

> Dave Cramer  writes:
> > Obviously I am biased by the JDBC API which would like to have
> > PreparedStatement.execute() return the number of rows inserted
> > without having to wait to read all of the rows returned
>
> Umm ... you do realize that we return the rows on-the-fly?
>
I do realize that.

> The server does not know how many rows got inserted/returned
>
Well I haven't looked at the code, but it seems unintuitive that adding the
returning clause changes the semantics of insert.

until it's run the query to completion, at which point all
> the data has already been sent to the client.  There isn't
> any way to return the rowcount before the data, and it wouldn't
> be some trivial protocol adjustment to make that work differently.
> (What it *would* be is expensive, because we'd have to store
> those rows somewhere.)
>
I wasn't asking for that, I just want the number of rows inserted.

Thanks,

Dave


Re: CommandStatus from insert returning when using a portal.

2023-07-12 Thread Tom Lane
Dave Cramer  writes:
> Obviously I am biased by the JDBC API which would like to have
> PreparedStatement.execute() return the number of rows inserted
> without having to wait to read all of the rows returned

Umm ... you do realize that we return the rows on-the-fly?
The server does not know how many rows got inserted/returned
until it's run the query to completion, at which point all
the data has already been sent to the client.  There isn't
any way to return the rowcount before the data, and it wouldn't
be some trivial protocol adjustment to make that work differently.
(What it *would* be is expensive, because we'd have to store
those rows somewhere.)

regards, tom lane




Re: CommandStatus from insert returning when using a portal.

2023-07-12 Thread Dave Cramer
Dave Cramer


On Wed, 12 Jul 2023 at 16:31, David G. Johnston 
wrote:

> On Wed, Jul 12, 2023 at 1:03 PM Dave Cramer  wrote:
>
>>
>> INSERT INTO test_table (cnt) VALUES (1), (2) RETURNING id
>>
>> if a portal is used to get the results then the CommandStatus
>>
>
> IIUC the portal is not optional if you including the RETURNING clause.
>
>From my testing it isn't required.

>
> There is no CommandStatus message in the protocol, the desired information
> is part of the command tag returned in the CommandComplete message.  You
> get that at the end of the command, which has been defined as when any
> portal produced by the command has been fully executed.
>

I could argue that the insert is fully completed whether I read the data or
not.

>
> You probably should add your desire to the Version 4 protocol ToDo on the
> wiki.
>
> https://wiki.postgresql.org/wiki/Todo#Wire_Protocol_Changes_.2F_v4_Protocol
>

thx, will do.

Dave

>


Re: CommandStatus from insert returning when using a portal.

2023-07-12 Thread Gurjeet Singh
On Wed, Jul 12, 2023 at 1:03 PM Dave Cramer  wrote:
>
> With a simple insert such as
>
> INSERT INTO test_table (cnt) VALUES (1), (2) RETURNING id
>
> if a portal is used to get the results then the CommandStatus is not returned 
> on the execute only when the portal is closed. After looking at this more it 
> is really after all of the data is read which is consistent if you don't use 
> a portal, however it would be much more useful if we received the 
> CommandStatus after the insert was completed and before the data
>
> Obviously I am biased by the JDBC API which would like to have
>
> PreparedStatement.execute() return the number of rows inserted without having 
> to wait to read all of the rows returned

I believe if RETURNING clause is use, the protocol-level behaviour of
INSERT is expected to match that of SELECT. If the SELECT command
behaves like that (resultset followed by CommandStatus), then I'd say
the INSERT RETURNING is behaving as expected.

Best regards,
Gurjeet
http://Gurje.et




Re: CommandStatus from insert returning when using a portal.

2023-07-12 Thread David G. Johnston
On Wed, Jul 12, 2023 at 1:03 PM Dave Cramer  wrote:

>
> INSERT INTO test_table (cnt) VALUES (1), (2) RETURNING id
>
> if a portal is used to get the results then the CommandStatus
>

IIUC the portal is not optional if you including the RETURNING clause.

There is no CommandStatus message in the protocol, the desired information
is part of the command tag returned in the CommandComplete message.  You
get that at the end of the command, which has been defined as when any
portal produced by the command has been fully executed.

You probably should add your desire to the Version 4 protocol ToDo on the
wiki.

https://wiki.postgresql.org/wiki/Todo#Wire_Protocol_Changes_.2F_v4_Protocol

If that ever becomes an active project working through the details of that
list for desirability and feasibility would be the first thing to happen.

David J.


CommandStatus from insert returning when using a portal.

2023-07-12 Thread Dave Cramer
Greetings,

With a simple insert such as

INSERT INTO test_table (cnt) VALUES (1), (2) RETURNING id

if a portal is used to get the results then the CommandStatus is not
returned on the execute only when the portal is closed. After looking at
this more it is really after all of the data is read which is consistent if
you don't use a portal, however it would be much more useful if we received
the CommandStatus after the insert was completed and before the data

Obviously I am biased by the JDBC API which would like to have

PreparedStatement.execute() return the number of rows inserted
without having to wait to read all of the rows returned


Dave Cramer


Re: 'ERROR: attempted to update invisible tuple' from 'ALTER INDEX ... ATTACH PARTITION' on parent index

2023-07-12 Thread Robert Haas
On Wed, Jul 12, 2023 at 12:28 PM Shruthi Gowda  wrote:
>   I reviewed the function RelationReloadIndexInfo() and observed that the 
> 'indisreplident' field and the SelfItemPointer 't_self' are not refreshed to 
> the pg_index tuple of the index.
>   Attached is the patch that fixes the above issue.

Oh, interesting. The fact that indisreplident isn't copied seems like
a pretty clear mistake, but I'm guessing that the fact that t_self
wasn't refreshed was deliberate and that the author of this code
didn't really intend for callers to look at the t_self value. We could
change our mind about whether that ought to be allowed, though. But,
like, none of the other tuple header fields are copied either... xmax,
xvac, infomask, etc.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: COPY table FROM STDIN via SPI

2023-07-12 Thread Joe Conway

On 7/12/23 14:43, c...@anastigmatix.net wrote:

On 2023-07-12 14:18, Joe Conway wrote:

On 7/11/23 22:52, James Sewell wrote:

What about running a COPY directly from C - is that possible?


https://www.postgresql.org/docs/current/libpq-copy.html


Or is the question about a COPY kicked off from server-side
C code (following up a question about SPI)?

If the idea is to kick off a COPY that reads from the connected
client's STDIN, the wire protocol doesn't really have a way to
work that out with the client, as Tom pointed out.

Or is the goal for some server-side code to quickly populate
a table from some file that's readable on the server and has
the same format that COPY FROM expects?



You can still use this in a server-side extension in the same way that 
dblink works. Perhaps ugly, but I have used it in the past and it worked 
*really* well for us.


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com





Re: COPY table FROM STDIN via SPI

2023-07-12 Thread chap

On 2023-07-12 14:18, Joe Conway wrote:

On 7/11/23 22:52, James Sewell wrote:

What about running a COPY directly from C - is that possible?


https://www.postgresql.org/docs/current/libpq-copy.html


Or is the question about a COPY kicked off from server-side
C code (following up a question about SPI)?

If the idea is to kick off a COPY that reads from the connected
client's STDIN, the wire protocol doesn't really have a way to
work that out with the client, as Tom pointed out.

Or is the goal for some server-side code to quickly populate
a table from some file that's readable on the server and has
the same format that COPY FROM expects?

Regards,
-Chap




Re: COPY table FROM STDIN via SPI

2023-07-12 Thread Joe Conway

On 7/11/23 22:52, James Sewell wrote:


No.  It'd be a wire protocol break, and even if it weren't I would not
expect many clients to be able to deal with it.  They're in the middle
of a query cycle (for the SELECT or CALL that got you into SPI), and
suddenly the backend asks for COPY data?  What are they supposed to
send, or where are they supposed to put it for the COPY-out case?
There's just not provision for nesting protocol operations like that.


What about running a COPY directly from C - is that possible?



https://www.postgresql.org/docs/current/libpq-copy.html


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com





Re: Better help output for pgbench -I init_steps

2023-07-12 Thread Gurjeet Singh
On Wed, Jul 12, 2023 at 3:08 AM Peter Eisentraut  wrote:
>
> On 12.07.23 09:42, Gurjeet Singh wrote:
> > These two variants show the two extremes; bare minimum vs. everything
> > but the kitchen sink. So one may feel the need to find a middle ground
> > and provide a "sufficient, but not too much" variant. I have attempted
> > that in variants 3 and 4; also attached.
>
> If you end up with variant 3 or 4, please use double quotes instead of
> single quotes.

Will  do.

I believe you're suggesting this because in the neighboring help text
the string literals use double quotes. I see that other utilities,
such as psql also use double quotes in help  text.

If there's a convention, documented somewhere in our sources, I'd love
to know and learn other conventions.

Best regards,
Gurjeet
http://Gurje.et




Re: Meson build updates

2023-07-12 Thread Tristan Partin
On Wed Jul 12, 2023 at 11:39 AM CDT, Alvaro Herrera wrote:
> On 2023-Jul-12, Tristan Partin wrote:
>
> > Attached is a patch which does just that without overriding the
> > binaries. I investigated the bin_targets stuff, but some test
> > executables get added there, so it wouldn't work out that well.
>
> This seems useful.  Maybe we should have some documentation changes to
> go with it, because otherwise it seems a bit too obscure.

Do you have a place in mind on where to document it?

> Maybe there are other subdirs where this would be useful.  ecpg maybe?
> (Much less widely used, but if it's this simple, it shouldn't be much of
> a burden)

A previous version of this patch[0] did it to all public facing
binaries. Andres wasn't super interested in that.

[0]: https://www.postgresql.org/message-id/CSPIJVUDZFKX.3KHMOAVGF94RV%40c3po

-- 
Tristan Partin
Neon (https://neon.tech)




Re: Meson build updates

2023-07-12 Thread Alvaro Herrera
On 2023-Jul-12, Tristan Partin wrote:

> Attached is a patch which does just that without overriding the
> binaries. I investigated the bin_targets stuff, but some test
> executables get added there, so it wouldn't work out that well.

This seems useful.  Maybe we should have some documentation changes to
go with it, because otherwise it seems a bit too obscure.

Maybe there are other subdirs where this would be useful.  ecpg maybe?
(Much less widely used, but if it's this simple, it shouldn't be much of
a burden)

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/




Re: 'ERROR: attempted to update invisible tuple' from 'ALTER INDEX ... ATTACH PARTITION' on parent index

2023-07-12 Thread Shruthi Gowda
On Wed, Jul 12, 2023 at 5:46 PM Dilip Kumar  wrote:

> On Wed, Jul 12, 2023 at 1:56 PM Michael Paquier 
> wrote:
> >
> > On Wed, Jul 12, 2023 at 11:38:05AM +0530, Dilip Kumar wrote:
> > > On Wed, Jul 12, 2023 at 11:12 AM Michael Paquier 
> wrote:
> > >>
> > >> On Wed, Jul 12, 2023 at 09:38:41AM +0900, Michael Paquier wrote:
> > >> > While working recently on what has led to cfc43ae and fc55c7f, I
> > >> > really got the feeling that there could be some command sequences
> that
> > >> > lacked some CCIs (or CommandCounterIncrement calls) to make sure
> that
> > >> > the catalog updates are visible in any follow-up steps in the same
> > >> > transaction.
> > >>
> > >> Wait a minute.  The validation of a partitioned index uses a copy of
> > >> the pg_index tuple from the relcache, which be out of date:
> > >>newtup = heap_copytuple(partedIdx->rd_indextuple);
> > >>((Form_pg_index) GETSTRUCT(newtup))->indisvalid = true;
> > >
> > > But why the recache entry is outdated, does that mean that in the
> > > previous command, we missed registering the invalidation for the
> > > recache entry?
> >
> > Yes, something's still a bit off here, even if switching a partitioned
> > index to become valid should use a fresh tuple copy from the syscache.
> >
> > Taking the test case of upthread, from what I can see, the ALTER TABLE
> > .. REPLICA IDENTITY registers two relcache invalidations for pk_foo
> > (via RegisterRelcacheInvalidation), which is the relcache entry whose
> > stuff is messed up.  I would have expected a refresh of the cache of
> > pk_foo to happen when doing the ALTER INDEX .. ATTACH PARTITION, but
> > for some reason it does not happen when running the whole in a
> > transaction block.
>
> I think there is something to do with this code here[1], basically, we
> are in a transaction block so while processing the invalidation we
> have first cleared the entry for the pk_foo but then we have partially
> recreated it using 'RelationReloadIndexInfo', in this function we
> haven't build complete relation descriptor but marked
> 'relation->rd_isvalid' as true and due to that next relation_open in
> (ALTER INDEX .. ATTACH PARTITION) will reuse this half backed entry.
> I am still not sure what is the purpose of just reloading the index
> and marking the entry as valid which is not completely valid.
>
> RelationClearRelation()
> {
> ..
> /*
> * Even non-system indexes should not be blown away if they are open and
> * have valid index support information. This avoids problems with active
> * use of the index support information. As with nailed indexes, we
> * re-read the pg_class row to handle possible physical relocation of the
> * index, and we check for pg_index updates too.
> */
> if ((relation->rd_rel->relkind == RELKIND_INDEX ||
> relation->rd_rel->relkind == RELKIND_PARTITIONED_INDEX) &&
> relation->rd_refcnt > 0 &&
> relation->rd_indexcxt != NULL)
> {
> if (IsTransactionState())
> RelationReloadIndexInfo(relation);
> return;
> }
>
  I reviewed the function RelationReloadIndexInfo() and observed that the
'indisreplident' field and the SelfItemPointer 't_self' are not refreshed
to the pg_index tuple of the index.
  Attached is the patch that fixes the above issue.


v1-0001-Fix-the-relcache-invalidation-issue-for-index-tab.patch
Description: Binary data


Re: Meson build updates

2023-07-12 Thread Tristan Partin
On Thu Jun 29, 2023 at 2:17 PM CDT, Tristan Partin wrote:
> On Thu Jun 29, 2023 at 2:13 PM CDT, Andres Freund wrote:
> > Hi,
> >
> > On 2023-06-29 14:07:19 -0500, Tristan Partin wrote:
> > > I still think the overrides are important, at the very least for libpq,
> > > but I will defer to your aforementioned decision for now.
> >
> > libpq makes sense to me, fwiw. Just doing it for all binaries individually
> > didn't seem as obviously beneficial.
> >
> > FWIW, it seems it could be handled somewhat centrally for binaries, the
> > bin_targets array should have all that's needed?
>
> I will send a follow-up patch with at least libpq overridden. I will
> investigate the bin_targets.

Attached is a patch which does just that without overriding the
binaries. I investigated the bin_targets stuff, but some test
executables get added there, so it wouldn't work out that well.

-- 
Tristan Partin
Neon (https://neon.tech)
From b52a13d1575e4965e8c1794e7a62f24ecfeeb1cf Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Wed, 17 May 2023 10:36:52 -0500
Subject: [PATCH v6] Add Meson override for libpq

Meson has the ability to do transparent overrides when projects are used
as subprojects. For instance, say I am building a Postgres extension. I
can define Postgres to be a subproject of my extension given the
following wrap file:

[wrap-git]
url = https://git.postgresql.org/git/postgresql.git
revision = master
depth = 1

[provide]
dependency_names = libpq

Then in my extension (root project), I can have the following line
snippet:

libpq = dependency('libpq')

This will tell Meson to transparently compile libpq prior to it
compiling my extension (because I depend on libpq) if libpq isn't found
on the host system.
---
 src/interfaces/libpq/meson.build | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/interfaces/libpq/meson.build b/src/interfaces/libpq/meson.build
index 80e6a15adf..6d18970e81 100644
--- a/src/interfaces/libpq/meson.build
+++ b/src/interfaces/libpq/meson.build
@@ -84,6 +84,8 @@ libpq = declare_dependency(
   include_directories: [include_directories('.')]
 )
 
+meson.override_dependency('libpq', libpq)
+
 pkgconfig.generate(
   name: 'libpq',
   description: 'PostgreSQL libpq library',
-- 
Tristan Partin
Neon (https://neon.tech)



Re: numeric datatype for current release not available

2023-07-12 Thread Julien Rouhaud
On Wed, 12 Jul 2023, 23:15 Ashutosh Bapat, 
wrote:

> Hi All,
> https://www.postgresql.org/docs/current/datatype-numeric.html gives me
> "bad gateway" error. Attached screen shot. Date/Time datatype
> documentation is accessible at
> https://www.postgresql.org/docs/current/datatype-datetime.html.
>
> Just got this while wrapping up for the day. Didn't look at what's going
> wrong.
>

it's working here. probably a transient error, it happens from time to
time.

>


numeric datatype for current release not available

2023-07-12 Thread Ashutosh Bapat
Hi All,
https://www.postgresql.org/docs/current/datatype-numeric.html gives me
"bad gateway" error. Attached screen shot. Date/Time datatype
documentation is accessible at
https://www.postgresql.org/docs/current/datatype-datetime.html.

Just got this while wrapping up for the day. Didn't look at what's going wrong.

-- 
Best Wishes,
Ashutosh Bapat


Re: [Patch] Use *other* indexes on the subscriber when REPLICA IDENTITY is FULL

2023-07-12 Thread Önder Kalacı
Hi Hayato, all


> > - return is_btree && !is_partial && !is_only_on_expression;
> > + /* Check whether the index is supported or not */
> > + is_suitable_type = ((get_equal_strategy_number_for_am(indexInfo->ii_Am)
> > + != InvalidStrategy));
> > +
> > + is_partial = (indexInfo->ii_Predicate != NIL);
> > + is_only_on_expression = IsIndexOnlyOnExpression(indexInfo);
> > +
> > + return is_suitable_type && !is_partial && !is_only_on_expression;
> >
>

I don't want to repeat this too much, as it is a minor note. Just
sharing my perspective here.

As discussed in the other email [1], I feel like keeping
IsIndexUsableForReplicaIdentityFull()  function readable is useful
for documentation purposes as well.

So, I'm more inclined to see something like your old code, maybe with
a different variable name.

bool is_btree = (indexInfo->ii_Am == BTREE_AM_OID);


to

> bool has_equal_strategy = get_equal_strategy_number_for_am...
> 
>  return  has_equal_strategy   && !is_partial && !is_only_on_expression;



> 4a.
> > The code can be written more optimally, because if it is deemed
> > already that 'is_suitable_type' is false, then there is no point to
> > continue to do the other checks and function calls.


Sure, there are maybe few cycles of CPU gains, but this code is executed
infrequently, and I don't see much value optimizing it. I think keeping it
slightly
more readable is nicer.

Other than that, I think the code/test looks good. For the
comments/documentation,
I think Amit and Peter have already given quite a bit of useful feedback,
so nothing
much to add from my end.

Thanks,
Onder

[1]:
https://www.postgresql.org/message-id/CACawEhUWH1qAZ8QNeCve737Qe1_ye%3DvTW9P22ePiFssT7%2BHaaQ%40mail.gmail.com


Hayato Kuroda (Fujitsu) , 12 Tem 2023 Çar, 10:07
tarihinde şunu yazdı:

> Dear Amit,
>
> Thanks for checking my patch! The patch can be available at [1].
>
> > > > ==
> > > > .../subscription/t/032_subscribe_use_index.pl
> > > >
> > > > 11.
> > > > AFAIK there is no test to verify that the leftmost index field must
> be
> > > > a column (e.g. cannot be an expression). Yes, there are some tests
> for
> > > > *only* expressions like (expr, expr, expr), but IMO there should be
> > > > another test for an index like (expr, expr, column) which should also
> > > > fail because the column is not the leftmost field.
> > >
> > > I'm not sure they should be fixed in the patch. You have raised these
> points
> > > at the Sawada-san's thread[1] and not sure he has done.
> > > Furthermore, these points are not related with our initial motivation.
> > > Fork, or at least it should be done by another patch.
> > >
> >
> > I think it is reasonable to discuss the existing code-related
> > improvements in a separate thread rather than trying to change this
> > patch.
>
> OK, so I will not touch in this thread.
>
> > I have some other comments about your patch:
> >
> > 1.
> > + * 1) Other indexes do not have a fixed set of strategy numbers.
> > + * In build_replindex_scan_key(), the operator which corresponds to
> "equality
> > + * comparison" is searched by using the strategy number, but it is
> difficult
> > + * for such indexes. For example, GiST index operator classes for
> > + * two-dimensional geometric objects have a comparison "same", but its
> > strategy
> > + * number is different from the gist_int4_ops, which is a GiST index
> operator
> > + * class that implements the B-tree equivalent.
> > + *
> > ...
> > + */
> > +int
> > +get_equal_strategy_number_for_am(Oid am)
> > ...
> >
> > I think this comment is slightly difficult to understand. Can we make
> > it a bit generic by saying something like: "The index access methods
> > other than BTREE and HASH don't have a fixed strategy for equality
> > operation. Note that in other index access methods, the support
> > routines of each operator class interpret the strategy numbers
> > according to the operator class's definition. So, we return
> > InvalidStrategy number for index access methods other than BTREE and
> > HASH."
>
> Sounds better. Fixed with some adjustments.
>
> > 2.
> > + * 2) Moreover, BRIN and GIN indexes do not implement "amgettuple".
> > + * RelationFindReplTupleByIndex() assumes that tuples will be fetched
> one by
> > + * one via index_getnext_slot(), but this currently requires the
> "amgetuple"
> > + * function. To make it use index_getbitmap() must be used, which
> fetches all
> > + * the tuples at once.
> > + */
> > +int
> > +get_equal_strategy_number_for_am(Oid am)
> > {
> > ..
> >
> > I don't think this is a good place for such a comment. We can probably
> > move this atop IsIndexUsableForReplicaIdentityFull(). I think you need
> > to mention two reasons in IsIndexUsableForReplicaIdentityFull() why we
> > support only BTREE and HASH index access methods (a) Refer to comments
> > atop get_equal_strategy_number_for_am(); (b) mention the reason
> > related to tuples_equal() as discussed in email [1]. Then you can say
> > that we also need to ensure 

Re: RelationGetIndexAttrBitmap comment outdated

2023-07-12 Thread Daniel Gustafsson
> On 12 Jul 2023, at 16:37, Alvaro Herrera  wrote:
> 
> I realized that commit 19d8e2308bc5 (and 5753d4ee320b before that) added
> a new output type to RelationGetIndexAttrBitmap but forgot to list its
> effect in the function's documenting comment.  Here's a patch that
> updates it, making it more specific (and IMO more readable).  I also add
> a comment to the enum definition, to remind people that the other one
> needs to be modified.

LGTM, a clear improvement.

> This ought to be backpatched to 16.

+1

--
Daniel Gustafsson





RelationGetIndexAttrBitmap comment outdated

2023-07-12 Thread Alvaro Herrera
I realized that commit 19d8e2308bc5 (and 5753d4ee320b before that) added
a new output type to RelationGetIndexAttrBitmap but forgot to list its
effect in the function's documenting comment.  Here's a patch that
updates it, making it more specific (and IMO more readable).  I also add
a comment to the enum definition, to remind people that the other one
needs to be modified.

This ought to be backpatched to 16.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"La conclusión que podemos sacar de esos estudios es que
no podemos sacar ninguna conclusión de ellos" (Tanenbaum)
>From db07c83e4baa04146a4c8e5ebc663c7eb5cfaa10 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 12 Jul 2023 14:19:23 +0200
Subject: [PATCH 1/1] document RelationGetIndexAttrBitmap

---
 src/backend/utils/cache/relcache.c | 12 +---
 src/include/utils/relcache.h   |  4 
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 8a08463c2b7..2f3b580cb0d 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -5149,9 +5149,15 @@ RelationGetIndexPredicate(Relation relation)
  * simple index keys, but attributes used in expressions and partial-index
  * predicates.)
  *
- * Depending on attrKind, a bitmap covering the attnums for all index columns,
- * for all potential foreign key columns, or for all columns in the configured
- * replica identity index is returned.
+ * Depending on attrKind, a bitmap covering attnums for certain columns is
+ * returned:
+ *	INDEX_ATTR_BITMAP_KEY			Columns in non-partial unique indexes not
+ *	in expressions (i.e., usable for FKs)
+ *	INDEX_ATTR_BITMAP_PRIMARY_KEY	Columns in the table's primary key
+ *	INDEX_ATTR_BITMAP_IDENTITY_KEY	Columns in the table's replica identity
+ *	index (empty if FULL)
+ *	INDEX_ATTR_BITMAP_HOT_BLOCKING	Columns that block updates from being HOT
+ *	INDEX_ATTR_BITMAP_SUMMARIZED	Columns included in summarizing indexes
  *
  * Attribute numbers are offset by FirstLowInvalidHeapAttributeNumber so that
  * we can include system attributes (e.g., OID) in the bitmap representation.
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index beeb28b83cb..1637ff7152b 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -54,6 +54,10 @@ extern List *RelationGetIndexPredicate(Relation relation);
 extern Datum *RelationGetIndexRawAttOptions(Relation indexrel);
 extern bytea **RelationGetIndexAttOptions(Relation relation, bool copy);
 
+/*
+ * Possible set types returned by RelationGetIndexAttrBitmap.  Update
+ * the comment atop that function if you modify this enum.
+ */
 typedef enum IndexAttrBitmapKind
 {
 	INDEX_ATTR_BITMAP_KEY,
-- 
2.39.2



Re: Clean up some signal usage mainly related to Windows

2023-07-12 Thread Tristan Partin
On Wed Jul 12, 2023 at 9:31 AM CDT, Peter Eisentraut wrote:
> On 12.07.23 16:23, Tristan Partin wrote:
> > It has come to my attention that STDOUT_FILENO might not be portable and
> > fileno(3) isn't marked as signal-safe, so I have just used the raw 1 for
> > stdout, which as far as I know is portable.
>
> We do use STDOUT_FILENO elsewhere in the code, and there are even 
> workaround definitions for Windows, so it appears it is meant to be used.

v3 is back to the original patch with newline being printed. Thanks.

-- 
Tristan Partin
Neon (https://neon.tech)
From 11b9da218a28b606a1b70d35c9cfefec91e09206 Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Thu, 6 Jul 2023 15:17:36 -0500
Subject: [PATCH v3 2/2] Cleanup some signal usage on Windows

SIGTERM can be handled the same as on a UNIX-like system. SIGINT, on the
other hand, requires special considerations. The Windows documentation
says the following:

> SIGINT is not supported for any Win32 application. When a CTRL+C
> interrupt occurs, Win32 operating systems generate a new thread to
> specifically handle that interrupt. This can cause a single-thread
> application, such as one in UNIX, to become multithreaded and cause
> unexpected behavior.

Therefore, if all the SIGINT handler does is read from memory and/or
exit, we can use the same handler. If the signal handler writes to
memory, such as setting a boolean, the original thread would never
become aware of the changed state.
---
 src/bin/initdb/initdb.c| 17 -
 src/bin/pg_basebackup/pg_receivewal.c  |  5 +
 src/bin/pg_basebackup/pg_recvlogical.c |  9 -
 src/bin/pg_test_fsync/pg_test_fsync.c  |  3 ---
 4 files changed, 9 insertions(+), 25 deletions(-)

diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 3f4167682a..cac58eae2a 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2773,26 +2773,17 @@ void
 setup_signals(void)
 {
 	/* some of these are not valid on Windows */
-#ifdef SIGHUP
-	pqsignal(SIGHUP, trapsig);
-#endif
-#ifdef SIGINT
+	pqsignal(SIGTERM, trapsig);
+
+#ifndef WIN32
 	pqsignal(SIGINT, trapsig);
-#endif
-#ifdef SIGQUIT
+	pqsignal(SIGHUP, trapsig);
 	pqsignal(SIGQUIT, trapsig);
-#endif
-#ifdef SIGTERM
-	pqsignal(SIGTERM, trapsig);
-#endif
 
 	/* Ignore SIGPIPE when writing to backend, so we can clean up */
-#ifdef SIGPIPE
 	pqsignal(SIGPIPE, SIG_IGN);
-#endif
 
 	/* Prevent SIGSYS so we can probe for kernel calls that might not work */
-#ifdef SIGSYS
 	pqsignal(SIGSYS, SIG_IGN);
 #endif
 }
diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index d0a4079d50..55143c4037 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -611,14 +611,11 @@ StreamLog(void)
  * When SIGINT/SIGTERM are caught, just tell the system to exit at the next
  * possible moment.
  */
-#ifndef WIN32
-
 static void
 sigexit_handler(SIGNAL_ARGS)
 {
 	time_to_stop = true;
 }
-#endif
 
 int
 main(int argc, char **argv)
@@ -838,9 +835,9 @@ main(int argc, char **argv)
 	 * Trap signals.  (Don't do this until after the initial password prompt,
 	 * if one is needed, in GetConnection.)
 	 */
+	pqsignal(SIGTERM, sigexit_handler);
 #ifndef WIN32
 	pqsignal(SIGINT, sigexit_handler);
-	pqsignal(SIGTERM, sigexit_handler);
 #endif
 
 	/*
diff --git a/src/bin/pg_basebackup/pg_recvlogical.c b/src/bin/pg_basebackup/pg_recvlogical.c
index f3c7937a1d..1228dc03db 100644
--- a/src/bin/pg_basebackup/pg_recvlogical.c
+++ b/src/bin/pg_basebackup/pg_recvlogical.c
@@ -644,10 +644,6 @@ error:
 	conn = NULL;
 }
 
-/*
- * Unfortunately we can't do sensible signal handling on windows...
- */
-#ifndef WIN32
 
 /*
  * When SIGINT/SIGTERM are caught, just tell the system to exit at the next
@@ -659,6 +655,9 @@ sigexit_handler(SIGNAL_ARGS)
 	time_to_abort = true;
 }
 
+/* SIGHUP is not defined on Windows */
+#ifndef WIN32
+
 /*
  * Trigger the output file to be reopened.
  */
@@ -921,9 +920,9 @@ main(int argc, char **argv)
 	 * Trap signals.  (Don't do this until after the initial password prompt,
 	 * if one is needed, in GetConnection.)
 	 */
+	pqsignal(SIGTERM, sigexit_handler);
 #ifndef WIN32
 	pqsignal(SIGINT, sigexit_handler);
-	pqsignal(SIGTERM, sigexit_handler);
 	pqsignal(SIGHUP, sighup_handler);
 #endif
 
diff --git a/src/bin/pg_test_fsync/pg_test_fsync.c b/src/bin/pg_test_fsync/pg_test_fsync.c
index d6d0548e89..61c4db76bf 100644
--- a/src/bin/pg_test_fsync/pg_test_fsync.c
+++ b/src/bin/pg_test_fsync/pg_test_fsync.c
@@ -109,9 +109,6 @@ main(int argc, char *argv[])
 	pqsignal(SIGTERM, signal_cleanup);
 #ifndef WIN32
 	pqsignal(SIGALRM, process_alarm);
-#endif
-#ifdef SIGHUP
-	/* Not defined on win32 */
 	pqsignal(SIGHUP, signal_cleanup);
 #endif
 
-- 
Tristan Partin
Neon (https://neon.tech)

From f442c9bbe4bdf228a0ac89919e312e3bb663290c Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Thu, 6 Jul 2023 15:25:14 -0500
Subject: [PATCH v3 1/2] Use signal-safe functions in 

Re: Clean up some signal usage mainly related to Windows

2023-07-12 Thread Peter Eisentraut

On 12.07.23 16:23, Tristan Partin wrote:

It has come to my attention that STDOUT_FILENO might not be portable and
fileno(3) isn't marked as signal-safe, so I have just used the raw 1 for
stdout, which as far as I know is portable.


We do use STDOUT_FILENO elsewhere in the code, and there are even 
workaround definitions for Windows, so it appears it is meant to be used.






Re: Use COPY for populating all pgbench tables

2023-07-12 Thread Tristan Partin
On Wed Jul 12, 2023 at 1:06 AM CDT, Michael Paquier wrote:
> On Tue, Jul 11, 2023 at 09:46:43AM -0500, Tristan Partin wrote:
> > On Tue Jul 11, 2023 at 12:03 AM CDT, Michael Paquier wrote:
> >> This seems a bit incorrect because partitioning only applies to
> >> pgbench_accounts, no?  This change means that the teller and branch
> >> tables would not benefit from FREEZE under a pgbench compiled with
> >> this patch and a backend version of 14 or newer if --partitions is
> >> used.  So, perhaps "partitions" should be an argument of
> >> initPopulateTable() specified for each table? 
> > 
> > Whoops, looks like I didn't read the comment for what the partitions
> > variable means. It only applies to pgbench_accounts as you said. I don't
> > think passing it in as an argument would make much sense. Let me know
> > what you think about the change in this new version, which only hits the
> > first portion of the `if` statement if the table is pgbench_accounts.
>
> -   /* use COPY with FREEZE on v14 and later without partitioning */
> -   if (partitions == 0 && PQserverVersion(con) >= 14)
> -   copy_statement = "copy pgbench_accounts from stdin with (freeze on)";
> +   if (partitions == 0 && strcmp(table, "pgbench_accounts") == 0 && 
> PQserverVersion(con) >= 14)
> +   copy_statement_fmt = "copy %s from stdin with (freeze on)";
>
> This would use the freeze option only on pgbench_accounts when no
> partitioning is defined, but my point was a bit different.  We could
> use the FREEZE option on the teller and branch tables as well, no?
> Okay, the impact is limited compared to accounts in terms of amount of
> data loaded, but perhaps some people like playing with large scaling
> factors where this could show a benefit in the initial data loading.

Perhaps, should they all be keyed off the same option? Seemed like in
your previous comment you wanted multiple options. Sorry for not reading
your comment correctly.

> In passing, I have noticed the following sentence in pgbench.sgml:
>Using g causes logging to print one message
>every 100,000 rows while generating data for the   
>pgbench_accounts table.
> It would become incorrect as the same code path would be used for the
> teller and branch tables.

I will amend the documentation to mention all tables rather than being
specific to pgbench_accounts in the next patch revision pending what you
want to do about the partition CLI argument.

-- 
Tristan Partin
Neon (https://neon.tech)




Re: tablecmds.c/MergeAttributes() cleanup

2023-07-12 Thread Peter Eisentraut

On 11.07.23 20:17, Alvaro Herrera wrote:

I spent a few minutes doing a test merge of this to my branch with NOT
NULL changes.  Here's a quick review.


Subject: [PATCH 01/17] Remove obsolete comment about OID support


Obvious, trivial.  +1


Subject: [PATCH 02/17] Remove ancient special case code for adding oid columns


LGTM; deletes dead code.


Subject: [PATCH 03/17] Remove ancient special case code for dropping oid
  columns


Hmm, interesting.  Yay for more dead code removal.  Didn't verify it.


I have committed these first three.  I'll leave it at that for now.


Subject: [PATCH 08/17] Improve some catalog documentation

Point out that typstorage and attstorage are never '\0', even for
fixed-length types.  This is different from attcompression.  For this
reason, some of the handling of these columns in tablecmds.c etc. is
different.  (catalogs.sgml already contained this information in an
indirect way.)


I don't understand why we must point out that they're never '\0'.  I
mean, if we're doing that, why not say that they can never be \xFF?
The valid values are already listed.  The other parts of this patch look
okay.


While working through the storage and compression handling, which look 
similar, I was momentarily puzzled by this.  While attcompression can be 
0 to mean, use default, this is not possible/allowed for attstorage, but 
it took looking around three corners to verify this.  It could be more 
explicit, I thought.






Re: Clean up some signal usage mainly related to Windows

2023-07-12 Thread Tristan Partin
On Wed Jul 12, 2023 at 3:56 AM CDT, Peter Eisentraut wrote:
> On 06.07.23 22:43, Tristan Partin wrote:
> > /* Finish incomplete line on stdout */
> > -   puts("");
> > -   exit(1);
> > +   write(STDOUT_FILENO, "", 1);
> > +   _exit(1);
>
> puts() writes a newline, so it should probably be something like
>
>  write(STDOUT_FILENO, "\n", 1);

Silly mistake. Thanks. v2 attached.

It has come to my attention that STDOUT_FILENO might not be portable and
fileno(3) isn't marked as signal-safe, so I have just used the raw 1 for
stdout, which as far as I know is portable.

-- 
Tristan Partin
Neon (https://neon.tech)
From 2ec5f78e8c8b54ea8953a943d672d02b0d6f448d Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Thu, 6 Jul 2023 15:17:36 -0500
Subject: [PATCH v2 2/2] Cleanup some signal usage on Windows

SIGTERM can be handled the same as on a UNIX-like system. SIGINT, on the
other hand, requires special considerations. The Windows documentation
says the following:

> SIGINT is not supported for any Win32 application. When a CTRL+C
> interrupt occurs, Win32 operating systems generate a new thread to
> specifically handle that interrupt. This can cause a single-thread
> application, such as one in UNIX, to become multithreaded and cause
> unexpected behavior.

Therefore, if all the SIGINT handler does is read from memory and/or
exit, we can use the same handler. If the signal handler writes to
memory, such as setting a boolean, the original thread would never
become aware of the changed state.
---
 src/bin/initdb/initdb.c| 17 -
 src/bin/pg_basebackup/pg_receivewal.c  |  5 +
 src/bin/pg_basebackup/pg_recvlogical.c |  9 -
 src/bin/pg_test_fsync/pg_test_fsync.c  |  3 ---
 4 files changed, 9 insertions(+), 25 deletions(-)

diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 3f4167682a..cac58eae2a 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2773,26 +2773,17 @@ void
 setup_signals(void)
 {
 	/* some of these are not valid on Windows */
-#ifdef SIGHUP
-	pqsignal(SIGHUP, trapsig);
-#endif
-#ifdef SIGINT
+	pqsignal(SIGTERM, trapsig);
+
+#ifndef WIN32
 	pqsignal(SIGINT, trapsig);
-#endif
-#ifdef SIGQUIT
+	pqsignal(SIGHUP, trapsig);
 	pqsignal(SIGQUIT, trapsig);
-#endif
-#ifdef SIGTERM
-	pqsignal(SIGTERM, trapsig);
-#endif
 
 	/* Ignore SIGPIPE when writing to backend, so we can clean up */
-#ifdef SIGPIPE
 	pqsignal(SIGPIPE, SIG_IGN);
-#endif
 
 	/* Prevent SIGSYS so we can probe for kernel calls that might not work */
-#ifdef SIGSYS
 	pqsignal(SIGSYS, SIG_IGN);
 #endif
 }
diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index d0a4079d50..55143c4037 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -611,14 +611,11 @@ StreamLog(void)
  * When SIGINT/SIGTERM are caught, just tell the system to exit at the next
  * possible moment.
  */
-#ifndef WIN32
-
 static void
 sigexit_handler(SIGNAL_ARGS)
 {
 	time_to_stop = true;
 }
-#endif
 
 int
 main(int argc, char **argv)
@@ -838,9 +835,9 @@ main(int argc, char **argv)
 	 * Trap signals.  (Don't do this until after the initial password prompt,
 	 * if one is needed, in GetConnection.)
 	 */
+	pqsignal(SIGTERM, sigexit_handler);
 #ifndef WIN32
 	pqsignal(SIGINT, sigexit_handler);
-	pqsignal(SIGTERM, sigexit_handler);
 #endif
 
 	/*
diff --git a/src/bin/pg_basebackup/pg_recvlogical.c b/src/bin/pg_basebackup/pg_recvlogical.c
index f3c7937a1d..1228dc03db 100644
--- a/src/bin/pg_basebackup/pg_recvlogical.c
+++ b/src/bin/pg_basebackup/pg_recvlogical.c
@@ -644,10 +644,6 @@ error:
 	conn = NULL;
 }
 
-/*
- * Unfortunately we can't do sensible signal handling on windows...
- */
-#ifndef WIN32
 
 /*
  * When SIGINT/SIGTERM are caught, just tell the system to exit at the next
@@ -659,6 +655,9 @@ sigexit_handler(SIGNAL_ARGS)
 	time_to_abort = true;
 }
 
+/* SIGHUP is not defined on Windows */
+#ifndef WIN32
+
 /*
  * Trigger the output file to be reopened.
  */
@@ -921,9 +920,9 @@ main(int argc, char **argv)
 	 * Trap signals.  (Don't do this until after the initial password prompt,
 	 * if one is needed, in GetConnection.)
 	 */
+	pqsignal(SIGTERM, sigexit_handler);
 #ifndef WIN32
 	pqsignal(SIGINT, sigexit_handler);
-	pqsignal(SIGTERM, sigexit_handler);
 	pqsignal(SIGHUP, sighup_handler);
 #endif
 
diff --git a/src/bin/pg_test_fsync/pg_test_fsync.c b/src/bin/pg_test_fsync/pg_test_fsync.c
index 42d0845b06..a3787d6d39 100644
--- a/src/bin/pg_test_fsync/pg_test_fsync.c
+++ b/src/bin/pg_test_fsync/pg_test_fsync.c
@@ -109,9 +109,6 @@ main(int argc, char *argv[])
 	pqsignal(SIGTERM, signal_cleanup);
 #ifndef WIN32
 	pqsignal(SIGALRM, process_alarm);
-#endif
-#ifdef SIGHUP
-	/* Not defined on win32 */
 	pqsignal(SIGHUP, signal_cleanup);
 #endif
 
-- 
Tristan Partin
Neon (https://neon.tech)

From 8c912ed353240ecfa49ee3a9f1f7c65141e9d0d6 Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Thu, 6 Jul 2023 15:25:14 -0500

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

2023-07-12 Thread Masahiko Sawada
On Wed, Jul 12, 2023 at 7:08 PM Amit Kapila  wrote:
>
> On Wed, Jul 12, 2023 at 12:31 PM Masahiko Sawada  
> wrote:
> >
> > On Tue, Jul 11, 2023 at 5:31 PM Peter Smith  wrote:
> > >
> >
> > I don't think we have concluded any action for it. I agree that
> > IsIndexOnlyOnExpression() is redundant. We don't need to check *all*
> > index fields actually. I've attached a draft patch. It removes
> > IsIndexOnlyOnExpression() and merges
> > RemoteRelContainsLeftMostColumnOnIdx() to
> > FindUsableIndexForReplicaIdentityFull. One concern is that we no
> > longer do the assertion check with
> > IsIndexUsableForReplicaIdentityFull(). What do you think?
> >
>
> I think this is a valid concern. Can't we move all the checks
> (including the remote attrs check) inside
> IsIndexUsableForReplicaIdentityFull() and then call it from both
> places? Won't we have attrmap information available in the callers of
> FindReplTupleInLocalRel() via ApplyExecutionData?

You mean to pass ApplyExecutionData or attrmap down to
RelationFindReplTupleByIndex()? I think it would be better to call it
from FindReplTupleInLocalRel() instead.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

2023-07-12 Thread Önder Kalacı
Hi Amit, all

Amit Kapila , 12 Tem 2023 Çar, 13:09 tarihinde
şunu yazdı:

> On Wed, Jul 12, 2023 at 12:31 PM Masahiko Sawada 
> wrote:
> >
> > On Tue, Jul 11, 2023 at 5:31 PM Peter Smith 
> wrote:
> > >
> >
> > I don't think we have concluded any action for it. I agree that
> > IsIndexOnlyOnExpression() is redundant. We don't need to check *all*
> > index fields actually. I've attached a draft patch. It removes
> > IsIndexOnlyOnExpression() and merges
> > RemoteRelContainsLeftMostColumnOnIdx() to
> > FindUsableIndexForReplicaIdentityFull. One concern is that we no
> > longer do the assertion check with
> > IsIndexUsableForReplicaIdentityFull(). What do you think?
> >
>
> I think this is a valid concern. Can't we move all the checks
> (including the remote attrs check) inside
> IsIndexUsableForReplicaIdentityFull() and then call it from both
> places? Won't we have attrmap information available in the callers of
> FindReplTupleInLocalRel() via ApplyExecutionData?
>
>
>
I think such an approach is slightly better than the proposed changes on
remove_redundant_check.patch

I think one reason we ended up with IsIndexUsableForReplicaIdentityFull()
is that it
is a nice way for documenting the requirements in the code.

However, as you also alluded to in the
thread, RemoteRelContainsLeftMostColumnOnIdx()
breaks this documentation.

I agree that it is nice to have all the logic to be in the same place. I
think remove_redundant_check.patch
does that by inlining IsIndexUsableForReplicaIdentityFull
and RemoteRelContainsLeftMostColumnOnIdx
into FindUsableIndexForReplicaIdentityFull().

As Amit noted, the other way around might be more interesting. We expand
IsIndexUsableForReplicaIdentityFull() such that it also includes
RemoteRelContainsLeftMostColumnOnIdx(). With that, readers of
IsIndexUsableForReplicaIdentityFull() can follow the requirements slightly
easier.

Though, not sure yet if we can get all the necessary information for the
Assert
via ApplyExecutionData in FindReplTupleInLocalRel. Perhaps yes.

Thanks,
Onder


Re: The same 2PC data maybe recovered twice

2023-07-12 Thread Andy Fan
Hi:



On Sat, Jul 8, 2023 at 2:53 AM 蔡梦娟(玊于)  wrote:

> Hi, all
> I add a patch for pg11 to fix this bug, hope you can check it.
>
>
Thanks for the bug report and patch!  Usually we talk about bugs
against the master branch,  no people want to check out a history
branch and do the discussion there:)  This  bug is reproducible on
the master IIUC.

I dislike the patch here because it uses more CPU cycles to detect
duplication for every 2pc record.  How many CPU cycles we use
depends on how many 2pc are used.  How about detecting such
duplication only at restoreTwoPhaseData stage?  Instead of

ProcessTwoPhaseBuffer:
if (TransactionIdFollowsOrEquals(xid, origNextXid))
{
  ...
ereport(WARNING,
(errmsg("removing future two-phase state file for transaction %u",
xid)));
RemoveTwoPhaseFile(xid, true);
  ...
}

we use:

if (TwoPhaseFileHeader.startup_lsn > checkpoint.redo)
{
ereport(WARNING,
(errmsg("removing future two-phase state file for transaction %u",
xid)));
}

We have several advantages with this approach. a).  We only care
about the restoreTwoPhaseData, not for every WAL record recovery.
b).  We use constant comparison rather than an-array-for-loop. c).
It is better design since we avoid the issue at the first place rather
than allowing it at the first stage and fix that at the following stage.

The only blocker I know is currently we don't write startup_lsn into
the  2pc checkpoint file and if we do that, the decode on the old 2pc
file will fail.  We also have several choices here.

a). Notify users to complete all the pending 2pc before upgrading
within manual.  b). Use a different MAGIC NUMBER in the 2pc
checkpoint file to distinguish the 2 versions.  Basically I prefer
the method a).

Any suggestion is welcome.


>
> --
> 发件人:蔡梦娟(玊于) 
> 发送时间:2023年7月6日(星期四) 10:02
> 收件人:pgsql-hackers 
> 抄 送:pgsql-bugs 
> 主 题:The same 2PC data maybe recovered twice
>
> Hi, all. I want to report a bug about recovery of 2pc data, in current
> implementation of crash recovery, there are two ways to recover 2pc data:
> 1、before redo, func restoreTwoPhaseData() will restore 2pc data those xid
> < ShmemVariableCache->nextXid, which is initialized from
> checkPoint.nextXid;
> 2、during redo, func xact_redo() will add 2pc from wal;
>
> The following scenario may cause the same 2pc to be added repeatedly:
> 1、start creating checkpoint_1, checkpoint_1.redo is set as curInsert;
> 2、before set checkPoint_1.nextXid, a new 2pc is prepared, suppose the xid
> of this 2pc is 100, and then ShmemVariableCache->nextXid will be advanced
> as 101;
> 3、checkPoint_1.nextXid is set as 101;
> 4、in CheckPointTwoPhase() of checkpoint_1, 2pc_100 won't be copied to
> disk because its prepare_end_lsn > checkpoint_1.redo;
> 5、checkPoint_1 is finished, after checkpoint_timeout, start creating
> checkpoint_2;
> 6、during checkpoint_2, data of 2pc_100 will be copied to disk;
> 7、before UpdateControlFile() of checkpoint_2, crash happened;
> 8、during crash recovery, redo will start from checkpoint_1, and 2pc_100
> will be restored first by restoreTwoPhaseData() because xid_100 < 
> checkPoint_1.nextXid,
> which is 101;
> 9、because prepare_start_lsn of 2pc_100 > checkpoint_1.redo, 2pc_100 will
> be added again by xact_redo() during wal replay, resulting in the same
> 2pc data being added twice;
> 10、In RecoverPreparedTransactions() -> lock_twophase_recover(), lock the
> same 2pc will cause panic.
>
> Is the above scenario reasonable, and do you have any good ideas for
> fixing this bug?
>
> Thanks & Best Regard
>
>

-- 
Best Regards
Andy Fan


Re: 'ERROR: attempted to update invisible tuple' from 'ALTER INDEX ... ATTACH PARTITION' on parent index

2023-07-12 Thread Robert Haas
On Wed, Jul 12, 2023 at 4:26 AM Michael Paquier  wrote:
> Taking the test case of upthread, from what I can see, the ALTER TABLE
> .. REPLICA IDENTITY registers two relcache invalidations for pk_foo
> (via RegisterRelcacheInvalidation), which is the relcache entry whose
> stuff is messed up.  I would have expected a refresh of the cache of
> pk_foo to happen when doing the ALTER INDEX .. ATTACH PARTITION, but
> for some reason it does not happen when running the whole in a
> transaction block.  I cannot put my finger on what's wrong for the
> moment, but based on my current impressions the inval requests are
> correctly registered when switching the replica identity, but nothing
> about pk_foo is updated when attaching a partition to it in the last
> step where the invisible update happens :/

I'm not sure exactly what is happening here, but it looks to me like
ATExecReplicaIdentity() only takes ShareLock on the index and
nevertheless feels entitled to update the pg_index tuple, which is
pretty strange. We normally require AccessExclusiveLock to perform DDL
on an object, and in the limited exceptions that we have to that rule
- see AlterTableGetLockLevel - it's pretty much always a
self-exclusive lock. Otherwise, two backends might try to do the same
DDL operation at the same time, which would lead to low-level failures
trying to update the same tuple such as the one seen here.

But even if that doesn't happen or is prevented by some other
mechanism, there's still a synchronization problem. Suppose backend B1
modifies some state via a DDL operation on table T and then afterward
backend B2 wants to perform a non-DDL operation that depends on that
state. Well, B1 takes some lock on the relation, and B2 takes a lock
that would conflict with it, and that guarantees that B2 starts after
B1 commits. That means that B2 is guaranteed to see the invalidations
that were queued by B1, which means it will flush any state out of its
cache that was made stale by the operation performed by B1. If the
locks didn't conflict, B2 might start before B1 committed and either
fail to update its caches or update them but with a version of the
tuples that's about to be made obsolete when B1 commits. So ShareLock
doesn't feel like a very safe choice here.

But I'm not quite sure exactly what's going wrong, either. Every
update is going to call CacheInvalidateHeapTuple(), and updating
either an index's pg_class tuple or its pg_index tuple should queue up
a relcache invalidation, and CommandEndInvalidationMessages() should
cause that to be processed. If this were multiple transactions, the
only thing that would be different is that the invalidation messages
would be in the shared queue, so maybe there's something going on with
the timing of CommandEndInvalidationMessages() vs.
AcceptInvalidationMessages() that accounts for the problem occurring
in one case but not the other. But I do wonder whether the underlying
problem is that what ATExecReplicaIdentity() is doing is not really
safe.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Better help output for pgbench -I init_steps

2023-07-12 Thread Alvaro Herrera
On 2023-Jul-12, Gurjeet Singh wrote:

> The init-steps are severely under-documented in pgbench --help output.
> I think at least a pointer to the the pgbench docs should be mentioned
> in the pgbench --help output; an average user may not rush to read the
> code to find the explanation, but a hint to where to find more details
> about what the letters in --init-steps mean, would save them a lot of
> time.

Agreed.

I would do it the way `pg_waldump --rmgr=list` or `psql
--help=variables` are handled: they give you a list of what is
supported.  You don't have to put the list in the main --help output.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"El sentido de las cosas no viene de las cosas, sino de
las inteligencias que las aplican a sus problemas diarios
en busca del progreso." (Ernesto Hernández-Novich)




Re: 'ERROR: attempted to update invisible tuple' from 'ALTER INDEX ... ATTACH PARTITION' on parent index

2023-07-12 Thread Robert Haas
On Tue, Jul 11, 2023 at 1:22 PM Shruthi Gowda  wrote:

> BEGIN;
>
> CREATE TABLE foo (
>   id INT NOT NULL,
>   ts TIMESTAMP WITH TIME ZONE NOT NULL
> ) PARTITION BY RANGE (ts);
>
> CREATE TABLE foo_2023 (
>   id INT NOT NULL,
>   ts TIMESTAMP WITH TIME ZONE NOT NULL
> );
>
> ALTER TABLE ONLY foo
>ATTACH PARTITION foo_2023
>FOR VALUES FROM ('2023-01-01 00:00:00+09') TO ('2024-01-01 00:00:00+09');
>
> CREATE UNIQUE INDEX pk_foo
>   ON ONLY foo USING btree (id, ts);
>
> ALTER TABLE ONLY foo REPLICA IDENTITY USING INDEX pk_foo;
>
> CREATE UNIQUE INDEX foo_2023_id_ts_ix ON foo_2023 USING btree (id, ts);
>
> ALTER INDEX pk_foo ATTACH PARTITION foo_2023_id_ts_ix;
>
>
This example confused me quite a bit when I first read it. I think that the
documentation for CREATE INDEX .. ONLY is pretty inadequate. All it says is
"Indicates not to recurse creating indexes on partitions, if the table is
partitioned. The default is to recurse." But that would just create a
permanently empty index, which is of no use to anyone. I think we should
somehow explain the intent of this, namely that this creates an initially
invalid index which can be made valid by using ALTER INDEX ... ATTACH
PARTITION once per partition.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


Re: 'ERROR: attempted to update invisible tuple' from 'ALTER INDEX ... ATTACH PARTITION' on parent index

2023-07-12 Thread Dilip Kumar
On Wed, Jul 12, 2023 at 1:56 PM Michael Paquier  wrote:
>
> On Wed, Jul 12, 2023 at 11:38:05AM +0530, Dilip Kumar wrote:
> > On Wed, Jul 12, 2023 at 11:12 AM Michael Paquier  
> > wrote:
> >>
> >> On Wed, Jul 12, 2023 at 09:38:41AM +0900, Michael Paquier wrote:
> >> > While working recently on what has led to cfc43ae and fc55c7f, I
> >> > really got the feeling that there could be some command sequences that
> >> > lacked some CCIs (or CommandCounterIncrement calls) to make sure that
> >> > the catalog updates are visible in any follow-up steps in the same
> >> > transaction.
> >>
> >> Wait a minute.  The validation of a partitioned index uses a copy of
> >> the pg_index tuple from the relcache, which be out of date:
> >>newtup = heap_copytuple(partedIdx->rd_indextuple);
> >>((Form_pg_index) GETSTRUCT(newtup))->indisvalid = true;
> >
> > But why the recache entry is outdated, does that mean that in the
> > previous command, we missed registering the invalidation for the
> > recache entry?
>
> Yes, something's still a bit off here, even if switching a partitioned
> index to become valid should use a fresh tuple copy from the syscache.
>
> Taking the test case of upthread, from what I can see, the ALTER TABLE
> .. REPLICA IDENTITY registers two relcache invalidations for pk_foo
> (via RegisterRelcacheInvalidation), which is the relcache entry whose
> stuff is messed up.  I would have expected a refresh of the cache of
> pk_foo to happen when doing the ALTER INDEX .. ATTACH PARTITION, but
> for some reason it does not happen when running the whole in a
> transaction block.

I think there is something to do with this code here[1], basically, we
are in a transaction block so while processing the invalidation we
have first cleared the entry for the pk_foo but then we have partially
recreated it using 'RelationReloadIndexInfo', in this function we
haven't build complete relation descriptor but marked
'relation->rd_isvalid' as true and due to that next relation_open in
(ALTER INDEX .. ATTACH PARTITION) will reuse this half backed entry.
I am still not sure what is the purpose of just reloading the index
and marking the entry as valid which is not completely valid.

RelationClearRelation()
{
..
/*
* Even non-system indexes should not be blown away if they are open and
* have valid index support information. This avoids problems with active
* use of the index support information. As with nailed indexes, we
* re-read the pg_class row to handle possible physical relocation of the
* index, and we check for pg_index updates too.
*/
if ((relation->rd_rel->relkind == RELKIND_INDEX ||
relation->rd_rel->relkind == RELKIND_PARTITIONED_INDEX) &&
relation->rd_refcnt > 0 &&
relation->rd_indexcxt != NULL)
{
if (IsTransactionState())
RelationReloadIndexInfo(relation);
return;
}

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: [PATCH]Add a tip to the check mode

2023-07-12 Thread Wen Yi
I'm so sorry for my careless, you're right.
But I still think there should add a tip to our user when there's check ok, 
because when I use the check mode, it didn't give me any message (If there's no 
error happend) and just exit, like this:


[beginnerc@bogon devel]$ postgres --check -D /home/beginnerc/pgsql/data
[beginnerc@bogon devel]$ 

[beginnerc@bogon devel]$ echo $?
0


That's confused me, until I print the return value.
So I think we should add this tip.


I fix and recommit the patch, thanks very much for your reply.


Yours,
Wen Yi





--Original--
From:   
 "Matthias van de Meent"

https://neon.tech)

Add-a-tip-to-the-check-mode[fix].patch
Description: Binary data


Re: pg16b2: REINDEX segv on null pointer in RemoveFromWaitQueue

2023-07-12 Thread Justin Pryzby
On Mon, Jul 10, 2023 at 09:01:37PM -0500, Justin Pryzby wrote:
> An instance compiled locally, without assertions, failed like this:
> 
...
> 
> => REINDEX was running, with parallel workers, but deadlocked with
> ANALYZE, and then crashed.
> 
> It looks like parallel workers are needed to hit this issue.
> I'm not sure if the issue is specific to extended stats - probably not.
> 
> I reproduced the crash with manual REINDEX+ANALYZE, and with assertions (which
> were not hit), and on a more recent commit (1124cb2cf).  The crash is hit 
> about
> 30% of the time when running a loop around REINDEX and then also running
> ANALYZE.
> 
> I hope someone has a hunch where to look; so far, I wasn't able to create a
> minimal reproducer.  

I was able to reproduce this in isolation by reloading data into a test
instance, ANALYZEing the DB to populate pg_statistic_ext_data (so it's
over 3MB in size), and then REINDEXing the stats_ext index in a loop
while ANALYZEing a table with extended stats.

I still don't have a minimal reproducer, but on a hunch I found that
this fails at 5764f611e but not its parent.

commit 5764f611e10b126e09e37fdffbe884c44643a6ce
Author: Andres Freund 
Date:   Wed Jan 18 11:41:14 2023 -0800

Use dlist/dclist instead of PROC_QUEUE / SHM_QUEUE for heavyweight locks

I tried compiling with -DILIST_DEBUG, but that shows nothing beyond
segfaulting, which seems to show that the lists themselves are fine.

-- 
Justin




Re: Synchronizing slots from primary to standby

2023-07-12 Thread Peter Eisentraut

On 14.04.23 15:22, Drouvot, Bertrand wrote:
Now that the "Minimal logical decoding on standby" patch series 
(mentioned up-thread) has been
committed, I think we can resume working on this one ("Synchronizing 
slots from primary to standby").


Maybe you have seen this extension that was released a few months ago: 
https://github.com/EnterpriseDB/pg_failover_slots .  This contains the 
same functionality packaged as an extension.  Maybe this can give some 
ideas about how this should behave and what options to provide etc.


Note that pg_failover_slots doesn't use logical decoding on standby, 
because that would be too slow in practice.  Earlier in this thread we 
had some discussion about which of the two approaches was preferred. 
Anyway, that's what's out there.







Re: Testing autovacuum wraparound (including failsafe)

2023-07-12 Thread Daniel Gustafsson
> On 12 Jul 2023, at 09:52, Masahiko Sawada  wrote:

> Agreed. The timeout can be set by manually setting
> PG_TEST_TIMEOUT_DEFAULT, but I bump it to 10 min by default. And it
> now require setting PG_TET_EXTRA to run it.

+# bump the query timeout to avoid false negatives on slow test syetems.
typo: s/syetems/systems/


+# bump the query timeout to avoid false negatives on slow test syetems.
+$ENV{PG_TEST_TIMEOUT_DEFAULT} = 600;
Does this actually work?  Utils.pm read the environment variable at compile
time in the BEGIN block so this setting won't be seen?  A quick testprogram
seems to confirm this but I might be missing something.

--
Daniel Gustafsson





Re: Consistent coding for the naming of LR workers

2023-07-12 Thread Peter Eisentraut

On 21.06.23 09:18, Alvaro Herrera wrote:

That is a terrible pattern in relatively new code.  Let's get rid of it
entirely rather than continue to propagate it.


So, I don't think it is fair to say that these format strings are OK
for the existing HEAD code, but not OK for the patch code, when they
are both the same.


Agreed.  Let's remove them all.


This is an open issue for PG16 translation.  I propose the attached 
patch to fix this.  Mostly, this just reverts to the previous wordings. 
(I don't think for these messages the difference between "apply worker" 
and "parallel apply worker" is all that interesting to explode the 
number of messages.  AFAICT, the table sync worker case wasn't even 
used, since callers always handled it separately.)
From 6a1558629f97d83c8b11f204b39aceffc94dee8f Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 12 Jul 2023 13:31:08 +0200
Subject: [PATCH] Fix untranslatable error message assembly

Discussion: 
https://www.postgresql.org/message-id/flat/CAHut%2BPt1xwATviPGjjtJy5L631SGf3qjV9XUCmxLu16cHamfgg%40mail.gmail.com
---
 src/backend/replication/logical/worker.c | 44 +++-
 1 file changed, 12 insertions(+), 32 deletions(-)

diff --git a/src/backend/replication/logical/worker.c 
b/src/backend/replication/logical/worker.c
index 0ee764d68f..95b36ced13 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -435,20 +435,6 @@ static inline void reset_apply_error_context_info(void);
 static TransApplyAction get_transaction_apply_action(TransactionId xid,

 ParallelApplyWorkerInfo **winfo);
 
-/*
- * Return the name of the logical replication worker.
- */
-static const char *
-get_worker_name(void)
-{
-   if (am_tablesync_worker())
-   return _("logical replication table synchronization worker");
-   else if (am_parallel_apply_worker())
-   return _("logical replication parallel apply worker");
-   else
-   return _("logical replication apply worker");
-}
-
 /*
  * Form the origin name for the subscription.
  *
@@ -3904,9 +3890,8 @@ maybe_reread_subscription(void)
if (!newsub)
{
ereport(LOG,
-   /* translator: first %s is the name of logical replication 
worker */
-   (errmsg("%s for subscription \"%s\" will stop 
because the subscription was removed",
-   get_worker_name(), 
MySubscription->name)));
+   (errmsg("logical replication apply worker for 
subscription \"%s\" will stop because the subscription was removed",
+   MySubscription->name)));
 
/* Ensure we remove no-longer-useful entry for worker's start 
time */
if (!am_tablesync_worker() && !am_parallel_apply_worker())
@@ -3918,9 +3903,8 @@ maybe_reread_subscription(void)
if (!newsub->enabled)
{
ereport(LOG,
-   /* translator: first %s is the name of logical replication 
worker */
-   (errmsg("%s for subscription \"%s\" will stop 
because the subscription was disabled",
-   get_worker_name(), 
MySubscription->name)));
+   (errmsg("logical replication apply worker for 
subscription \"%s\" will stop because the subscription was disabled",
+   MySubscription->name)));
 
apply_worker_exit();
}
@@ -3954,9 +3938,8 @@ maybe_reread_subscription(void)
MySubscription->name)));
else
ereport(LOG,
-   /* translator: first %s is the name of logical 
replication worker */
-   (errmsg("%s for subscription \"%s\" 
will restart because of a parameter change",
-   get_worker_name(), 
MySubscription->name)));
+   (errmsg("logical replication apply 
worker for subscription \"%s\" will restart because of a parameter change",
+   MySubscription->name)));
 
apply_worker_exit();
}
@@ -4478,9 +4461,8 @@ InitializeApplyWorker(void)
if (!MySubscription)
{
ereport(LOG,
-   /* translator: %s is the name of logical replication worker */
-   (errmsg("%s for subscription %u will not start 
because the subscription was removed during startup",
-   get_worker_name(), 
MyLogicalRepWorker->subid)));
+   (errmsg("logical replication apply worker for 

Re: psql: Add role's membership options to the \du+ command

2023-07-12 Thread Pavel Luzanov

On 09.07.2023 13:56, Pavel Luzanov wrote:

On 08.07.2023 20:07, Tom Lane wrote:

1. I was thinking in terms of dropping the "Member of" column entirely
in \du and \dg.  It doesn't tell you enough, and the output of those
commands is often too wide already.



2. You have describeRoleGrants() set up to localize "ADMIN", "INHERIT",
and "SET".  Since those are SQL keywords, our usual practice is to not
localize them; this'd simplify the code.




3. Not sure about use of LEFT JOIN in the query.  That will mean we
get a row out even for roles that have no grants, which seems like
clutter.  The LEFT JOINs to r and g are fine, but I suggest changing
the first join to a plain join.


So, I accepted all three suggestions. I will wait for other opinions and
plan to implement discussed changes within a few days.


Please review the updated version with suggested changes.

--
Pavel Luzanov
Postgres Professional: https://postgrespro.com
From ca303798c14b75cbd0131f850d93c68508ac62e9 Mon Sep 17 00:00:00 2001
From: Pavel Luzanov 
Date: Wed, 12 Jul 2023 13:00:07 +0300
Subject: [PATCH v9] psql: add \drg command to show role memberships and grants
 information.

New command shows assigned privileges for each membership
(ADMIN, INHERIT or SET) and who is grantor. This is important to know
which privileges can be used and how to revoke membership.
Without this command you need to query pg_auth_members directly.

This patch also removes the "Member of" column from the \du & \dg
commands, as it does not provide enough information about membership.
---
 doc/src/sgml/ref/psql-ref.sgml | 21 
 src/bin/psql/command.c |  2 +
 src/bin/psql/describe.c| 87 +-
 src/bin/psql/describe.h|  3 ++
 src/bin/psql/help.c|  1 +
 src/bin/psql/tab-complete.c|  6 ++-
 src/test/regress/expected/psql.out | 46 ++--
 src/test/regress/sql/psql.sql  | 26 +
 8 files changed, 174 insertions(+), 18 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 35aec6d3ce..e09fd06105 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1883,6 +1883,27 @@ INSERT INTO tbl1 VALUES ($1, $2) \bind 'first value' 'second value' \g
 
   
 
+
+  
+\drg[S] [ pattern ]
+
+
+Lists detailed information about each role membership, including
+assigned options (ADMIN,
+INHERIT or SET) and grantor.
+See the GRANT
+command for their meaning.
+
+
+By default, only user-created roles are shown; supply the
+S modifier to include system roles.
+If pattern is specified,
+only those roles whose names match the pattern are listed.
+
+
+  
+
+
   
 \drds [ role-pattern [ database-pattern ] ]
 
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 511debbe81..88a653a709 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -918,6 +918,8 @@ exec_command_d(PsqlScanState scan_state, bool active_branch, const char *cmd)
 
 	free(pattern2);
 }
+else if (strncmp(cmd, "drg", 3) == 0)
+	success = describeRoleGrants(pattern, show_system);
 else
 	status = PSQL_CMD_UNKNOWN;
 break;
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 9325a46b8f..434043594a 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -3617,7 +3617,7 @@ describeRoles(const char *pattern, bool verbose, bool showSystem)
 	PGresult   *res;
 	printTableContent cont;
 	printTableOpt myopt = pset.popt.topt;
-	int			ncols = 3;
+	int			ncols = 2;
 	int			nrows = 0;
 	int			i;
 	int			conns;
@@ -3631,11 +3631,7 @@ describeRoles(const char *pattern, bool verbose, bool showSystem)
 	printfPQExpBuffer(,
 	  "SELECT r.rolname, r.rolsuper, r.rolinherit,\n"
 	  "  r.rolcreaterole, r.rolcreatedb, r.rolcanlogin,\n"
-	  "  r.rolconnlimit, r.rolvaliduntil,\n"
-	  "  ARRAY(SELECT b.rolname\n"
-	  "FROM pg_catalog.pg_auth_members m\n"
-	  "JOIN pg_catalog.pg_roles b ON (m.roleid = b.oid)\n"
-	  "WHERE m.member = r.oid) as memberof");
+	  "  r.rolconnlimit, r.rolvaliduntil\n");
 
 	if (verbose)
 	{
@@ -3675,8 +3671,6 @@ describeRoles(const char *pattern, bool verbose, bool showSystem)
 
 	printTableAddHeader(, gettext_noop("Role name"), true, align);
 	printTableAddHeader(, gettext_noop("Attributes"), true, align);
-	/* ignores implicit memberships from superuser & pg_database_owner */
-	printTableAddHeader(, gettext_noop("Member of"), true, align);
 
 	if (verbose)
 		printTableAddHeader(, gettext_noop("Description"), true, align);
@@ -3701,11 +3695,11 @@ describeRoles(const char *pattern, bool verbose, bool showSystem)
 		if (strcmp(PQgetvalue(res, i, 5), "t") != 0)
 			add_role_attribute(, _("Cannot login"));
 
-		if (strcmp(PQgetvalue(res, i, 

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

2023-07-12 Thread Amit Kapila
On Wed, Jul 12, 2023 at 12:31 PM Masahiko Sawada  wrote:
>
> On Tue, Jul 11, 2023 at 5:31 PM Peter Smith  wrote:
> >
>
> I don't think we have concluded any action for it. I agree that
> IsIndexOnlyOnExpression() is redundant. We don't need to check *all*
> index fields actually. I've attached a draft patch. It removes
> IsIndexOnlyOnExpression() and merges
> RemoteRelContainsLeftMostColumnOnIdx() to
> FindUsableIndexForReplicaIdentityFull. One concern is that we no
> longer do the assertion check with
> IsIndexUsableForReplicaIdentityFull(). What do you think?
>

I think this is a valid concern. Can't we move all the checks
(including the remote attrs check) inside
IsIndexUsableForReplicaIdentityFull() and then call it from both
places? Won't we have attrmap information available in the callers of
FindReplTupleInLocalRel() via ApplyExecutionData?

-- 
With Regards,
Amit Kapila.




Re: Better help output for pgbench -I init_steps

2023-07-12 Thread Peter Eisentraut

On 12.07.23 09:42, Gurjeet Singh wrote:

These two variants show the two extremes; bare minimum vs. everything
but the kitchen sink. So one may feel the need to find a middle ground
and provide a "sufficient, but not too much" variant. I have attempted
that in variants 3 and 4; also attached.


If you end up with variant 3 or 4, please use double quotes instead of 
single quotes.





Re: DROP DATABASE is interruptible

2023-07-12 Thread Daniel Gustafsson
> On 12 Jul 2023, at 03:59, Andres Freund  wrote:
> On 2023-07-07 14:09:08 +0200, Daniel Gustafsson wrote:
>>> On 25 Jun 2023, at 19:03, Andres Freund  wrote:
>>> On 2023-06-21 12:02:04 -0700, Andres Freund wrote:

>>> There don't need to be explict checks, because pg_upgrade will fail, because
>>> it connects to every database. Obviously the error could be nicer, but it
>>> seems ok for something hopefully very rare. I did add a test ensuring that 
>>> the
>>> behaviour is caught.
>> 
>> I don't see any pg_upgrade test in the patch?
> 
> Oops, I stashed them alongside some unrelated changes... Included this time.

Looking more at this I wonder if we in HEAD should make this a bit nicer by
extending the --check phase to catch this?  I did a quick hack along these
lines in the 0003 commit attached here (0001 and 0002 are your unchanged
patches, just added for consistency and to be CFBot compatible).  If done it
could be a separate commit to make the 0002 patch backport cleaner of course.

 I'm not sure what should be done for psql. It's probably not a good idea to
 change tab completion, that'd just make it appear the database is gone. 
 But \l
 could probably show dropped databases more prominently?
>>> 
>>> I have not done that. I wonder if this is something that should be done in 
>>> the
>>> back branches?
>> 
>> Possibly, I'm not sure where we usually stand on changing the output format 
>> of
>> \ commands in psql in minor revisions.
> 
> I'd normally be quite careful, people do script psql.
> 
> While breaking things when encountering an invalid database doesn't actually
> sound like a bad thing, I don't think it fits into any of the existing column
> output by psql for \l.

Agreed, it doesn't, it would have to be a new column.

>> +errhint("Use DROP DATABASE to drop invalid databases"));
>> Should end with a period as a complete sentence?
> 
> I get confused about this every time. It's not helped by this example in
> sources.sgml:
> 
> 
> Primary:could not create shared memory segment: %m
> Detail: Failed syscall was shmget(key=%d, size=%u, 0%o).
> Hint:   the addendum
> 
> 
> Which notably does not use punctuation for the hint. But indeed, later we say:
>   
>Detail and hint messages: Use complete sentences, and end each with
>a period.  Capitalize the first word of sentences.  Put two spaces after
>the period if another sentence follows (for English text; might be
>inappropriate in other languages).
>   

That's not a very helpful example, and one which may give the wrong impression
unless the entire page is read.  I've raised this with a small diff to improve
it on -docs.

> Updated patches attached.

This version of the patchset LGTM.

> Does anybody have an opinion about whether we should add a dedicated field to
> pg_database for representing invalid databases in HEAD? I'm inclined to think
> that it's not really worth the cross-version complexity at this point, and
> it's not that bad a misuse to use pg_database.datconnlimit.

FWIW I think we should use pg_database.datconnlimit for this, it doesn't seem
like a common enough problem to warrant the added complexity and cost.

--
Daniel Gustafsson



v4-0003-pg_upgrade-Extend-check-for-database-not-allowing.patch
Description: Binary data


v4-0002-Handle-DROP-DATABASE-getting-interrupted.patch
Description: Binary data


v4-0001-Release-lock-after-encountering-bogs-row-in-vac_t.patch
Description: Binary data


Re: pg_upgrade instructions involving "rsync --size-only" might lead to standby corruption?

2023-07-12 Thread Michael Banck
Hi,

On Mon, Jul 10, 2023 at 02:37:24PM -0700, Nikolay Samokhvalov wrote:
> On Mon, Jul 10, 2023 at 2:02 PM Michael Banck  wrote:
> > > +++ b/doc/src/sgml/ref/pgupgrade.sgml
> > > @@ -380,22 +380,28 @@ NET STOP postgresql-
> > >  
> > >
> > >  
> > > - Streaming replication and log-shipping standby servers can
> > > + Streaming replication and log-shipping standby servers must
> > >   remain running until a later step.
> > >  
> > > 
> > >
> > > -   
> > > +   
> > >
> > >  
> > > - If you are upgrading standby servers using methods outlined in
> > section  > > - linkend="pgupgrade-step-replicas"/>, verify that the old standby
> > > - servers are caught up by running
> > pg_controldata
> > > - against the old primary and standby clusters.  Verify that the
> > > - Latest checkpoint location values match in all
> > clusters.
> > > - (There will be a mismatch if old standby servers were shut down
> > > - before the old primary or if the old standby servers are still
> > running.)
> > > + If you are upgrading standby servers using methods outlined in
> > > + ,
> >
> > You dropped the "section" before the xref, I think that should be kept
> > around.
> 
> Seems to be a problem in discussing source code that looks quite different
> than the final result.
> 
> In the result – the docs – we currently have "section Step 9", looking
> weird. I still think it's good to remove it. We also have "in Step 17
> below" (without the extra word "section") in a different place on the same
> page.

Ok.
 
> > > +ensure that they were
> > running when
> > > + you shut down the primaries in the previous step, so all the
> > latest changes
> >
> > You talk of primaries in plural here, that is a bit weird for PostgreSQL
> > documentation.
> 
> The same docs already discuss two primaries ("8. Stop both primaries"), but
> I agree it might look confusing if you read only a part of the doc, jumping
> into middle of it, like I do all the time when using the docs in "check the
> reference" style.

[...]

> > I think this should be something like "ensure both that the primary is
> > shut down and that the standbys have received all the changes".
> 
> Well, we have two primary servers – old and new. I tried to clarify it in
> the new version.

Yeah sorry about that, I think I should have first have coffee and/or
slept over this review before sending it.

Maybe one reason why I was confused is beause I consider a "primary"
more like a full server/VM, not necessarily a database instance (though
one could of course have a primary/seconday pair on the same host).
Possibly "primary instances" or something might be clearer, but I think

I should re-read the whole section first before commenting further.


Michael




Re: Clean up some signal usage mainly related to Windows

2023-07-12 Thread Peter Eisentraut

On 06.07.23 22:43, Tristan Partin wrote:

/* Finish incomplete line on stdout */
-   puts("");
-   exit(1);
+   write(STDOUT_FILENO, "", 1);
+   _exit(1);


puts() writes a newline, so it should probably be something like

write(STDOUT_FILENO, "\n", 1);





Re: Support to define custom wait events for extensions

2023-07-12 Thread Michael Paquier
On Wed, Jul 12, 2023 at 04:52:38PM +0900, Masahiro Ikeda wrote:
> In my understanding, the first column of the row for WaitEventExtension in
> wait_event_names.txt can be any value and the above code should not die.
> But if I use the following input, it falls on the last line.
> 
>  # wait_event_names.txt
>  Section: ClassName - WaitEventExtension
> 
>  WAIT_EVENT_EXTENSION "Extension" "Waiting in an extension."
>  Extension"Extension" "Waiting in an extension."
>  EXTENSION"Extension" "Waiting in an extension."
> 
> If the behavior is unexpected, we need to change the current code.
> I have created a patch for the areas that I felt needed to be changed.
> - 0001-change-the-die-condition-in-generate-wait_event_type.patch
>  (In addition to the above, "$continue = ",\n";" doesn't appear to be
> necessary.)

die "wait event names must start with 'WAIT_EVENT_'"
  if ( $trimmedwaiteventname eq $waiteventenumname
-   && $waiteventenumname !~ /^LWLock/
-   && $waiteventenumname !~ /^Lock/);
-   $continue = ",\n";
+   && $waitclassname !~ /^WaitEventLWLock$/
+   && $waitclassname !~ /^WaitEventLock$/);

Indeed, this looks wrong as-is.  $waiteventenumname refers to the
names of the enum elements, so we could just apply a filter based on
the class names in full.  The second check in for the generation of
the c/h files uses the class names.
--
Michael


signature.asc
Description: PGP signature


Re: Performance degradation on concurrent COPY into a single relation in PG16.

2023-07-12 Thread Masahiko Sawada
On Wed, Jul 12, 2023 at 3:52 AM Andres Freund  wrote:
>
> Hi,
>
> On 2023-07-03 11:55:13 +0900, Masahiko Sawada wrote:
> > While testing PG16, I observed that in PG16 there is a big performance
> > degradation in concurrent COPY into a single relation with 2 - 16
> > clients in my environment. I've attached a test script that measures
> > the execution time of COPYing 5GB data in total to the single relation
> > while changing the number of concurrent insertions, in PG16 and PG15.
> > Here are the results on my environment (EC2 instance, RHEL 8.6, 128
> > vCPUs, 512GB RAM):
> >
> > * PG15 (4b15868b69)
> > PG15: nclients = 1, execution time = 14.181
> >
> > * PG16 (c24e9ef330)
> > PG16: nclients = 1, execution time = 17.112
>
> > The relevant commit is 00d1e02be2 "hio: Use ExtendBufferedRelBy() to
> > extend tables more efficiently". With commit 1cbbee0338 (the previous
> > commit of 00d1e02be2), I got a better numbers, it didn't have a better
> > scalability, though:
> >
> > PG16: nclients = 1, execution time = 17.444
>
> I think the single client case is indicative of an independent regression, or
> rather regressions - it can't have anything to do with the fallocate() issue
> and reproduces before that too in your numbers.

Right.

>
> 1) COPY got slower, due to:
> 9f8377f7a27 Add a DEFAULT option to COPY  FROM
>
> This added a new palloc()/free() to every call to NextCopyFrom(). It's not at
> all clear to me why this needs to happen in NextCopyFrom(), particularly
> because it's already stored in CopyFromState?

Yeah, it seems to me that we can palloc the bool array once and use it
for the entire COPY FROM. With the attached small patch, the
performance becomes much better:

15:
14.70500 sec

16:
17.42900 sec

16 w/ patch:
14.85600 sec

>
>
> 2) pg_strtoint32_safe() got substantially slower, mainly due
>to
> faff8f8e47f Allow underscores in integer and numeric constants.
> 6fcda9aba83 Non-decimal integer literals

Agreed.

>
> pinned to one cpu, turbo mode disabled, I get the following best-of-three 
> times for
>   copy test from '/tmp/tmp_4.data'
> (too impatient to use the larger file every time)
>
> 15:
> 6281.107 ms
>
> HEAD:
> 7000.469 ms
>
> backing out 9f8377f7a27:
> 6433.516 ms
>
> also backing out faff8f8e47f, 6fcda9aba83:
> 6235.453 ms
>
>
> I suspect 1) can relatively easily be fixed properly. But 2) seems much
> harder. The changes increased the number of branches substantially, that's
> gonna cost in something as (previously) tight as pg_strtoint32().

I'll look at how to fix 2).

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


fix_COPY_DEFAULT.patch
Description: Binary data


Re: Support to define custom wait events for extensions

2023-07-12 Thread Masahiro Ikeda

On 2023-07-12 09:36, Andres Freund wrote:

Hi,

On 2023-07-11 16:45:26 +0900, Michael Paquier wrote:

+/* --
+ * Wait Events - Extension
+ *
+ * Use this category when the server process is waiting for some 
condition

+ * defined by an extension module.
+ *
+ * Extensions can define custom wait events.  First, call
+ * WaitEventExtensionNewTranche() just once to obtain a new wait 
event

+ * tranche.  The ID is allocated from a shared counter.  Next, each
+ * individual process using the tranche should call
+ * WaitEventExtensionRegisterTranche() to associate that wait event 
with

+ * a name.


What does "tranche" mean here? For LWLocks it makes some sense, it's 
used for
a set of lwlocks, not an individual one. But here that doesn't really 
seem to

apply?


Thanks for useful comments.
OK, I will change to WaitEventExtensionNewId() and 
WaitEventExtensionRegisterName().


+ * It may seem strange that each process using the tranche must 
register it
+ * separately, but dynamic shared memory segments aren't guaranteed 
to be
+ * mapped at the same address in all coordinating backends, so 
storing the
+ * registration in the main shared memory segment wouldn't work for 
that case.

+ */

I don't really see how this applies to wait events? There's no pointers
here...


Yes, I'll fix the comments.




+typedef enum
+{
+   WAIT_EVENT_EXTENSION = PG_WAIT_EXTENSION,
+   WAIT_EVENT_EXTENSION_FIRST_USER_DEFINED
+} WaitEventExtension;
+
+extern void WaitEventExtensionShmemInit(void);
+extern Size WaitEventExtensionShmemSize(void);
+
+extern uint32 WaitEventExtensionNewTranche(void);
+extern void WaitEventExtensionRegisterTranche(uint32 wait_event_info,



-slock_t*ShmemLock; /* spinlock for shared memory and LWLock
+slock_t*ShmemLock; /* spinlock for shared memory, LWLock
+* allocation, 
and named extension wait event
 * allocation */


I'm doubtful that it's a good idea to reuse the spinlock further. Given 
that
the patch adds WaitEventExtensionShmemInit(), why not just have a lock 
in

there?


OK, I'll create a new spinlock for the purpose.



+/*
+ * Allocate a new event ID and return the wait event info.
+ */
+uint32
+WaitEventExtensionNewTranche(void)
+{
+   uint16  eventId;
+
+   SpinLockAcquire(ShmemLock);
+   eventId = (*WaitEventExtensionCounter)++;
+   SpinLockRelease(ShmemLock);
+
+   return PG_WAIT_EXTENSION | eventId;
+}


It seems quite possible to run out space in WaitEventExtensionCounter, 
so this

should error out in that case.


OK, I'll do so.



+/*
+ * Register a dynamic tranche name in the lookup table of the current 
process.

+ *
+ * This routine will save a pointer to the wait event tranche name 
passed
+ * as an argument, so the name should be allocated in a 
backend-lifetime context

+ * (shared memory, TopMemoryContext, static constant, or similar).
+ *
+ * The "wait_event_name" will be user-visible as a wait event name, 
so try to

+ * use a name that fits the style for those.
+ */
+void
+WaitEventExtensionRegisterTranche(uint32 wait_event_info,
+ const char 
*wait_event_name)
+{
+   uint16  eventId;
+
+   /* Check wait event class. */
+   Assert((wait_event_info & 0xFF00) == PG_WAIT_EXTENSION);
+
+   eventId = wait_event_info & 0x;
+
+   /* This should only be called for user-defined tranches. */
+   if (eventId < NUM_BUILTIN_WAIT_EVENT_EXTENSION)
+   return;


Why not assert out in that case then?


OK, I'll add the assertion for eventID.



+/*
+ * Return the name of an Extension wait event ID.
+ */
+static const char *
+GetWaitEventExtensionIdentifier(uint16 eventId)
+{
+   /* Build-in tranche? */


*built


I will fix it.



+   if (eventId < NUM_BUILTIN_WAIT_EVENT_EXTENSION)
+   return "Extension";
+
+   /*
+	 * It's an extension tranche, so look in 
WaitEventExtensionTrancheNames[].
+	 * However, it's possible that the tranche has never been registered 
by
+	 * calling WaitEventExtensionRegisterTranche() in the current 
process, in

+* which case give up and return "Extension".
+*/
+   eventId -= NUM_BUILTIN_WAIT_EVENT_EXTENSION;
+
+   if (eventId >= WaitEventExtensionTrancheNamesAllocated ||
+   WaitEventExtensionTrancheNames[eventId] == NULL)
+   return "Extension";


I'd return something different here, otherwise something that's 
effectively a

bug is not distinguishable from the built


It is a good idea. It would be good to name it "unknown wait event", the 
same as

pgstat_get_wait_activity(), etc.



+++ b/src/test/modules/test_custom_wait_events/t/001_basic.pl
@@ -0,0 +1,34 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+use strict;
+use warnings;
+

Re: 'ERROR: attempted to update invisible tuple' from 'ALTER INDEX ... ATTACH PARTITION' on parent index

2023-07-12 Thread Michael Paquier
On Wed, Jul 12, 2023 at 11:38:05AM +0530, Dilip Kumar wrote:
> On Wed, Jul 12, 2023 at 11:12 AM Michael Paquier  wrote:
>>
>> On Wed, Jul 12, 2023 at 09:38:41AM +0900, Michael Paquier wrote:
>> > While working recently on what has led to cfc43ae and fc55c7f, I
>> > really got the feeling that there could be some command sequences that
>> > lacked some CCIs (or CommandCounterIncrement calls) to make sure that
>> > the catalog updates are visible in any follow-up steps in the same
>> > transaction.
>>
>> Wait a minute.  The validation of a partitioned index uses a copy of
>> the pg_index tuple from the relcache, which be out of date:
>>newtup = heap_copytuple(partedIdx->rd_indextuple);
>>((Form_pg_index) GETSTRUCT(newtup))->indisvalid = true;
> 
> But why the recache entry is outdated, does that mean that in the
> previous command, we missed registering the invalidation for the
> recache entry?

Yes, something's still a bit off here, even if switching a partitioned
index to become valid should use a fresh tuple copy from the syscache.

Taking the test case of upthread, from what I can see, the ALTER TABLE
.. REPLICA IDENTITY registers two relcache invalidations for pk_foo
(via RegisterRelcacheInvalidation), which is the relcache entry whose
stuff is messed up.  I would have expected a refresh of the cache of
pk_foo to happen when doing the ALTER INDEX .. ATTACH PARTITION, but
for some reason it does not happen when running the whole in a
transaction block.  I cannot put my finger on what's wrong for the
moment, but based on my current impressions the inval requests are
correctly registered when switching the replica identity, but nothing
about pk_foo is updated when attaching a partition to it in the last
step where the invisible update happens :/ 
--
Michael


signature.asc
Description: PGP signature


Re: SQL:2011 application time

2023-07-12 Thread Peter Eisentraut

On 07.07.23 03:03, Paul A Jungwirth wrote:

Here are some new patch files based on discussions from PGCon.


Here are a few fixup patches to get things building without warnings and 
errors.


The last patch (your 0005) fails the regression test for me and it 
didn't appear to be a trivial problem, so please take another look at 
that sometime.  (Since it's the last patch, it's obviously lower priority.)
From 6694912b0fdd8742e583ff2a524f32284c330711 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 12 Jul 2023 09:45:56 +0200
Subject: [PATCH v12] fixup! Add temporal PRIMARY KEY and UNIQUE constraints

---
 src/bin/pg_dump/pg_dump.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index a95e1d3696..33ad34ad66 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -7029,16 +7029,14 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int 
numTables)
 "(SELECT 
pg_catalog.array_agg(attstattarget ORDER BY attnum) "
 "  FROM 
pg_catalog.pg_attribute "
 "  WHERE attrelid = 
i.indexrelid AND "
-"attstattarget >= 
0) AS indstatvals, "
-"c.conexclop IS NOT 
NULL AS withoutoverlaps, ");
+"attstattarget >= 
0) AS indstatvals, ");
else
appendPQExpBufferStr(query,
 "0 AS parentidx, "
 "i.indnatts AS 
indnkeyatts, "
 "i.indnatts AS 
indnatts, "
 "'' AS indstatcols, "
-"'' AS indstatvals, "
-"null AS 
withoutoverlaps, ");
+"'' AS indstatvals, ");
 
if (fout->remoteVersion >= 15)
appendPQExpBufferStr(query,
@@ -7047,6 +7045,13 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int 
numTables)
appendPQExpBufferStr(query,
 "false AS 
indnullsnotdistinct, ");
 
+   if (fout->remoteVersion >= 17)
+   appendPQExpBufferStr(query,
+"c.conexclop IS NOT 
NULL AS withoutoverlaps ");
+   else
+   appendPQExpBufferStr(query,
+"null AS 
withoutoverlaps ");
+
/*
 * The point of the messy-looking outer join is to find a constraint 
that
 * is related by an internal dependency link to the index. If we find 
one,
-- 
2.41.0

From 49932adb8e626036b8d8edf26ed1465f39db82bd Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 12 Jul 2023 10:16:53 +0200
Subject: [PATCH v12] fixup! Add UPDATE/DELETE FOR PORTION OF

---
 doc/src/sgml/ref/delete.sgml | 2 ++
 doc/src/sgml/ref/update.sgml | 2 ++
 src/backend/parser/analyze.c | 5 +++--
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/ref/delete.sgml b/doc/src/sgml/ref/delete.sgml
index 868cf0d1f9..aec593239b 100644
--- a/doc/src/sgml/ref/delete.sgml
+++ b/doc/src/sgml/ref/delete.sgml
@@ -55,6 +55,7 @@ Description
circumstances.
   
 
+  
 
   
The optional RETURNING clause causes 
DELETE
diff --git a/doc/src/sgml/ref/update.sgml b/doc/src/sgml/ref/update.sgml
index f2042e0b25..62e9e0e1f0 100644
--- a/doc/src/sgml/ref/update.sgml
+++ b/doc/src/sgml/ref/update.sgml
@@ -52,6 +52,7 @@ Description
circumstances.
   
 
+  
 
   
The optional RETURNING clause causes 
UPDATE
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 0b2109d1bb..c6d2b7e1d1 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -1247,6 +1247,7 @@ transformForPortionOfClause(ParseState *pstate,
char *range_name = forPortionOf->range_name;
char *range_type_name = NULL;
int range_attno = InvalidAttrNumber;
+   Form_pg_attribute attr;
ForPortionOfExpr *result;
List *targetList;
Node *target_start, *target_end;
@@ -1264,7 +1265,7 @@ transformForPortionOfClause(ParseState *pstate,
range_name,

RelationGetRelationName(targetrel)),
 parser_errposition(pstate, 
forPortionOf->range_name_location)));
-   Form_pg_attribute attr = TupleDescAttr(targetrel->rd_att, range_attno - 
1);
+   attr = TupleDescAttr(targetrel->rd_att, range_attno 

Re: Support to define custom wait events for extensions

2023-07-12 Thread Masahiro Ikeda

On 2023-07-12 02:39, Tristan Partin wrote:

From bf06b8100cb747031959fe81a2d19baabc4838cf Mon Sep 17 00:00:00 2001
From: Masahiro Ikeda 
Date: Fri, 16 Jun 2023 11:53:29 +0900
Subject: [PATCH 1/2] Support custom wait events for extensions.


+ * This is indexed by event ID minus 
NUM_BUILTIN_WAIT_EVENT_EXTENSION, and
+ * stores the names of all dynamically-created event ID known to the 
current

+ * process.  Any unused entries in the array will contain NULL.


The second ID should be plural.


Thanks for reviewing. Yes, I'll fix it.


+   /* If necessary, create or enlarge array. */
+   if (eventId >= ExtensionWaitEventTrancheNamesAllocated)
+   {
+   int newalloc;
+
+   newalloc = pg_nextpower2_32(Max(8, eventId + 1));


Given the context of our last conversation, I assume this code was
copied from somewhere else. Since this is new code, I think it would
make more sense if newalloc was a uint16 or size_t.


As Michael-san said, I used LWLockRegisterTranche() as a reference.
I think it is a good idea to fix the current master. I'll modify the
above code accordingly.


From what I undersatnd, Neon differs from upstream in some way related
to this patch. I am trying to ascertain how that is. I hope to provide
more feedback when I learn more about it.


Oh, it was unexpected for me. Thanks for researching the reason.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION




Re: Testing autovacuum wraparound (including failsafe)

2023-07-12 Thread Masahiko Sawada
On Thu, Apr 27, 2023 at 11:12 PM Daniel Gustafsson  wrote:
>
> > On 27 Apr 2023, at 16:06, Masahiko Sawada  wrote:
> > On Fri, Apr 21, 2023 at 12:02 PM John Naylor
> >  wrote:
>
> >> ...that call to background_psql doesn't look like other ones that have 
> >> "key => value". Is there something I'm missing?
> >
> > Thanks for reporting. I think that the patch needs to be updated since
> > commit 664d757531e1 changed background psql TAP functions. I've
> > attached the updated patch.
>
> Is there a risk that the background psql will time out on slow systems during
> the consumption of 2B xid's?  Since you mainly want to hold it open for the
> duration of testing you might want to bump it to avoid false negatives on slow
> test systems.

Agreed. The timeout can be set by manually setting
PG_TEST_TIMEOUT_DEFAULT, but I bump it to 10 min by default. And it
now require setting PG_TET_EXTRA to run it.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


v4-0001-Add-tests-for-XID-wraparound.patch
Description: Binary data


Re: Support to define custom wait events for extensions

2023-07-12 Thread Masahiro Ikeda

On 2023-07-11 16:45, Michael Paquier wrote:

On Fri, Jun 23, 2023 at 05:56:26PM +0900, Masahiro Ikeda wrote:

I updated the patches to handle the warning mentioned
by PostgreSQL Patch Tester, and removed unnecessary spaces.


I have begun hacking on that, and the API layer inspired from the
LWLocks is sound.  I have been playing with it in my own extensions
and it is nice to be able to plug in custom wait events into
pg_stat_activity, particularly for bgworkers.  Finally.


Great!


The patch needed a rebase after the recent commit that introduced the
automatic generation of docs and code for wait events.  It requires
two tweaks in generate-wait_event_types.pl, feel free to double-check
them.


Thanks for rebasing. I confirmed it works with the current master.

I know this is a little off-topic from what we're talking about here,
but I'm curious about generate-wait_event_types.pl.

 # generate-wait_event_types.pl
 -  # An exception is required for LWLock and Lock as these don't require
 -  # any C and header files generated.
 +	# An exception is required for Extension, LWLock and Lock as these 
don't

 +  # require any C and header files generated.
die "wait event names must start with 'WAIT_EVENT_'"
  if ( $trimmedwaiteventname eq $waiteventenumname
 +  && $waiteventenumname !~ /^Extension/
&& $waiteventenumname !~ /^LWLock/
&& $waiteventenumname !~ /^Lock/);

In my understanding, the first column of the row for WaitEventExtension 
in

wait_event_names.txt can be any value and the above code should not die.
But if I use the following input, it falls on the last line.

 # wait_event_names.txt
 Section: ClassName - WaitEventExtension

 WAIT_EVENT_EXTENSION   "Extension"   "Waiting in an extension."
 Extension  "Extension"   "Waiting in an extension."
 EXTENSION  "Extension"   "Waiting in an extension."

If the behavior is unexpected, we need to change the current code.
I have created a patch for the areas that I felt needed to be changed.
- 0001-change-the-die-condition-in-generate-wait_event_type.patch
 (In addition to the above, "$continue = ",\n";" doesn't appear to be 
necessary.)




Some of the new structures and routine names don't quite reflect the
fact that we have wait events for extensions, so I have taken a stab
at that.


Sorry. I confirmed the change.



Note that the test module test_custom_wait_events would crash if
attempting to launch a worker when not loaded in
shared_preload_libraries, so we'd better have some protection in
wait_worker_launch() (this function should be renamed as well).


OK, I will handle it. Since Andres gave me other comments for the
test module, I'll think about what is best.



Attached is a rebased patch that I have begun tweaking here and
there.  For now, the patch is moved as waiting on author.  I have
merged the test module with the main patch for the moment, for
simplicity.  A split is straight-forward as the code paths touched are
different.

Another and *very* important thing is that we are going to require
some documentation in xfunc.sgml to explain how to use these routines
and what to expect from them.  Ikeda-san, could you write some?  You
could look at the part about shmem and LWLock to get some
inspiration.


OK. Yes, I planned to write documentation.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATIONFrom 5198fc6302a3bf4232252b23b45d39d987e736bc Mon Sep 17 00:00:00 2001
From: Masahiro Ikeda 
Date: Wed, 12 Jul 2023 16:28:27 +0900
Subject: [PATCH] change the die condition in generate-wait_event_types.pl

---
 src/backend/utils/activity/generate-wait_event_types.pl | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/activity/generate-wait_event_types.pl b/src/backend/utils/activity/generate-wait_event_types.pl
index 6d1a2af42a..fbe011039a 100644
--- a/src/backend/utils/activity/generate-wait_event_types.pl
+++ b/src/backend/utils/activity/generate-wait_event_types.pl
@@ -89,9 +89,8 @@ foreach my $line (@lines_sorted)
 	# any C and header files generated.
 	die "wait event names must start with 'WAIT_EVENT_'"
 	  if ( $trimmedwaiteventname eq $waiteventenumname
-		&& $waiteventenumname !~ /^LWLock/
-		&& $waiteventenumname !~ /^Lock/);
-	$continue = ",\n";
+		&& $waitclassname !~ /^WaitEventLWLock$/
+		&& $waitclassname !~ /^WaitEventLock$/);
 	push(@{ $hashwe{$waitclassname} }, @waiteventlist);
 }
 
-- 
2.25.1



Better help output for pgbench -I init_steps

2023-07-12 Thread Gurjeet Singh
These patches were created during an unrelated discussion about pgbench.

Please see emails [1] - [6] linked below, for the past discussion.

In brief:

> $ pgbench -i -I dtGvp -s 500

The init-steps are severely under-documented in pgbench --help output.
I think at least a pointer to the the pgbench docs should be mentioned
in the pgbench --help output; an average user may not rush to read the
code to find the explanation, but a hint to where to find more details
about what the letters in --init-steps mean, would save them a lot of
time.

Please see attached 4 variants of the patch. Variant 1 simply tells
the reader to consult pgbench documentation. The second variant
provides a description for each of the letters, as the documentation
does. The committer can pick  the one they find suitable.

The text ", in the specified order" is an important detail, that
should be included irrespective of the rest of the patch.

My preference would be to use the first variant, since the second one
feels too wordy for --help output. I believe we'll have to choose
between these two; the alternatives will not make anyone happy.

These two variants show the two extremes; bare minimum vs. everything
but the kitchen sink. So one may feel the need to find a middle ground
and provide a "sufficient, but not too much" variant. I have attempted
that in variants 3 and 4; also attached.

The third variant does away with the list of steps, and uses a
paragraph to describe the letters. And the fourth variant makes that
paragraph terse.

In the order of preference I'd choose variant 1, then 2. Variants 3
and 4 feel like a significant degradation from variant 2.

Attached samples.txt shows the snippets of --help output of each of
the variants/patches, for comparison.

In [6] below, Tristan showed preference for the second variant.

[1] My complaint about -I initi_steps being severely under-documented
in --help output
https://www.postgresql.org/message-id/CABwTF4XMdHTxemhskad41Vj_hp2nPgifjwegOqR52_8-wEbv2Q%40mail.gmail.com

[2] Tristan Partin agreeing with the complaint, and suggesting a patch
would be welcome
https://www.postgresql.org/message-id/CT8BC7RXT33R.3CHYIXGD5NVHK%40gonk

[3] Greg Smith agreeing and saying he'd welcome a few more words about
the init_steps in --help output
https://www.postgresql.org/message-id/CAHLJuCUp5_VUo%2BRJ%2BpSnxeiiZfcstRtTubRP8%2Bu8NEqmrbp4aw%40mail.gmail.com

[4] First set of patches
https://www.postgresql.org/message-id/CABwTF4UKv43ZftJadsxs8%3Da07BmA1U4RU3W1qbmDAguVKJAmZw%40mail.gmail.com

[5] Second set of patches
https://www.postgresql.org/message-id/CABwTF4Ww42arY1Q88_iaraVLxqSU%2B8Yb4oKiTT5gD1sineog9w%40mail.gmail.com

[6] Tristan showing preference for the second variant
https://www.postgresql.org/message-id/CTBN5E2K2YSJ.3QYXGZ09JZXIW%40gonk

+CC Tristan Partin and Greg Smith, since they were involved in the
initial thread.

Best regards,
Gurjeet
http://Gurje.et
 variant-1-brief-pointer-to-docs.patch
  -i, --initialize invokes initialization mode
  -I, --init-steps=[dtgGvpf]+ (default "dtgvp")
   run selected initialization steps, in the specified 
order
   see pgbench documentation for a description of these 
steps
  -F, --fillfactor=NUM set fill factor

 variant-2-detailed-description.patch
  -i, --initialize invokes initialization mode
  -I, --init-steps=[dtgGvpf]+ (default "dtgvp")
   run selected initialization steps, in the specified 
order
   d: Drop any existing pgbench tables
   t: Create the tables used by the standard pgbench 
scenario
   g: Generate data, client-side
   G: Generate data, server-side
   v: Invoke VACUUM on the standard tables
   p: Create primary key indexes on the standard tables
   f: Create foreign key constraints between the 
standard tables
  -F, --fillfactor=NUM set fill factor

 variant-3-details-not-list.patch
  -i, --initialize invokes initialization mode
  -I, --init-steps=[dtgGvpf]+ (default "dtgvp")
   run selected initialization steps, in the specified 
order.
   These steps operate on standard tables used by 
pgbench
   'd' drops the tables, 't' creates the tables, 'g' 
generates
   the data on client-side, 'G' generates data on 
server-side,
   'v' vaccums the tables, 'p' creates primary key 
constraints,
   and 'f' creates foreign key constraints
  -F, --fillfactor=NUM set fill factor

 variant-4-terse-details-not-list.patch
  -i, --initialize invokes initialization mode
  -I, --init-steps=[dtgGvpf]+ (default "dtgvp")
   run selected initialization steps, in the specified 
order.

Re: Exclusion constraints on partitioned tables

2023-07-12 Thread Peter Eisentraut

On 11.07.23 07:52, Paul A Jungwirth wrote:

On Mon, Jul 10, 2023 at 8:06 AM Paul A Jungwirth
 wrote:


On Mon, Jul 10, 2023 at 7:05 AM Peter Eisentraut  wrote:

I'm not sure what value we would get from testing this with btree_gist,
but if we wanted to do that, then adding a new test file to the
btree_gist sql/ directory would seem reasonable to me.

(I would make the test a little bit bigger than you had shown, like
insert a few values.)

If you want to do that, please send another patch.  Otherwise, I'm ok to
commit this one.


I can get you a patch tonight or tomorrow. I think it's worth it since
btree_gist uses different strategy numbers than ordinary gist.


Patch attached.


Looks good, committed.

I had some second thoughts about the use of get_attname().  It seems the 
previous code used the dominant style of extracting the attribute name 
from the open relation handle, so I kept it that way.  That's also more 
efficient than going via the syscache.






RE: [Patch] Use *other* indexes on the subscriber when REPLICA IDENTITY is FULL

2023-07-12 Thread Hayato Kuroda (Fujitsu)
Dear Amit,

Thanks for checking my patch! The patch can be available at [1].

> > > ==
> > > .../subscription/t/032_subscribe_use_index.pl
> > >
> > > 11.
> > > AFAIK there is no test to verify that the leftmost index field must be
> > > a column (e.g. cannot be an expression). Yes, there are some tests for
> > > *only* expressions like (expr, expr, expr), but IMO there should be
> > > another test for an index like (expr, expr, column) which should also
> > > fail because the column is not the leftmost field.
> >
> > I'm not sure they should be fixed in the patch. You have raised these points
> > at the Sawada-san's thread[1] and not sure he has done.
> > Furthermore, these points are not related with our initial motivation.
> > Fork, or at least it should be done by another patch.
> >
> 
> I think it is reasonable to discuss the existing code-related
> improvements in a separate thread rather than trying to change this
> patch. 

OK, so I will not touch in this thread.

> I have some other comments about your patch:
> 
> 1.
> + * 1) Other indexes do not have a fixed set of strategy numbers.
> + * In build_replindex_scan_key(), the operator which corresponds to "equality
> + * comparison" is searched by using the strategy number, but it is difficult
> + * for such indexes. For example, GiST index operator classes for
> + * two-dimensional geometric objects have a comparison "same", but its
> strategy
> + * number is different from the gist_int4_ops, which is a GiST index operator
> + * class that implements the B-tree equivalent.
> + *
> ...
> + */
> +int
> +get_equal_strategy_number_for_am(Oid am)
> ...
> 
> I think this comment is slightly difficult to understand. Can we make
> it a bit generic by saying something like: "The index access methods
> other than BTREE and HASH don't have a fixed strategy for equality
> operation. Note that in other index access methods, the support
> routines of each operator class interpret the strategy numbers
> according to the operator class's definition. So, we return
> InvalidStrategy number for index access methods other than BTREE and
> HASH."

Sounds better. Fixed with some adjustments.

> 2.
> + * 2) Moreover, BRIN and GIN indexes do not implement "amgettuple".
> + * RelationFindReplTupleByIndex() assumes that tuples will be fetched one by
> + * one via index_getnext_slot(), but this currently requires the "amgetuple"
> + * function. To make it use index_getbitmap() must be used, which fetches all
> + * the tuples at once.
> + */
> +int
> +get_equal_strategy_number_for_am(Oid am)
> {
> ..
> 
> I don't think this is a good place for such a comment. We can probably
> move this atop IsIndexUsableForReplicaIdentityFull(). I think you need
> to mention two reasons in IsIndexUsableForReplicaIdentityFull() why we
> support only BTREE and HASH index access methods (a) Refer to comments
> atop get_equal_strategy_number_for_am(); (b) mention the reason
> related to tuples_equal() as discussed in email [1]. Then you can say
> that we also need to ensure that the index access methods that we
> support must have an implementation "amgettuple" as later while
> searching tuple via RelationFindReplTupleByIndex, we need the same.

Fixed, and based on that I modified the commit message accordingly.
How do you feel?

> We can probably have an assert for this as well.

Added.

[1]: 
https://www.postgresql.org/message-id/TYAPR01MB5866B4F938ADD7088379633CF536A%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED





RE: [Patch] Use *other* indexes on the subscriber when REPLICA IDENTITY is FULL

2023-07-12 Thread Hayato Kuroda (Fujitsu)
Dear Peter,

Thanks for giving comment.

> 
> 1.
> FYI, this patch also needs some minor PG docs updates [1] because
> section "31.1 Publication" currently refers to only btree - e.g.
> "Candidate indexes must be btree, non-partial, and have..."
> 
> (this may also clash with Sawada-san's other thread as previously mentioned)

Fixed that, but I could not find any other points. Do you have in mind others?
I checked related commits like 89e46d and adedf5, but only the part was changed.

> Commit message
> 
> 2.
> Moreover, BRIN and GIN indexes do not implement "amgettuple".
> RelationFindReplTupleByIndex()
> assumes that tuples will be fetched one by one via
> index_getnext_slot(), but this
> currently requires the "amgetuple" function.
> 
> ~
> 
> Typo:
> /"amgetuple"/"amgettuple"/

Fixed.

> src/backend/executor/execReplication.c
> 
> 3.
> + * 2) Moreover, BRIN and GIN indexes do not implement "amgettuple".
> + * RelationFindReplTupleByIndex() assumes that tuples will be fetched one by
> + * one via index_getnext_slot(), but this currently requires the "amgetuple"
> + * function. To make it use index_getbitmap() must be used, which fetches all
> + * the tuples at once.
> + */
> +int
> 
> 3a.
> Typo:
> /"amgetuple"/"amgettuple"/

Per suggestion from Amit [1], the paragraph was removed.


> 3b.
> I think my previous comment about this comment was misunderstood. I
> was questioning why that last sentence ("To make it...") about
> "index_getbitmap()" is even needed in this patch. I thought it may be
> overreach to describe details how to make some future enhancement that
> might never happen.

Removed.

> src/backend/replication/logical/relation.c
> 
> 4. IsIndexUsableForReplicaIdentityFull
> 
>  IsIndexUsableForReplicaIdentityFull(IndexInfo *indexInfo)
>  {
> - bool is_btree = (indexInfo->ii_Am == BTREE_AM_OID);
> - bool is_partial = (indexInfo->ii_Predicate != NIL);
> - bool is_only_on_expression = IsIndexOnlyOnExpression(indexInfo);
> + bool is_suitable_type;
> + bool is_partial;
> + bool is_only_on_expression;
> 
> - return is_btree && !is_partial && !is_only_on_expression;
> + /* Check whether the index is supported or not */
> + is_suitable_type = ((get_equal_strategy_number_for_am(indexInfo->ii_Am)
> + != InvalidStrategy));
> +
> + is_partial = (indexInfo->ii_Predicate != NIL);
> + is_only_on_expression = IsIndexOnlyOnExpression(indexInfo);
> +
> + return is_suitable_type && !is_partial && !is_only_on_expression;
> 
> 4a.
> The code can be written more optimally, because if it is deemed
> already that 'is_suitable_type' is false, then there is no point to
> continue to do the other checks and function calls.

Right, is_suitable_type is now removed.

> 4b.
> 
> + /* Check whether the index is supported or not */
> + is_suitable_type = ((get_equal_strategy_number_for_am(indexInfo->ii_Am)
> + != InvalidStrategy));
> 
> The above statement has an extra set of nested parentheses for some reason.

This part was removed per above comment.

> src/backend/utils/cache/lsyscache.c
> 
> 5.
> /*
>  * get_opclass_method
>  *
>  * Returns the OID of the index access method operator class is for.
>  */
> Oid
> get_opclass_method(Oid opclass)
> 
> IMO the comment should be worded more like the previous one in this source 
> file.
> 
> SUGGESTION
> Returns the OID of the index access method the opclass belongs to.

Fixed.

[1]: 
https://www.postgresql.org/message-id/CAA4eK1%2B%2BR3WSfsGH0yFR9WEbkDfF__OccADR7qXDnHGTww%2BkvQ%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v5-0001-Allow-the-use-Hash-index-when-REPLICA-IDENTITY-FU.patch
Description:  v5-0001-Allow-the-use-Hash-index-when-REPLICA-IDENTITY-FU.patch


Re: Support to define custom wait events for extensions

2023-07-12 Thread Michael Paquier
On Tue, Jul 11, 2023 at 05:36:47PM -0700, Andres Freund wrote:
> On 2023-07-11 16:45:26 +0900, Michael Paquier wrote:
>> +$node->init;
>> +$node->append_conf(
>> +'postgresql.conf',
>> +"shared_preload_libraries = 'test_custom_wait_events'"
>> +);
>> +$node->start;
> 
> I think this should also test registering a wait event later.

Yup, agreed that the coverage is not sufficient.

> > @@ -0,0 +1,188 @@
> > +/*--
> > + *
> > + * test_custom_wait_events.c
> > + * Code for testing custom wait events
> > + *
> > + * This code initializes a custom wait event in shmem_request_hook() and
> > + * provide a function to launch a background worker waiting forever
> > + * with the custom wait event.
> 
> Isn't this vast overkill?  Why not just have a simple C function that waits
> with a custom wait event, until cancelled? That'd maybe 1/10th of the LOC.

Hmm.  You mean in the shape of a TAP test where a backend registers a
wait event by itself in a SQL function that waits for a certain amount
of time with a WaitLatch(), then we use a second poll_query_until()
that checks if the a wait event is stored in pg_stat_activity?  With
something like what's done at the end of 001_stream_rep.pl, that
should be stable, I guess..
--
Michael


signature.asc
Description: PGP signature


Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

2023-07-12 Thread Masahiko Sawada
On Tue, Jul 11, 2023 at 5:31 PM Peter Smith  wrote:
>
> Here are my comments for v4.
>
> ==
>
> Docs/Comments:
>
> All the docs and updated comments LTGM, except I felt one sentence
> might be written differently to avoid nested parentheses.
>
> BEFORE
> ...used for REPLICA IDENTITY FULL table (see
> FindUsableIndexForReplicaIdentityFull() for details).
>
> AFTER
> ...used for REPLICA IDENTITY FULL table. See
> FindUsableIndexForReplicaIdentityFull() for details.
>
> 

Agreed. I've attached the updated patch. I'll push it barring any objections.

>
> Logic:
>
> What was the decision about the earlier question [1] of
> removing/merging the function IsIndexOnlyOnExpression()?
>

I don't think we have concluded any action for it. I agree that
IsIndexOnlyOnExpression() is redundant. We don't need to check *all*
index fields actually. I've attached a draft patch. It removes
IsIndexOnlyOnExpression() and merges
RemoteRelContainsLeftMostColumnOnIdx() to
FindUsableIndexForReplicaIdentityFull. One concern is that we no
longer do the assertion check with
IsIndexUsableForReplicaIdentityFull(). What do you think?

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


v5-0001-Doc-clarify-the-conditions-of-usable-indexes-for-.patch
Description: Binary data


remove_redundant_check.patch
Description: Binary data


Re: 'ERROR: attempted to update invisible tuple' from 'ALTER INDEX ... ATTACH PARTITION' on parent index

2023-07-12 Thread Dilip Kumar
On Wed, Jul 12, 2023 at 11:12 AM Michael Paquier  wrote:
>
> On Wed, Jul 12, 2023 at 09:38:41AM +0900, Michael Paquier wrote:
> > While working recently on what has led to cfc43ae and fc55c7f, I
> > really got the feeling that there could be some command sequences that
> > lacked some CCIs (or CommandCounterIncrement calls) to make sure that
> > the catalog updates are visible in any follow-up steps in the same
> > transaction.
>
> Wait a minute.  The validation of a partitioned index uses a copy of
> the pg_index tuple from the relcache, which be out of date:
>newtup = heap_copytuple(partedIdx->rd_indextuple);
>((Form_pg_index) GETSTRUCT(newtup))->indisvalid = true;

But why the recache entry is outdated, does that mean that in the
previous command, we missed registering the invalidation for the
recache entry?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: CI and test improvements

2023-07-12 Thread Michael Paquier
On Wed, Jul 12, 2023 at 12:56:17AM -0500, Justin Pryzby wrote:
> And, since 681d9e462:
> 
> security is missing from contrib/seg/meson.build

Ugh.  Good catch!
--
Michael


signature.asc
Description: PGP signature


  1   2   >