Re: Assert while autovacuum was executing

2023-07-05 Thread Amit Kapila
On Fri, Jun 30, 2023 at 8:26 AM Amit Kapila  wrote:
>
> On Wed, Jun 28, 2023 at 7:26 AM Zhijie Hou (Fujitsu)
>  wrote:
> >
>
> Thanks for the verification. Unless someone has any further comments
> or suggestions, I'll push this next week sometime.
>

Pushed but forgot to do indent which leads to BF failure[1]. I'll take
care of it.

[1] - 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=koel=2023-07-06%2005%3A19%3A03

-- 
With Regards,
Amit Kapila.




Re: test_extensions: fix inconsistency between meson.build and Makefile

2023-07-05 Thread Jeff Davis
On Thu, 2023-07-06 at 11:41 +0900, Michael Paquier wrote:
> Why is the addition of --encoding necessary for test_extensions?  Its
> Makefile has a NO_LOCALE, but it has no ENCODING set.

I think that was an oversight -- as you point out, the Makefile doesn't
set ENCODING, so the meson.build does not need to, either.

Regards,
Jeff Davis





Re: Add index scan progress to pg_stat_progress_vacuum

2023-07-05 Thread Masahiko Sawada
On Thu, Jul 6, 2023 at 11:15 AM Michael Paquier  wrote:
>
> On Thu, Jul 06, 2023 at 11:07:14AM +0900, Masahiko Sawada wrote:
> > Thank you for updating the patch. It looks good to me too. I've
> > updated the commit message.
>
> Thanks.  I was planning to review this patch today and/or tomorrow now
> that my stack of things to do is getting slightly lower (registered my
> name as committer as well a few weeks ago to not format).
>
> One thing I was planning to do is to move the new message processing
> API for the incrementational updates in its own commit for clarity, as
> that's a separate concept than the actual feature, useful on its own.

+1. I had the same idea. Please find the attached patches.

> Anyway, would you prefer taking care of it?

I can take care of it if you're okay.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


v31-0002-Report-index-vacuum-progress.patch
Description: Binary data


v31-0001-Add-new-parallel-message-type-to-progress-report.patch
Description: Binary data


Re: MERGE ... RETURNING

2023-07-05 Thread Gurjeet Singh
On Sat, Jul 1, 2023 at 4:08 AM Dean Rasheed  wrote:
>
> On Mon, 13 Mar 2023 at 13:36, Dean Rasheed  wrote:
> >
> > And another rebase.
> >
>
> I ran out of cycles to pursue the MERGE patches in v16, but hopefully
> I can make more progress in v17.
>
> Looking at this one with fresh eyes, it looks mostly in good shape.

+1

Most of the review was done with the v6 of the patch, minus the doc
build step. The additional changes in v7 of the patch were eyeballed,
and tested with `make check`.

> To
> recap, this adds support for the RETURNING clause in MERGE, together
> with new support functions pg_merge_action() and
> pg_merge_when_clause() that can be used in the RETURNING list of MERGE

nit: s/can be used in/can be used only in/

> to retrieve the kind of action (INSERT/UPDATE/DELETE), and the index
> of the WHEN clause executed for each row merged. In addition,
> RETURNING support allows MERGE to be used as the source query in COPY
> TO and WITH queries.
>
> One minor annoyance is that, due to the way that pg_merge_action() and
> pg_merge_when_clause() require access to the MergeActionState, they
> only work if they appear directly in the RETURNING list. They can't,
> for example, appear in a subquery in the RETURNING list, and I don't
> see an easy way round that limitation.

I believe that's a serious limitation, and would be a blocker for the feature.

> Attached is an updated patch with some cosmetic updates, plus updated
> ruleutils support.

With each iteration of your patch, it is becoming increasingly clear
that this will be a documentation-heavy patch :-)

I think the name of function pg_merge_when_clause() can be improved.
How about pg_merge_when_clause_ordinal().

In the doc page of MERGE, you've moved the 'with_query' from the
bottom of Parameters section to the top of that section. Any reason
for this? Since the Parameters section is quite large, for a moment I
thought the patch was _adding_ the description of 'with_query'.


+/*
+ * Merge support functions should only be called directly from a MERGE
+ * command, and need access to the parent ModifyTableState.  The parser
+ * should have checked that such functions only appear in the RETURNING
+ * list of a MERGE, so this should never fail.
+ */
+if (IsMergeSupportFunction(funcid))
+{
+if (!state->parent ||
+!IsA(state->parent, ModifyTableState) ||
+((ModifyTableState *) state->parent)->operation != CMD_MERGE)
+elog(ERROR, "merge support function called in non-merge context");

As the comment says, this is an unexpected condition, and should have
been caught and reported by the parser. So I think it'd be better to
use an Assert() here. Moreover, there's ERROR being raised by the
pg_merge_action() and pg_merge_when_clause() functions, when they
detect the function context is not appropriate.

I found it very innovative to allow these functions to be called only
in a certain part of certain SQL command. I don't think  there's a
precedence for this in  Postgres; I'd be glad to learn if there are
other functions that Postgres exposes this way.

-EXPR_KIND_RETURNING,/* RETURNING */
+EXPR_KIND_RETURNING,/* RETURNING in INSERT/UPDATE/DELETE */
+EXPR_KIND_MERGE_RETURNING,  /* RETURNING in MERGE */

Having to invent a whole new ParseExprKind enum, feels like a bit of
an overkill. My only objection is that this is exactly like
EXPR_KIND_RETURNING, hence EXPR_KIND_MERGE_RETURNING needs to be dealt
with exactly in as many places. But I don't have any alternative
proposals.

--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
+/* Is this a merge support function?  (Requires fmgroids.h) */
+#define IsMergeSupportFunction(funcid) \
+((funcid) == F_PG_MERGE_ACTION || \
+ (funcid) == F_PG_MERGE_WHEN_CLAUSE)

If it doesn't cause recursion or other complications, I think we
should simply include the fmgroids.h in pg_proc.h. I understand that
not all .c files/consumers that include pg_proc.h may need fmgroids.h,
but #include'ing it will eliminate the need to keep the "Requires..."
note above, and avoid confusion, too.

--- a/src/test/regress/expected/merge.out
+++ b/src/test/regress/expected/merge.out

-WHEN MATCHED AND tid > 2 THEN
+WHEN MATCHED AND tid >= 2 THEN

This change can be treated as a bug fix :-)

+-- COPY (MERGE ... RETURNING) TO ...
+BEGIN;
+COPY (
+MERGE INTO sq_target t
+USING v
+ON tid = sid
+WHEN MATCHED AND tid > 2 THEN

For consistency, the check should be tid >= 2, like you've fixed in
command referenced above.

+BEGIN;
+COPY (
+MERGE INTO sq_target t
+USING v
+ON tid = sid
+WHEN MATCHED AND tid > 2 THEN
+UPDATE SET balance = t.balance + delta
+WHEN NOT MATCHED THEN
+INSERT (balance, tid) VALUES (balance + delta, sid)
+WHEN MATCHED AND tid < 2 THEN
+DELETE
+RETURNING pg_merge_action(), t.*
+) TO stdout;
+DELETE  1   100
+ROLLBACK;

I expected 

Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-07-05 Thread Amit Kapila
On Wed, Jul 5, 2023 at 1:48 AM Melih Mutlu  wrote:
>
> Hayato Kuroda (Fujitsu) , 4 Tem 2023 Sal,
> 08:42 tarihinde şunu yazdı:
> > > > But in the later patch the tablesync worker tries to reuse the slot 
> > > > during the
> > > > synchronization, so in this case the application_name should be same as
> > > slotname.
> > > >
> > >
> > > Fair enough. I am slightly afraid that if we can't show the benefits
> > > with later patches then we may need to drop them but at this stage I
> > > feel we need to investigate why those are not helping?
> >
> > Agreed. Now I'm planning to do performance testing independently. We can 
> > discuss
> > based on that or Melih's one.
>
> Here I attached  what I use for performance testing of this patch.
>
> I only benchmarked the patch set with reusing connections very roughly
> so far. But seems like it improves quite significantly. For example,
> it took 611 ms to sync 100 empty tables, it was 1782 ms without
> reusing connections.
> First 3 patches from the set actually bring a good amount of
> improvement, but not sure about the later patches yet.
>

I suggest then we should focus first on those 3, get them committed
and then look at the remaining.

> Amit Kapila , 3 Tem 2023 Pzt, 08:59 tarihinde
> şunu yazdı:
> > On thinking about this, I think the primary benefit we were expecting
> > by saving network round trips for slot drop/create but now that we
> > anyway need an extra round trip to establish a snapshot, so such a
> > benefit was not visible. This is just a theory so we should validate
> > it. The another idea as discussed before [1] could be to try copying
> > multiple tables in a single transaction. Now, keeping a transaction
> > open for a longer time could have side-effects on the publisher node.
> > So, we probably need to ensure that we don't perform multiple large
> > syncs and even for smaller tables (and later sequences) perform it
> > only for some threshold number of tables which we can figure out by
> > some tests. Also, the other safety-check could be that anytime we need
> > to perform streaming (sync with apply worker), we won't copy more
> > tables in same transaction.
> >
> > Thoughts?
>
> Yeah, maybe going to the publisher for creating a slot or only a
> snapshot does not really make enough difference. I was hoping that
> creating only snapshot by an existing replication slot would help the
> performance. I guess I was either wrong or am missing something in the
> implementation.
>
> The tricky bit with keeping a long transaction to copy multiple tables
> is deciding how many tables one transaction can copy.
>

Yeah, I was thinking that we should not allow copying some threshold
data in one transaction. After every copy, we will check the size of
the table and add it to the previously copied table size in the same
transaction. Once the size crosses a certain threshold, we will end
the transaction. This may not be a very good scheme but I think it
this helps then it would be much simpler than creating-only-snapshot
approach.

-- 
With Regards,
Amit Kapila.




Re: Should we remove db_user_namespace?

2023-07-05 Thread Nathan Bossart
On Thu, Jul 06, 2023 at 08:21:18AM +0900, Michael Paquier wrote:
> Removing the GUC from this table is kind of annoying.  Cannot this be
> handled like default_with_oids or ssl_renegotiation_limit to avoid any
> kind of issues with the reload of dump files and the kind?

Ah, good catch.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From ba8f57f2e15bcf9c147c25496f5ea7dba211fefb Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 30 Jun 2023 12:46:08 -0700
Subject: [PATCH v4 1/1] remove db_user_namespace

---
 doc/src/sgml/client-auth.sgml |  5 --
 doc/src/sgml/config.sgml  | 52 ---
 src/backend/commands/variable.c   | 15 ++
 src/backend/libpq/auth.c  |  5 --
 src/backend/libpq/hba.c   | 12 -
 src/backend/postmaster/postmaster.c   | 19 ---
 src/backend/utils/misc/guc_tables.c   | 16 --
 src/backend/utils/misc/postgresql.conf.sample |  1 -
 src/include/libpq/pqcomm.h|  2 -
 src/include/utils/guc_hooks.h |  1 +
 .../unsafe_tests/expected/guc_privs.out   |  4 ++
 .../modules/unsafe_tests/sql/guc_privs.sql|  3 ++
 12 files changed, 35 insertions(+), 100 deletions(-)

diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index 204d09df67..6c95f0df1e 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -1253,11 +1253,6 @@ omicron bryanh  guest1
attacks.
   
 
-  
-   The md5 method cannot be used with
-   the  feature.
-  
-
   
To ease transition from the md5 method to the newer
SCRAM method, if md5 is specified as a method
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 6262cb7bb2..e6cea8ddfc 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1188,58 +1188,6 @@ include_dir 'conf.d'

   
  
-
- 
-  db_user_namespace (boolean)
-  
-   db_user_namespace configuration parameter
-  
-  
-  
-   
-This parameter enables per-database user names.  It is off by default.
-This parameter can only be set in the postgresql.conf
-file or on the server command line.
-   
-
-   
-If this is on, you should create users as username@dbname.
-When username is passed by a connecting client,
-@ and the database name are appended to the user
-name and that database-specific user name is looked up by the
-server. Note that when you create users with names containing
-@ within the SQL environment, you will need to
-quote the user name.
-   
-
-   
-With this parameter enabled, you can still create ordinary global
-users.  Simply append @ when specifying the user
-name in the client, e.g., joe@.  The @
-will be stripped off before the user name is looked up by the
-server.
-   
-
-   
-db_user_namespace causes the client's and
-server's user name representation to differ.
-Authentication checks are always done with the server's user name
-so authentication methods must be configured for the
-server's user name, not the client's.  Because
-md5 uses the user name as salt on both the
-client and server, md5 cannot be used with
-db_user_namespace.
-   
-
-   
-
- This feature is intended as a temporary measure until a
- complete solution is found.  At that time, this option will
- be removed.
-
-   
-  
- 
  
  
 
diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index f0f2e07655..b6a2fa2512 100644
--- a/src/backend/commands/variable.c
+++ b/src/backend/commands/variable.c
@@ -1157,6 +1157,21 @@ check_bonjour(bool *newval, void **extra, GucSource source)
 	return true;
 }
 
+bool
+check_db_user_namespace(bool *newval, void **extra, GucSource source)
+{
+	if (*newval)
+	{
+		/* check the GUC's definition for an explanation */
+		GUC_check_errcode(ERRCODE_FEATURE_NOT_SUPPORTED);
+		GUC_check_errmsg("db_user_namespace is not supported");
+
+		return false;
+	}
+
+	return true;
+}
+
 bool
 check_default_with_oids(bool *newval, void **extra, GucSource source)
 {
diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index a98b934a8e..65d452f099 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -873,11 +873,6 @@ CheckMD5Auth(Port *port, char *shadow_pass, const char **logdetail)
 	char	   *passwd;
 	int			result;
 
-	if (Db_user_namespace)
-		ereport(FATAL,
-(errcode(ERRCODE_INVALID_AUTHORIZATION_SPECIFICATION),
- errmsg("MD5 authentication is not supported when \"db_user_namespace\" is enabled")));
-
 	/* include the salt to use for computing the response */
 	if (!pg_strong_random(md5Salt, 4))
 

Re: contrib/pg_freespacemap first check input argument, then relation_open.

2023-07-05 Thread Julien Rouhaud
Hi,

On Thu, Jul 06, 2023 at 10:14:46AM +0800, jian he wrote:
>
> In:
> https://git.postgresql.org/cgit/postgresql.git/tree/contrib/pg_freespacemap/pg_freespacemap.c
>
> rel = relation_open(relid, AccessShareLock);
>
> if (blkno < 0 || blkno > MaxBlockNumber)
> ereport(ERROR,
> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> errmsg("invalid block number")));
>
> 
> should it first check input arguments, then relation_open?

It would probably be a slightly better approach but wouldn't really change much
in practice so I'm not sure it's worth changing now.

> Does ereport automatically unlock the relation?

Yes, locks, lwlocks, memory contexts and everything else is properly cleaned /
released in case of error.




Re: test_extensions: fix inconsistency between meson.build and Makefile

2023-07-05 Thread Michael Paquier
On Sat, Jun 17, 2023 at 07:40:18AM -0700, Gurjeet Singh wrote:
> So attached is updated patch that makes the order consistent across
> all 3 occurrences.

There is no need to update unaccent since 44e73a4.

--- a/src/test/modules/test_extensions/meson.build
+++ b/src/test/modules/test_extensions/meson.build
@@ -47,5 +47,6 @@ tests += {
   'test_extensions',
   'test_extdepend',
 ],
+'regress_args': ['--no-locale', '--encoding=UTF8'],

Why is the addition of --encoding necessary for test_extensions?  Its
Makefile has a NO_LOCALE, but it has no ENCODING set.
--
Michael


signature.asc
Description: PGP signature


Re: Add index scan progress to pg_stat_progress_vacuum

2023-07-05 Thread Michael Paquier
On Thu, Jul 06, 2023 at 11:07:14AM +0900, Masahiko Sawada wrote:
> Thank you for updating the patch. It looks good to me too. I've
> updated the commit message.

Thanks.  I was planning to review this patch today and/or tomorrow now
that my stack of things to do is getting slightly lower (registered my
name as committer as well a few weeks ago to not format).  

One thing I was planning to do is to move the new message processing
API for the incrementational updates in its own commit for clarity, as
that's a separate concept than the actual feature, useful on its own.

Anyway, would you prefer taking care of it?
--
Michael


signature.asc
Description: PGP signature


contrib/pg_freespacemap first check input argument, then relation_open.

2023-07-05 Thread jian he
Hi.

In:
https://git.postgresql.org/cgit/postgresql.git/tree/contrib/pg_freespacemap/pg_freespacemap.c

rel = relation_open(relid, AccessShareLock);

if (blkno < 0 || blkno > MaxBlockNumber)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("invalid block number")));


should it first check input arguments, then relation_open?
Does ereport automatically unlock the relation?


Re: Add index scan progress to pg_stat_progress_vacuum

2023-07-05 Thread Masahiko Sawada
On Wed, Apr 12, 2023 at 9:22 PM Imseih (AWS), Sami  wrote:
>
> > This should be OK (also checked the code paths where the reports are
> > added). Note that the patch needed a few adjustments for its
> > indentation.
>
> Thanks for the formatting corrections! This looks good to me.

Thank you for updating the patch. It looks good to me too. I've
updated the commit message.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From f92929f21d4052cfbc3b068ca5b8be954741da3d Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 12 Apr 2023 13:45:59 +0900
Subject: [PATCH v30] Report index vacuum progress.

This commit adds two columns: indexes_total and indexes_processed, to
pg_stat_progress_vacuum system view to show the index vacuum
progress. These numbers are reported in the "vacuuming indexes" and
"cleaning up indexes" phases.

It also adds a new type of parallel message, 'P'. Which is used to
convey the progress updates made by parallel workers to the leader
process. Therefore, the parallel workers' progress updates are
reflected in an asynchronous manner. Currently it supports only
incremental progress reporting but it's possible to allow for
supporting of other backend process APIs in the future.

Authors: Sami Imseih
Reviewed by: Masahiko Sawada, Michael Paquier, Nathan Bossart, Andres Freund
Discussion: https://www.postgresql.org/message-id/flat/5478DFCD-2333-401A-B2F0-0D186AB09228@amazon.com
---
 doc/src/sgml/monitoring.sgml  | 23 ++
 src/backend/access/heap/vacuumlazy.c  | 70 ---
 src/backend/access/transam/parallel.c | 18 +
 src/backend/catalog/system_views.sql  |  3 +-
 src/backend/commands/vacuumparallel.c |  9 ++-
 src/backend/utils/activity/backend_progress.c | 32 +
 src/include/commands/progress.h   |  2 +
 src/include/utils/backend_progress.h  |  1 +
 src/test/regress/expected/rules.out   |  4 +-
 9 files changed, 150 insertions(+), 12 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 506aeaa879..588b720f57 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6110,6 +6110,29 @@ FROM pg_stat_get_backend_idset() AS backendid;
Number of dead tuples collected since the last index vacuum cycle.
   
  
+
+ 
+  
+   indexes_total bigint
+  
+  
+   Total number of indexes that will be vacuumed or cleaned up. This
+   number is reported at the beginning of the
+   vacuuming indexes phase or the
+   cleaning up indexes phase.
+  
+ 
+
+ 
+  
+   indexes_processed bigint
+  
+  
+   Number of indexes processed. This counter only advances when the
+   phase is vacuuming indexes or
+   cleaning up indexes.
+  
+ 
 

   
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 4eb953f904..6a41ee635d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2319,6 +2319,17 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
 {
 	bool		allindexes = true;
 	double		old_live_tuples = vacrel->rel->rd_rel->reltuples;
+	const int	progress_start_index[] = {
+		PROGRESS_VACUUM_PHASE,
+		PROGRESS_VACUUM_INDEXES_TOTAL
+	};
+	const int	progress_end_index[] = {
+		PROGRESS_VACUUM_INDEXES_TOTAL,
+		PROGRESS_VACUUM_INDEXES_PROCESSED,
+		PROGRESS_VACUUM_NUM_INDEX_VACUUMS
+	};
+	int64		progress_start_val[2];
+	int64		progress_end_val[3];
 
 	Assert(vacrel->nindexes > 0);
 	Assert(vacrel->do_index_vacuuming);
@@ -2331,9 +2342,13 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
 		return false;
 	}
 
-	/* Report that we are now vacuuming indexes */
-	pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
- PROGRESS_VACUUM_PHASE_VACUUM_INDEX);
+	/*
+	 * Report that we are now vacuuming indexes and the number of indexes to
+	 * vacuum.
+	 */
+	progress_start_val[0] = PROGRESS_VACUUM_PHASE_VACUUM_INDEX;
+	progress_start_val[1] = vacrel->nindexes;
+	pgstat_progress_update_multi_param(2, progress_start_index, progress_start_val);
 
 	if (!ParallelVacuumIsActive(vacrel))
 	{
@@ -2346,6 +2361,10 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
 		  old_live_tuples,
 		  vacrel);
 
+			/* Report the number of indexes vacuumed */
+			pgstat_progress_update_param(PROGRESS_VACUUM_INDEXES_PROCESSED,
+		 idx + 1);
+
 			if (lazy_check_wraparound_failsafe(vacrel))
 			{
 /* Wraparound emergency -- end current index scan */
@@ -2380,14 +2399,17 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
 	Assert(allindexes || VacuumFailsafeActive);
 
 	/*
-	 * Increase and report the number of index scans.
+	 * Increase and report the number of index scans.  Also, we reset
+	 * PROGRESS_VACUUM_INDEXES_TOTAL and PROGRESS_VACUUM_INDEXES_PROCESSED.
 	 *
 	 * We deliberately include the case where we started a round of bulk
 	 * deletes that 

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

2023-07-05 Thread Masahiko Sawada
On Thu, Jul 6, 2023 at 7:58 AM Peter Smith  wrote:
>
> On Wed, Jul 5, 2023 at 4:32 PM Masahiko Sawada  wrote:
> >
> > On Wed, Jul 5, 2023 at 2:46 PM Amit Kapila  wrote:
> > >
> > > On Wed, Jul 5, 2023 at 9:01 AM Peter Smith  wrote:
> > > >
> > > > Hi. Here are some review comments for this patch.
> > > >
> > > > +1 for the patch idea.
> > > >
> > > > --
> > > >
> > > > I wasn't sure about the code comment adjustments suggested by Amit [1]:
> > > > "Accordingly, the comments atop build_replindex_scan_key(),
> > > > FindUsableIndexForReplicaIdentityFull(), IsIndexOnlyOnExpression()
> > > > should also be adjusted."
> >
> > As for IsIndexOnlyOnExpression(), what part do you think we need to
> > adjust? It says:
> >
> > /*
> >  * Returns true if the given index consists only of expressions such as:
> >  *  CREATE INDEX idx ON table(foo(col));
> >  *
> >  * Returns false even if there is one column reference:
> >  *   CREATE INDEX idx ON table(foo(col), col_2);
> >  */
> >
> > and it seems to me that the function doesn't check if the leftmost
> > index column is a non-expression.
> >
>
> TBH, this IsIndexOnlyOnExpression() function seemed redundant to me,
> otherwise, there can be some indexes that are firstly considered
> "useable" but then fail the subsequent leftmost check. It does not
> seem right.

I see your point. IsIndexUsableForReplicaIdentityFull(), the sole user
of IsIndexOnlyOnExpression(), is also called by
RelationFindReplTupleByIndex() in an assertion build. I thought this
is the reason why we have separate IsIndexOnlyOnExpression() ( and
IsIndexUsableForReplicaIdentityFull()). But this assertion doesn't
check if the leftmost index column exists on the remote relation. What
are we doing this check for? If it's not necessary, we can remove this
assertion and merge both IsIndexOnlyOnExpression() and
IsIndexUsableForReplicaIdentityFull() into
FindUsableIndexForReplicaIdentityFull().

>
> > > > doc/src/sgml/logical-replication.sgml
> > > >
> > > > 1.
> > > > the key.  When replica identity FULL is 
> > > > specified,
> > > > indexes can be used on the subscriber side for searching the rows.
> > > > Candidate
> > > > indexes must be btree, non-partial, and have at least one column 
> > > > reference
> > > > -   (i.e. cannot consist of only expressions).  These restrictions
> > > > -   on the non-unique index properties adhere to some of the 
> > > > restrictions that
> > > > -   are enforced for primary keys.  If there are no such suitable 
> > > > indexes,
> > > > +   at the leftmost column indexes (i.e. cannot consist of only
> > > > expressions).  These
> > > > +   restrictions on the non-unique index properties adhere to some of
> > > > the restrictions
> > > > +   that are enforced for primary keys.  If there are no such suitable 
> > > > indexes,
> > > > the search on the subscriber side can be very inefficient, therefore
> > > > replica identity FULL should only be used as a
> > > > fallback if no other solution is possible.  If a replica identity 
> > > > other
> > > >
> > > > Isn't this using the word "indexes" with different meanings in the
> > > > same sentence? e.g. IIUC "leftmost column indexes" is referring to the
> > > > ordinal number of the index fields.
> >
> > That was my mistake, it should be " at the leftmost column".
>
> IIUC the subscriber-side table can have more columns than the
> publisher-side table, so just describing in the docs that the leftmost
> INDEX field must be a column may not be quite enough; it also needs to
> say that column has to exist on the publisher-table, doesn't it?

Right. I've updated the patch.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


doc_update_v2.patch
Description: Binary data


Re: Add PQsendSyncMessage() to libpq

2023-07-05 Thread Anton Kirilov

Hello,

On 05/07/2023 21:45, Daniel Gustafsson wrote:

Please rebase and send an updated version.

Here it is (including the warning fix).

Thanks,
Anton KirilovFrom 3c2e064a151568830e4d9e4bf238739b458350b4 Mon Sep 17 00:00:00 2001
From: Anton Kirilov 
Date: Wed, 22 Mar 2023 20:39:57 +
Subject: [PATCH v4] Add PQpipelinePutSync() to libpq

This new function is equivalent to PQpipelineSync(),
except that it lets the caller choose whether anything
is flushed to the server. Its purpose is to reduce the
system call overhead of pipeline mode.
---
 doc/src/sgml/libpq.sgml   | 52 ---
 src/interfaces/libpq/exports.txt  |  1 +
 src/interfaces/libpq/fe-exec.c| 30 ---
 src/interfaces/libpq/libpq-fe.h   |  6 +++
 .../modules/libpq_pipeline/libpq_pipeline.c   | 37 +
 .../traces/multi_pipelines.trace  | 11 
 6 files changed, 121 insertions(+), 16 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index b6f5aba1b1..0bc88b1569 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -3547,8 +3547,9 @@ ExecStatusType PQresultStatus(const PGresult *res);
   

 The PGresult represents a
-synchronization point in pipeline mode, requested by
-.
+synchronization point in pipeline mode, requested by either
+ or
+.
 This status occurs only when pipeline mode has been selected.

   
@@ -5122,7 +5123,8 @@ int PQsendClosePortal(PGconn *conn, const char *portalName);
,
,
,
-   , or
+   ,
+   , or

call, and returns it.
A null pointer is returned when the command is complete and there
@@ -5490,7 +5492,8 @@ int PQflush(PGconn *conn);
  or its prepared-query sibling
  .
  These requests are queued on the client-side until flushed to the server;
- this occurs when  is used to
+ this occurs when  (or optionally
+ ) is used to
  establish a synchronization point in the pipeline,
  or when  is called.
  The functions ,
@@ -5506,8 +5509,9 @@ int PQflush(PGconn *conn);
  client sends them.  The server will begin executing the commands in the
  pipeline immediately, not waiting for the end of the pipeline.
  Note that results are buffered on the server side; the server flushes
- that buffer when a synchronization point is established with
- PQpipelineSync, or when
+ that buffer when a synchronization point is established with either
+ PQpipelineSync or
+ PQpipelinePutSync, or when
  PQsendFlushRequest is called.
  If any statement encounters an error, the server aborts the current
  transaction and does not execute any subsequent command in the queue
@@ -5564,7 +5568,8 @@ int PQflush(PGconn *conn);
  PGresult types PGRES_PIPELINE_SYNC
  and PGRES_PIPELINE_ABORTED.
  PGRES_PIPELINE_SYNC is reported exactly once for each
- PQpipelineSync at the corresponding point
+ PQpipelineSync or
+ PQpipelinePutSync at the corresponding point
  in the pipeline.
  PGRES_PIPELINE_ABORTED is emitted in place of a normal
  query result for the first error and all subsequent results
@@ -5602,7 +5607,8 @@ int PQflush(PGconn *conn);
  PQresultStatus will report a
  PGRES_PIPELINE_ABORTED result for each remaining queued
  operation in an aborted pipeline. The result for
- PQpipelineSync is reported as
+ PQpipelineSync or
+ PQpipelinePutSync is reported as
  PGRES_PIPELINE_SYNC to signal the end of the aborted pipeline
  and resumption of normal result processing.
 
@@ -5834,6 +5840,36 @@ int PQsendFlushRequest(PGconn *conn);

   
  
+
+
+ PQpipelinePutSyncPQpipelinePutSync
+
+ 
+  
+   Marks a synchronization point in a pipeline by sending a
+   sync message.
+   This serves as the delimiter of an implicit transaction and
+   an error recovery point; see .
+
+
+int PQpipelinePutSync(PGconn *conn, int flags);
+
+  
+  
+   Returns 0 for success. Returns -1 if the connection is not in
+   pipeline mode or sending a
+   sync message
+   failed. Returns 1 if it was unable to send all the data in
+   the send queue yet (this case can only occur if the connection
+   is nonblocking and output data is flushed to the server).
+   The flags argument is a bitwise OR of
+   several flags. PG_PIPELINEPUTSYNC_FLUSH
+   specifies that any queued output data is flushed to the
+   server. Otherwise, nothing is flushed automatically;
+   use PQflush if necessary.
+  
+ 
+

   
 
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index 850734ac96..18e7b31539 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -191,3 

Re: Autogenerate some wait events code and documentation

2023-07-05 Thread Michael Paquier
On Wed, Jul 05, 2023 at 02:59:39PM -0700, Andres Freund wrote:
> Rebasing a patch over this I was a bit confused because I got a bunch of
> ""unable to parse wait_event_names.txt" errors. Took me a while to figure out
> that that was just because I didn't include a trailing . in the description.
> Perhaps that could be turned into a more meaningful error?
> 
>   die "unable to parse wait_event_names.txt"
> unless $line =~ /^(\w+)\t+(\w+)\t+("\w+")\t+("\w.*\.")$/;

Agreed that we could at least add the $line in the error message
generated, at least, to help with debugging.

> I also do wonder if we should invest in generating the lwlock names as
> well. Except for a few abbreviations, the second column is always the
> camel-cased version of what follows WAIT_EVENT_. Feels pretty tedious to write
> that out.

And you mean getting rid of lwlocknames.txt?  The impact on dtrace or
other similar tools is uncertain to me because we have free number on
this list, and stuff like GetLWLockIdentifier() rely on the input ID
from lwlocknames.txt.

> Perhaps we should just change the case of the upper-cased names (DSM, SSL,
> WAL, ...) to follow the other names?

So you mean renaming the existing events like WalSenderWaitForWAL to
WalSenderWaitForWal?  The impact on existing monitoring queries is not
zero because any changes would be silent, and that's the part that
worried me the most even if it can remove one column in the txt file.
--
Michael


signature.asc
Description: PGP signature


Re: pg_upgrade and cross-library upgrades

2023-07-05 Thread Michael Paquier
On Wed, Jul 05, 2023 at 07:03:56AM -0400, Tom Lane wrote:
> Alvaro Herrera  writes:
> > 002_pg_upgrade.pl can test for presence or absence of pgcrypto by
> > grepping pg_config --configure for --with-ssl or --with-openssl.  If the
> > old cluster has it but the new doesn't, we must drop the
> > contrib_regression_pgcrypto database.
> 
> Hmm, but you'd also need code to handle meson builds no?

Yes.  It is worth noting that pg_config (or its SQL function) would
report the switches for ./configure in its CONFIGURE field, but, err..
We report nothing under meson.  That's a problem.

> Could it
> be easier to look for the pgcrypto library in the new installation?

If all the contrib/ modules are handled, we'd need mapping rules for
everything listed in contrib/Makefile.

> Not entirely sure this is worth the effort.

I am not sure either..  Anyway, the buildfarm code does similar things
already, see around $bad_module in TestUpgradeXversion.pm, for
instance.  So this kind of workaround exists already.  It seems to me
that we should try to pull that out of the buildfarm code and have
that in the core module instead as a routine that would be used by the
in-core TAP tests of pg_upgrade and the buildfarm code.
--
Michael


signature.asc
Description: PGP signature


Re: Consider \v to the list of whitespace characters in the parser

2023-07-05 Thread Michael Paquier
On Tue, Jul 04, 2023 at 09:28:21AM +0900, Michael Paquier wrote:
> Yeah, thanks.

I have looked again at that this morning, and did not notice any
missing spots, so applied..  Let's see how it goes.
--
Michael


signature.asc
Description: PGP signature


Re: Should we remove db_user_namespace?

2023-07-05 Thread Michael Paquier
On Wed, Jul 05, 2023 at 02:29:27PM -0700, Nathan Bossart wrote:
>   },
> - {
> - {"db_user_namespace", PGC_SIGHUP, CONN_AUTH_AUTH,
> - gettext_noop("Enables per-database user names."),
> - NULL
> - },
> - _user_namespace,
> - false,
> - NULL, NULL, NULL
> - },
>   {

Removing the GUC from this table is kind of annoying.  Cannot this be
handled like default_with_oids or ssl_renegotiation_limit to avoid any
kind of issues with the reload of dump files and the kind?
--
Michael


signature.asc
Description: PGP signature


Re: Wrong syntax in feature description

2023-07-05 Thread Peter Smith
On Wed, Jul 5, 2023 at 5:37 PM Daniel Gustafsson  wrote:
>
> > On 4 Jun 2023, at 18:48, Amit Kapila  wrote:
> >
> > On Fri, Jun 2, 2023 at 7:01 PM Peter Smith  wrote:
> >>
> >> Hi, I noticed a feature description [1] referring to a command example:
> >>
> >> CREATE PUBLICATION ... FOR ALL TABLES IN SCHEMA 
> >>
> >> ~~
> >>
> >> AFAIK that should say "FOR TABLES IN SCHEMA" (without the "ALL", see [2])
> >
> > Right, this should be changed.
>
> Agreed, so I've fixed this in the featurematrix on the site.  I will mark this
> CF entry as committed even though there is nothing to commit (the 
> featurematrix
> is stored in the postgresql.org django instance) since there was a change
> performed.
>
> Thanks for the report!

Thanks for (not) pushing ;-)

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

2023-07-05 Thread Peter Smith
On Wed, Jul 5, 2023 at 4:32 PM Masahiko Sawada  wrote:
>
> On Wed, Jul 5, 2023 at 2:46 PM Amit Kapila  wrote:
> >
> > On Wed, Jul 5, 2023 at 9:01 AM Peter Smith  wrote:
> > >
> > > Hi. Here are some review comments for this patch.
> > >
> > > +1 for the patch idea.
> > >
> > > --
> > >
> > > I wasn't sure about the code comment adjustments suggested by Amit [1]:
> > > "Accordingly, the comments atop build_replindex_scan_key(),
> > > FindUsableIndexForReplicaIdentityFull(), IsIndexOnlyOnExpression()
> > > should also be adjusted."
>
> As for IsIndexOnlyOnExpression(), what part do you think we need to
> adjust? It says:
>
> /*
>  * Returns true if the given index consists only of expressions such as:
>  *  CREATE INDEX idx ON table(foo(col));
>  *
>  * Returns false even if there is one column reference:
>  *   CREATE INDEX idx ON table(foo(col), col_2);
>  */
>
> and it seems to me that the function doesn't check if the leftmost
> index column is a non-expression.
>

TBH, this IsIndexOnlyOnExpression() function seemed redundant to me,
otherwise, there can be some indexes that are firstly considered
"useable" but then fail the subsequent leftmost check. It does not
seem right.

> > > doc/src/sgml/logical-replication.sgml
> > >
> > > 1.
> > > the key.  When replica identity FULL is specified,
> > > indexes can be used on the subscriber side for searching the rows.
> > > Candidate
> > > indexes must be btree, non-partial, and have at least one column 
> > > reference
> > > -   (i.e. cannot consist of only expressions).  These restrictions
> > > -   on the non-unique index properties adhere to some of the restrictions 
> > > that
> > > -   are enforced for primary keys.  If there are no such suitable indexes,
> > > +   at the leftmost column indexes (i.e. cannot consist of only
> > > expressions).  These
> > > +   restrictions on the non-unique index properties adhere to some of
> > > the restrictions
> > > +   that are enforced for primary keys.  If there are no such suitable 
> > > indexes,
> > > the search on the subscriber side can be very inefficient, therefore
> > > replica identity FULL should only be used as a
> > > fallback if no other solution is possible.  If a replica identity 
> > > other
> > >
> > > Isn't this using the word "indexes" with different meanings in the
> > > same sentence? e.g. IIUC "leftmost column indexes" is referring to the
> > > ordinal number of the index fields.
>
> That was my mistake, it should be " at the leftmost column".

IIUC the subscriber-side table can have more columns than the
publisher-side table, so just describing in the docs that the leftmost
INDEX field must be a column may not be quite enough; it also needs to
say that column has to exist on the publisher-table, doesn't it?

Also, after you document this 'leftmost field restriction' that
already implies there *must* be a non-expression in the INDEX. So I
thought we can just omit the "(i.e. cannot consist of only
expressions)" part.

Anyway, I will wait to see the wording of the updated patch before
commenting further.

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: [PoC] Federated Authn/z with OAUTHBEARER

2023-07-05 Thread Thomas Munro
On Thu, Jul 6, 2023 at 9:00 AM Jacob Champion  wrote:
> My goal is to maintain compatibility with existing PQconnectPoll()
> applications, where the only way we get to communicate with the client
> is through the PQsocket() for the connection. Ideally, you shouldn't
> have to completely rewrite your application loop just to make use of
> OAuth. (I assume a requirement like that would be a major roadblock to
> committing this -- and if that's not a correct assumption, then I
> guess my job gets a lot easier?)

Ah, right, I get it.

I guess there are a couple of ways to do it if we give up the goal of
no-code-change-for-the-client:

1.  Generalised PQsocket(), that so that a client can call something like:

int PQpollset(const PGConn *conn, struct pollfd fds[], int fds_size,
int *nfds, int *timeout_ms);

That way, libpq could tell you about which events it would like to
wait for on which fds, and when it would like you to call it back due
to timeout, and you can either pass that information directly to
poll() or WSAPoll() or some equivalent interface (we don't care, we
just gave you the info you need), or combine it in obvious ways with
whatever else you want to multiplex with in your client program.

2.  Convert those events into new libpq events like 'I want you to
call me back in 100ms', and 'call me back when socket #42 has data',
and let clients handle that by managing their own poll set etc.  (This
is something I've speculated about to support more efficient
postgres_fdw shard query multiplexing; gotta figure out how to get
multiple connections' events into one WaitEventSet...)

I guess there is a practical middle ground where client code on
systems that have epoll/kqueue can use OAUTHBEARER without any code
change, and the feature is available on other systems too but you'll
have to change your client code to use one of those interfaces or else
you get an error 'coz we just can't do it.  Or, more likely in the
first version, you just can't do it at all...  Doesn't seem that bad
to me.

BTW I will happily do the epoll->kqueue port work if necessary.




Re: pgsql: Clean up role created in new subscription test.

2023-07-05 Thread Daniel Gustafsson
> On 16 May 2023, at 11:17, Daniel Gustafsson  wrote:

> Parked in the July CF for now.

Rebased to fix a trivial conflict highlighted by the CFBot.

--
Daniel Gustafsson



v3-0001-pg_regress-Add-database-verification-test.patch
Description: Binary data


Re: Autogenerate some wait events code and documentation

2023-07-05 Thread Andres Freund
Hi,

On 2023-07-05 10:57:19 +0900, Michael Paquier wrote:
> With all that in place, VPATH builds, the CI, meson, configure/make
> and the various cleanup targets were working fine, so I have applied
> it.  Now let's see what the buildfarm tells.
>
> The final --stat number is like that:
>  22 files changed, 757 insertions(+), 2111 deletions(-)

That's pretty nice!

Rebasing a patch over this I was a bit confused because I got a bunch of
""unable to parse wait_event_names.txt" errors. Took me a while to figure out
that that was just because I didn't include a trailing . in the description.
Perhaps that could be turned into a more meaningful error?

die "unable to parse wait_event_names.txt"
  unless $line =~ /^(\w+)\t+(\w+)\t+("\w+")\t+("\w.*\.")$/;

It's not helped by the fact that the regex in the error actually doesn't match
any lines, because it's not operating on the input but on
push(@lines, $section_name . "\t" . $_);


I also do wonder if we should invest in generating the lwlock names as
well. Except for a few abbreviations, the second column is always the
camel-cased version of what follows WAIT_EVENT_. Feels pretty tedious to write
that out.

Perhaps we should just change the case of the upper-cased names (DSM, SSL,
WAL, ...) to follow the other names?


Greetings,

Andres Freund




[BUG] Fix DETACH with FK pointing to a partitioned table fails

2023-07-05 Thread Jehan-Guillaume de Rorthais
Hi,

(patch proposal below).

Consider a table with a FK pointing to a partitioned table.

  CREATE TABLE p ( id bigint PRIMARY KEY )
PARTITION BY list (id);
  CREATE TABLE p_1 PARTITION OF p FOR VALUES IN (1);

  CREATE TABLE r_1 (
id   bigint PRIMARY KEY,
p_id bigint NOT NULL,
FOREIGN KEY (p_id) REFERENCES p (id)
  );

Now, attach this table "refg_1" as partition of another one having the same FK:

  CREATE TABLE r (
id   bigint PRIMARY KEY,
p_id bigint NOT NULL,
FOREIGN KEY (p_id) REFERENCES p (id)
  ) PARTITION BY list (id);

  ALTER TABLE r ATTACH PARTITION r_1 FOR VALUES IN (1); 

The old sub-FKs (below 18289) created in this table to enforce the action
triggers on referenced partitions are not deleted when the table becomes a
partition. Because of this, we have additional and useless triggers on the
referenced partitions and we can not DETACH this partition on the referencing
side anymore:

  => ALTER TABLE r DETACH PARTITION r_1;
  ERROR:  could not find ON INSERT check triggers of foreign key
  constraint 18289

  => SELECT c.oid, conparentid, 
   conrelid::regclass, 
   confrelid::regclass, 
   t.tgfoid::regproc
 FROM pg_constraint c 
 JOIN pg_trigger t ON t.tgconstraint = c.oid
 WHERE confrelid::regclass = 'p_1'::regclass;
oid  │ conparentid │ conrelid │ confrelid │ tgfoid 
  ───┼─┼──┼───┼
   18289 │   18286 │ r_1  │ p_1   │ "RI_FKey_noaction_del"
   18289 │   18286 │ r_1  │ p_1   │ "RI_FKey_noaction_upd"
   18302 │   18299 │ r│ p_1   │ "RI_FKey_noaction_del"
   18302 │   18299 │ r│ p_1   │ "RI_FKey_noaction_upd"
  (4 rows)

The legitimate constraint and triggers here are 18302. The old sub-FK
18289 having 18286 as parent should have gone during the ATTACH PARTITION.

Please, find in attachment a patch dropping old "sub-FK" during the ATTACH
PARTITION command and adding a regression test about it. At the very least, it
help understanding the problem and sketch a possible solution.

Regards,
>From 5fc7997b9f9a17ee5a31f059c18e6c01fd716c04 Mon Sep 17 00:00:00 2001
From: Jehan-Guillaume de Rorthais 
Date: Wed, 5 Jul 2023 19:19:40 +0200
Subject: [PATCH v1] Remove useless parted-FK constraints when attaching a
 partition

When a FK is referencing a partitioned table, this FK is parenting
as many other "parted-FK" constraints than the referencing side has
partitions. Each of these sub-constraints enforce only the
UPDATE/DELETE triggers on each referenced partition, but not the
check triggers on the referencing partition as this is already
handle by the parent constraint. These parted-FK are half-backed on
purpose.

However when attaching such standalone table to a partitioned table
having the same FK definition, all these sub-constraints were not
removed. This leave a useless action trigger on each referenced
partition, the legitimate one already checking against the root of
the referencing partition.

Moreover, when the partition is later detached,
DetachPartitionFinalize() look for all existing FK on the relation
and try to detach them from the parent triggers and constraints.
But because these parted-FK are half backed, calling
GetForeignKeyCheckTriggers() to later detach the check triggers
raise an ERROR:

  ERROR:  could not find ON INSERT check triggers of foreign key
  constraint N.
---
 src/backend/commands/tablecmds.c  | 57 +--
 src/test/regress/expected/foreign_key.out | 22 +
 src/test/regress/sql/foreign_key.sql  | 27 +++
 3 files changed, 103 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index fce5e6f220..7c1aa5d395 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -10566,9 +10566,11 @@ tryAttachPartitionForeignKey(ForeignKeyCacheInfo *fk,
 	Form_pg_constraint parentConstr;
 	HeapTuple	partcontup;
 	Form_pg_constraint partConstr;
-	ScanKeyData key;
+	ScanKeyData key[3];
 	SysScanDesc scan;
 	HeapTuple	trigtup;
+	HeapTuple	contup;
+	Relation	pg_constraint;
 	Oid			insertTriggerOid,
 updateTriggerOid;
 
@@ -10631,12 +10633,12 @@ tryAttachPartitionForeignKey(ForeignKeyCacheInfo *fk,
 	 * in the partition.  We identify them because they have our constraint
 	 * OID, as well as being on the referenced rel.
 	 */
-	ScanKeyInit(,
+	ScanKeyInit(key,
 Anum_pg_trigger_tgconstraint,
 BTEqualStrategyNumber, F_OIDEQ,
 ObjectIdGetDatum(fk->conoid));
 	scan = systable_beginscan(trigrel, TriggerConstraintIndexId, true,
-			  NULL, 1, );
+			  NULL, 1, key);
 	while ((trigtup = systable_getnext(scan)) != NULL)
 	{
 		Form_pg_trigger trgform = (Form_pg_trigger) GETSTRUCT(trigtup);
@@ -10684,6 +10686,55 @@ tryAttachPartitionForeignKey(ForeignKeyCacheInfo *fk,
 	TriggerSetParentTrigger(trigrel, updateTriggerOid, parentUpdTrigger,
 		

Re: Should we remove db_user_namespace?

2023-07-05 Thread Nathan Bossart
On Mon, Jul 03, 2023 at 04:20:39PM +0900, Michael Paquier wrote:
> I am on the side of +1'ing for the removal.

Here is a rebased version of the patch.  So far no one has responded to the
pgsql-general thread [0], and no one here has argued for keeping this
parameter.  I'm planning to bump the pgsql-general thread next week to give
folks one more opportunity to object.

[0] https://postgr.es/m/20230630215608.GD2941194%40nathanxps13

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 3d46751ec7fa55d2ab776a9cb47533fe77ab0739 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 30 Jun 2023 12:46:08 -0700
Subject: [PATCH v3 1/1] remove db_user_namespace

---
 doc/src/sgml/client-auth.sgml |  5 --
 doc/src/sgml/config.sgml  | 52 ---
 src/backend/libpq/auth.c  |  5 --
 src/backend/libpq/hba.c   | 12 -
 src/backend/postmaster/postmaster.c   | 19 ---
 src/backend/utils/misc/guc_tables.c   |  9 
 src/backend/utils/misc/postgresql.conf.sample |  1 -
 src/include/libpq/pqcomm.h|  2 -
 8 files changed, 105 deletions(-)

diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index 204d09df67..6c95f0df1e 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -1253,11 +1253,6 @@ omicron bryanh  guest1
attacks.
   
 
-  
-   The md5 method cannot be used with
-   the  feature.
-  
-
   
To ease transition from the md5 method to the newer
SCRAM method, if md5 is specified as a method
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 6262cb7bb2..e6cea8ddfc 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1188,58 +1188,6 @@ include_dir 'conf.d'

   
  
-
- 
-  db_user_namespace (boolean)
-  
-   db_user_namespace configuration parameter
-  
-  
-  
-   
-This parameter enables per-database user names.  It is off by default.
-This parameter can only be set in the postgresql.conf
-file or on the server command line.
-   
-
-   
-If this is on, you should create users as username@dbname.
-When username is passed by a connecting client,
-@ and the database name are appended to the user
-name and that database-specific user name is looked up by the
-server. Note that when you create users with names containing
-@ within the SQL environment, you will need to
-quote the user name.
-   
-
-   
-With this parameter enabled, you can still create ordinary global
-users.  Simply append @ when specifying the user
-name in the client, e.g., joe@.  The @
-will be stripped off before the user name is looked up by the
-server.
-   
-
-   
-db_user_namespace causes the client's and
-server's user name representation to differ.
-Authentication checks are always done with the server's user name
-so authentication methods must be configured for the
-server's user name, not the client's.  Because
-md5 uses the user name as salt on both the
-client and server, md5 cannot be used with
-db_user_namespace.
-   
-
-   
-
- This feature is intended as a temporary measure until a
- complete solution is found.  At that time, this option will
- be removed.
-
-   
-  
- 
  
  
 
diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index a98b934a8e..65d452f099 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -873,11 +873,6 @@ CheckMD5Auth(Port *port, char *shadow_pass, const char **logdetail)
 	char	   *passwd;
 	int			result;
 
-	if (Db_user_namespace)
-		ereport(FATAL,
-(errcode(ERRCODE_INVALID_AUTHORIZATION_SPECIFICATION),
- errmsg("MD5 authentication is not supported when \"db_user_namespace\" is enabled")));
-
 	/* include the salt to use for computing the response */
 	if (!pg_strong_random(md5Salt, 4))
 	{
diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c
index f89f138f3c..5d4ddbb04d 100644
--- a/src/backend/libpq/hba.c
+++ b/src/backend/libpq/hba.c
@@ -1741,19 +1741,7 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 	else if (strcmp(token->string, "reject") == 0)
 		parsedline->auth_method = uaReject;
 	else if (strcmp(token->string, "md5") == 0)
-	{
-		if (Db_user_namespace)
-		{
-			ereport(elevel,
-	(errcode(ERRCODE_CONFIG_FILE_ERROR),
-	 errmsg("MD5 authentication is not supported when \"db_user_namespace\" is enabled"),
-	 errcontext("line %d of configuration file \"%s\"",
-line_num, file_name)));
-			*err_msg = "MD5 authentication is not supported when \"db_user_namespace\" is enabled";
-			return NULL;
-		}
 		

Re: Fix last unitialized memory warning

2023-07-05 Thread Tristan Partin
On Mon Jul 3, 2023 at 1:19 AM CDT, Peter Eisentraut wrote:
> On 07.06.23 16:31, Tristan Partin wrote:
> > This patch is really not necessary from a functional point of view. It
> > is only necessary if we want to silence a compiler warning.
> > 
> > Tested on `gcc (GCC) 13.1.1 20230511 (Red Hat 13.1.1-2)`.
> > 
> > After silencing this warning, all I am left with (given my build
> > configuration) is:
> > 
> > [1667/2280] Compiling C object src/pl/plpgsql/src/plpgsql.so.p/pl_exec.c.o
> > In file included from ../src/include/access/htup_details.h:22,
> >   from ../src/pl/plpgsql/src/pl_exec.c:21:
> > In function ‘assign_simple_var’,
> >  inlined from ‘exec_set_found’ at 
> > ../src/pl/plpgsql/src/pl_exec.c:8349:2:
> > ../src/include/varatt.h:230:36: warning: array subscript 0 is outside array 
> > bounds of ‘char[0]’ [-Warray-bounds=]
> >230 | (((varattrib_1b_e *) (PTR))->va_tag)
> >|^
> > ../src/include/varatt.h:94:12: note: in definition of macro 
> > ‘VARTAG_IS_EXPANDED’
> > 94 | (((tag) & ~1) == VARTAG_EXPANDED_RO)
> >|^~~
> > ../src/include/varatt.h:284:57: note: in expansion of macro ‘VARTAG_1B_E’
> >284 | #define VARTAG_EXTERNAL(PTR)
> > VARTAG_1B_E(PTR)
> >| ^~~
> > ../src/include/varatt.h:301:57: note: in expansion of macro 
> > ‘VARTAG_EXTERNAL’
> >301 | (VARATT_IS_EXTERNAL(PTR) && 
> > !VARTAG_IS_EXPANDED(VARTAG_EXTERNAL(PTR)))
> >| 
> > ^~~
> > ../src/pl/plpgsql/src/pl_exec.c:8537:17: note: in expansion of macro 
> > ‘VARATT_IS_EXTERNAL_NON_EXPANDED’
> >   8537 | 
> > VARATT_IS_EXTERNAL_NON_EXPANDED(DatumGetPointer(newvalue)))
> >| ^~~
> > 
> >  From my perspective, this warning definitely seems like a false
> > positive, but I don't know the code well-enough to say that for certain.
>
> I cannot reproduce this warning with gcc-13.  Are you using any 
> non-standard optimization options.  Could you give your full configure 
> and build commands and the OS?

Thanks for following up. My system is Fedora 38. I can confirm this is
still happening on master.

$ gcc --version
gcc (GCC) 13.1.1 20230614 (Red Hat 13.1.1-4)
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ meson setup build --buildtype=release
$ ninja -C build

-- 
Tristan Partin
Neon (https://neon.tech)




Re: [PoC] Federated Authn/z with OAUTHBEARER

2023-07-05 Thread Jacob Champion
On Fri, Jun 30, 2023 at 9:29 PM Thomas Munro  wrote:
>
> On Sat, May 20, 2023 at 10:01 AM Jacob Champion  
> wrote:
> > - The client implementation is currently epoll-/Linux-specific. I think
> > kqueue shouldn't be too much trouble for the BSDs, but it's even more
> > code to maintain.
>
> I guess you also need a fallback that uses plain old POSIX poll()?

The use of the epoll API here is to combine several sockets into one,
not to actually call epoll_wait() itself. kqueue descriptors should
let us do the same, IIUC.

> I see you're not just using epoll but also timerfd.  Could that be
> converted to plain old timeout bookkeeping?  That should be enough to
> get every other Unix and *possibly* also Windows to work with the same
> code path.

I might be misunderstanding your suggestion, but I think our internal
bookkeeping is orthogonal to that. The use of timerfd here allows us
to forward libcurl's timeout requirements up to the top-level
PQsocket(). As an example, libcurl is free to tell us to call it again
in ten milliseconds, and we have to make sure a nonblocking client
calls us again after that elapses; otherwise they might hang waiting
for data that's not coming.

> > - Unless someone is aware of some amazing Winsock magic, I'm pretty sure
> > the multiplexed-socket approach is dead in the water on Windows. I think
> > the strategy there probably has to be a background thread plus a fake
> > "self-pipe" (loopback socket) for polling... which may be controversial?
>
> I am not a Windows user or hacker, but there are certainly several
> ways to multiplex sockets.  First there is the WSAEventSelect() +
> WaitForMultipleObjects() approach that latch.c uses.

I don't think that strategy plays well with select() clients, though
-- it requires a handle array, and we've just got the one socket.

My goal is to maintain compatibility with existing PQconnectPoll()
applications, where the only way we get to communicate with the client
is through the PQsocket() for the connection. Ideally, you shouldn't
have to completely rewrite your application loop just to make use of
OAuth. (I assume a requirement like that would be a major roadblock to
committing this -- and if that's not a correct assumption, then I
guess my job gets a lot easier?)

> It's a shame to write modern code using select(), but you can find
> lots of shouting all over the internet about WSAPoll()'s defects, most
> famously the cURL guys[1] whose blog is widely cited, so people still
> do it.

Right -- that's basically the root of my concern. I can't guarantee
that existing Windows clients out there are all using
WaitForMultipleObjects(). From what I can tell, whatever we hand up
through PQsocket() has to be fully Winsock-/select-compatible.

> Another thing people
> complain about is the lack of socketpair() or similar in winsock which
> means you unfortunately can't easily make anonymous
> select/poll-compatible local sockets, but that doesn't seem to be
> needed here.

For the background-thread implementation, it probably would be. I've
been looking at libevent (BSD-licensed) and its socketpair hack for
Windows...

Thanks!
--Jacob




Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

2023-07-05 Thread Tristan Partin
Someday I will learn...

Attached is the v2.

-- 
Tristan Partin
Neon (https://neon.tech)
From 5688bc2b2c6331f437a72b6a429199c5416ccd76 Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Fri, 30 Jun 2023 09:31:04 -0500
Subject: [PATCH v2 1/3] Skip checking for uselocale on Windows

Windows doesn't have uselocale, so skip it.
---
 meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/meson.build b/meson.build
index fbec997947..df76f10822 100644
--- a/meson.build
+++ b/meson.build
@@ -2439,7 +2439,7 @@ func_checks = [
   ['strsignal'],
   ['sync_file_range'],
   ['syncfs'],
-  ['uselocale'],
+  ['uselocale', {'skip': host_system == 'windows'}],
   ['wcstombs_l'],
 ]
 
-- 
Tristan Partin
Neon (https://neon.tech)

From d52ab7851e9638e36cd99859014d7aa31154e600 Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Fri, 30 Jun 2023 11:13:29 -0500
Subject: [PATCH v2 2/3] Add locale_is_c function

In some places throughout the codebase, there are string comparisons for
locales matching C or POSIX. Encapsulate this logic into a single
function and use it.
---
 src/backend/utils/adt/pg_locale.c | 37 ++-
 src/backend/utils/init/postinit.c |  4 +---
 src/backend/utils/mb/mbutils.c|  5 ++---
 src/include/utils/pg_locale.h |  1 +
 4 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 514d4a9eeb..b455ef3aff 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -154,6 +154,21 @@ static void icu_set_collation_attributes(UCollator *collator, const char *loc,
 		 UErrorCode *status);
 #endif
 
+/*
+ * Check if a locale is the C locale
+ *
+ * POSIX is an alias for C. Passing ingore_case as true will use
+ * pg_strcasecmp() instead of strcmp().
+ */
+bool
+locale_is_c(const char *locale, bool ignore_case)
+{
+	if (!ignore_case)
+		return strcmp(locale, "C") == 0 || strcmp(locale, "POSIX") == 0;
+
+	return pg_strcasecmp(locale, "C") == 0 || pg_strcasecmp(locale, "POSIX") == 0;
+}
+
 /*
  * pg_perm_setlocale
  *
@@ -1239,10 +1254,8 @@ lookup_collation_cache(Oid collation, bool set_flags)
 			datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_collctype);
 			collctype = TextDatumGetCString(datum);
 
-			cache_entry->collate_is_c = ((strcmp(collcollate, "C") == 0) ||
-		 (strcmp(collcollate, "POSIX") == 0));
-			cache_entry->ctype_is_c = ((strcmp(collctype, "C") == 0) ||
-	   (strcmp(collctype, "POSIX") == 0));
+			cache_entry->collate_is_c = locale_is_c(collcollate, false);
+			cache_entry->ctype_is_c = locale_is_c(collctype, false);
 		}
 		else
 		{
@@ -1290,12 +1303,8 @@ lc_collate_is_c(Oid collation)
 		if (!localeptr)
 			elog(ERROR, "invalid LC_COLLATE setting");
 
-		if (strcmp(localeptr, "C") == 0)
-			result = true;
-		else if (strcmp(localeptr, "POSIX") == 0)
-			result = true;
-		else
-			result = false;
+		result = locale_is_c(localeptr, false);
+
 		return (bool) result;
 	}
 
@@ -1343,12 +1352,8 @@ lc_ctype_is_c(Oid collation)
 		if (!localeptr)
 			elog(ERROR, "invalid LC_CTYPE setting");
 
-		if (strcmp(localeptr, "C") == 0)
-			result = true;
-		else if (strcmp(localeptr, "POSIX") == 0)
-			result = true;
-		else
-			result = false;
+		result = locale_is_c(localeptr, false);
+
 		return (bool) result;
 	}
 
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 0f9b92b32e..a92a0c438f 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -419,9 +419,7 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 		   " which is not recognized by setlocale().", ctype),
  errhint("Recreate the database with another locale or install the missing locale.")));
 
-	if (strcmp(ctype, "C") == 0 ||
-		strcmp(ctype, "POSIX") == 0)
-		database_ctype_is_c = true;
+	database_ctype_is_c = locale_is_c(ctype, false);
 
 	if (dbform->datlocprovider == COLLPROVIDER_ICU)
 	{
diff --git a/src/backend/utils/mb/mbutils.c b/src/backend/utils/mb/mbutils.c
index 67a1ab2ab2..997156515c 100644
--- a/src/backend/utils/mb/mbutils.c
+++ b/src/backend/utils/mb/mbutils.c
@@ -39,6 +39,7 @@
 #include "mb/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
+#include "utils/pg_locale.h"
 #include "utils/syscache.h"
 #include "varatt.h"
 
@@ -1237,9 +1238,7 @@ pg_bind_textdomain_codeset(const char *domainname)
 	int			new_msgenc;
 
 #ifndef WIN32
-	const char *ctype = setlocale(LC_CTYPE, NULL);
-
-	if (pg_strcasecmp(ctype, "C") == 0 || pg_strcasecmp(ctype, "POSIX") == 0)
+	if (locale_is_c(locale_ctype, true))
 #endif
 		if (encoding != PG_SQL_ASCII &&
 			raw_pg_bind_textdomain_codeset(domainname, encoding))
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index e2a7243542..872bef798a 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -54,6 +54,7 @@ extern 

Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

2023-07-05 Thread Tristan Partin
On Mon Jul 3, 2023 at 9:42 AM CDT, Tristan Partin wrote:
> Thanks for your patience. Attached is a patch that should cover all the
> problematic use cases of setlocale(). There are some setlocale() calls in
> tests, initdb, and ecpg left. I plan to get to ecpglib before the final
> version of this patch after I abstract over Windows not having
> uselocale(). I think leaving initdb and tests as is would be fine, but I
> am also happy to just permanently purge setlocale() from the codebase
> if people see value in that. We could also poison[0] setlocale() at that
> point.
>
> [0]: https://gcc.gnu.org/onlinedocs/cpp/Pragmas.html

Here is a v2 with best effort Windows support. My patch currently
assumes that you either have uselocale() or are Windows. I dropped the
environment variable hacks, but could bring them back if we didn't like
this requirement.

I tried to add an email[0] to discuss this with hackers, but failed to add
the CC. Let's discuss here instead given my complete inability to manage
mailing lists :).

[0]: https://www.postgresql.org/message-id/CTUJ604ZWHI1.3PFZK152XCWLX%40gonk

-- 
Tristan Partin
Neon (https://neon.tech)




Re: Add PQsendSyncMessage() to libpq

2023-07-05 Thread Daniel Gustafsson
> On 21 May 2023, at 19:17, Anton Kirilov  wrote:

> .. here is an updated version of the patch 

This hunk here:

-   if (PQflush(conn) < 0)
+   const int ret = flags & PG_PIPELINEPUTSYNC_FLUSH ? PQflush(conn) : 0;
+
+   if (ret < 0)

..is causing this compiler warning:

fe-exec.c: In function ‘PQpipelinePutSync’:
fe-exec.c:3203:2: error: ISO C90 forbids mixed declarations and code 
[-Werror=declaration-after-statement]
3203 | const int ret = flags & PG_PIPELINEPUTSYNC_FLUSH ? PQflush(conn) : 0;
 | ^
cc1: all warnings being treated as errors

Also, the patch no longer applies. Please rebase and send an updated version.

--
Daniel Gustafsson





Re: Including a sample Table Access Method with core code

2023-07-05 Thread Fabrízio de Royes Mello
>
> > Included Mark Dilger directly to this mail as he mentioned he has a
> > Perl script that makes a functional copy of heap AM that can be
> > compiled as installed as custom AM.
>
> Similar discussion has happened in 640c198 related to the creation of
> dummy_index_am, where the argument is that such a module needs to
> provide value in testing some of the core internals.  dummy_index_am
> did so for reloptions on indexes because there was not much coverage
> for that part of the system.
>
> > @mark - maybe you can create 3 boilerplate Table AMs for the above
> > named `mem_am`, `overlay_am` and `py3_am` and we could put them
> > somewhere for interested parties to play with ?
>
> Not sure if that's worth counting, but I also have a table AM template
> stored in my plugin repo:
> https://github.com/michaelpq/pg_plugins/tree/main/blackhole_am
>

And based on your `blackhole_am` I've sent a patch [1] to add a
`dummy_table_am` for testing purposes.

Regards,

[1]
https://www.postgresql.org/message-id/CAFcNs+pcU2ib=jvjnznbod+m2tho+vd77c_yzj2rsgr0tp3...@mail.gmail.com

--
Fabrízio de Royes Mello


Re: [PATCH] Honor PG_TEST_NOCLEAN for tempdirs

2023-07-05 Thread Jacob Champion
On Sun, Jul 2, 2023 at 6:17 PM Michael Paquier  wrote:
> Adjusted that as well, on top of an extra comment.

Thanks all!

--Jacob




Re: Disabling Heap-Only Tuples

2023-07-05 Thread Laurenz Albe
On Wed, 2023-07-05 at 12:02 +0100, Thom Brown wrote:
> On Wed, 5 Jul 2023 at 11:57, Matthias van de Meent 
>  wrote:
> > On Wed, 5 Jul 2023 at 12:45, Thom Brown  wrote:
> > > Heap-Only Tuple (HOT) updates are a significant performance
> > > enhancement, as they prevent unnecessary page writes. However, HOT
> > > comes with a caveat: it means that if we have lots of available space
> > > earlier on in the relation, it can only be used for new tuples or in
> > > cases where there's insufficient space on a page for an UPDATE to use
> > > HOT.
> > > 
> > > Considering these trade-offs, I'd like to propose an option to allow
> > > superusers to disable HOT on tables. The intent is to trade some
> > > performance benefits for the ability to reduce the size of a table
> > > without the typical locking associated with it.
> > 
> > Interesting use case, but I think that disabling HOT would be missing
> > the forest for the trees. I think that a feature that disables
> > block-local updates for pages > some offset would be a better solution
> > to your issue: Normal updates also prefer the new tuple to be stored
> > in the same pages as the old tuple if at all possible, so disabling
> > HOT wouldn't solve the issue of tuples residing in the tail of your
> > table - at least not while there is still empty space in those pages.
> 
> Hmm... I see your point.  It's when an UPDATE isn't going to land on
> the same page that it relocates to the earlier available page.  So I
> guess I'm after whatever mechanism would allow that to happen reliably
> and predictably.
> 
> So $subject should really be "Allow forcing UPDATEs off the same page".

I've been thinking about the same thing - an option that changes the update
strategy to always use the lowest block with enough free space.

That would allow to consolidate bloated tables with no down time.

Yours,
Laurenz Albe




Re: Allow specifying a dbname in pg_basebackup connection string

2023-07-05 Thread Jelte Fennema
On Wed, 5 Jul 2023 at 20:09, Thom Brown  wrote:
> Okay, understood.  In that case, please remember to write changes to
> the pg_basebackup docs page explaining how the dbname value is ignored


I updated the wording in the docs for pg_basebackup and pg_receivewal.
They now clarify that Postgres itself doesn't care if there's a
database name in the connection string, but that a proxy might.


v2-0001-Allow-specifying-a-dbname-in-pg_basebackup-connec.patch
Description: Binary data


Re: brininsert optimization opportunity

2023-07-05 Thread Soumyadeep Chakraborty
On Wed, Jul 5, 2023 at 3:16 AM Tomas Vondra
 wrote:

> > I'll try this out and introduce a couple of new index AM callbacks. I
> > think it's best to do it before releasing the locks - otherwise it
> > might be weird
> > to manipulate buffers of an index relation, without having some sort of 
> > lock on
> > it. I'll think about it some more.
> >
>
> I don't understand why would this need more than just a callback to
> release the cache.

We wouldn't. I thought that it would be slightly cleaner and slightly more
performant if we moved the (if !state) branches out of the XXXinsert()
functions.
But I guess, let's minimize the changes here. One cleanup callback is enough.

> > PS: It should be possible to make GIN and GiST use the new index AM APIs
> > as well.
> >
>
> Why should GIN/GiST use the new API? I think it's perfectly sensible to
> only require the "cleanup callback" when just pfree() is not enough.

Yeah no need.

Attached v3 of the patch w/ a single index AM callback.

Regards,
Soumyadeep (VMware)
From 563b905da5c2602113044689e37f8385cbfbde64 Mon Sep 17 00:00:00 2001
From: Soumyadeep Chakraborty 
Date: Wed, 5 Jul 2023 10:59:11 -0700
Subject: [PATCH v3 1/1] Reuse revmap and brin desc in brininsert

brininsert() used to have code that performed per-tuple initialization
of the revmap. That had some overhead.

To mitigate, we introduce a new AM callback to clean up state at the end
of all inserts, and perform the revmap cleanup there.
---
 contrib/bloom/blutils.c  |  1 +
 doc/src/sgml/indexam.sgml| 14 +-
 src/backend/access/brin/brin.c   | 74 +++-
 src/backend/access/gin/ginutil.c |  1 +
 src/backend/access/gist/gist.c   |  1 +
 src/backend/access/hash/hash.c   |  1 +
 src/backend/access/index/indexam.c   | 15 ++
 src/backend/access/nbtree/nbtree.c   |  1 +
 src/backend/access/spgist/spgutils.c |  1 +
 src/backend/executor/execIndexing.c  |  5 ++
 src/include/access/amapi.h   |  3 ++
 src/include/access/brin_internal.h   |  1 +
 src/include/access/genam.h   |  2 +
 13 files changed, 108 insertions(+), 12 deletions(-)

diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
index d935ed8fbdf..9720f69de09 100644
--- a/contrib/bloom/blutils.c
+++ b/contrib/bloom/blutils.c
@@ -131,6 +131,7 @@ blhandler(PG_FUNCTION_ARGS)
 	amroutine->ambuild = blbuild;
 	amroutine->ambuildempty = blbuildempty;
 	amroutine->aminsert = blinsert;
+	amroutine->aminsertcleanup = NULL;
 	amroutine->ambulkdelete = blbulkdelete;
 	amroutine->amvacuumcleanup = blvacuumcleanup;
 	amroutine->amcanreturn = NULL;
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index e813e2b620a..17f7d6597f1 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -139,6 +139,7 @@ typedef struct IndexAmRoutine
 ambuild_function ambuild;
 ambuildempty_function ambuildempty;
 aminsert_function aminsert;
+aminsertcleanup_function aminsertcleanup;
 ambulkdelete_function ambulkdelete;
 amvacuumcleanup_function amvacuumcleanup;
 amcanreturn_function amcanreturn;   /* can be NULL */
@@ -355,11 +356,22 @@ aminsert (Relation indexRelation,
within an SQL statement, it can allocate space
in indexInfo-ii_Context and store a pointer to the
data in indexInfo-ii_AmCache (which will be NULL
-   initially).
+   initially). If the data cached contains structures that can be simply pfree'd,
+   they will implicitly be pfree'd. But, if it contains more complex data, such
+   as Buffers or TupleDescs, additional cleanup is necessary. This additional
+   cleanup can be performed in indexinsertcleanup.
   
 
   
 
+bool
+aminsertcleanup (IndexInfo *indexInfo);
+
+   Clean up state that was maintained across successive inserts in
+   indexInfo-ii_AmCache. This is useful if the data
+   contained is complex - like Buffers or TupleDescs which need additional
+   cleanup, unlike simple structures that will be implicitly pfree'd.
+
 IndexBulkDeleteResult *
 ambulkdelete (IndexVacuumInfo *info,
   IndexBulkDeleteResult *stats,
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 3c6a956eaa3..2da59f2ab03 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -58,6 +58,17 @@ typedef struct BrinBuildState
 	BrinMemTuple *bs_dtuple;
 } BrinBuildState;
 
+/*
+ * We use a BrinInsertState to capture running state spanning multiple
+ * brininsert invocations, within the same command.
+ */
+typedef struct BrinInsertState
+{
+	BrinRevmap *bs_rmAccess;
+	BrinDesc   *bs_desc;
+	BlockNumber bs_pages_per_range;
+}			BrinInsertState;
+
 /*
  * Struct used as "opaque" during index scans
  */
@@ -72,6 +83,7 @@ typedef struct BrinOpaque
 
 static BrinBuildState *initialize_brin_buildstate(Relation idxRel,
   BrinRevmap *revmap, BlockNumber pagesPerRange);
+static BrinInsertState *initialize_brin_insertstate(Relation idxRel, IndexInfo *indexInfo);
 

Re: Allow specifying a dbname in pg_basebackup connection string

2023-07-05 Thread Thom Brown
On Wed, 5 Jul 2023 at 16:50, Jelte Fennema  wrote:
>
> On Wed, 5 Jul 2023 at 16:01, Euler Taveira  wrote:
> > One of the PgBouncer's missions is to be a transparent proxy.
> >
> > Sometimes you cannot reach out the database directly due to a security 
> > policy.
>
> Indeed the transparent proxy use case is where replication through
> pgbouncer makes sense. There's quite some reasons to set up PgBouncer
> like such a proxy apart from security policies. Some others that come
> to mind are:
> - load balancer layer of pgbouncers
> - transparent failovers
> - transparent database moves
>
> And in all of those cases its nice for a user to use a single
> connection string/hostname. Instead of having to think: Oh yeah, for
> backups, I need to use this other one.

Okay, understood.  In that case, please remember to write changes to
the pg_basebackup docs page explaining how the dbname value is ignored
under normal usage.

Thom




Re: Disabling Heap-Only Tuples

2023-07-05 Thread Thom Brown
On Wed, 5 Jul 2023 at 18:05, Matthias van de Meent
 wrote:
>
> On Wed, 5 Jul 2023 at 14:39, Thom Brown  wrote:
> >
> > On Wed, 5 Jul 2023 at 13:12, Matthias van de Meent
> >  wrote:
> > >
> > > On Wed, 5 Jul 2023 at 13:03, Thom Brown  wrote:
> > > >
> > > > On Wed, 5 Jul 2023 at 11:57, Matthias van de Meent
> > > >  wrote:
> > > > >
> > > > > On Wed, 5 Jul 2023 at 12:45, Thom Brown  wrote:
> > > > > > Heap-Only Tuple (HOT) updates are a significant performance
> > > > > > enhancement, as they prevent unnecessary page writes. However, HOT
> > > > > > comes with a caveat: it means that if we have lots of available 
> > > > > > space
> > > > > > earlier on in the relation, it can only be used for new tuples or in
> > > > > > cases where there's insufficient space on a page for an UPDATE to 
> > > > > > use
> > > > > > HOT.
> > > > > >
> > > > > > This mechanism limits our options for condensing tables, forcing us 
> > > > > > to
> > > > > > resort to methods like running VACUUM FULL/CLUSTER or using external
> > > > > > tools like pg_repack. These either require exclusive locks (which 
> > > > > > will
> > > > > > be a deal-breaker on large tables on a production system), or 
> > > > > > there's
> > > > > > risks involved. Of course we can always flood pages with new 
> > > > > > versions
> > > > > > of a row until it's forced onto an early page, but that shouldn't be
> > > > > > necessary.
> > > > > >
> > > > > > Considering these trade-offs, I'd like to propose an option to allow
> > > > > > superusers to disable HOT on tables. The intent is to trade some
> > > > > > performance benefits for the ability to reduce the size of a table
> > > > > > without the typical locking associated with it.
> > > > >
> > > > > Interesting use case, but I think that disabling HOT would be missing
> > > > > the forest for the trees. I think that a feature that disables
> > > > > block-local updates for pages > some offset would be a better solution
> > > > > to your issue: Normal updates also prefer the new tuple to be stored
> > > > > in the same pages as the old tuple if at all possible, so disabling
> > > > > HOT wouldn't solve the issue of tuples residing in the tail of your
> > > > > table - at least not while there is still empty space in those pages.
> > > >
> > > > Hmm... I see your point.  It's when an UPDATE isn't going to land on
> > > > the same page that it relocates to the earlier available page.  So I
> > > > guess I'm after whatever mechanism would allow that to happen reliably
> > > > and predictably.
> > > >
> > > > So $subject should really be "Allow forcing UPDATEs off the same page".
> > >
> > > You'd probably want to do that only for a certain range of the table -
> > > for a table with 1GB of data and 3GB of bloat there is no good reason
> > > to force page-crossing updates in the first 1GB of the table - all
> > > tuples of the table will eventually reside there, so why would you
> > > take a performance penalty and move the tuples from inside that range
> > > to inside that same range?
> >
> > I'm thinking more of a case of:
> >
> > 
> >
> > UPDATE bigtable
> > SET primary key = primary key
> > WHERE ctid IN (
> > SELECT ctid
> > FROM bigtable
> > ORDER BY ctid DESC
> > LIMIT 10);
>
> So what were you thinking of? A session GUC? A table option?

Both.

> The benefit of a table option is that it is retained across sessions
> and thus allows tables that get enough updates to eventually get to a
> cleaner state. The main downside of such a table option is that it
> requires a temporary table-level lock to update the parameter.

Yes, but the maintenance window to make such a change would be extremely brief.

> The benefit of a session GUC is that you can set it without impacting
> other sessions, but the downside is that you need to do the
> maintenance in that session, and risk that cascading updates to other
> tables (e.g. through AFTER UPDATE triggers) are also impacted by this
> non-local update GUC.
>
> > > Something else to note: Indexes would suffer some (large?) amount of
> > > bloat in this process, as you would be updating a lot of tuples
> > > without the HOT optimization, thus increasing the work to be done by
> > > VACUUM.
> > > This may result in more bloat in indexes than what you get back from
> > > shrinking the table.
> >
> > This could be the case, but I guess indexes are expendable to an
> > extent, unlike tables.
>
> I don't think that's accurate - index rebuilds are quite expensive.
> But, that's besides the point of this thread.
>
> Somewhat related: did you consider using pg_repack instead of this
> potential feature?

pg_repack isn't exactly innocuous, and can leave potentially the
database in an irrevocable state.  Plus, if disk space is an issue, it
doesn't help.

Thom




Re: logicalrep_message_type throws an error

2023-07-05 Thread Alvaro Herrera
On 2023-Jul-05, Alvaro Herrera wrote:

> On 2023-Jul-05, Euler Taveira wrote:
> 
> > Isn't this numerical value already exposed in the error message (X = 88)?
> > In this example, it is:
> > 
> > ERROR:  invalid logical replication message type "X"
> > CONTEXT:  processing remote data for replication origin "pg_16638" during 
> > message type "??? (88)" in transaction 796, finished at 0/1626698
> > 
> > IMO it could be confusing if we provide two representations of the same 
> > data (X
> > and 88). Since we already provide "X" I don't think we need "88".
> 
> The CONTEXT message could theoretically appear in other error throws,
> not just in "invalid logical replication message".  So the duplicity is
> not really a problem.

Ah, but you're thinking that if the message type is invalid, then it
will have been rejected in the "invalid logical replication message"
stage, so no invalid message type will be reported.  I guess there's a
point to that argument as well.

However, I don't see that having the numerical ASCII value there causes
any harm, even if the char value is already exposed in the other
message.  (I'm sure you'll agree that this is quite a minor issue.)

I doubt that each of these two prints of the enum value showing
different formats is confusing.  Yes, the enum is defined in terms of
char literals, but if an actually invalid message shows up with an
uint32 value outside the printable ASCII range, the printout might
be ugly or useless.

> > Another option, is to remove "X" from apply_dispatch() and add "???
> > (88)" to logicalrep_message_type().

Now *this* would be an actively bad idea IMO.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: Disabling Heap-Only Tuples

2023-07-05 Thread Matthias van de Meent
On Wed, 5 Jul 2023 at 14:39, Thom Brown  wrote:
>
> On Wed, 5 Jul 2023 at 13:12, Matthias van de Meent
>  wrote:
> >
> > On Wed, 5 Jul 2023 at 13:03, Thom Brown  wrote:
> > >
> > > On Wed, 5 Jul 2023 at 11:57, Matthias van de Meent
> > >  wrote:
> > > >
> > > > On Wed, 5 Jul 2023 at 12:45, Thom Brown  wrote:
> > > > > Heap-Only Tuple (HOT) updates are a significant performance
> > > > > enhancement, as they prevent unnecessary page writes. However, HOT
> > > > > comes with a caveat: it means that if we have lots of available space
> > > > > earlier on in the relation, it can only be used for new tuples or in
> > > > > cases where there's insufficient space on a page for an UPDATE to use
> > > > > HOT.
> > > > >
> > > > > This mechanism limits our options for condensing tables, forcing us to
> > > > > resort to methods like running VACUUM FULL/CLUSTER or using external
> > > > > tools like pg_repack. These either require exclusive locks (which will
> > > > > be a deal-breaker on large tables on a production system), or there's
> > > > > risks involved. Of course we can always flood pages with new versions
> > > > > of a row until it's forced onto an early page, but that shouldn't be
> > > > > necessary.
> > > > >
> > > > > Considering these trade-offs, I'd like to propose an option to allow
> > > > > superusers to disable HOT on tables. The intent is to trade some
> > > > > performance benefits for the ability to reduce the size of a table
> > > > > without the typical locking associated with it.
> > > >
> > > > Interesting use case, but I think that disabling HOT would be missing
> > > > the forest for the trees. I think that a feature that disables
> > > > block-local updates for pages > some offset would be a better solution
> > > > to your issue: Normal updates also prefer the new tuple to be stored
> > > > in the same pages as the old tuple if at all possible, so disabling
> > > > HOT wouldn't solve the issue of tuples residing in the tail of your
> > > > table - at least not while there is still empty space in those pages.
> > >
> > > Hmm... I see your point.  It's when an UPDATE isn't going to land on
> > > the same page that it relocates to the earlier available page.  So I
> > > guess I'm after whatever mechanism would allow that to happen reliably
> > > and predictably.
> > >
> > > So $subject should really be "Allow forcing UPDATEs off the same page".
> >
> > You'd probably want to do that only for a certain range of the table -
> > for a table with 1GB of data and 3GB of bloat there is no good reason
> > to force page-crossing updates in the first 1GB of the table - all
> > tuples of the table will eventually reside there, so why would you
> > take a performance penalty and move the tuples from inside that range
> > to inside that same range?
>
> I'm thinking more of a case of:
>
> 
>
> UPDATE bigtable
> SET primary key = primary key
> WHERE ctid IN (
> SELECT ctid
> FROM bigtable
> ORDER BY ctid DESC
> LIMIT 10);

So what were you thinking of? A session GUC? A table option?

The benefit of a table option is that it is retained across sessions
and thus allows tables that get enough updates to eventually get to a
cleaner state. The main downside of such a table option is that it
requires a temporary table-level lock to update the parameter.

The benefit of a session GUC is that you can set it without impacting
other sessions, but the downside is that you need to do the
maintenance in that session, and risk that cascading updates to other
tables (e.g. through AFTER UPDATE triggers) are also impacted by this
non-local update GUC.

> > Something else to note: Indexes would suffer some (large?) amount of
> > bloat in this process, as you would be updating a lot of tuples
> > without the HOT optimization, thus increasing the work to be done by
> > VACUUM.
> > This may result in more bloat in indexes than what you get back from
> > shrinking the table.
>
> This could be the case, but I guess indexes are expendable to an
> extent, unlike tables.

I don't think that's accurate - index rebuilds are quite expensive.
But, that's besides the point of this thread.

Somewhat related: did you consider using pg_repack instead of this
potential feature?

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)




Re: logicalrep_message_type throws an error

2023-07-05 Thread Alvaro Herrera
On 2023-Jul-05, Euler Taveira wrote:

> Isn't this numerical value already exposed in the error message (X = 88)?
> In this example, it is:
> 
> ERROR:  invalid logical replication message type "X"
> CONTEXT:  processing remote data for replication origin "pg_16638" during 
> message type "??? (88)" in transaction 796, finished at 0/1626698
> 
> IMO it could be confusing if we provide two representations of the same data 
> (X
> and 88). Since we already provide "X" I don't think we need "88". Another
> option, is to remove "X" from apply_dispatch() and add "??? (88)" to 
> logicalrep_message_type().
> 
> Opinions?

The CONTEXT message could theoretically appear in other error throws,
not just in "invalid logical replication message".  So the duplicity is
not really a problem.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Now I have my system running, not a byte was off the shelf;
It rarely breaks and when it does I fix the code myself.
It's stable, clean and elegant, and lightning fast as well,
And it doesn't cost a nickel, so Bill Gates can go to hell."




Re: [PATCH] Add GitLab CI to PostgreSQL

2023-07-05 Thread Matthias van de Meent
On Wed, 5 Jul 2023 at 15:22, Andrew Dunstan  wrote:
>
>
> On 2023-07-04 Tu 19:44, Newhouse, Robin wrote:
>
> > Hello!
> >
> > I propose the attached patch to be applied on the 'master' branch
> > of PostgreSQL to add GitLab CI automation alongside Cirrus CI in the 
> > PostgreSQL repository.

Can you configure GitLab to use a .gitlab-ci.yml file that is not in
the root directory?

> > It is not intended to be a replacement for Cirrus CI, but simply suggestion 
> > for the
> > PostgreSQL project to host centrally a Gitlab CI definition for those who 
> > prefer to use
> > it while developing/testing PostgreSQL.
> >
> > The intent is to facilitate collaboration among GitLab users, promote 
> > standardization
> > and consistency, and ultimately, improve testing and code quality.
> >
> This seems very RedHat-centric, which I'm not sure is a good idea. Also, 
> shouldn't at least some of these recipes call dnf and dnf-builddep instead of 
> yum and yum-build-dep?

I don't think it's bad to add an automated test suite for redhat-based images?

> If we're going to do this then arguably we should also be supporting GitHub 
> Actions and who knows what other CI frameworks. There is a case for us 
> special casing Cirrus CI because it's used for the cfbot.

I think there's a good case for _not_ using Cirrus CI, namely that the
license may be prohibitive, self-management of the build system
(storage of artifacts, UI, database) is missing for Cirrus CI, and it
also seems to be unable to run automated CI on repositories that
aren't hosted on Github.
I think these are good arguments for adding a GitLab CI configuration.

Unless the cfbot is entirely under management of the PostgreSQL
project (which it doesn't appear to be, as far as I know the URL is
still cfbot.cputube.org indicating some amount of external control)
the only special casing for Cirrus CI within the project seems to be
"people have experience with this tool", which is good, but not
exclusive to Cirrus CI - clearly there are also people here who have
experience with (or are interested in) maintaining GitLab CI
configurations.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)




Re: Allow specifying a dbname in pg_basebackup connection string

2023-07-05 Thread Jelte Fennema
On Wed, 5 Jul 2023 at 16:01, Euler Taveira  wrote:
> One of the PgBouncer's missions is to be a transparent proxy.
>
> Sometimes you cannot reach out the database directly due to a security policy.

Indeed the transparent proxy use case is where replication through
pgbouncer makes sense. There's quite some reasons to set up PgBouncer
like such a proxy apart from security policies. Some others that come
to mind are:
- load balancer layer of pgbouncers
- transparent failovers
- transparent database moves

And in all of those cases its nice for a user to use a single
connection string/hostname. Instead of having to think: Oh yeah, for
backups, I need to use this other one.




Re: logical decoding and replication of sequences, take 2

2023-07-05 Thread Ashutosh Bapat
And the last patch 0008.

@@ -1180,6 +1194,13 @@ AlterSubscription(ParseState *pstate,
AlterSubscriptionStmt *stmt,
... snip ...
+if (IsSet(opts.specified_opts, SUBOPT_SEQUENCES))
+{
+values[Anum_pg_subscription_subsequences - 1] =
+BoolGetDatum(opts.sequences);
+replaces[Anum_pg_subscription_subsequences - 1] = true;
+}
+

The list of allowed options set a few lines above this code does not contain
"sequences". Is this option missing there or this code is unnecessary? If we
intend to add "sequence" at a later time after a subscription is created, will
the sequences be synced after ALTER SUBSCRIPTION?

+/*
+ * ignore sequences when not requested
+ *
+ * XXX Maybe we should differentiate between "callbacks not defined" or
+ * "subscriber disabled sequence replication" and "subscriber does not
+ * know about sequence replication" (e.g. old subscriber version).
+ *
+ * For the first two it'd be fine to bail out here, but for the last it

It's not clear which two you are talking about. Maybe that's because the
paragraph above is ambiguious. It is in the form of A or B and C; so not clear
which cases we are differentiating between: (A, B, C), ((A or B) and C) or (A or
(B and C)) or something else.

+ * might be better to continue and error out only when the sequence
+ * would be replicated (e.g. as part of the publication). We don't know
+ * that here, unfortunately.

Please see comments on changes to pgoutput_startup() below. We may
want to change the paragraph accordingly.

@@ -298,6 +298,20 @@ StartupDecodingContext(List *output_plugin_options,
  */
 ctx->reorder->update_progress_txn = update_progress_txn_cb_wrapper;

+/*
+ * To support logical decoding of sequences, we require the sequence
+ * callback. We decide it here, but only check it later in the wrappers.
+ *
+ * XXX Isn't it wrong to define only one of those callbacks? Say we
+ * only define the stream_sequence_cb() - that may get strange results
+ * depending on what gets streamed. Either none or both?

I don't think the current condition is correct; it will consider sequence
changes to be streamed even when sequence_cb is not defined and actually not
send those. sequence_cb is needed to send sequence changes irrespective of
whether transaction streaming is supported.  But stream_sequence_cb is required
if other stream callbacks are available. Something like

if (ctx->callbacks.sequence_cb)
{
if (ctx->streaming)
{
if ctx->callbacks.stream_sequence_cb == NULL)
ctx->sequences = false;
else
ctx->sequences = true;
}
else
ctx->sequences = true;
}
else
ctx->sequences = false;

+ *
+ * XXX Shouldn't sequence be defined at slot creation time, similar
+ * to two_phase? Probably not.

I don't know why two_phase is defined at the slot creation time, so can't
comment on this. But looks like something we need to answer before committing
the patches.

+/*
+ * We allow decoding of sequences when the option is given at the streaming
+ * start, provided the plugin supports all the callbacks for two-phase.

s/two-phase/sequences/

+ *
+ * XXX Similar behavior to the two-phase block below.

I think we need to describe sequence specific behaviour instead of pointing to
the two-phase. two-phase is part of in replication slot's on disk specification
but sequence is not. Given that it's XXX, I think you are planning to do that.

+ *
+ * XXX Shouldn't this error out if the callbacks are not defined?

Isn't this already being done in pgoutput_startup()? Should we remove this XXX.

+/*
+ * Here, we just check whether the sequences decoding option is passed
+ * by plugin and decide whether to enable it at later point of time. It
+ * remains enabled if the previous start-up has done so. But we only
+ * allow the option to be passed in with sufficient version of the
+ * protocol, and when the output plugin supports it.
+ */
+if (!data->sequences)
+ctx->sequences_opt_given = false;
+else if (data->protocol_version <
LOGICALREP_PROTO_SEQUENCES_VERSION_NUM)
+ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("requested proto_version=%d does not
support sequences, need %d or higher",
+data->protocol_version,
LOGICALREP_PROTO_SEQUENCES_VERSION_NUM)));
+else if (!ctx->sequences)
+ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("sequences requested, but not supported
by output plugin")));

If a given output plugin doesn't implement the callbacks but subscription
specifies sequences, the code will throw an error whether or not 

Re: logical decoding and replication of sequences, take 2

2023-07-05 Thread Ashutosh Bapat
0005, 0006 and 0007 are all related to the initial sequence sync. [3]
resulted in 0007 and I think we need it. That leaves 0005 and 0006 to
be reviewed in this response.

I followed the discussion starting [1] till [2]. The second one
mentions the interlock mechanism which has been implemented in 0005
and 0006. While I don't have an objection to allowing LOCKing a
sequence using the LOCK command, I am not sure whether it will
actually work or is even needed.

The problem described in [1] seems to be the same as the problem
described in [2]. In both cases we see the sequence moving backwards
during CATCHUP. At the end of catchup the sequence is in the right
state in both the cases. [2] actually deems this behaviour OK. I also
agree that the behaviour is ok. I am confused whether we have solved
anything using interlocking and it's really needed.

I see that the idea of using an LSN to decide whether or not to apply
a change to sequence started in [4]. In [5] Tomas proposed to use page
LSN. Looking at [6], it actually seems like a good idea. In [7] Tomas
agreed that LSN won't be sufficient. But I don't understand why. There
are three LSNs in the picture - restart LSN of sync slot,
confirmed_flush LSN of sync slot and page LSN of the sequence page
from where we read the initial state of the sequence. I think they can
be used with the following rules:
1. The publisher will not send any changes with LSN less than
confirmed_flush so we are good there.
2. Any non-transactional changes that happened between confirmed_flush
and page LSN should be discarded while syncing. They are already
visible to SELECT.
3. Any transactional changes with commit LSN between confirmed_flush
and page LSN should be discarded while syncing. They are already
visible to SELECT.
4. A DDL acquires a lock on sequence. Thus no other change to that
sequence can have an LSN between the LSN of the change made by DDL and
the commit LSN of that transaction. Only DDL changes to sequence are
transactional. Hence any transactional changes with commit LSN beyond
page LSN would not have been seen by the SELECT otherwise SELECT would
see the page LSN committed by that transaction. so they need to be
applied while syncing.
5. Any non-transactional changes beyond page LSN should be applied.
They are not seen by SELECT.

Am I missing something?

I don't have an idea how to get page LSN via a SQL query (while also
fetching data on that page). That may or may not be a challenge.

[1] 
https://www.postgresql.org/message-id/c2799362-9098-c7bf-c315-4d7975acafa3%40enterprisedb.com
[2] 
https://www.postgresql.org/message-id/2d4bee7b-31be-8b36-2847-a21a5d56e04f%40enterprisedb.com
[3] 
https://www.postgresql.org/message-id/f5a9d63d-a6fe-59a9-d1ed-38f6a5582c13%40enterprisedb.com
[4] 
https://www.postgresql.org/message-id/CAA4eK1KUYrXFq25xyjBKU1UDh7Dkzw74RXN1d3UAYhd4NzDcsg%40mail.gmail.com
[5] 
https://www.postgresql.org/message-id/CAA4eK1LiA8nV_ZT7gNHShgtFVpoiOvwoxNsmP_fryP%3DPsYPvmA%40mail.gmail.com
[6] https://www.postgresql.org/docs/current/storage-page-layout.html

--
Best Wishes,
Ashutosh Bapat




Re: pg_column_toast_chunk_id: a function to get a chunk ID of a TOASTed value

2023-07-05 Thread Nikita Malakhov
Hi!

I like the idea of having a standard function which shows a TOAST value ID
for a row. I've used my own to handle TOAST errors. Just, maybe, more
correct
name would be "...value_id", because you actually retrieve valueid field
from the TOAST pointer, and chunk ID consists of valueid + chunk_seq.

-- 
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/


Re: Doc limitation update proposal: include out-of-line OID usage per TOAST-ed columns

2023-07-05 Thread Jakub Wartak
On Tue, Jun 13, 2023 at 10:20 AM John Naylor 
wrote:

Hi John,

v3 is attached for review.

> >
> >- 
> >+ see note below on TOAST
>
> Maybe:
> "further limited by the number of TOAST-ed values; see note below"

Fixed.

> > I've wrongly put it, I've meant that pg_largeobject also consume OID
> > and as such are subject to 32TB limit.
> No, OID has nothing to do with the table size limit, they have to do with
the max number of LOs in a DB.

Clearly I needed more coffee back then...

> Also, perhaps the LO entries should be split into a separate patch. Since
they are a special case and documented elsewhere, it's not clear their
limits fit well here. Maybe they could.

Well, but those are *limits* of the engine and they seem to be pretty
widely chosen especially in migration scenarios (because they are the only
ones allowed to store over 1GB). I think we should warn the dangers of
using pg_largeobjects.

> > + 
> > +  For every TOAST-ed column (that is for field values wider than
TOAST_TUPLE_TARGET
> > +  [2040 bytes by default]), due to internal PostgreSQL implementation
of using one
> > +  shared global OID counter - today you cannot have more than
4,294,967,296 out-of-line
> > +  values in a single table.
> > + 
> > +
> > + 

> "column" != "field value". (..)"Today" is irrelevant. Needs polish.

Fixed.

> Also the shared counter is the cause of the slowdown, but not the reason
for the numeric limit.

Isn't it both? typedef Oid is unsigned int = 2^32, and according to
GetNewOidWithIndex() logic if we exhaust the whole OID space it will hang
indefinitely which has the same semantics as "being impossible"/permanent
hang (?)

> +  out-of-line value (The search for free OIDs won't stop until it finds
the free OID).

> Still too much detail, and not very illuminating. If it *did* stop, how
does that make it any less of a problem?

OK I see your point - so it's removed. As for the question: well, maybe we
could document that one day in known-performance-cliffs.sgml (or via Wiki)
instead of limits.sgml.

> +  Therefore, the OID shortages will eventually cause slowdowns to the
> +  INSERTs/UPDATE/COPY statements.

> Maybe this whole sentence is better as "This will eventually cause
slowdowns for INSERT, UPDATE, and COPY statements."

Yes, it flows much better that way.

> > > +  Please note that that the limit of 2^32
> > > +  out-of-line TOAST values applies to the sum of both visible and
invisible tuples.
> > >
> > > We didn't feel the need to mention this for normal tuples...
> >
> > Right, but this somewhat points reader to the queue-like scenario
> > mentioned by Nikita.

> That seems to be in response to you mentioning "especially to steer
people away from designing very wide non-partitioned tables". In any case,
I'm now thinking that everything in this sentence and after doesn't belong
here. We don't need to tell people to vacuum, and we don't need to tell
them about partitioning as a workaround -- it's a workaround for the table
size limit, too, but we are just documenting the limits here.

OK, I've removed the visible/invisible fragments and workaround techniques.

-J.


v3-0001-doc-Add-some-OID-TOAST-related-limitations-to-the.patch
Description: Binary data


Re: Parallel CREATE INDEX for BRIN indexes

2023-07-05 Thread Matthias van de Meent
On Wed, 5 Jul 2023 at 00:08, Tomas Vondra  wrote:
>
>
>
> On 7/4/23 23:53, Matthias van de Meent wrote:
> > On Thu, 8 Jun 2023 at 14:55, Tomas Vondra  
> > wrote:
> >>
> >> Hi,
> >>
> >> Here's a WIP patch allowing parallel CREATE INDEX for BRIN indexes. The
> >> infrastructure (starting workers etc.) is "inspired" by the BTREE code
> >> (i.e. copied from that and massaged a bit to call brin stuff).
> >
> > Nice work.
> >
> >> In both cases _brin_end_parallel then reads the summaries from worker
> >> files, and adds them into the index. In 0001 this is fairly simple,
> >> although we could do one more improvement and sort the ranges by range
> >> start to make the index nicer (and possibly a bit more efficient). This
> >> should be simple, because the per-worker results are already sorted like
> >> that (so a merge sort in _brin_end_parallel would be enough).
> >
> > I see that you manually built the passing and sorting of tuples
> > between workers, but can't we use the parallel tuplesort
> > infrastructure for that? It already has similar features in place and
> > improves code commonality.
> >
>
> Maybe. I wasn't that familiar with what parallel tuplesort can and can't
> do, and the little I knew I managed to forget since I wrote this patch.
> Which similar features do you have in mind?

I was referring to the feature that is "emitting a single sorted run
of tuples at the leader backend based on data gathered in parallel
worker backends". It manages the sort state, on-disk runs etc. so that
you don't have to manage that yourself.

Adding a new storage format for what is effectively a logical tape
(logtape.{c,h}) and manually merging it seems like a lot of changes if
that functionality is readily available, standardized and optimized in
sortsupport; and adds an additional place to manually go through for
disk-related changes like TDE.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech/)




Re: Removing unneeded self joins

2023-07-05 Thread Andrey Lepikhov

Hi,

During the significant code revision in v.41 I lost some replacement 
operations. Here is the fix and extra tests to check this in the future.
Also, Tom added the JoinDomain structure five months ago, and I added 
code to replace relids for that place too.
One more thing, I found out that we didn't replace SJs, defined by 
baserestrictinfos if no one self-join clause have existed for the join. 
Now, it is fixed, and the test has been added.

To understand changes readily, see the delta file in the attachment.

--
regards,
Andrey Lepikhov
Postgres Professional
From 8f5a432f6fbbcad1fd2937f33af09e9328690b6b Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Tue, 4 Jul 2023 16:07:50 +0700
Subject: [PATCH] Add lost arrangements of relids and varnos. Add the test to
 check it. Add one more cleaning procedure on JoinDomain relids which was
 introduced recently with commit 3bef56e. Fix the corner case when we haven't
 removed SJ if the selfjoinquals list was empty.

---
 src/backend/optimizer/plan/analyzejoins.c | 15 ++-
 src/test/regress/expected/join.out| 26 ---
 src/test/regress/expected/updatable_views.out | 17 +---
 src/test/regress/sql/join.sql |  9 +++
 4 files changed, 53 insertions(+), 14 deletions(-)

diff --git a/src/backend/optimizer/plan/analyzejoins.c 
b/src/backend/optimizer/plan/analyzejoins.c
index a93e4ce05c..15234b7a3b 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -424,6 +424,8 @@ remove_rel_from_query_common(PlannerInfo *root, RelOptInfo 
*rel,
sjinf->commute_above_r = replace_relid(sjinf->commute_above_r, 
ojrelid, subst);
sjinf->commute_below_l = replace_relid(sjinf->commute_below_l, 
ojrelid, subst);
sjinf->commute_below_r = replace_relid(sjinf->commute_below_r, 
ojrelid, subst);
+
+   replace_varno((Node *) sjinf->semi_rhs_exprs, relid, subst);
}
 
/*
@@ -465,6 +467,8 @@ remove_rel_from_query_common(PlannerInfo *root, RelOptInfo 
*rel,
/* ph_needed might or might not become empty */
phv->phrels = replace_relid(phv->phrels, relid, subst);
phv->phrels = replace_relid(phv->phrels, ojrelid, 
subst);
+   phinfo->ph_lateral = replace_relid(phinfo->ph_lateral, 
relid, subst);
+   phinfo->ph_var->phrels = 
replace_relid(phinfo->ph_var->phrels, relid, subst);
Assert(!bms_is_empty(phv->phrels));
Assert(phv->phnullingrels == NULL); /* no need to 
adjust */
}
@@ -1545,6 +1549,7 @@ update_eclass(EquivalenceClass *ec, int from, int to)
}
 
em->em_relids = replace_relid(em->em_relids, from, to);
+   em->em_jdomain->jd_relids = 
replace_relid(em->em_jdomain->jd_relids, from, to);
 
/* We only process inner joins */
replace_varno((Node *) em->em_expr, from, to);
@@ -2101,7 +2106,7 @@ remove_self_joins_one_group(PlannerInfo *root, Relids 
relids)
 */
restrictlist = generate_join_implied_equalities(root, 
joinrelids,

inner->relids,
-   
outer, 0);
+   
outer, NULL);
 
/*
 * Process restrictlist to seperate the self join quals 
out of
@@ -2111,6 +2116,14 @@ remove_self_joins_one_group(PlannerInfo *root, Relids 
relids)
split_selfjoin_quals(root, restrictlist, ,
 
, inner->relid, outer->relid);
 
+   /*
+* To enable SJE for the only degenerate case without 
any self join
+* clauses at all, add baserestrictinfo to this list.
+* Degenerate case works only if both sides have the 
same clause. So
+* doesn't matter which side to add.
+*/
+   selfjoinquals = list_concat(selfjoinquals, 
outer->baserestrictinfo);
+
/*
 * Determine if the inner table can duplicate outer 
rows.  We must
 * bypass the unique rel cache here since we're 
possibly using a
diff --git a/src/test/regress/expected/join.out 
b/src/test/regress/expected/join.out
index b1f43f6ff8..027c356bcc 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5807,11 +5807,13 @@ explain (costs off)
 select 

Re: logicalrep_message_type throws an error

2023-07-05 Thread Ashutosh Bapat
On Wed, Jul 5, 2023 at 7:07 PM Euler Taveira  wrote:
>
> On Wed, Jul 5, 2023, at 7:56 AM, Alvaro Herrera wrote:
>
> On 2023-Jul-05, Amit Kapila wrote:
>
> > I think after returning "???" from logicalrep_message_type(), the
> > above is possible when we get the error: "invalid logical replication
> > message type "X"" from apply_dispatch(), right? If so, then what about
> > the case when we forgot to handle some message in
> > logicalrep_message_type() but handled it in apply_dispatch()?

apply_dispatch() has a default case in switch() so it can
theoretically forget to handle some message type. I think we should
avoid default case in that function to catch missing message type in
that function. But if logicalrep_message_type() doesn't use default
case, it won't forget to handle a known message type. So what you are
suggesting is not possible.

It might happen that the upstream may send an unknown message type
that both apply_dispatch() and logicalrep_message_type() can not
handle.

> ERROR:  invalid logical replication message type "X"
> CONTEXT:  processing remote data for replication origin "pg_16638" during 
> message type "??? (88)" in transaction 796, finished at 0/1626698
>
> IMO it could be confusing if we provide two representations of the same data 
> (X
> and 88). Since we already provide "X" I don't think we need "88". Another
> option, is to remove "X" from apply_dispatch() and add "??? (88)" to
> logicalrep_message_type().

I think we don't need message type to be mentioned in the context for
an error about invalid message type. I think what needs to be done is
to set
apply_error_callback_arg.command = 0 before calling ereport in the
default case of apply_dispatch().
apply_error_callback() will just return without providing a context.
If we need a context and have all the other necessary fields, we can
improve apply_error_callback() to provide context when
apply_error_callback_arg.command = 0 .


--
Best Wishes,
Ashutosh Bapat




Re: Prevent psql \watch from running queries that return no rows

2023-07-05 Thread Greg Sabino Mullane
Thanks for the feedback!

On Wed, Jul 5, 2023 at 5:51 AM Daniel Gustafsson  wrote:

>
> The comment on ExecQueryAndProcessResults() needs to be updated with an
> explanation of what this parameter is.
>

I added a comment in the place where min_rows is used, but not sure what
you mean by adding it to the main comment at the top of the function?  None
of the other args are explained there, even the non-intuitive ones  (e.g.
svpt_gone_p)

-   return cancel_pressed ? 0 : success ? 1 : -1;
> +   return (cancel_pressed || return_early) ? 0 : success ? 1 : -1;
>
> I think this is getting tangled up enough that it should be replaced with
> separate if() statements for the various cases.
>

Would like to hear others weigh in, I think it's still only three states
plus a default, so I'm not convinced it warrants multiple statements yet. :)

+   HELP0("  \\watch [[i=]SEC] [c=N] [m=ROW]\n");
> +   HELP0("  execute query every SEC seconds,
> up to N times\n");
> +   HELP0("  stop if less than ROW minimum
> rows are rerturned\n");
>
> "less than ROW minimum rows" reads a bit awkward IMO, how about calling it
> [m=MIN] and describe as "stop if less than MIN rows are returned"?  Also,
> there
> is a typo: s/rerturned/returned/.
>

Great idea: changed and will attach a new patch

Cheers,
Greg


psql_watch_exit_on_zero_rows_v3.patch
Description: Binary data


Re: Allow specifying a dbname in pg_basebackup connection string

2023-07-05 Thread Euler Taveira
On Wed, Jul 5, 2023, at 9:43 AM, Thom Brown wrote:
> I guess my immediate question is, should backups be taken through
> PgBouncer?  It seems beyond PgBouncer's remit.

One of the PgBouncer's missions is to be a transparent proxy.

Sometimes you cannot reach out the database directly due to a security policy.
I've heard this backup question a few times. IMO if dbname doesn't matter for
reaching the server directly, I don't see a problem relaxing this restriction
to support this use case. We just need to document that dbname will be ignored
if specified. Other connection poolers might also benefit from it.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: pg_decode_message vs skip_empty_xacts and xact_wrote_changes

2023-07-05 Thread Ashutosh Bapat
On Wed, Jul 5, 2023 at 2:29 PM Amit Kapila  wrote:
>
> On Wed, Jul 5, 2023 at 2:28 PM Amit Kapila  wrote:
> >
> > On Mon, Jul 3, 2023 at 4:49 PM vignesh C  wrote:
> > >
> > > +1 for the first version patch, I also felt the first version is
> > > easily understandable.
> > >
> >
> > Okay, please find the slightly updated version (changed a comment and
> > commit message). Unless there are more comments, I'll push this in a
> > day or two.
> >
>
> oops, forgot to attach the patch.

I still think that we need to do something so that a new call to
pg_output_begin() automatically takes care of the conditions under
which it should be called. Otherwise, we will introduce a similar
problem in some other place in future.

-- 
Best Wishes,
Ashutosh Bapat




Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2023-07-05 Thread Aleksander Alekseev
Hi,

> Reviewing this now. I think it's almost ready to be committed.
>
> There's another big effort going on to move SLRUs to the regular buffer
> cache (https://commitfest.postgresql.org/43/3514/). I wonder how moving
> to 64 bit page numbers will affect that. BlockNumber is still 32 bits,
> after all.

Somehow I didn't pay too much attention to this effort, thanks. I will
familiarize myself with the patch. Intuitively I don't think that the
patchse should block each other.

> This patch makes the filenames always 12 characters wide (for SLRUs that
> opt-in to the new naming). That's actually not enough for the full range
> that a 64 bit page number can represent. Should we make it 16 characters
> now, to avoid repeating the same mistake we made earlier? Or make it
> more configurable, on a per-SLRU basis. One reason I don't want to just
> make it 16 characters is that WAL segment filenames are also 16 hex
> characters, which could cause confusion.

Good catch. I propose the following solution:

```
 SlruFileName(SlruCtl ctl, char *path, int64 segno)
 {
 if (ctl->long_segment_names)
-return snprintf(path, MAXPGPATH, "%s/%012llX", ctl->Dir,
+/*
+ * We could use 16 characters here but the disadvantage would be that
+ * the SLRU segments will be hard to distinguish from WAL segments.
+ *
+ * For this reason we use 15 characters. It is enough but also means
+ * that in the future we can't decrease SLRU_PAGES_PER_SEGMENT easily.
+ */
+return snprintf(path, MAXPGPATH, "%s/%015llX", ctl->Dir,
 (long long) segno);
 else
 return snprintf(path, MAXPGPATH, "%s/%04X", (ctl)->Dir,
```

PFE the corrected patchset v58.

--
Best regards,
Aleksander Alekseev


v58-0003-Make-use-FullTransactionId-in-2PC-filenames.patch
Description: Binary data


v58-0001-Index-SLRUs-by-64-bit-integers-rather-than-by-32.patch
Description: Binary data


v58-0002-Use-larger-segment-file-names-for-pg_notify.patch
Description: Binary data


Re: logicalrep_message_type throws an error

2023-07-05 Thread Euler Taveira
On Wed, Jul 5, 2023, at 7:56 AM, Alvaro Herrera wrote:
> On 2023-Jul-05, Amit Kapila wrote:
> 
> > I think after returning "???" from logicalrep_message_type(), the
> > above is possible when we get the error: "invalid logical replication
> > message type "X"" from apply_dispatch(), right? If so, then what about
> > the case when we forgot to handle some message in
> > logicalrep_message_type() but handled it in apply_dispatch()? Isn't it
> > better to return the 'action' from the function
> > logicalrep_message_type() for unknown type? That way the information
> > could be a bit better and we can easily catch the code bug as well.
> 
> Are you suggesting that logicalrep_message_type should include the
> numerical value of 'action' in the ??? message? Something like this:
> 
> ERROR:  invalid logical replication message type "X"
> CONTEXT:  processing remote data for replication origin "pg_16638" during 
> message type "??? (123)" in transaction 796, finished at 0/16266F8

Isn't this numerical value already exposed in the error message (X = 88)?
In this example, it is:

ERROR:  invalid logical replication message type "X"
CONTEXT:  processing remote data for replication origin "pg_16638" during 
message type "??? (88)" in transaction 796, finished at 0/1626698

IMO it could be confusing if we provide two representations of the same data (X
and 88). Since we already provide "X" I don't think we need "88". Another
option, is to remove "X" from apply_dispatch() and add "??? (88)" to 
logicalrep_message_type().

Opinions?


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: [PATCH] Add GitLab CI to PostgreSQL

2023-07-05 Thread Andrew Dunstan


On 2023-07-04 Tu 19:44, Newhouse, Robin wrote:


Hello!

I propose the attached patch to be applied on the 'master' branch

of PostgreSQL to add GitLab CI automation alongside Cirrus CI in the 
PostgreSQL repository.


It is not intended to be a replacement for Cirrus CI, but simply 
suggestion for the


PostgreSQL project to host centrally a Gitlab CI definition for those 
who prefer to use


it while developing/testing PostgreSQL.

The intent is to facilitate collaboration among GitLab users, promote 
standardization


and consistency, and ultimately, improve testing and code quality.



This seems very RedHat-centric, which I'm not sure is a good idea. Also, 
shouldn't at least some of these recipes call dnf and dnf-builddep 
instead of yum and yum-build-dep?


If we're going to do this then arguably we should also be supporting 
GitHub Actions and who knows what other CI frameworks. There is a case 
for us special casing Cirrus CI because it's used for the cfbot.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: pg_basebackup check vs Windows file path limits

2023-07-05 Thread Andrew Dunstan


On 2023-07-04 Tu 16:54, Daniel Gustafsson wrote:

On 4 Jul 2023, at 20:19, Andrew Dunstan  wrote:
But sadly we're kinda back where we started. fairywren is failing on REL_16_STABLE. 
Before the changes the failure occurred because the test script was unable to create 
the file with a path > 255. Now that we have a way to create the file the test for 
pg_basebackup to reject files with names > 100 fails, I presume because the server 
can't actually see the file. At this stage I'm thinking the best thing would be to 
skip the test altogether on windows if the path is longer than 255.

That does sound like a fairly large hammer for a nail small enough that we
should be able to fix it, but I don't have any other good ideas off the cuff.



Not sure it's such a big hammer. Here's a patch.


cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index e0009c8531..ee0d03512c 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -311,20 +311,22 @@ $node->command_fails(
 	'-T with invalid format fails');
 
 # Tar format doesn't support filenames longer than 100 bytes.
-# Create the test file via a short name directory so it doesn't blow the
-# Windows path limit.
-my $lftmp = PostgreSQL::Test::Utils::tempdir_short;
-dir_symlink "$pgdata", "$lftmp/pgdata";
-my $superlongname = "superlongname_" . ("x" x 100);
-my $superlongpath = "$lftmp/pgdata/$superlongname";
+SKIP:
+{
+	my $superlongname = "superlongname_" . ("x" x 100);
+	my $superlongpath = "$pgdata/$superlongname";
 
-open my $file, '>', "$superlongpath"
-  or die "unable to create file $superlongpath";
-close $file;
-$node->command_fails(
-	[ @pg_basebackup_defs, '-D', "$tempdir/tarbackup_l1", '-Ft' ],
-	'pg_basebackup tar with long name fails');
-unlink "$superlongpath";
+	skip "File path too long", 1
+	  if $windows_os && length($superlongpath) > 255;
+
+	open my $file, '>', "$superlongpath"
+	  or die "unable to create file $superlongpath";
+	close $file;
+	$node->command_fails(
+		[ @pg_basebackup_defs, '-D', "$tempdir/tarbackup_l1", '-Ft' ],
+		'pg_basebackup tar with long name fails');
+	unlink "$superlongpath";
+}
 
 # The following tests are for symlinks.
 


Re: logicalrep_message_type throws an error

2023-07-05 Thread Amit Kapila
On Wed, Jul 5, 2023 at 4:26 PM Alvaro Herrera  wrote:
>
> On 2023-Jul-05, Amit Kapila wrote:
>
> > I think after returning "???" from logicalrep_message_type(), the
> > above is possible when we get the error: "invalid logical replication
> > message type "X"" from apply_dispatch(), right? If so, then what about
> > the case when we forgot to handle some message in
> > logicalrep_message_type() but handled it in apply_dispatch()? Isn't it
> > better to return the 'action' from the function
> > logicalrep_message_type() for unknown type? That way the information
> > could be a bit better and we can easily catch the code bug as well.
>
> Are you suggesting that logicalrep_message_type should include the
> numerical value of 'action' in the ??? message? Something like this:
>
>  ERROR:  invalid logical replication message type "X"
>  CONTEXT:  processing remote data for replication origin "pg_16638" during 
> message type "??? (123)" in transaction 796, finished at 0/16266F8
>

Yes, something like that would be better.

-- 
With Regards,
Amit Kapila.




Re: Allow specifying a dbname in pg_basebackup connection string

2023-07-05 Thread Thom Brown
On Mon, 3 Jul 2023 at 13:23, Jelte Fennema  wrote:
>
> Normally it doesn't really matter which dbname is used in the connection
> string that pg_basebackup and other physical replication CLI tools use.
> The reason being, that physical replication does not work at the
> database level, but instead at the server level. So you will always get
> the data for all databases.
>
> However, when there's a proxy, such as PgBouncer, in between the client
> and the server, then it might very well matter. Because this proxy might
> want to route the connection to a different server depending on the
> dbname parameter in the startup packet.
>
> This patch changes the creation of the connection string key value
> pairs, so that the following command will actually include
> dbname=postgres in the startup packet to the server:
>
> pg_basebackup --dbname 'dbname=postgres port=6432' -D dump

I guess my immediate question is, should backups be taken through
PgBouncer?  It seems beyond PgBouncer's remit.

Thom




Re: Disabling Heap-Only Tuples

2023-07-05 Thread Thom Brown
On Wed, 5 Jul 2023 at 13:12, Matthias van de Meent
 wrote:
>
> On Wed, 5 Jul 2023 at 13:03, Thom Brown  wrote:
> >
> > On Wed, 5 Jul 2023 at 11:57, Matthias van de Meent
> >  wrote:
> > >
> > > On Wed, 5 Jul 2023 at 12:45, Thom Brown  wrote:
> > > > Heap-Only Tuple (HOT) updates are a significant performance
> > > > enhancement, as they prevent unnecessary page writes. However, HOT
> > > > comes with a caveat: it means that if we have lots of available space
> > > > earlier on in the relation, it can only be used for new tuples or in
> > > > cases where there's insufficient space on a page for an UPDATE to use
> > > > HOT.
> > > >
> > > > This mechanism limits our options for condensing tables, forcing us to
> > > > resort to methods like running VACUUM FULL/CLUSTER or using external
> > > > tools like pg_repack. These either require exclusive locks (which will
> > > > be a deal-breaker on large tables on a production system), or there's
> > > > risks involved. Of course we can always flood pages with new versions
> > > > of a row until it's forced onto an early page, but that shouldn't be
> > > > necessary.
> > > >
> > > > Considering these trade-offs, I'd like to propose an option to allow
> > > > superusers to disable HOT on tables. The intent is to trade some
> > > > performance benefits for the ability to reduce the size of a table
> > > > without the typical locking associated with it.
> > >
> > > Interesting use case, but I think that disabling HOT would be missing
> > > the forest for the trees. I think that a feature that disables
> > > block-local updates for pages > some offset would be a better solution
> > > to your issue: Normal updates also prefer the new tuple to be stored
> > > in the same pages as the old tuple if at all possible, so disabling
> > > HOT wouldn't solve the issue of tuples residing in the tail of your
> > > table - at least not while there is still empty space in those pages.
> >
> > Hmm... I see your point.  It's when an UPDATE isn't going to land on
> > the same page that it relocates to the earlier available page.  So I
> > guess I'm after whatever mechanism would allow that to happen reliably
> > and predictably.
> >
> > So $subject should really be "Allow forcing UPDATEs off the same page".
>
> You'd probably want to do that only for a certain range of the table -
> for a table with 1GB of data and 3GB of bloat there is no good reason
> to force page-crossing updates in the first 1GB of the table - all
> tuples of the table will eventually reside there, so why would you
> take a performance penalty and move the tuples from inside that range
> to inside that same range?

I'm thinking more of a case of:



UPDATE bigtable
SET primary key = primary key
WHERE ctid IN (
SELECT ctid
FROM bigtable
ORDER BY ctid DESC
LIMIT 10);

> Something else to note: Indexes would suffer some (large?) amount of
> bloat in this process, as you would be updating a lot of tuples
> without the HOT optimization, thus increasing the work to be done by
> VACUUM.
> This may result in more bloat in indexes than what you get back from
> shrinking the table.

This could be the case, but I guess indexes are expendable to an
extent, unlike tables.

Thom




Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2023-07-05 Thread Heikki Linnakangas

On 09/03/2023 17:21, Aleksander Alekseev wrote:

v57-0001-Index-SLRUs-by-64-bit-integers-rather-than-by-32.patch


Reviewing this now. I think it's almost ready to be committed.

There's another big effort going on to move SLRUs to the regular buffer 
cache (https://commitfest.postgresql.org/43/3514/). I wonder how moving 
to 64 bit page numbers will affect that. BlockNumber is still 32 bits, 
after all.



+/*
+ * An internal function used by SlruScanDirectory().
+ *
+ * Returns true if a file with a name of a given length may be a correct
+ * SLRU segment.
+ */
+static inline bool
+SlruCorrectSegmentFilenameLength(SlruCtl ctl, size_t len)
+{
+   if (ctl->long_segment_names)
+   return (len == 12);
+   else
+
+   /*
+* XXX Should we still consider 5 and 6 to be a correct length? 
It
+* looks like it was previously allowed but now SlruFileName() 
can't
+* return such a name.
+*/
+   return (len == 4 || len == 5 || len == 6);
+}


Huh, I didn't realize that 5 and 6 character SLRU segment names are 
possible. But indeed they are. Commit 638cf09e76d allowed 5-character 
lengths:



commit 638cf09e76d70dd83d8123e7079be6c0532102d2
Author: Alvaro Herrera 
Date:   Thu Jan 2 18:17:29 2014 -0300

Handle 5-char filenames in SlruScanDirectory

Original users of slru.c were all producing 4-digit filenames, so that

was all that that code was prepared to handle.  Changes to multixact.c
in the course of commit 0ac5ad5134f made pg_multixact/members create
5-digit filenames once a certain threshold was reached, which
SlruScanDirectory wasn't prepared to deal with; in particular,
5-digit-name files were not removed during truncation.  Change that
routine to make it aware of those files, and have it process them just
like any others.

Right now, some pg_multixact/members directories will contain a mixture

of 4-char and 5-char filenames.  A future commit is expected fix things
so that each slru.c user declares the correct maximum width for the
files it produces, to avoid such unsightly mixtures.

Noticed while investigating bug #8673 reported by Serge Negodyuck.




And later commit 73c986adde5, which introduced commit_ts, allowed 6 
characters. With 8192 block size, pg_commit_ts segments are indeed 5 
chars wide, and with 512 block size, 6 chars are needed.


This patch makes the filenames always 12 characters wide (for SLRUs that 
opt-in to the new naming). That's actually not enough for the full range 
that a 64 bit page number can represent. Should we make it 16 characters 
now, to avoid repeating the same mistake we made earlier? Or make it 
more configurable, on a per-SLRU basis. One reason I don't want to just 
make it 16 characters is that WAL segment filenames are also 16 hex 
characters, which could cause confusion.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Extension Enhancement: Buffer Invalidation in pg_buffercache

2023-07-05 Thread Nitin Jadhav
+1 for the idea. It's going to be more useful to test and understand
the buffer management of PostgreSQL and it can be used to explicitly
free up the buffers if there are any such requirements.

I had a quick look over the patch. Following are the comments.

First, The TryInvalidateBuffer() tries to flush the buffer if it is
dirty and then tries to invalidate it if it meets the requirement.
Instead of directly doing this can we provide an option to the caller
to mention whether to invalidate the dirty buffers or not. For
example, TryInvalidateBuffer(Buffer bufnum, bool force), if the force
is set to FALSE, then ignore invalidating dirty buffers. Otherwise,
flush the dirty buffer and try to invalidate.

Second, In TryInvalidateBuffer(), it first checks if the reference
count is greater than zero and then checks for dirty buffers. Will
there be a scenario where the buffer is dirty and its reference count
is zero? Can you please provide more information on this or adjust the
code accordingly.

> +/*
> +Try Invalidating a buffer using bufnum.
> +If the buffer is invalid, the function returns false.
> +The function checks for dirty buffer and flushes the dirty buffer before 
> invalidating.
> +If the buffer is still dirty it returns false.
> +*/
> +bool

The star(*) and space are missing here. Please refer to the style of
function comments and change accordingly.

Thanks & Regards,
Nitin Jadhav

On Fri, Jun 30, 2023 at 4:17 PM Palak Chaturvedi
 wrote:
>
> I hope this email finds you well. I am excited to share that I have
> extended the functionality of the `pg_buffercache` extension by
> implementing buffer invalidation capability, as requested by some
> PostgreSQL contributors for improved testing scenarios.
>
> This marks my first time submitting a patch to pgsql-hackers, and I am
> eager to receive your expert feedback on the changes made. Your
> insights are invaluable, and any review or comments you provide will
> be greatly appreciated.
>
> The primary objective of this enhancement is to enable explicit buffer
> invalidation within the `pg_buffercache` extension. By doing so, we
> can simulate scenarios where buffers are invalidated and observe the
> resulting behavior in PostgreSQL.
>
> As part of this patch, a new function or mechanism has been introduced
> to facilitate buffer invalidation. I would like to hear your thoughts
> on whether this approach provides a good user interface for this
> functionality. Additionally, I seek your evaluation of the buffer
> locking protocol employed in the extension to ensure its correctness
> and efficiency.
>
> Please note that I plan to add comprehensive documentation once the
> details of this enhancement are agreed upon. This documentation will
> serve as a valuable resource for users and contributors alike. I
> believe that your expertise will help uncover any potential issues and
> opportunities for further improvement.
>
> I have attached the patch file to this email for your convenience.
> Your valuable time and consideration in reviewing this extension are
> sincerely appreciated.
>
> Thank you for your continued support and guidance. I am looking
> forward to your feedback and collaboration in enhancing the PostgreSQL
> ecosystem.
>
> The working of the extension:
>
> 1. Creating the extension pg_buffercache and then call select query on
> a table and note the buffer to be cleared.
> pgbench=# create extension pg_buffercache;
> CREATE EXTENSION
> pgbench=# select count(*) from pgbench_accounts;
>  count
> 
>  10
> (1 row)
>
> pgbench=# SELECT *
> FROM pg_buffercache
> WHERE relfilenode = pg_relation_filenode('pgbench_accounts'::regclass);
>  bufferid | relfilenode | reltablespace | reldatabase | relforknumber
> | relblocknumber | isdirty | usagecount | pinning_backends
> --+-+---+-+---++-++--
>   233 |   16397 |  1663 |   16384 | 0
> |  0 | f   |  1 |0
>   234 |   16397 |  1663 |   16384 | 0
> |  1 | f   |  1 |0
>   235 |   16397 |  1663 |   16384 | 0
> |  2 | f   |  1 |0
>   236 |   16397 |  1663 |   16384 | 0
> |  3 | f   |  1 |0
>   237 |   16397 |  1663 |   16384 | 0
> |  4 | f   |  1 |0
>
>
> 2. Clearing a single buffer by entering the bufferid.
> pgbench=# SELECT count(*)
> FROM pg_buffercache
> WHERE relfilenode = pg_relation_filenode('pgbench_accounts'::regclass);
>  count
> ---
>   1660
> (1 row)
>
> pgbench=# select pg_buffercache_invalidate(233);
>  pg_buffercache_invalidate
> ---
>  t
> (1 row)
>
> pgbench=# SELECT count(*)
> FROM 

Re: Disabling Heap-Only Tuples

2023-07-05 Thread Matthias van de Meent
On Wed, 5 Jul 2023 at 13:03, Thom Brown  wrote:
>
> On Wed, 5 Jul 2023 at 11:57, Matthias van de Meent
>  wrote:
> >
> > On Wed, 5 Jul 2023 at 12:45, Thom Brown  wrote:
> > > Heap-Only Tuple (HOT) updates are a significant performance
> > > enhancement, as they prevent unnecessary page writes. However, HOT
> > > comes with a caveat: it means that if we have lots of available space
> > > earlier on in the relation, it can only be used for new tuples or in
> > > cases where there's insufficient space on a page for an UPDATE to use
> > > HOT.
> > >
> > > This mechanism limits our options for condensing tables, forcing us to
> > > resort to methods like running VACUUM FULL/CLUSTER or using external
> > > tools like pg_repack. These either require exclusive locks (which will
> > > be a deal-breaker on large tables on a production system), or there's
> > > risks involved. Of course we can always flood pages with new versions
> > > of a row until it's forced onto an early page, but that shouldn't be
> > > necessary.
> > >
> > > Considering these trade-offs, I'd like to propose an option to allow
> > > superusers to disable HOT on tables. The intent is to trade some
> > > performance benefits for the ability to reduce the size of a table
> > > without the typical locking associated with it.
> >
> > Interesting use case, but I think that disabling HOT would be missing
> > the forest for the trees. I think that a feature that disables
> > block-local updates for pages > some offset would be a better solution
> > to your issue: Normal updates also prefer the new tuple to be stored
> > in the same pages as the old tuple if at all possible, so disabling
> > HOT wouldn't solve the issue of tuples residing in the tail of your
> > table - at least not while there is still empty space in those pages.
>
> Hmm... I see your point.  It's when an UPDATE isn't going to land on
> the same page that it relocates to the earlier available page.  So I
> guess I'm after whatever mechanism would allow that to happen reliably
> and predictably.
>
> So $subject should really be "Allow forcing UPDATEs off the same page".

You'd probably want to do that only for a certain range of the table -
for a table with 1GB of data and 3GB of bloat there is no good reason
to force page-crossing updates in the first 1GB of the table - all
tuples of the table will eventually reside there, so why would you
take a performance penalty and move the tuples from inside that range
to inside that same range?

Something else to note: Indexes would suffer some (large?) amount of
bloat in this process, as you would be updating a lot of tuples
without the HOT optimization, thus increasing the work to be done by
VACUUM.
This may result in more bloat in indexes than what you get back from
shrinking the table.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech/)




Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-07-05 Thread John Naylor
On Tue, Jul 4, 2023 at 12:49 PM Masahiko Sawada 
wrote:
>
> As you mentioned, the 1-byte value is embedded into 8 byte so 7 bytes
> are unused, but we use less memory since we use less slab contexts and
> save fragmentations.

Thanks for testing. This tree is sparse enough that most of the space is
taken up by small inner nodes, and not by leaves. So, it's encouraging to
see a small space savings even here.

> I've also tested some large value cases (e.g. the value is 80-bytes)
> and got a similar result.

Interesting. With a separate allocation per value the overhead would be 8
bytes, or 10% here. It's plausible that savings elsewhere can hide that,
globally.

> Regarding the codes, there are many todo and fixme comments so it
> seems to me that your recent work is still in-progress. What is the
> current status? Can I start reviewing the code or should I wait for a
> while until your recent work completes?

Well, it's going to be a bit of a mess until I can demonstrate it working
(and working well) with bitmap heap scan. Fixing that now is just going to
create conflicts. I do have a couple small older patches laying around that
were quick experiments -- I think at least some of them should give a
performance boost in loading speed, but haven't had time to test. Would you
like to take a look?

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Parallel CREATE INDEX for BRIN indexes

2023-07-05 Thread Tomas Vondra



On 7/5/23 10:44, Matthias van de Meent wrote:
> On Wed, 5 Jul 2023 at 00:08, Tomas Vondra  
> wrote:
>>
>>
>>
>> On 7/4/23 23:53, Matthias van de Meent wrote:
>>> On Thu, 8 Jun 2023 at 14:55, Tomas Vondra  
>>> wrote:

 Hi,

 Here's a WIP patch allowing parallel CREATE INDEX for BRIN indexes. The
 infrastructure (starting workers etc.) is "inspired" by the BTREE code
 (i.e. copied from that and massaged a bit to call brin stuff).
>>>
>>> Nice work.
>>>
 In both cases _brin_end_parallel then reads the summaries from worker
 files, and adds them into the index. In 0001 this is fairly simple,
 although we could do one more improvement and sort the ranges by range
 start to make the index nicer (and possibly a bit more efficient). This
 should be simple, because the per-worker results are already sorted like
 that (so a merge sort in _brin_end_parallel would be enough).
>>>
>>> I see that you manually built the passing and sorting of tuples
>>> between workers, but can't we use the parallel tuplesort
>>> infrastructure for that? It already has similar features in place and
>>> improves code commonality.
>>>
>>
>> Maybe. I wasn't that familiar with what parallel tuplesort can and can't
>> do, and the little I knew I managed to forget since I wrote this patch.
>> Which similar features do you have in mind?
>>
>> The workers are producing the results in "start_block" order, so if they
>> pass that to the leader, it probably can do the usual merge sort.
>>
 For 0002 it's a bit more complicated, because with a single parallel
 scan brinbuildCallbackParallel can't decide if a range is assigned to a
 different worker or empty. And we want to generate summaries for empty
 ranges in the index. We could either skip such range during index build,
 and then add empty summaries in _brin_end_parallel (if needed), or add
 them and then merge them using "union".


 I just realized there's a third option to do this - we could just do
 regular parallel scan (with no particular regard to pagesPerRange), and
 then do "union" when merging results from workers. It doesn't require
 the sequence of TID scans, and the union would also handle the empty
 ranges. The per-worker results might be much larger, though, because
 each worker might produce up to the "full" BRIN index.
>>>
>>> Would it be too much effort to add a 'min_chunk_size' argument to
>>> table_beginscan_parallel (or ParallelTableScanDesc) that defines the
>>> minimum granularity of block ranges to be assigned to each process? I
>>> think that would be the most elegant solution that would require
>>> relatively little effort: table_block_parallelscan_nextpage already
>>> does parallel management of multiple chunk sizes, and I think this
>>> modification would fit quite well in that code.
>>>
>>
>> I'm confused. Isn't that pretty much exactly what 0002 does? I mean,
>> that passes pagesPerRange to table_parallelscan_initialize(), so that
>> each pagesPerRange is assigned to a single worker.
> 
> Huh, I overlooked that one... Sorry for that.
> 
>> The trouble I described above is that the scan returns tuples, and the
>> consumer has no idea what was the chunk size or how many other workers
>> are there. Imagine you get a tuple from block 1, and then a tuple from
>> block 1000. Does that mean that the blocks in between are empty or that
>> they were processed by some other worker?
> 
> If the unit of work for parallel table scans is the index's
> pages_per_range, then I think we can just fill in expected-but-missing
> ranges as 'empty' in the parallel leader during index loading, like
> the first of the two solutions you proposed.
> 

Right, I think that's the right solution.

Or rather the only solution, because the other idea (generating the
empty ranges in workers) relies on the workers knowing when to generate
that. But I don't think the workers have the necessary information.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: pg_upgrade and cross-library upgrades

2023-07-05 Thread Tom Lane
Alvaro Herrera  writes:
> 002_pg_upgrade.pl can test for presence or absence of pgcrypto by
> grepping pg_config --configure for --with-ssl or --with-openssl.  If the
> old cluster has it but the new doesn't, we must drop the
> contrib_regression_pgcrypto database.

Hmm, but you'd also need code to handle meson builds no?  Could it
be easier to look for the pgcrypto library in the new installation?

Not entirely sure this is worth the effort.

regards, tom lane




Re: Disabling Heap-Only Tuples

2023-07-05 Thread Thom Brown
On Wed, 5 Jul 2023 at 11:57, Matthias van de Meent
 wrote:
>
> On Wed, 5 Jul 2023 at 12:45, Thom Brown  wrote:
> > Heap-Only Tuple (HOT) updates are a significant performance
> > enhancement, as they prevent unnecessary page writes. However, HOT
> > comes with a caveat: it means that if we have lots of available space
> > earlier on in the relation, it can only be used for new tuples or in
> > cases where there's insufficient space on a page for an UPDATE to use
> > HOT.
> >
> > This mechanism limits our options for condensing tables, forcing us to
> > resort to methods like running VACUUM FULL/CLUSTER or using external
> > tools like pg_repack. These either require exclusive locks (which will
> > be a deal-breaker on large tables on a production system), or there's
> > risks involved. Of course we can always flood pages with new versions
> > of a row until it's forced onto an early page, but that shouldn't be
> > necessary.
> >
> > Considering these trade-offs, I'd like to propose an option to allow
> > superusers to disable HOT on tables. The intent is to trade some
> > performance benefits for the ability to reduce the size of a table
> > without the typical locking associated with it.
>
> Interesting use case, but I think that disabling HOT would be missing
> the forest for the trees. I think that a feature that disables
> block-local updates for pages > some offset would be a better solution
> to your issue: Normal updates also prefer the new tuple to be stored
> in the same pages as the old tuple if at all possible, so disabling
> HOT wouldn't solve the issue of tuples residing in the tail of your
> table - at least not while there is still empty space in those pages.

Hmm... I see your point.  It's when an UPDATE isn't going to land on
the same page that it relocates to the earlier available page.  So I
guess I'm after whatever mechanism would allow that to happen reliably
and predictably.

So $subject should really be "Allow forcing UPDATEs off the same page".

Thom




Re: Disabling Heap-Only Tuples

2023-07-05 Thread Matthias van de Meent
On Wed, 5 Jul 2023 at 12:45, Thom Brown  wrote:
> Heap-Only Tuple (HOT) updates are a significant performance
> enhancement, as they prevent unnecessary page writes. However, HOT
> comes with a caveat: it means that if we have lots of available space
> earlier on in the relation, it can only be used for new tuples or in
> cases where there's insufficient space on a page for an UPDATE to use
> HOT.
>
> This mechanism limits our options for condensing tables, forcing us to
> resort to methods like running VACUUM FULL/CLUSTER or using external
> tools like pg_repack. These either require exclusive locks (which will
> be a deal-breaker on large tables on a production system), or there's
> risks involved. Of course we can always flood pages with new versions
> of a row until it's forced onto an early page, but that shouldn't be
> necessary.
>
> Considering these trade-offs, I'd like to propose an option to allow
> superusers to disable HOT on tables. The intent is to trade some
> performance benefits for the ability to reduce the size of a table
> without the typical locking associated with it.

Interesting use case, but I think that disabling HOT would be missing
the forest for the trees. I think that a feature that disables
block-local updates for pages > some offset would be a better solution
to your issue: Normal updates also prefer the new tuple to be stored
in the same pages as the old tuple if at all possible, so disabling
HOT wouldn't solve the issue of tuples residing in the tail of your
table - at least not while there is still empty space in those pages.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech/)




Re: logicalrep_message_type throws an error

2023-07-05 Thread Alvaro Herrera
On 2023-Jul-05, Amit Kapila wrote:

> I think after returning "???" from logicalrep_message_type(), the
> above is possible when we get the error: "invalid logical replication
> message type "X"" from apply_dispatch(), right? If so, then what about
> the case when we forgot to handle some message in
> logicalrep_message_type() but handled it in apply_dispatch()? Isn't it
> better to return the 'action' from the function
> logicalrep_message_type() for unknown type? That way the information
> could be a bit better and we can easily catch the code bug as well.

Are you suggesting that logicalrep_message_type should include the
numerical value of 'action' in the ??? message? Something like this:

 ERROR:  invalid logical replication message type "X"
 CONTEXT:  processing remote data for replication origin "pg_16638" during 
message type "??? (123)" in transaction 796, finished at 0/16266F8

I don't see why not -- seems easy enough, and might help somebody.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
Tom: There seems to be something broken here.
Teodor: I'm in sackcloth and ashes...  Fixed.
   http://postgr.es/m/482d1632.8010...@sigaev.ru




Disabling Heap-Only Tuples

2023-07-05 Thread Thom Brown
Hi,

Heap-Only Tuple (HOT) updates are a significant performance
enhancement, as they prevent unnecessary page writes. However, HOT
comes with a caveat: it means that if we have lots of available space
earlier on in the relation, it can only be used for new tuples or in
cases where there's insufficient space on a page for an UPDATE to use
HOT.

This mechanism limits our options for condensing tables, forcing us to
resort to methods like running VACUUM FULL/CLUSTER or using external
tools like pg_repack. These either require exclusive locks (which will
be a deal-breaker on large tables on a production system), or there's
risks involved. Of course we can always flood pages with new versions
of a row until it's forced onto an early page, but that shouldn't be
necessary.

Considering these trade-offs, I'd like to propose an option to allow
superusers to disable HOT on tables. The intent is to trade some
performance benefits for the ability to reduce the size of a table
without the typical locking associated with it.

This feature could be used to shrink tables in one of two ways:
temporarily disabling HOT until DML operations have compacted the data
into a smaller area, or performing a mass update on later rows to
relocate them to an earlier location, probably in stages. Of course,
this would need to be used in conjunction with a VACUUM operation.

Admittedly this isn't ideal, and it would be better if we had an
operation that could do this (e.g. VACUUM COMPACT ), or an
option that causes some operations to avoid HOT when it detects an
amount of free space over a threshold, but in lieu of those, I thought
this would at least allow users to help themselves when running into
disk space issues.

Thoughts?

Thom




Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

2023-07-05 Thread Amit Kapila
On Wed, Jul 5, 2023 at 12:02 PM Masahiko Sawada  wrote:
>
> On Wed, Jul 5, 2023 at 2:46 PM Amit Kapila  wrote:
> >
> > On Wed, Jul 5, 2023 at 9:01 AM Peter Smith  wrote:
> > >
> > > Hi. Here are some review comments for this patch.
> > >
> > > +1 for the patch idea.
> > >
> > > --
> > >
> > > I wasn't sure about the code comment adjustments suggested by Amit [1]:
> > > "Accordingly, the comments atop build_replindex_scan_key(),
> > > FindUsableIndexForReplicaIdentityFull(), IsIndexOnlyOnExpression()
> > > should also be adjusted."
>
> As for IsIndexOnlyOnExpression(), what part do you think we need to
> adjust? It says:
>
> /*
>  * Returns true if the given index consists only of expressions such as:
>  *  CREATE INDEX idx ON table(foo(col));
>  *
>  * Returns false even if there is one column reference:
>  *   CREATE INDEX idx ON table(foo(col), col_2);
>  */
>
> and it seems to me that the function doesn't check if the leftmost
> index column is a non-expression.
>

Right, so, we can leave this as is.

-- 
With Regards,
Amit Kapila.




Re: Missing llvm_leave_fatal_on_oom() call

2023-07-05 Thread Heikki Linnakangas

On 04/07/2023 19:33, Daniel Gustafsson wrote:

On 21 Feb 2023, at 15:50, Heikki Linnakangas  wrote:

llvm_release_context() calls llvm_enter_fatal_on_oom(), but it never calls 
llvm_leave_fatal_on_oom(). Isn't that a clear leak?


Not sure how much of a leak it is since IIUC LLVM just stores a function
pointer to our error handler, but I can't see a reason not clean it up here.
The attached fix LGTM and passes make check with jit_above_cost set to zero.


Pushed to all live branches, thanks for the review!

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: brininsert optimization opportunity

2023-07-05 Thread Tomas Vondra



On 7/5/23 02:35, Soumyadeep Chakraborty wrote:
> On Tue, Jul 4, 2023 at 2:54 PM Tomas Vondra
>  wrote:
> 
>> I'm not sure if memory context callbacks are the right way to rely on
>> for this purpose. The primary purpose of memory contexts is to track
>> memory, so using them for this seems a bit weird.
> 
> Yeah, this just kept getting dirtier and dirtier.
> 
>> There are cases that do something similar, like expandendrecord.c where
>> we track refcounted tuple slot, but IMHO there's a big difference
>> between tracking one slot allocated right there, and unknown number of
>> buffers allocated much later.
> 
> Yeah, the following code in ER_mc_callbackis is there I think to prevent 
> double
> freeing the tupdesc (since it might be freed in 
> ResourceOwnerReleaseInternal())
> (The part about: /* Ditto for tupdesc references */).
> 
> if (tupdesc->tdrefcount > 0)
> {
> if (--tupdesc->tdrefcount == 0)
> FreeTupleDesc(tupdesc);
> }
> Plus the above code doesn't try anything with Resource owner stuff, whereas
> releasing a buffer means:
> ReleaseBuffer() -> UnpinBuffer() ->
> ResourceOwnerForgetBuffer(CurrentResourceOwner, b);
> 

Well, my understanding is the expandedrecord.c code increments the
refcount exactly to prevent the resource owner to release the slot too
early. My assumption is we'd need to do something similar for the revmap
buffers by calling IncrBufferRefCount or something. But that's going to
be messy, because the buffers are read elsewhere.

>> The fact that even with the extra context is still doesn't handle query
>> cancellations is another argument against that approach (I wonder how
>> expandedrecord.c handles that, but I haven't checked).
>>
>>>
>>> Maybe there is a better way of doing our cleanup? I'm not sure. Would
>>> love your input!
>>>
>>> The other alternative for all this is to introduce new AM callbacks for
>>> insert_begin and insert_end. That might be a tougher sell?
>>>
>>
>> That's the approach I wanted to suggest, more or less - to do the
>> cleanup from ExecCloseIndices() before index_close(). I wonder if it's
>> even correct to do that later, once we release the locks etc.
> 
> I'll try this out and introduce a couple of new index AM callbacks. I
> think it's best to do it before releasing the locks - otherwise it
> might be weird
> to manipulate buffers of an index relation, without having some sort of lock 
> on
> it. I'll think about it some more.
> 

I don't understand why would this need more than just a callback to
release the cache.

>> I don't think ii_AmCache was intended for stuff like this - GIN and GiST
>> only use it to cache stuff that can be just pfree-d, but for buffers
>> that's no enough. It's not surprising we need to improve this.
> 
> Hmmm, yes, although the docs state:
> "If the index AM wishes to cache data across successive index insertions 
> within
> an SQL statement, it can allocate space in indexInfo->ii_Context and
> store a pointer
> to the data in indexInfo->ii_AmCache (which will be NULL initially)."
> they don't mention anything about buffer usage. Well we will fix it!
> 
> PS: It should be possible to make GIN and GiST use the new index AM APIs
> as well.
> 

Why should GIN/GiST use the new API? I think it's perfectly sensible to
only require the "cleanup callback" when just pfree() is not enough.

>> FWIW while debugging this (breakpoint on MemoryContextDelete) I was
>> rather annoyed the COPY keeps dropping and recreating the two BRIN
>> contexts - brininsert cxt / brin dtuple. I wonder if we could keep and
>> reuse those too, but I don't know how much it'd help.
>>
> 
> Interesting, I will investigate that.
> 
>>> Now, to finally answer your question about the speedup without
>>> generate_series(). We do see an even higher speedup!
>>>
>>> seq 1 2 > /tmp/data.csv
>>> \timing
>>> DROP TABLE heap;
>>> CREATE TABLE heap(i int);
>>> CREATE INDEX ON heap USING brin(i) WITH (pages_per_range=1);
>>> COPY heap FROM '/tmp/data.csv';
>>>
>>> -- 3 runs (master 29cf61ade3f245aa40f427a1d6345287ef77e622)
>>> COPY 2
>>> Time: 205072.444 ms (03:25.072)
>>> Time: 215380.369 ms (03:35.380)
>>> Time: 203492.347 ms (03:23.492)
>>>
>>> -- 3 runs (branch v2)
>>>
>>> COPY 2
>>> Time: 135052.752 ms (02:15.053)
>>> Time: 135093.131 ms (02:15.093)
>>> Time: 138737.048 ms (02:18.737)
>>>
>>
>> That's nice, but it still doesn't say how much of that is reading the
>> data. If you do just copy into a table without any indexes, how long
>> does it take?
> 
> So, I loaded the same heap table without any indexes and at the same
> scale. I got:
> 
> postgres=# COPY heap FROM '/tmp/data.csv';
> COPY 2
> Time: 116161.545 ms (01:56.162)
> Time: 114182.745 ms (01:54.183)
> Time: 114975.368 ms (01:54.975)
> 

OK, so the baseline is 115 seconds. With the current code, an index
means +100 seconds. With the optimization, it's just +20. Nice.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise 

Re: Prevent psql \watch from running queries that return no rows

2023-07-05 Thread Daniel Gustafsson
> On 4 Jun 2023, at 20:55, Greg Sabino Mullane  wrote:
> 
> On Sat, Jun 3, 2023 at 5:58 PM Michael Paquier  wrote:
> 
> Wouldn't something like a target_rows be more flexible?  You could use
> this parameter with a target number of rows to expect, zero being one
> choice in that.
> 
> Thank you! That does feel better to me. Please see attached a new v2 patch 
> that uses 
> a min_rows=X syntax (defaults to 0). Also added some help.c changes.

This is a feature I've wanted on several occasions so definitely a +1 on this
suggestion.  I've yet to test it out and do a full review, but a few comments
from skimming the patch:

- bool is_watch,
+ bool is_watch, int min_rows,

The comment on ExecQueryAndProcessResults() needs to be updated with an
explanation of what this parameter is.


-   return cancel_pressed ? 0 : success ? 1 : -1;
+   return (cancel_pressed || return_early) ? 0 : success ? 1 : -1;

I think this is getting tangled up enough that it should be replaced with
separate if() statements for the various cases.


-   HELP0("  \\watch [[i=]SEC] [c=N] execute query every SEC seconds, up to 
N times\n");
+   HELP0("  \\watch [[i=]SEC] [c=N] [m=ROW]\n");
+   HELP0("  execute query every SEC seconds, up to 
N times\n");
+   HELP0("  stop if less than ROW minimum rows are 
rerturned\n");

"less than ROW minimum rows" reads a bit awkward IMO, how about calling it
[m=MIN] and describe as "stop if less than MIN rows are returned"?  Also, there
is a typo: s/rerturned/returned/.

--
Daniel Gustafsson





Re: Speed up transaction completion faster after many relations are accessed in a transaction

2023-07-05 Thread Heikki Linnakangas

On 10/02/2023 04:51, David Rowley wrote:

I've attached another set of patches. I do need to spend longer
looking at this. I'm mainly attaching these as CI seems to be
highlighting a problem that I'm unable to recreate locally and I
wanted to see if the attached fixes it.


I like this patch's approach.


index 296dc82d2ee..edb8b6026e5 100644
--- a/src/backend/commands/discard.c
+++ b/src/backend/commands/discard.c
@@ -71,7 +71,7 @@ DiscardAll(bool isTopLevel)
Async_UnlistenAll();
-   LockReleaseAll(USER_LOCKMETHOD, true);
+   LockReleaseSession(USER_LOCKMETHOD);
ResetPlanCache();


This assumes that there are no transaction-level advisory locks. I think 
that's OK. It took me a while to convince myself of that, though. I 
think we need a high level comment somewhere that explains what 
assumptions we make on which locks can be held in session mode and which 
in transaction mode.



@@ -3224,14 +3206,6 @@ PostPrepare_Locks(TransactionId xid)
Assert(lock->nGranted <= lock->nRequested);
Assert((proclock->holdMask & ~lock->grantMask) == 0);
 
-   /* Ignore it if nothing to release (must be a session lock) */

-   if (proclock->releaseMask == 0)
-   continue;
-
-   /* Else we should be releasing all locks */
-   if (proclock->releaseMask != proclock->holdMask)
-   elog(PANIC, "we seem to have dropped a bit 
somewhere");
-
/*
 * We cannot simply modify proclock->tag.myProc to 
reassign
 * ownership of the lock, because that's part of the 
hash key and


This looks wrong. If you prepare a transaction that is holding any 
session locks, we will now transfer them to the prepared transaction. 
And its locallock entry will be out of sync. To fix, I think we could 
keep around the hash table that CheckForSessionAndXactLocks() builds, 
and use that here.


--
Heikki Linnakangas
Neon (https://neon.tech)





CHECK Constraint Deferrable

2023-07-05 Thread Himanshu Upadhyaya
Hi,

Currently, there is no support for CHECK constraint DEFERRABLE in a create
table statement.
SQL standard specifies that CHECK constraint can be defined as DEFERRABLE.

The attached patch is having implementation for CHECK constraint Deferrable
as below:

‘postgres[579453]=#’CREATE TABLE t1 (i int CHECK(i<>0) DEFERRABLE, t text);
CREATE TABLE
‘postgres[579453]=#’\d t1
 Table "public.t1"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+-
 i  | integer |   |  |
 t  | text|   |  |
Check constraints:
"t1_i_check" CHECK (i <> 0) DEFERRABLE

Now we can have a deferrable CHECK constraint, and we can defer the
constraint validation:

‘postgres[579453]=#’BEGIN;
BEGIN
‘postgres[579453]=#*’SET CONSTRAINTS t1_i_check DEFERRED;
SET CONSTRAINTS
‘postgres[579453]=#*’INSERT INTO t1 VALUES (0, 'one'); -- should succeed
INSERT 0 1
‘postgres[579453]=#*’UPDATE t1 SET i = 1 WHERE t = 'one';
UPDATE 1
‘postgres[579453]=#*’COMMIT; -- should succeed
COMMIT

Attaching the initial patch, I will improve it with documentation in my
next version of the patch.

thoughts?



-- 
Regards,
Himanshu Upadhyaya
EnterpriseDB: http://www.enterprisedb.com
From 1eb72b14a3a6914e893854508a071ae835d23245 Mon Sep 17 00:00:00 2001
From: Himanshu Upadhyaya 
Date: Wed, 5 Jul 2023 14:54:37 +0530
Subject: [PATCH v1] Implementation of "CHECK Constraint" to make it
 Deferrable.

---
 src/backend/access/common/tupdesc.c   |   2 +
 src/backend/catalog/heap.c|  50 --
 src/backend/commands/constraint.c | 116 ++
 src/backend/commands/copyfrom.c   |  10 +-
 src/backend/commands/tablecmds.c  |   8 ++
 src/backend/commands/trigger.c|  43 ++--
 src/backend/executor/execMain.c   |  33 +-
 src/backend/executor/execReplication.c|  10 +-
 src/backend/executor/nodeModifyTable.c|  29 +++---
 src/backend/parser/gram.y |   2 +-
 src/backend/parser/parse_utilcmd.c|   9 +-
 src/backend/utils/cache/relcache.c|   2 +
 src/include/access/tupdesc.h  |   2 +
 src/include/catalog/heap.h|   2 +
 src/include/catalog/pg_proc.dat   |   5 +
 src/include/commands/trigger.h|   2 +
 src/include/executor/executor.h   |  42 +++-
 src/test/regress/expected/constraints.out |  98 ++
 src/test/regress/sql/constraints.sql  |  99 ++
 19 files changed, 518 insertions(+), 46 deletions(-)

diff --git a/src/backend/access/common/tupdesc.c b/src/backend/access/common/tupdesc.c
index 7c5c390503..098cb27932 100644
--- a/src/backend/access/common/tupdesc.c
+++ b/src/backend/access/common/tupdesc.c
@@ -204,6 +204,8 @@ CreateTupleDescCopyConstr(TupleDesc tupdesc)
 cpy->check[i].ccbin = pstrdup(constr->check[i].ccbin);
 cpy->check[i].ccvalid = constr->check[i].ccvalid;
 cpy->check[i].ccnoinherit = constr->check[i].ccnoinherit;
+cpy->check[i].ccdeferrable = constr->check[i].ccdeferrable;
+cpy->check[i].ccdeferred = constr->check[i].ccdeferred;
 			}
 		}
 
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 2a0d82aedd..8a1e8e266f 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -52,17 +52,20 @@
 #include "catalog/pg_statistic.h"
 #include "catalog/pg_subscription_rel.h"
 #include "catalog/pg_tablespace.h"
+#include "catalog/pg_trigger.h"
 #include "catalog/pg_type.h"
 #include "catalog/storage.h"
 #include "commands/tablecmds.h"
 #include "commands/typecmds.h"
 #include "miscadmin.h"
+#include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/optimizer.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_collate.h"
 #include "parser/parse_expr.h"
 #include "parser/parse_relation.h"
+#include "parser/parser.h"
 #include "parser/parsetree.h"
 #include "partitioning/partdesc.h"
 #include "pgstat.h"
@@ -102,7 +105,8 @@ static ObjectAddress AddNewRelationType(const char *typeName,
 static void RelationRemoveInheritance(Oid relid);
 static Oid	StoreRelCheck(Relation rel, const char *ccname, Node *expr,
 		  bool is_validated, bool is_local, int inhcount,
-		  bool is_no_inherit, bool is_internal);
+		  bool is_no_inherit, bool is_internal,
+		  bool is_deferrable, bool initdeferred);
 static void StoreConstraints(Relation rel, List *cooked_constraints,
 			 bool is_internal);
 static bool MergeWithExistingConstraint(Relation rel, const char *ccname, Node *expr,
@@ -2063,13 +2067,15 @@ SetAttrMissing(Oid relid, char *attname, char *value)
 static Oid
 StoreRelCheck(Relation rel, const char *ccname, Node *expr,
 			  bool is_validated, bool is_local, int inhcount,
-			  bool is_no_inherit, bool is_internal)
+			  bool is_no_inherit, bool is_internal,
+			  bool is_deferrable, bool initdeferred)
 {
 	char	   *ccbin;
 	List	   *varList;
 	int			

Re: pg_waldump: add test for coverage

2023-07-05 Thread Peter Eisentraut

On 29.06.23 21:16, Tristen Raab wrote:

I've reviewed your latest v3 patches on Ubuntu 23.04. Both patches apply 
correctly and all the tests run and pass as they should. Execution time was 
normal for me, I didn't notice any significant latency when compared to other 
tests. The only other feedback I can provide would be to add test coverage to 
some of the other options that aren't currently covered (ie. --bkp-details, 
--end, --follow, --path, etc.) for completeness. Other than that, this looks 
like a great patch.


Committed.

I added a test for the --quiet option.  --end and --path are covered.

The only options not covered now are

  -b, --bkp-details  output detailed information about backup blocks
  -f, --follow   keep retrying after reaching end of WAL
  -t, --timeline=TLI timeline from which to read WAL records
  -x, --xid=XID  only show records with transaction ID XID

--follow is a bit tricky to test because you need to leave pg_waldump 
running in the background for a while, or something like that. 
--timeline and --xid can be tested but would need some work on the 
underlying test data (such as creating more than one timeline).  I don't 
know much about --bkp-details, so I don't have a good idea how to test 
it.  So I'll leave those as projects for the future.






Re: pg_decode_message vs skip_empty_xacts and xact_wrote_changes

2023-07-05 Thread Amit Kapila
On Wed, Jul 5, 2023 at 2:28 PM Amit Kapila  wrote:
>
> On Mon, Jul 3, 2023 at 4:49 PM vignesh C  wrote:
> >
> > +1 for the first version patch, I also felt the first version is
> > easily understandable.
> >
>
> Okay, please find the slightly updated version (changed a comment and
> commit message). Unless there are more comments, I'll push this in a
> day or two.
>

oops, forgot to attach the patch.

-- 
With Regards,
Amit Kapila.


v3-0001-Add-BEGIN-COMMIT-for-transactional-messages-durin.patch
Description: Binary data


Re: pg_decode_message vs skip_empty_xacts and xact_wrote_changes

2023-07-05 Thread Amit Kapila
On Mon, Jul 3, 2023 at 4:49 PM vignesh C  wrote:
>
> +1 for the first version patch, I also felt the first version is
> easily understandable.
>

Okay, please find the slightly updated version (changed a comment and
commit message). Unless there are more comments, I'll push this in a
day or two.

-- 
With Regards,
Amit Kapila.




Re: pg_upgrade and cross-library upgrades

2023-07-05 Thread Alvaro Herrera
On 2023-Jul-05, Michael Paquier wrote:

> The same thing as HEAD could be done on its back-branches by removing
> --with-openssl and bring more consistency, but pg_upgrade has never
> been good at handling upgrades with different library requirements.
> Something I am wondering is if AdjustUpgrade.pm could gain more
> knowledge in this area, though I am unsure how much could be achieved
> as this module has only object-level knowledge.

Hmm, let's explore the AdjustUpgrade.pm path a bit more:

002_pg_upgrade.pl can test for presence or absence of pgcrypto by
grepping pg_config --configure for --with-ssl or --with-openssl.  If the
old cluster has it but the new doesn't, we must drop the
contrib_regression_pgcrypto database.  I think we would need a new
function in AdjustUpgrade.pm (or an API change to
adjust_database_contents) so that we can add the DROP DATABASE command
conditionally.

This seems easily extended to contrib/xml2 also.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"La rebeldía es la virtud original del hombre" (Arthur Schopenhauer)




Re: Parallel CREATE INDEX for BRIN indexes

2023-07-05 Thread Matthias van de Meent
On Wed, 5 Jul 2023 at 00:08, Tomas Vondra  wrote:
>
>
>
> On 7/4/23 23:53, Matthias van de Meent wrote:
> > On Thu, 8 Jun 2023 at 14:55, Tomas Vondra  
> > wrote:
> >>
> >> Hi,
> >>
> >> Here's a WIP patch allowing parallel CREATE INDEX for BRIN indexes. The
> >> infrastructure (starting workers etc.) is "inspired" by the BTREE code
> >> (i.e. copied from that and massaged a bit to call brin stuff).
> >
> > Nice work.
> >
> >> In both cases _brin_end_parallel then reads the summaries from worker
> >> files, and adds them into the index. In 0001 this is fairly simple,
> >> although we could do one more improvement and sort the ranges by range
> >> start to make the index nicer (and possibly a bit more efficient). This
> >> should be simple, because the per-worker results are already sorted like
> >> that (so a merge sort in _brin_end_parallel would be enough).
> >
> > I see that you manually built the passing and sorting of tuples
> > between workers, but can't we use the parallel tuplesort
> > infrastructure for that? It already has similar features in place and
> > improves code commonality.
> >
>
> Maybe. I wasn't that familiar with what parallel tuplesort can and can't
> do, and the little I knew I managed to forget since I wrote this patch.
> Which similar features do you have in mind?
>
> The workers are producing the results in "start_block" order, so if they
> pass that to the leader, it probably can do the usual merge sort.
>
> >> For 0002 it's a bit more complicated, because with a single parallel
> >> scan brinbuildCallbackParallel can't decide if a range is assigned to a
> >> different worker or empty. And we want to generate summaries for empty
> >> ranges in the index. We could either skip such range during index build,
> >> and then add empty summaries in _brin_end_parallel (if needed), or add
> >> them and then merge them using "union".
> >>
> >>
> >> I just realized there's a third option to do this - we could just do
> >> regular parallel scan (with no particular regard to pagesPerRange), and
> >> then do "union" when merging results from workers. It doesn't require
> >> the sequence of TID scans, and the union would also handle the empty
> >> ranges. The per-worker results might be much larger, though, because
> >> each worker might produce up to the "full" BRIN index.
> >
> > Would it be too much effort to add a 'min_chunk_size' argument to
> > table_beginscan_parallel (or ParallelTableScanDesc) that defines the
> > minimum granularity of block ranges to be assigned to each process? I
> > think that would be the most elegant solution that would require
> > relatively little effort: table_block_parallelscan_nextpage already
> > does parallel management of multiple chunk sizes, and I think this
> > modification would fit quite well in that code.
> >
>
> I'm confused. Isn't that pretty much exactly what 0002 does? I mean,
> that passes pagesPerRange to table_parallelscan_initialize(), so that
> each pagesPerRange is assigned to a single worker.

Huh, I overlooked that one... Sorry for that.

> The trouble I described above is that the scan returns tuples, and the
> consumer has no idea what was the chunk size or how many other workers
> are there. Imagine you get a tuple from block 1, and then a tuple from
> block 1000. Does that mean that the blocks in between are empty or that
> they were processed by some other worker?

If the unit of work for parallel table scans is the index's
pages_per_range, then I think we can just fill in expected-but-missing
ranges as 'empty' in the parallel leader during index loading, like
the first of the two solutions you proposed.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech/)




Re: Wrong syntax in feature description

2023-07-05 Thread Daniel Gustafsson
> On 4 Jun 2023, at 18:48, Amit Kapila  wrote:
> 
> On Fri, Jun 2, 2023 at 7:01 PM Peter Smith  wrote:
>> 
>> Hi, I noticed a feature description [1] referring to a command example:
>> 
>> CREATE PUBLICATION ... FOR ALL TABLES IN SCHEMA 
>> 
>> ~~
>> 
>> AFAIK that should say "FOR TABLES IN SCHEMA" (without the "ALL", see [2])
> 
> Right, this should be changed.

Agreed, so I've fixed this in the featurematrix on the site.  I will mark this
CF entry as committed even though there is nothing to commit (the featurematrix
is stored in the postgresql.org django instance) since there was a change
performed.

Thanks for the report!

--
Daniel Gustafsson





Re: evtcache: EventTriggerCache vs Event Trigger Cache

2023-07-05 Thread Daniel Gustafsson
> On 8 May 2023, at 10:39, Daniel Gustafsson  wrote:
>> On 4 May 2023, at 14:18, Daniel Gustafsson  wrote:
>>> On 4 May 2023, at 14:09, Tom Lane  wrote:

>>> How about naming
>>> the hash "EventTriggerCacheHash" or so?
>> 
>> I think the level is the indicator here, but I have no strong opinions,
>> EventTriggerCacheHash is fine by me.
> 
> The attached trivial diff does that, parking this in the next CF.

Pushed, thanks!

--
Daniel Gustafsson





Re: [PATCH] Add loongarch native checksum implementation.

2023-07-05 Thread John Naylor
On Wed, Jul 5, 2023 at 9:16 AM YANG Xudong  wrote:
>
> Is there any other comment?

It's only been a few weeks since the last patch, and this is not an urgent
bugfix, so there is no reason to ping the thread. Feature freeze will
likely be in April of next year.

Also, please don't top-post (which means: quoting an entire message, with
new text at the top) -- it clutters our archives.

Before I look at this again: Are there any objections to another CRC
implementation for the reason of having no buildfarm member?

--
John Naylor
EDB: http://www.enterprisedb.com


Re: On /*----- comments

2023-07-05 Thread Heikki Linnakangas

On 04/07/2023 21:36, Tom Lane wrote:

Heikki Linnakangas  writes:

Pushed a patch to remove the end-guard from the example in the pgindent
README. And fixed the bogus end-guard in walsender.c.


I don't see any actual push?


Forgot it after all. Pushed now, thanks.

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Subscription statistics are not dropped at DROP SUBSCRIPTION in some cases

2023-07-05 Thread Masahiko Sawada
On Mon, Jun 19, 2023 at 12:37 PM Amit Kapila  wrote:
>
> On Mon, Jun 19, 2023 at 6:50 AM Masahiko Sawada  wrote:
> >
> > On Sat, Jun 17, 2023 at 6:45 PM Amit Kapila  wrote:
> > >
> > > On Tue, May 16, 2023 at 8:00 PM Masahiko Sawada  
> > > wrote:
> > > >
> > > > On Thu, May 11, 2023 at 5:12 PM Masahiko Sawada  
> > > > wrote:
> > > > >
> > > >
> > > > After thinking more about it, I realized that this is not a problem
> > > > specific to HEAD. ISTM the problem is that by commit 7b64e4b3, we drop
> > > > the stats entry of subscription that is not associated with a
> > > > replication slot for apply worker, but we missed the case where the
> > > > subscription is not associated with both replication slots for apply
> > > > and tablesync. So IIUC we should backpatch it down to 15.
> > > >
> > >
> > > I agree that it should be backpatched to 15.
> > >
> > > > Since in pg15, since we don't create the subscription stats at CREATE
> > > > SUBSCRIPTION time but do when the first error is reported,
> > > >
> > >
> > > AFAICS, the call to pgstat_create_subscription() is present in
> > > CreateSubscription() in 15 as well, so, I don't get your point.
> >
> > IIUC in 15, pgstat_create_subscription() doesn't create the stats
> > entry. See commit e0b01429590.
> >
>
> Thanks for the clarification. Your changes looks good to me though I
> haven't tested it.

Thanks for reviewing the patch. Pushed.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: pg_upgrade and cross-library upgrades

2023-07-05 Thread Michael Paquier
On Wed, Jul 05, 2023 at 06:00:25PM +1200, Thomas Munro wrote:
> If this were my animal and if the hardware couldn't be
> upgraded to a modern distro for technical reasons like a de-supported
> architecture, I would now disable HEAD, or more likely give the whole
> machine a respectful send-off ceremony and move on.

Amen.
--
Michael


signature.asc
Description: PGP signature


Re: Add more sanity checks around callers of changeDependencyFor()

2023-07-05 Thread Michael Paquier
On Thu, Jun 29, 2023 at 10:06:35AM +0300, Heikki Linnakangas wrote:
> The error messages like "failed to change schema dependency for extension"
> don't conform to the usual error message style. "could not change schema
> dependency for extension" would be more conformant. I see that you
> copy-pasted that from existing messages, and we have a bunch of other
> "failed to" messages in the repository too, so I'm OK with leaving it as it
> is for now. Or maybe change the wording of all the changeDependencyFor()
> callers now, and consider all the other "failed to" messages separately
> later.

I'm OK to change the messages for all changeDependencyFor() now that
these are being touched.  I am counting 7 of them.

> If changeDependencyFor() returns >= 2, the message is a bit misleading.
> That's what the existing callers did too, so maybe that's fine.
> 
> I can hit the above error with the attached test case. That seems wrong,
> although I don't know if it means that the check is wrong or it exposed a
> long-standing bug.

Coming back to this one, I think that my check and you have found an
old bug in AlterExtensionNamespace() where the sequence of objects
manipulated breaks the namespace OIDs used to change the normal
dependency of the extension when calling changeDependencyFor().  The
check I have added looks actually correct to me because there should
be always have one 'n' pg_depend entry to change between the extension
and its schema, and we should always change it.

A little bit of debugging is showing me that at the stage of "ALTER
EXTENSION test_ext_req_schema1 SET SCHEMA test_func_dep3;", oldNspOid
is set to the OID of test_func_dep2, and nspOid is the OID of
test_func_dep3.  So the new OID is correct, but the old one points to
the schema test_func_dep2 used by the function because it is the first
object it has been picked up while scanning pg_depend, and not the
schema test_func_dep1 used by the extension.  This causes the command
to fail to update the schema dependency between the schema and the
extension.

The origin of the confusing comes to the handling of oldNspOid, in my
opinion.  I don't quite see why it is necessary to save the old OID of
the namespace from the object scanned while we know the previous
schema used by the extension thanks to its pg_extension entry.

Also, note that there is a check in AlterExtensionNamespace() to
prevent the command from happening if an object is not in the same
schema as the extension, but it fails to trigger here.  I have written
a couple of extra queries to show the difference.

Please find attached a patch to fix this issue with ALTER EXTENSION
.. SET SCHEMA, and the rest.  The patch does everything discussed, but
it had better be split into two patches for different branches.  Here
are my thoughts:
- Fix and backpatch the ALTER EXTENSION business, *without* the new
sanity check for changeDependencyFor() in AlterExtensionNamespace(),
with its regression test.
- Add all the sanity checks and reword the error messages related to
changeDependencyFor() only on HEAD.

Thoughts?
--
Michael
From c2ae83420f8bc4b9b80bfbad313b240bc406a60c Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 5 Jul 2023 15:30:01 +0900
Subject: [PATCH v2] Fix dependency handling with ALTER EXTENSION .. SET SCHEMA

The error message rewordings related to changeDependencyFor() should be
split into their own patch..
---
 src/backend/commands/alter.c  |  6 ++-
 src/backend/commands/cluster.c|  4 +-
 src/backend/commands/extension.c  | 17 
 src/backend/commands/functioncmds.c   | 10 +++--
 src/backend/commands/tablecmds.c  |  2 +-
 src/backend/commands/typecmds.c   |  2 +-
 .../expected/test_extensions.out  | 43 +++
 .../test_extensions/sql/test_extensions.sql   | 32 ++
 8 files changed, 98 insertions(+), 18 deletions(-)

diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c
index e95dc31bde..e12db795b4 100644
--- a/src/backend/commands/alter.c
+++ b/src/backend/commands/alter.c
@@ -848,8 +848,10 @@ AlterObjectNamespace_internal(Relation rel, Oid objid, Oid nspOid)
 	pfree(replaces);
 
 	/* update dependencies to point to the new schema */
-	changeDependencyFor(classId, objid,
-		NamespaceRelationId, oldNspOid, nspOid);
+	if (changeDependencyFor(classId, objid,
+			NamespaceRelationId, oldNspOid, nspOid) != 1)
+		elog(ERROR, "could not change schema dependency for object %u",
+			 objid);
 
 	InvokeObjectPostAlterHook(classId, objid, 0);
 
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 03b24ab90f..4d10cc0483 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1273,7 +1273,7 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
 AccessMethodRelationId,
 relam1,
 relam2) != 1)
-			elog(ERROR, "failed to change access method dependency 

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

2023-07-05 Thread Masahiko Sawada
On Wed, Jul 5, 2023 at 2:46 PM Amit Kapila  wrote:
>
> On Wed, Jul 5, 2023 at 9:01 AM Peter Smith  wrote:
> >
> > Hi. Here are some review comments for this patch.
> >
> > +1 for the patch idea.
> >
> > --
> >
> > I wasn't sure about the code comment adjustments suggested by Amit [1]:
> > "Accordingly, the comments atop build_replindex_scan_key(),
> > FindUsableIndexForReplicaIdentityFull(), IsIndexOnlyOnExpression()
> > should also be adjusted."

As for IsIndexOnlyOnExpression(), what part do you think we need to
adjust? It says:

/*
 * Returns true if the given index consists only of expressions such as:
 *  CREATE INDEX idx ON table(foo(col));
 *
 * Returns false even if there is one column reference:
 *   CREATE INDEX idx ON table(foo(col), col_2);
 */

and it seems to me that the function doesn't check if the leftmost
index column is a non-expression.

> > doc/src/sgml/logical-replication.sgml
> >
> > 1.
> > the key.  When replica identity FULL is specified,
> > indexes can be used on the subscriber side for searching the rows.
> > Candidate
> > indexes must be btree, non-partial, and have at least one column 
> > reference
> > -   (i.e. cannot consist of only expressions).  These restrictions
> > -   on the non-unique index properties adhere to some of the restrictions 
> > that
> > -   are enforced for primary keys.  If there are no such suitable indexes,
> > +   at the leftmost column indexes (i.e. cannot consist of only
> > expressions).  These
> > +   restrictions on the non-unique index properties adhere to some of
> > the restrictions
> > +   that are enforced for primary keys.  If there are no such suitable 
> > indexes,
> > the search on the subscriber side can be very inefficient, therefore
> > replica identity FULL should only be used as a
> > fallback if no other solution is possible.  If a replica identity other
> >
> > Isn't this using the word "indexes" with different meanings in the
> > same sentence? e.g. IIUC "leftmost column indexes" is referring to the
> > ordinal number of the index fields.

That was my mistake, it should be " at the leftmost column".

>
> I don't know if this suggestion is what the code is actually doing. In
> function RemoteRelContainsLeftMostColumnOnIdx(), we have the following
> checks:
> ==
> keycol = indexInfo->ii_IndexAttrNumbers[0];
> if (!AttributeNumberIsValid(keycol))
> return false;
>
> if (attrmap->maplen <= AttrNumberGetAttrOffset(keycol))
> return false;
>
> return attrmap->attnums[AttrNumberGetAttrOffset(keycol)] >= 0;
> ==
>
> The first of these checks indicates that the leftmost column of the
> index should be non-expression, second and third indicates what you
> suggest in your wording. We can also think that what you wrote in a
> way is a superset of "leftmost index column is a non-expression" and
> "leftmost index column should be present in remote rel" but I feel it
> would be better to explicit about the first part as it is easy to
> understand for users at least in docs.

+1

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Incremental sort for access method with ordered scan support (amcanorderbyop)

2023-07-05 Thread Richard Guo
On Tue, Jul 4, 2023 at 7:15 PM David Rowley  wrote:

> On Tue, 4 Jul 2023 at 20:12, Richard Guo  wrote:
> > The v4 patch looks good to me (maybe some cosmetic tweaks are still
> > needed for the comments).  I think it's now 'Ready for Committer'.
>
> I agree. I went and hit the comments with a large hammer and while
> there also adjusted the regression tests. I didn't think having "t" as
> a table name was a good idea as it seems like a name with a high risk
> of conflicting with a concurrently running test. Also, there didn't
> seem to be much need to insert data into that table as the tests
> didn't query any of it.
>
> The only other small tweak I made was to not call list_copy_head()
> when the list does not need to be shortened. It's likely not that
> important, but if the majority of cases are not partial matches, then
> we'd otherwise be needlessly making copies of the list.
>
> I pushed the adjusted patch.


The adjustments improve the patch a lot.  Thanks for adjusting and
pushing the patch.

Thanks
Richard


Dumping policy on a table belonging to an extension is expected?

2023-07-05 Thread Amul Sul
Hi,

I have a ROW LEVEL SECURITY policy on the table part of an extension, and
while
dumping the database where that extension is installed, dumps the policy of
that table too even though not dumpling that table .

Here is quick tests, where I have added following SQL to adminpack--1.0.sql
extension file:

CREATE TABLE public.foo (i int CHECK(i > 0));
ALTER TABLE public.foo ENABLE ROW LEVEL SECURITY;
CREATE POLICY foo_policy ON public.foo USING (true);

After installation and creation of this extension, the dump output will have
policy without that table:

--
-- Name: foo; Type: ROW SECURITY; Schema: public; Owner: amul
--

ALTER TABLE public.foo ENABLE ROW LEVEL SECURITY;

--
-- Name: foo foo_policy; Type: POLICY; Schema: public; Owner: amul
--

CREATE POLICY foo_policy ON public.foo USING (true);


I am not sure if that is expected behaviour. The code comment in
checkExtensionMembership() seems to be doing intentionally:

 * In 9.6 and above, mark the member object to have any non-initial ACL,
 * policies, and security labels dumped.

The question is why were we doing this? Shouldn't skip this policy if it is
part of the create-extension script?

Also, If you try to drop this policy, get dropped without any warning/error
unlike tables or other objects which are not allowed to drop at all.

-- 
Regards,
Amul Sul
EDB: http://www.enterprisedb.com


Re: pg_upgrade and cross-library upgrades

2023-07-05 Thread Thomas Munro
On Wed, Jul 5, 2023 at 4:29 PM Michael Paquier  wrote:
> Thoughts?

I am grateful for all the bug discoveries that these Debian 7 animals
provided in their time, but at this point we're unlikely to learn
things that are useful to our mission from them.  It costs our
community time to talk about each of these issues, re-discovering old
GCC bugs etc.  If this were my animal and if the hardware couldn't be
upgraded to a modern distro for technical reasons like a de-supported
architecture, I would now disable HEAD, or more likely give the whole
machine a respectful send-off ceremony and move on.