date:20230912

Re: persist logical slots to disk during shutdown checkpoint

2023-09-12 Thread Michael Paquier

On Tue, Sep 12, 2023 at 03:15:44PM +0530, Amit Kapila wrote:
> I don't think it will become more responsive in any way, not sure what
> made you think like that. The key idea is that after restart we want
> to ensure that all the WAL data up to the shutdown checkpoint record
> is sent to downstream. As mentioned in the commit message, this will
> help in ensuring that upgrades don't miss any data and then there is
> another small advantage as mentioned in the commit message.

Good thing I did not use the term "responsive" in the previous patch I
posted.  My apologies if you found that confusing.  Let's say, "to
prevent unnecessary retreat", then ;)

>> - Not sure that there is any point to mention the other code paths in
>> the tree where ReplicationSlotSave() can be called, and a slot can be
>> saved in other processes than just WAL senders (like slot
>> manipulations in normal backends, for one).  This was the last
>> sentence in v10.
> 
> The point was the earlier sentence is no longer true and keeping it as
> it is could be wrong or at least misleading. For example, earlier it
> is okay to say, "This needn't actually be part of a checkpoint, ..."
> but now that is no longer true as we want to invoke this at the time
> of shutdown checkpoint for correctness. If we want to be precise, we

How so?  This is just a reference about the fact that using a
checkpoint path for this stuff is useful.  A shutdown checkpoint is
still a checkpoint, done by the checkpointer.  The background writer
is not concerned by that.  

> can say, "It is convenient to flush dirty replication slots at the
> time of checkpoint. Additionally, .."

Okay by mr to reword the top comment of CheckPointReplicationSlots()
to use these terms, if you feel strongly about it.

>> +   if (s->data.invalidated == RS_INVAL_NONE &&
>> +   s->data.confirmed_flush != s->last_saved_confirmed_flush)
>>
>> Actually this is incorrect, no?  Shouldn't we make sure that the
>> confirmed_flush is strictly higher than the last saved LSN?
> 
> I can't see why it is incorrect. Do you see how (in what scenario) it
> could go wrong? As per my understanding, confirmed_flush LSN will
> always be greater than equal to last_saved_confirmed_flush but we
> don't want to ensure that point here because we just want if the
> latest value is not the same then we should mark the slot dirty and
> flush it as that will be location we have ensured to update before
> walsender shutdown. I think it is better to add an assert if you are
> worried about any such case and we had thought of adding it as well
> but then didn't do it because we don't have matching asserts to ensure
> that we never assign prior LSN value to consfirmed_flush LSN.

Because that's just safer in the long run, and I don't see why we
cannot just do that?  Imagine, for instance, that a bug doing an
incorrect manipulation of a logical slot's data does an incorrect
computation of this field, and that we finish with in-memory data
that's older than what was previously saved.  The code may cause a
flush at an incorrect, past, position.  That's just an assumption from
my side, of course.

> I don't want to argue on such a point because it is a little bit of a
> matter of personal choice but I find this comment unclear. It seems to
> read that confirmed_flush LSN is some LSN position which is where we
> flushed the slot's data and that is not true. I found the last comment
> in the patch sent by me: "This value tracks the last confirmed_flush
> LSN flushed which is used during a shutdown checkpoint to decide if
> logical's slot data should be forcibly flushed or not." which I feel
> we agreed upon is better.

Okay, fine by me here.
--
Michael


signature.asc
Description: PGP signature

Re: Row pattern recognition

2023-09-12 Thread Tatsuo Ishii

> 

I was looking for this but I only found ISO/IEC 19075-5:2021.
https://www.iso.org/standard/78936.html

Maybe 19075-5:2021 is the latest one?

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: POC: GROUP BY optimization

2023-09-12 Thread Andrey Lepikhov


Hi,
Here is the patch rebased on the current master. Also, I fixed some 
minor slips and one static analyzer warning.
This is just for adding to the next commitfest and enforcing work with 
this patch.


One extra difference in newly added postgres_fdw tests is caused by this 
patch - see changes in the query plan in attachment.


--
regards,
Andrey Lepikhov
Postgres Professional
From 33953655c9ac3f9ec64b80c9f2a2ff38bd178745 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Wed, 13 Sep 2023 11:20:03 +0700
Subject: [PATCH] Explore alternative orderings of group-by pathkeys during
 optimization.

When evaluating a query with a multi-column GROUP BY clause using sort,
the cost may depend heavily on the order in which the keys are compared when
building the groups. Grouping does not imply any ordering, so we can compare
the keys in arbitrary order, and a Hash Agg leverages this. But for Group Agg,
we simply compared keys in the order specified in the query. This commit
explores alternative ordering of the keys, trying to find a cheaper one.

In principle, we might generate grouping paths for all permutations of the keys
and leave the rest to the optimizer. But that might get very expensive, so we
try to pick only a couple interesting orderings based on both local and global
information.

When planning the grouping path, we explore statistics (number of distinct
values, cost of the comparison function) for the keys and reorder them
to minimize comparison costs. Intuitively, it may be better to perform more
expensive comparisons (for complex data types, etc.) last because maybe
the cheaper comparisons will be enough. Similarly, the higher the cardinality
of a key, the lower the probability we'll need to compare more keys. The patch
generates and costs various orderings, picking the cheapest ones.

The ordering of group keys may interact with other parts of the query, some of
which may not be known while planning the grouping. For example, there may be
an explicit ORDER BY clause or some other ordering-dependent operation higher up
in the query, and using the same ordering may allow using either incremental
sort or even eliminating the sort entirely.

The patch generates orderings and picks those, minimizing the comparison cost
(for various path keys), and then adds orderings that might be useful for
operations higher up in the plan (ORDER BY, etc.). Finally, it always keeps
the ordering specified in the query, assuming the user might have additional
insights.

This introduces a new GUC enable_group_by_reordering so that the optimization
may be disabled if needed.
---
 .../postgres_fdw/expected/postgres_fdw.out|  36 +-
 src/backend/optimizer/path/costsize.c | 363 ++-
 src/backend/optimizer/path/equivclass.c   |  13 +-
 src/backend/optimizer/path/pathkeys.c | 602 ++
 src/backend/optimizer/plan/planner.c  | 465 --
 src/backend/optimizer/util/pathnode.c |   4 +-
 src/backend/utils/adt/selfuncs.c  |  38 +-
 src/backend/utils/misc/guc_tables.c   |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/nodes/pathnodes.h |  10 +
 src/include/optimizer/cost.h  |   4 +-
 src/include/optimizer/paths.h |   7 +
 src/include/utils/selfuncs.h  |   5 +
 src/test/regress/expected/aggregates.out  | 244 ++-
 .../regress/expected/incremental_sort.out |   2 +-
 src/test/regress/expected/join.out|  51 +-
 src/test/regress/expected/merge.out   |  15 +-
 .../regress/expected/partition_aggregate.out  |  44 +-
 src/test/regress/expected/partition_join.out  |  75 +--
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/expected/union.out   |  60 +-
 src/test/regress/sql/aggregates.sql   |  99 +++
 src/test/regress/sql/incremental_sort.sql |   2 +-
 23 files changed, 1771 insertions(+), 382 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index 144c114d0f..63af7feabe 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -2319,18 +2319,21 @@ SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT 
DISTINCT t2.c1, t3.c1 FROM
 -- join with pseudoconstant quals
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1 AND CURRENT_USER 
= SESSION_USER) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
-   
QUERY PLAN  
 
-
+

Re: JSON Path and GIN Questions

2023-09-12 Thread Erik Rijkers


Op 9/13/23 om 03:00 schreef Erik Wienhold:

Hi David,

On 13/09/2023 02:16 CEST David E. Wheeler  wrote:


CREATE TABLE MOVIES (id SERIAL PRIMARY KEY, movie JSONB NOT NULL);
\copy movies(movie) from PROGRAM 'curl -s 
https://raw.githubusercontent.com/prust/wikipedia-movie-data/master/movies.json | jq -c 
".[]" | sed "s|||g"';
create index on movies using gin (movie);
analyze movies;

I have been confused as to the difference between @@ vs @?: Why do these
return different results?

david=# select id from movies where movie @@ '$ ?(@.title == "New Life 
Rescue")';
  id

(0 rows)

david=# select id from movies where movie @? '$ ?(@.title == "New Life 
Rescue")';
  id

  10
(1 row)

I posted this question on Stack Overflow 
(https://stackoverflow.com/q/77046554/79202),
and from the suggestion I got there, it seems that @@ expects a boolean to be
returned by the path query, while @? wraps it in an implicit exists(). Is that
right?


That's also my understanding.  We had a discussion about the docs on @@, @?, and
jsonb_path_query on -general a while back [1].  Maybe it's useful also.


If so, I’d like to submit a patch to the docs talking about this, and
suggesting the use of jsonb_path_query() to test paths to see if they return
a boolean or not.


+1

[1] 
https://www.postgresql.org/message-id/CACJufxE01sxgvtG4QEvRZPzs_roggsZeVvBSGpjM5tzE5hMCLA%40mail.gmail.com

--
Erik



"All use of json*() functions preclude index usage."

That sentence is missing from the documentation.


Erik Rijkers

Re: subscription TAP test has unused $result

2023-09-12 Thread Amit Kapila

On Wed, Sep 13, 2023 at 8:43 AM Peter Smith  wrote:
>
> Yesterday noticed a TAP test assignment to an unused $result.
>
> PSA patch to remove that.
>

Though it is harmless I think we can clean it up. Your patch looks good to me.


-- 
With Regards,
Amit Kapila.

Re: Jumble the CALL command in pg_stat_statements

2023-09-12 Thread Michael Paquier

On Wed, Sep 13, 2023 at 12:48:48AM +, Imseih (AWS), Sami wrote:
> The patch also modifies existing test cases for CALL handling in 
> pg_stat_statements
> and adds additional tests which prove that a CALL to an overloaded procedure
> will generate a different query_id.

+CALL overload(1);
+CALL overload('A');
[...]
+ 1 |0 | CALL overload($1)
+ 1 |0 | CALL overload($1)

That's not surprising to me.  We've historically relied on the
function OID in the jumbling of a FuncExpr, so I'm OK with that.  This
may look a bit surprising though if you have a schema that enforces
the same function name for several data types.  Having a DEFAULT does
this:
CREATE OR REPLACE PROCEDURE overload(i text, j bool DEFAULT true) AS
$$ DECLARE
  r text;
BEGIN
  SELECT i::text INTO r;
END; $$ LANGUAGE plpgsql;

Then with these three, and a jumbling based on the OID gives:
+CALL overload(1);
+CALL overload('A');
+CALL overload('A', false);
[...]
- 1 |0 | CALL overload($1)
+ 2 |0 | CALL overload($1)

Still this grouping is much better than having thousands of entries
with different values.  I am not sure if we should bother improving
that more than what you suggest that, especially as FuncExpr->args can
itself include Const nodes as far as I recall.

> As far as the SET command mentioned in [1] is concerned, it is a bit more 
> complex
> as it requires us to deal with A_Constants which is not very straightforward. 
> We can surely
> deal with SET currently by applying custom query jumbling logic to 
> VariableSetStmt,
> but this can be dealt with in a separate discussion.

As VariableSetStmt is the top-most node structure for SET/RESET
commands, using a custom implementation may be wise in this case,
particularly for the args made of A_Const.  I don't really want to go
down to the internals of A_Const outside its internal implementation,
as these can be used for some queries where there are multiple
keywords separated by whitespaces for one single A_Const, like
isolation level values in transaction commands.  This would lead to
appending the dollar-based variables in weird ways for some patterns.
Cough.

/* transformed output-argument expressions */
-   List   *outargs pg_node_attr(query_jumble_ignore);
+   List   *outargs;

This choice is a bit surprising.  How does it influence the jumbling?
For example, if I add a query_jumble_ignore to it, the regression
tests of pg_stat_statements still pass.  This is going to require more
test coverage to prove that this addition is useful.
--
Michael

signature.asc
Description: PGP signature

Re: Query execution in Perl TAP tests needs work

2023-09-12 Thread Andres Freund

Hi,

On 2023-08-28 17:29:56 +1200, Thomas Munro wrote:
> As an experiment, I hacked up a not-good-enough-to-share experiment
> where $node->safe_psql() would automatically cache a BackgroundPsql
> object and reuse it, and the times for that test dropped ~51 -> ~9s on
> Windows, and ~7 -> ~2s on the Unixen.  But even that seems non-ideal
> (well it's certainly non-ideal the way I hacked it up anyway...).  I
> suppose there are quite a few ways we could do better:
> 
> 1.  Don't fork anything at all: open (and cache) a connection directly
> from Perl.
> 1a.  Write xsub or ffi bindings for libpq.  Or vendor (parts) of the
> popular Perl xsub library?
> 1b.  Write our own mini pure-perl pq client module.  Or vendor (parts)
> of some existing one.
> 2.  Use long-lived psql sessions.
> 2a.  Something building on BackgroundPsql.
> 2b.  Maybe give psql or a new libpq-wrapper a new low level stdio/pipe
> protocol that is more fun to talk to from Perl/machines?

While we can't easily use plain long-lived psql everywhere, due to tests
depending on no additional connections being present, we could at least
partially address that by adding a \disconnect to psql.  Based on your numbers
the subprocess that IPC::Run wraps around psql on windows is a substantial
part of the overhead.  Even if we default to reconnecting after every
->psql(), just saving the fork of the wrapper process and psql should reduce
costs substantially.

Famous last words, but it seems like that it should be quite doable to add
that to psql and use it in Cluster->{psql,safe_psql,poll_query_until}? There
might be a few cases that expect the full connection error message, but I
can't imagine that to be too many?

Greetings,

Andres Freund

Re: Make psql's qeury canceling test simple by using signal() routine of IPC::Run

2023-09-12 Thread Yugo NAGATA

On Wed, 13 Sep 2023 10:20:23 +0900
Michael Paquier  wrote:

> On Tue, Sep 12, 2023 at 03:18:05PM +0900, Michael Paquier wrote:
> > On Wed, Sep 06, 2023 at 12:45:24AM +0900, Yugo NAGATA wrote:
> > > I attached the update patch. I removed the incorrect comments and
> > > unnecessary lines. Also,  I rewrote the test to use "skip_all" instead
> > > of SKIP because we skip the whole test rather than a part of it.
> > 
> > Thanks for checking how IPC::Run behaves in this case on Windows!
> > 
> > Right.  This test is currently setting up a node for nothing, so let's
> > skip this test entirely under $windows_os and move on.  I'll backpatch
> > that down to 15 once the embargo on REL_16_STABLE is lifted with the
> > 16.0 tag.
> 
> At the end, I have split this change into two:
> - One to disable the test to run on Windows, skipping the wasted node
> initialization, and applied that down to 15.
> - One to switch to signal(), only for HEAD to see what happens in the
> buildfarm once the test is able to run on platforms that do not
> support PPID.  I am wondering as well how IPC::Run::signal is stable,
> as it is the first time we would use it, AFAIK.

Thank you for pushing them.
I agree with the suspection about IPC::Run::singnal, so I'll also
check the buildfarm result.

Regards,
Yugo Nagata

> --
> Michael


-- 
Yugo NAGATA

Re: Add 'worker_type' to pg_stat_subscription

2023-09-12 Thread Michael Paquier

On Tue, Sep 12, 2023 at 07:00:14PM +1000, Peter Smith wrote:
> Right. I found just a single test currently using pg_stat_subscription
> catalog. I added a worker_type check for that.

This looks enough to me, thanks!
--
Michael


signature.asc
Description: PGP signature

Re: Avoid a possible null pointer (src/backend/utils/adt/pg_locale.c)

2023-09-12 Thread Michael Paquier

On Wed, Sep 13, 2023 at 09:59:22AM +0900, Kyotaro Horiguchi wrote:
> At Tue, 12 Sep 2023 09:03:27 +0900, Michael Paquier  
> wrote in 
> > On Mon, Sep 11, 2023 at 12:15:49PM -0700, Jeff Davis wrote:
> > > That's fine with me.
> > 
> > Okay.  Then, please find attached a v4 for HEAD and REL_16_STABLE.
> 
> For example, they result in the following message:
> 
> ERROR:  unsupported collprovider (pg_strcoll): i
> 
> Even if it is an elog message, I believe we can make it clearer. The
> pg_strcoll seems like a collation privier at first glance. Not sure
> about others, though, I would spell it as the follows instead:
> 
> ERROR:  unsupported collprovider in pg_strcoll: i
> ERROR:  unsupported collprovider in pg_strcoll(): i

Hmm.  I see your point, one could be confused that the function name
is the provider with this wording.  How about that instead:
 ERROR:  unsupported collprovider for %s: %c

I've hidden that in a macro that uses __func__ as Jeff has suggested.

What do you think?
--
Michael
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index aa9da99308..d5003da417 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -81,6 +81,10 @@
 #include 
 #endif
 
+/* Error triggered for locale-sensitive subroutines */
+#define		PGLOCALE_SUPPORT_ERROR(provider) \
+	elog(ERROR, "unsupported collprovider for %s: %c", __func__, provider)
+
 /*
  * This should be large enough that most strings will fit, but small enough
  * that we feel comfortable putting it on the stack
@@ -2031,7 +2035,7 @@ pg_strcoll(const char *arg1, const char *arg2, pg_locale_t locale)
 #endif
 	else
 		/* shouldn't happen */
-		elog(ERROR, "unsupported collprovider: %c", locale->provider);
+		PGLOCALE_SUPPORT_ERROR(locale->provider);
 
 	return result;
 }
@@ -2067,7 +2071,7 @@ pg_strncoll(const char *arg1, size_t len1, const char *arg2, size_t len2,
 #endif
 	else
 		/* shouldn't happen */
-		elog(ERROR, "unsupported collprovider: %c", locale->provider);
+		PGLOCALE_SUPPORT_ERROR(locale->provider);
 
 	return result;
 }
@@ -2086,7 +2090,7 @@ pg_strxfrm_libc(char *dest, const char *src, size_t destsize,
 		return strxfrm(dest, src, destsize);
 #else
 	/* shouldn't happen */
-	elog(ERROR, "unsupported collprovider: %c", locale->provider);
+	PGLOCALE_SUPPORT_ERROR(locale->provider);
 	return 0;	/* keep compiler quiet */
 #endif
 }
@@ -2282,7 +2286,7 @@ pg_strxfrm_enabled(pg_locale_t locale)
 		return true;
 	else
 		/* shouldn't happen */
-		elog(ERROR, "unsupported collprovider: %c", locale->provider);
+		PGLOCALE_SUPPORT_ERROR(locale->provider);
 
 	return false;/* keep compiler quiet */
 }
@@ -2314,7 +2318,7 @@ pg_strxfrm(char *dest, const char *src, size_t destsize, pg_locale_t locale)
 #endif
 	else
 		/* shouldn't happen */
-		elog(ERROR, "unsupported collprovider: %c", locale->provider);
+		PGLOCALE_SUPPORT_ERROR(locale->provider);
 
 	return result;
 }
@@ -2351,7 +2355,7 @@ pg_strnxfrm(char *dest, size_t destsize, const char *src, size_t srclen,
 #endif
 	else
 		/* shouldn't happen */
-		elog(ERROR, "unsupported collprovider: %c", locale->provider);
+		PGLOCALE_SUPPORT_ERROR(locale->provider);
 
 	return result;
 }
@@ -2369,7 +2373,7 @@ pg_strxfrm_prefix_enabled(pg_locale_t locale)
 		return true;
 	else
 		/* shouldn't happen */
-		elog(ERROR, "unsupported collprovider: %c", locale->provider);
+		PGLOCALE_SUPPORT_ERROR(locale->provider);
 
 	return false;/* keep compiler quiet */
 }
@@ -2393,16 +2397,14 @@ pg_strxfrm_prefix(char *dest, const char *src, size_t destsize,
 {
 	size_t		result = 0;		/* keep compiler quiet */
 
-	if (!locale || locale->provider == COLLPROVIDER_LIBC)
-		elog(ERROR, "collprovider '%c' does not support pg_strxfrm_prefix()",
-			 locale->provider);
+	if (!locale)
+		PGLOCALE_SUPPORT_ERROR(COLLPROVIDER_LIBC);
 #ifdef USE_ICU
 	else if (locale->provider == COLLPROVIDER_ICU)
 		result = pg_strnxfrm_prefix_icu(dest, src, -1, destsize, locale);
 #endif
 	else
-		/* shouldn't happen */
-		elog(ERROR, "unsupported collprovider: %c", locale->provider);
+		PGLOCALE_SUPPORT_ERROR(locale->provider);
 
 	return result;
 }
@@ -2430,16 +2432,14 @@ pg_strnxfrm_prefix(char *dest, size_t destsize, const char *src,
 {
 	size_t		result = 0;		/* keep compiler quiet */
 
-	if (!locale || locale->provider == COLLPROVIDER_LIBC)
-		elog(ERROR, "collprovider '%c' does not support pg_strnxfrm_prefix()",
-			 locale->provider);
+	if (!locale)
+		PGLOCALE_SUPPORT_ERROR(COLLPROVIDER_LIBC);
 #ifdef USE_ICU
 	else if (locale->provider == COLLPROVIDER_ICU)
 		result = pg_strnxfrm_prefix_icu(dest, src, -1, destsize, locale);
 #endif
 	else
-		/* shouldn't happen */
-		elog(ERROR, "unsupported collprovider: %c", locale->provider);
+		PGLOCALE_SUPPORT_ERROR(locale->provider);
 
 	return result;
 }


signature.asc
Description: PGP signature

Re: Avoid a possible null pointer (src/backend/utils/adt/pg_locale.c)

2023-09-12 Thread Michael Paquier

On Tue, Sep 12, 2023 at 09:40:04PM -0300, Ranier Vilela wrote:
> I think that is not a good idea.

Hm?  We already use __func__ across the tree even on Windows and
nobody has complained about that.  Using a macro for the elog()
generated would be slightly more elegant, actually.
--
Michael


signature.asc
Description: PGP signature

Re: PSQL error: total cell count of XXX exceeded

2023-09-12 Thread Hongxu Ma

Thanks for pointing that, I did miss some other "ncols * nrows" places. 
Uploaded v3 patch to fix them.

As for the Windows, I didn't test it before but I think it should also have the 
issue (and happens more possible since `cellsadded` is also a long type).
My fix idea is simple: define a common long64 type for it.
I referred MSDN: only `LONGLONG` and `LONG64` are 64 bytes. And I assume 
Postgres should already have a similar type, but only found `typedef long int 
int64` in src/include/c.h, looks it's not a proper choose.
@Michael Paquier, could you help to give some 
advices here (which type should be used? or should define a new one?). Thank 
you very much.

From: Tom Lane 
Sent: Tuesday, September 12, 2023 12:19
To: Michael Paquier 
Cc: Hongxu Ma ; Jelte Fennema ; David 
G. Johnston ; PostgreSQL Hackers 

Subject: Re: PSQL error: total cell count of XXX exceeded

Michael Paquier  writes:
> On Tue, Sep 12, 2023 at 02:39:55AM +, Hongxu Ma wrote:
> +   long total_cells;

> long is 4 bytes on Windows, and 8 bytes basically elsewhere.  So you
> would still have the same problem on Windows, no?

More to the point: what about the multiplication in printTableInit?
The cat's been out of the bag for quite some time before we get to
printTableAddCell.

I'm more than a bit skeptical about trying to do something about this,
simply because this range of query result sizes is far past what is
practical.  The OP clearly hasn't tested his patch on actually
overflowing query results, and I don't care to either.

regards, tom lane

v3-0001-Using-long-type-in-printTableAddCell.patch
Description: v3-0001-Using-long-type-in-printTableAddCell.patch

subscription TAP test has unused $result

2023-09-12 Thread Peter Smith

Yesterday noticed a TAP test assignment to an unused $result.

PSA patch to remove that.

--
Kind Regards,
Peter Smith.
Fujitsu Australia


v1-0001-remove-redundant-result-assignment.patch
Description: Binary data

Re: Is the member name of hashctl inappropriate？

2023-09-12 Thread Tom Lane

ywgrit  writes:
> According to the description, the keycopy function only copies the key, but
> in reality it copies the entire entry, i.e., the key and the value,

On what grounds do you claim that?  dynahash.c only ever passes "keysize"
as the size parameter.

regards, tom lane

Re: Make psql's qeury canceling test simple by using signal() routine of IPC::Run

2023-09-12 Thread Michael Paquier

On Tue, Sep 12, 2023 at 03:18:05PM +0900, Michael Paquier wrote:
> On Wed, Sep 06, 2023 at 12:45:24AM +0900, Yugo NAGATA wrote:
> > I attached the update patch. I removed the incorrect comments and
> > unnecessary lines. Also,  I rewrote the test to use "skip_all" instead
> > of SKIP because we skip the whole test rather than a part of it.
> 
> Thanks for checking how IPC::Run behaves in this case on Windows!
> 
> Right.  This test is currently setting up a node for nothing, so let's
> skip this test entirely under $windows_os and move on.  I'll backpatch
> that down to 15 once the embargo on REL_16_STABLE is lifted with the
> 16.0 tag.

At the end, I have split this change into two:
- One to disable the test to run on Windows, skipping the wasted node
initialization, and applied that down to 15.
- One to switch to signal(), only for HEAD to see what happens in the
buildfarm once the test is able to run on platforms that do not
support PPID.  I am wondering as well how IPC::Run::signal is stable,
as it is the first time we would use it, AFAIK.
--
Michael


signature.asc
Description: PGP signature

Tab completion for ATTACH PARTITION

2023-09-12 Thread bt23nguyent


Hi,

Currently, the psql's tab completion feature does not support properly 
for ATTACH PARTITION. When  key is typed after "ALTER TABLE 
 ATTACH PARTITION ", all possible table names should be 
displayed, however, foreign table names are not displayed. So I created 
a patch that addresses this issue by ensuring that psql displays not 
only normal table names but also foreign table names in this case.


Any kind of feedback is appreciated.

Best,
Tung Nguyendiff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c  
index 779fdc90cb..174cf5e1dd 100644 
--- a/src/bin/psql/tab-complete.c   
+++ b/src/bin/psql/tab-complete.c   
@@ -865,6 +865,18 @@ static const SchemaQuery Query_for_list_of_truncatables = {
.result = "c.relname",  
 }; 

+/* Relations supporting ATTACH PARTITION */
+static const SchemaQuery Query_for_list_of_attachables = { 
+   .catname = "pg_catalog.pg_class c", 
+   .selcondition = 
+   "c.relkind IN (" CppAsString2(RELKIND_RELATION) ", "
+   CppAsString2(RELKIND_FOREIGN_TABLE) ", "
+   CppAsString2(RELKIND_PARTITIONED_TABLE) ")",
+   .viscondition = "pg_catalog.pg_table_is_visible(c.oid)",
+   .namespace = "c.relnamespace",  
+   .result = "c.relname",  
+}; 
+   
 /* Relations supporting GRANT are currently same as those supporting SELECT */ 
 #define Query_for_list_of_grantables Query_for_list_of_selectables 

@@ -2567,7 +2579,7 @@ psql_completion(const char *text, int start, int end) 
 * tables.  
 */ 
else if (Matches("ALTER", "TABLE", MatchAny, "ATTACH", "PARTITION"))

Is the member name of hashctl inappropriate？

2023-09-12 Thread ywgrit

The definition of hashctl is shown below

typedef struct HASHCTL
{
  long   num_partitions; /* # partitions (must be power of 2) */
  long   ssize; /* segment size */
  long   dsize; /* (initial) directory size */
  long   max_dsize; /* limit to dsize if dir
size is limited */
  long   ffactor;   /* fill factor */
  Size   keysize;   /* hash key length in bytes */
  Size   entrysize; /* total user element size
in bytes */
  HashValueFunc hash; /* hash function */
  HashCompareFunc match; /* key comparison function */
  HashCopyFunc keycopy;   /* key copying function */
  HashAllocFunc alloc;   /* memory allocator */
  MemoryContext hcxt; /* memory context to use
for allocations */
  HASHHDR   *hctl;   /* location of header in
shared mem */
} HASHCTL;


/*
* Key copying functions must have this signature. The return value is not
* used. (The definition is set up to allow memcpy() and strlcpy() to be
* used directly.)
*/
typedef void *(*HashCopyFunc) (void *dest, const void *src, Size keysize);

According to the description, the keycopy function only copies the key, but
in reality it copies the entire entry, i.e., the key and the value, is the
name wrong? This may make the developer pass in an inappropriate keycopy
parameter when creating the htab.

Thanks

Re: JSON Path and GIN Questions

2023-09-12 Thread Erik Wienhold

Hi David,

On 13/09/2023 02:16 CEST David E. Wheeler  wrote:

> CREATE TABLE MOVIES (id SERIAL PRIMARY KEY, movie JSONB NOT NULL);
> \copy movies(movie) from PROGRAM 'curl -s 
> https://raw.githubusercontent.com/prust/wikipedia-movie-data/master/movies.json
>  | jq -c ".[]" | sed "s|||g"';
> create index on movies using gin (movie);
> analyze movies;
>
> I have been confused as to the difference between @@ vs @?: Why do these
> return different results?
>
> david=# select id from movies where movie @@ '$ ?(@.title == "New Life 
> Rescue")';
>  id
> 
> (0 rows)
>
> david=# select id from movies where movie @? '$ ?(@.title == "New Life 
> Rescue")';
>  id
> 
>  10
> (1 row)
>
> I posted this question on Stack Overflow 
> (https://stackoverflow.com/q/77046554/79202),
> and from the suggestion I got there, it seems that @@ expects a boolean to be
> returned by the path query, while @? wraps it in an implicit exists(). Is that
> right?

That's also my understanding.  We had a discussion about the docs on @@, @?, and
jsonb_path_query on -general a while back [1].  Maybe it's useful also.

> If so, I’d like to submit a patch to the docs talking about this, and
> suggesting the use of jsonb_path_query() to test paths to see if they return
> a boolean or not.

+1

[1] 
https://www.postgresql.org/message-id/CACJufxE01sxgvtG4QEvRZPzs_roggsZeVvBSGpjM5tzE5hMCLA%40mail.gmail.com

--
Erik

Re: Avoid a possible null pointer (src/backend/utils/adt/pg_locale.c)

2023-09-12 Thread Kyotaro Horiguchi

At Tue, 12 Sep 2023 09:03:27 +0900, Michael Paquier  wrote 
in 
> On Mon, Sep 11, 2023 at 12:15:49PM -0700, Jeff Davis wrote:
> > That's fine with me.
> 
> Okay.  Then, please find attached a v4 for HEAD and REL_16_STABLE.

For example, they result in the following message:

ERROR:  unsupported collprovider (pg_strcoll): i

Even if it is an elog message, I believe we can make it clearer. The
pg_strcoll seems like a collation privier at first glance. Not sure
about others, though, I would spell it as the follows instead:

ERROR:  unsupported collprovider in pg_strcoll: i
ERROR:  unsupported collprovider in pg_strcoll(): i

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Jumble the CALL command in pg_stat_statements

2023-09-12 Thread Imseih (AWS), Sami

Hi,

The proposal by Bertrand in CC to jumble CALL and SET in [1] was
rejected at the time for a more robust solution to jumble DDL.

Michael also in CC made this possible with commit 3db72ebcbe.

The attached patch takes advantage of the jumbling infrastructure
added in the above mentioned commit and jumbles the CALL statement
in pg_stat_statements.

The patch also modifies existing test cases for CALL handling in 
pg_stat_statements
and adds additional tests which prove that a CALL to an overloaded procedure
will generate a different query_id.

As far as the SET command mentioned in [1] is concerned, it is a bit more 
complex
as it requires us to deal with A_Constants which is not very straightforward. 
We can surely
deal with SET currently by applying custom query jumbling logic to 
VariableSetStmt,
but this can be dealt with in a separate discussion.


Regards,

Sami Imseih
Amazon Web Services (AWS)

[1] 
https://www.postgresql.org/message-id/flat/36e5bffe-e989-194f-85c8-06e7bc88e6f7%40amazon.com


0001-v1-Jumble-the-CALL-command-in-pg_stat_statements.patch
Description: 0001-v1-Jumble-the-CALL-command-in-pg_stat_statements.patch

Re: Avoid a possible null pointer (src/backend/utils/adt/pg_locale.c)

2023-09-12 Thread Ranier Vilela

Em ter., 12 de set. de 2023 às 17:51, Jeff Davis 
escreveu:

> On Tue, 2023-09-12 at 09:03 +0900, Michael Paquier wrote:
> > On Mon, Sep 11, 2023 at 12:15:49PM -0700, Jeff Davis wrote:
> > > That's fine with me.
> >
> > Okay.  Then, please find attached a v4 for HEAD and REL_16_STABLE.
>
> One question: would it make sense to use __func__?
>
According to the msvc documentation, __func__ requires C++11.
https://learn.microsoft.com/en-us/cpp/cpp/func?view=msvc-170

I think that is not a good idea.

best regards,
Ranier Vilela

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-12 Thread Peter Smith

Hi Kuroda-san. Here are my review comments for patch v36-0002.

==
doc/src/sgml/ref/pgupgrade.sgml

1.
Configure the servers for log shipping.  (You do not need to run
pg_backup_start() and
pg_backup_stop()
or take a file system backup as the standbys are still synchronized
-   with the primary.)  Replication slots are not copied and must
-   be recreated.
+   with the primary.) Only logical slots on the primary are copied to the
+   new standby, and other other slots on the old standby must be recreated
+   as they are not copied.
   

IMO this text still needs some minor changes like shown below, Anyway,
there is a typo: /other other/

SUGGESTION
Only logical slots on the primary are copied to the new standby, but
other slots on the old standby are not copied so must be recreated
manually.

==
src/bin/pg_upgrade/server.c

2.
+ *
+ * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+ * checkpointer process.  If WALs required by logical replication slots are
+ * removed, the slots are unusable.  The setting ensures that such WAL
+ * records have remained so that invalidation of slots would be avoided
+ * during the upgrade.

The comment already explained the reason for the setting is to prevent
removing the needed WAL records, so I felt there is no need for the
last sentence to repeat the same information.

BEFORE
The setting ensures that such WAL records have remained so that
invalidation of slots would be avoided during the upgrade.

SUGGESTION
This setting prevents the invalidation of slots during the upgrade.

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Document that server will start even if it's unable to open some TCP/IP ports

2023-09-12 Thread Gurjeet Singh

On Fri, Sep 8, 2023 at 7:52 AM Bruce Momjian  wrote:
>
> On Thu, Sep  7, 2023 at 09:21:07PM -0700, Nathan Bossart wrote:
> > On Thu, Sep 07, 2023 at 07:13:44PM -0400, Bruce Momjian wrote:
> > > On Thu, Sep  7, 2023 at 02:54:13PM -0700, Nathan Bossart wrote:
> > >> IMO the phrase "open a port" is kind of nonstandard.  I think we should 
> > >> say
> > >> something along the lines of
> > >>
> > >>If listen_addresses is not empty, the server will start only if it can
> > >>listen on at least one of the specified addresses.  A warning will be
> > >>emitted for any addresses that the server cannot listen on.
> > >
> > > Good idea, updated patch attached.
> >
> > I still think we should say "listen on an address" instead of "open a
> > port," but otherwise it LGTM.
>
> Agreed, I never liked the "port" mention.  I couldn't figure how to get
> "open" out of the warning sentence though.  Updated patch attached.

LGTM.

Best regards,
Gurjeet
http://Gurje.et

JSON Path and GIN Questions

2023-09-12 Thread David E. Wheeler

Greetings Hackers,

Been a while! I’m working on some experiments with JSONB columns and GIN 
indexes, and have operated on the assumption that JSON Path operations would 
take advantage of GIN indexes, with json_path_ops as a nice optimization. But 
I’ve run into what appear to be some inconsistencies and oddities I’m hoping to 
figure out with your help.

For the examples in this email, I’m using this simple table:

CREATE TABLE MOVIES (id SERIAL PRIMARY KEY, movie JSONB NOT NULL);
\copy movies(movie) from PROGRAM 'curl -s 
https://raw.githubusercontent.com/prust/wikipedia-movie-data/master/movies.json 
| jq -c ".[]" | sed "s|||g"';
create index on movies using gin (movie);
analyze movies;

That gives me a simple table with around 3600 rows. Not a lot of data, but 
hopefully enough to demonstrate the issues.

Issue 1: @@ vs @?
-

I have been confused as to the difference between @@ vs @?: Why do these return 
different results?

david=# select id from movies where movie @@ '$ ?(@.title == "New Life 
Rescue")';
 id

(0 rows)

david=# select id from movies where movie @? '$ ?(@.title == "New Life 
Rescue")';
 id

 10
(1 row)

I posted this question on Stack Overflow 
(https://stackoverflow.com/q/77046554/79202), and from the suggestion I got 
there, it seems that @@ expects a boolean to be returned by the path query, 
while @? wraps it in an implicit exists(). Is that right?

If so, I’d like to submit a patch to the docs talking about this, and 
suggesting the use of jsonb_path_query() to test paths to see if they return a 
boolean or not.


Issue 2: @? Index Use
-

From Oleg’s (happy belated birthday!) notes 
(https://github.com/obartunov/sqljsondoc/blob/master/jsonpath.md#jsonpath-operators):


> Operators @? and @@ are interchangeable:
> 
> js @? '$.a' <=> js @@ 'exists($.a)’
> js @@ '$.a == 1' <=> js @? '$ ? ($.a == 1)’

For the purposes of the above example, this appears to hold true: if I wrap the 
path query in exists(), @@ returns a result:

david=# select id from movies where movie @@ 'exists($ ?(@.title == "New Life 
Rescue"))';
 id

 10
(1 row)

Yay! However, @@ and @? don’t seem to use an index the same way: @@ uses a GIN 
index while @? does not.

Or, no, fiddling with it again just now, I think I have still been confusing 
these operators! @@ was using the index with an an explicit exists(), but @? 
was not…because I was still using an explicit exists.

In other words:

* @@ 'exists($ ?($.year == 1944))'  Uses the index
* @? '$ ?(@.year == 1944)'  Uses the index
* @? 'exists($ ?($.year == 1944))'  Does not use the index

That last one presumably doesn’t work, because there is an implicit exists() 
around the exists(), making it `exists(exists($ ?($.year == 1944)))`, which 
returns true for every row  (true and false both exists)! 臘‍♂️.

Anyway, if I have this right, I’d like to flesh out the docs a bit.

Issue 3: Index Use for Comparison
-

From the docs 
(https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING), I 
had assumed any JSON Path query would be able to use the GIN index. However 
while the use of the == JSON Path operator is able to take advantage of the GIN 
index, apparently the >= operator cannot:

david=# explain analyze select id from movies where movie @? '$ ?($.year >= 
2023)';
   QUERY PLAN   
  
-
 Seq Scan on movies  (cost=0.00..3741.41 rows=366 width=4) (actual 
time=34.815..36.259 rows=192 loops=1)
   Filter: (movie @? '$?($."year" >= 2023)'::jsonpath)
   Rows Removed by Filter: 36081
 Planning Time: 1.864 ms
 Execution Time: 36.338 ms
(5 rows)

Is this expected? Originally I tried with json_path_ops, which I can understand 
not working, since it stores hashes of paths, which would allow only exact 
matches. But a plain old GIN index doesn’t appear to work, either. Should it? 
Is there perhaps some other op class that would allow it to work? Or would I 
have to create a separate BTREE index on `movie -> 'year'`?

Thanks your your patience with my questions!

Best,

David



signature.asc
Description: Message signed with OpenPGP

Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

2023-09-12 Thread Andres Freund

Hi,

On 2023-08-30 10:57:10 +0200, Daniel Gustafsson wrote:
> > On 28 Aug 2023, at 14:32, Daniel Gustafsson  wrote:
> 
> > Attached is a patch with a quick PoC for using PQPing instead of using psql 
> > for
> > connection checks in pg_regress.
> 
> The attached v2 fixes a silly mistake which led to a compiler warning.

Still seems like a good idea to me. To see what impact it has, I measured the
time running the pg_regress tests that take less than 6s on my machine - I
excluded the slower ones (like the main regression tests) because they'd hide
any overall difference.

ninja && m test --suite setup --no-rebuild && tests=$(m test --no-rebuild 
--list|grep -E '/regress'|grep -vE 
'(regress|postgres_fdw|test_integerset|intarray|amcheck|test_decoding)/regress'|cut
 -d' ' -f 3) && time m test --no-rebuild $tests

Time for:

master:

cassert:
real0m5.265s
user0m8.422s
sys 0m8.381s

optimized:
real0m4.926s
user0m6.356s
sys 0m8.263s

my patch (probing every 100ms with psql):

cassert:
real0m3.465s
user0m8.827s
sys 0m8.579s

optimized:
real0m2.932s
user0m6.596s
sys 0m8.458s

Daniel's (probing every 50ms with PQping()):

cassert:
real0m3.347s
user0m8.373s
sys 0m8.354s

optimized:
real0m2.527s
user0m6.156s
sys 0m8.315s

My patch increased user/sys time a bit (likely due to a higher number of
futile psql forks), but Daniel's doesn't. And it does show a nice overall wall
clock time saving.

Greetings,

Andres Freund

Re: [17] CREATE SUBSCRIPTION ... SERVER

2023-09-12 Thread Jeff Davis

On Tue, 2023-09-05 at 12:08 -0700, Jeff Davis wrote:
> OK, so we could have a built-in FDW called pg_connection that would
> do
> the right kinds of validation; and then also allow other FDWs but the
> subscription would have to do its own validation.

While working on this, I found a minor bug and there's another
discussion happening here:

https://www.postgresql.org/message-id/e5892973ae2a80a1a3e0266806640dae3c428100.camel%40j-davis.com

It looks like that's going in the direction of checking for the
presence of a password in the connection string at connection time.

Ashutosh, that's compatible with your suggestion that CREATE
SUBSCRIPTION ... SERVER works for any FDW that supplies the right
information, because we need to validate it at connection time anyway.
I'll wait to see how that discussion gets resolved, and then I'll post
the next version.

Regards,
Jeff Davis

Pre-proposal: unicode normalized text

2023-09-12 Thread Jeff Davis



One of the frustrations with using the "C" locale (or any deterministic
locale) is that the following returns false:

  SELECT 'á' = 'á'; -- false

because those are the unicode sequences U&'\0061\0301' and U&'\00E1',
respectively, so memcmp() returns non-zero. But it's really the same
character with just a different representation, and if you normalize
them they are equal:

  SELECT normalize('á') = normalize('á'); -- true

The idea is to have a new data type, say "UTEXT", that normalizes the
input so that it can have an improved notion of equality while still
using memcmp().

Unicode guarantees that "the results of normalizing a string on one
version will always be the same as normalizing it on any other version,
as long as the string contains only assigned characters according to
both versions"[1]. It also guarantees that it "will not reallocate,
remove, or reassign" characters[2]. That means that we can normalize in
a forward-compatible way as long as we don't allow the use of
unassigned code points.

I looked at the standard to see what it had to say, and is discusses
normalization, but a standard UCS string with an unassigned code point
is not an error. Without a data type to enforce the constraint that
there are no unassigned code points, we can't guarantee forward
compatibility. Some other systems support NVARCHAR, but I didn't see
any guarantee of normalization or blocking unassigned code points
there, either.

UTEXT benefits:
  * slightly better natural language semantics than TEXT with
deterministic collation
  * still deterministic=true
  * fast memcmp()-based comparisons
  * no breaking semantic changes as unicode evolves

TEXT allows unassigned code points, and generally returns the same byte
sequences that were orgiinally entered; therefore UTEXT is not a
replacement for TEXT.

UTEXT could be built-in or it could be an extension or in contrib. If
an extension, we'd probably want to at least expose a function that can
detect unassigned code points, so that it's easy to be consistent with
the auto-generated unicode tables. I also notice that there already is
an unassigned code points table in saslprep.c, but it seems to be
frozen as of Unicode 3.2, and I'm not sure why.

Questions:

 * Would this be useful enough to justify a new data type? Would it be
confusing about when to choose one versus the other?
 * Would cross-type comparisons between TEXT and UTEXT become a major
problem that would reduce the utility?
 * Should "some_utext_value = some_text_value" coerce the LHS to TEXT
or the RHS to UTEXT?
 * Other comments or am I missing something?

Regards,
Jeff Davis


[1] https://unicode.org/reports/tr15/
[2] https://www.unicode.org/policies/stability_policy.html

Re: Row pattern recognition

2023-09-12 Thread Jacob Champion

On Mon, Sep 11, 2023 at 11:18 PM Tatsuo Ishii  wrote:
> What I am not sure about is, you and Vik mentioned that the
> traditional NFA is superior that POSIX NFA in terms of performance.
> But how "lexicographic ordering" is related to performance?

I think they're only tangentially related. POSIX NFAs have to fully
backtrack even after the first match is found, so that's where the
performance difference comes in. (We would be introducing new ways to
catastrophically backtrack if we used that approach.) But since you
don't visit every possible path through the graph with a traditional
NFA, it makes sense to define an order in which you visit the nodes,
so that you can reason about which string is actually going to be
matched in the end.

> BTW, attched is the v6 patch. The differences from v5 include:
>
> - Now aggregates can be used with RPR. Below is an example from the
>   regression test cases, which is added by v6 patch.

Great, thank you!

--Jacob

Re: Faster "SET search_path"

2023-09-12 Thread Jeff Davis

On Mon, 2023-08-07 at 15:39 -0700, Nathan Bossart wrote:
> 0003 is looking pretty good, too, but I think we
> should get some more eyes on it, given the complexity.

Attached rebased version of 0003.

-- 
Jeff Davis
PostgreSQL Contributor Team - AWS


From f5b055aea1bf08928de1bffcfd7b202e28847595 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Wed, 26 Jul 2023 12:55:09 -0700
Subject: [PATCH v5 3/3] Add cache for recomputeNamespacePath().

When search_path is changed to something that was previously set, and
no invalidation happened in between, use the cached list of namespace
OIDs rather than recomputing them. This avoids syscache lookups and
ACL checks.

Important when the search_path changes frequently, such as when set in
proconfig.

Discussion: https://postgr.es/m/04c8592dbd694e4114a3ed87139a7a04e4363030.camel%40j-davis.com
---
 src/backend/catalog/namespace.c | 349 ++--
 1 file changed, 286 insertions(+), 63 deletions(-)

diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 4ceae038ec..8b97b3b529 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -41,6 +41,7 @@
 #include "catalog/pg_ts_template.h"
 #include "catalog/pg_type.h"
 #include "commands/dbcommands.h"
+#include "common/hashfn.h"
 #include "funcapi.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
@@ -109,11 +110,12 @@
  * activeSearchPath is always the actually active path; it points to
  * to baseSearchPath which is the list derived from namespace_search_path.
  *
- * If baseSearchPathValid is false, then baseSearchPath (and other
- * derived variables) need to be recomputed from namespace_search_path.
- * We mark it invalid upon an assignment to namespace_search_path or receipt
- * of a syscache invalidation event for pg_namespace.  The recomputation
- * is done during the next lookup attempt.
+ * If baseSearchPathValid is false, then baseSearchPath (and other derived
+ * variables) need to be recomputed from namespace_search_path, or retrieved
+ * from the search path cache if there haven't been any syscache
+ * invalidations.  We mark it invalid upon an assignment to
+ * namespace_search_path or receipt of a syscache invalidation event for
+ * pg_namespace.  The recomputation is done during the next lookup attempt.
  *
  * Any namespaces mentioned in namespace_search_path that are not readable
  * by the current user ID are simply left out of baseSearchPath; so
@@ -153,6 +155,35 @@ static Oid	namespaceUser = InvalidOid;
 
 /* The above four values are valid only if baseSearchPathValid */
 static bool baseSearchPathValid = true;
+static bool searchPathCacheValid = false;
+static MemoryContext SearchPathCacheContext = NULL;
+
+typedef struct SearchPathCacheKey
+{
+	const char	*searchPath;
+	Oid			 roleid;
+} SearchPathCacheKey;
+
+typedef struct SearchPathCacheEntry
+{
+	SearchPathCacheKey	*key;
+	List*oidlist;	/* namespace OIDs that pass ACL checks */
+	List*finalPath;	/* cached final computed search path */
+	Oid	 firstNS;	/* first explicitly-listed namespace */
+	bool temp_missing;
+
+	/*
+	 * If true, some namespaces were previously filtered out of finalPath by a
+	 * namespace search hook. The next time it's read from the cache, we need
+	 * to recreate finalPath from the oidlist.
+	 */
+	bool hook_filtered;
+
+	/* needed for simplehash */
+	char status;
+} SearchPathCacheEntry;
+
+static SearchPathCacheEntry *CurrentSearchPathCacheEntry = NULL;
 
 /*
  * myTempNamespace is InvalidOid until and unless a TEMP namespace is set up
@@ -193,6 +224,45 @@ static bool MatchNamedCall(HeapTuple proctup, int nargs, List *argnames,
 		   bool include_out_arguments, int pronargs,
 		   int **argnumbers);
 
+/*
+ * Recomputing the namespace path can be costly when done frequently, such as
+ * when a function has search_path set in proconfig. Add a cache that can be
+ * used when the search_path changes to a value that was previously set, and
+ * no invalidation intervened.
+ *
+ * The cache is a simple hash table in its own memory context, with a key of
+ * search_path and role ID.
+ */
+
+static inline uint32
+nspcachekey_hash(SearchPathCacheKey *key)
+{
+	const unsigned char	*bytes = (const unsigned char *)key->searchPath;
+	int	 blen  = strlen(key->searchPath);
+
+	return hash_combine(hash_bytes(bytes, blen),
+		hash_uint32(key->roleid));
+}
+
+static inline bool
+nspcachekey_equal(SearchPathCacheKey *a, SearchPathCacheKey *b)
+{
+	return a->roleid == b->roleid &&
+		strcmp(a->searchPath, b->searchPath) == 0;
+}
+
+#define SH_PREFIX		nsphash
+#define SH_ELEMENT_TYPE	SearchPathCacheEntry
+#define SH_KEY_TYPE		SearchPathCacheKey *
+#define	SH_KEY			key
+#define SH_HASH_KEY(tb, key)   	nspcachekey_hash(key)
+#define SH_EQUAL(tb, a, b)		nspcachekey_equal(a, b)
+#define	SH_SCOPE		static inline
+#define SH_DECLARE
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+static nsphash_hash *SearchPathCache = NULL;
 
 /*

Re: Avoid a possible null pointer (src/backend/utils/adt/pg_locale.c)

2023-09-12 Thread Jeff Davis

On Tue, 2023-09-12 at 09:03 +0900, Michael Paquier wrote:
> On Mon, Sep 11, 2023 at 12:15:49PM -0700, Jeff Davis wrote:
> > That's fine with me.
> 
> Okay.  Then, please find attached a v4 for HEAD and REL_16_STABLE.

One question: would it make sense to use __func__?

Other than that, looks good. Thank you.

Regards,
Jeff Davis

Re: sslinfo extension - add notbefore and notafter timestamps

2023-09-12 Thread Jacob Champion

Hello,

On 7/25/23 07:21, Daniel Gustafsson wrote:
> The attached version passes ssl tests for me on 1.0.2 through OpenSSL Git 
> HEAD.

Tests pass for me too, including LibreSSL 3.8.

> +   /* Calculate the diff from the epoch to the certificat timestamp */

"certificate"

> + ssl_client_get_notbefore() returns text
> ...> + ssl_client_get_notafter() returns text

I think this should say timestamptz rather than text? Ditto for the
pg_stat_ssl documentation.

Speaking of which: is the use of `timestamp` rather than `timestamptz`
in pg_proc.dat intentional? Will that cause problems with comparisons?

--

I haven't been able to poke any holes in the ASN1_TIME_to_timestamp()
implementations themselves. I went down a rabbit hole trying to find out
whether leap seconds could cause problems for us when we switch to
`struct tm` in the future, but it turns out OpenSSL rejects leap seconds
in the Validity fields. That seems weird -- as far as I can tell, RFC
5280 defers to ASN.1 which defers to ISO 8601 which appears to allow
leap seconds -- but I don't plan to worry about it anymore. (I do idly
wonder whether some CA, somewhere, has ever had a really Unhappy New
Year due to that.)

Thanks,
--Jacob

Re: GenBKI emits useless open;close for catalogs without rows

2023-09-12 Thread Matthias van de Meent

On Fri, 1 Sept 2023 at 19:52, Tom Lane  wrote:
>
> Alvaro Herrera  writes:
> > On 2023-Sep-01, Matthias van de Meent wrote:
> >> A potential addition to the patch would to stop manually closing
> >> relations: initdb and check-world succeed without manual 'close'
> >> operations because the 'open' command auto-closes the previous open
> >> relation (in boot_openrel). Testing also suggests that the last opened
> >> relation apparently doesn't need closing - check-world succeeds
> >> without issues (incl. with TAP enabled). That is therefore implemented
> >> in attached patch 2 - it removes the 'close' syntax in its entirety.
>
> > Hmm, what happens with the last relation in the bootstrap process?  Is
> > closerel() called via some other path for that one?
>
> Taking a quick census of existing closerel() callers: there is
> cleanup() in bootstrap.c, but it's called uncomfortably late
> and outside any transaction, so I misdoubt that it works
> properly if asked to actually shoulder any responsibility.
> (A little code reshuffling could fix that.)
> There are also a couple of low-level elog warnings in CREATE
> that would likely get triggered, though I suppose we could just
> remove those elogs.

Yes, that should be easy to fix.

> I guess my reaction to this patch is "why bother?".  It seems
> unlikely to yield any measurable benefit, though of course
> that guess could be wrong.

There is a small but measurable decrease in size of the generated bki
(2kb with both patches, on an initial 945kB), and there is some
related code that can be eliminated. If that's not worth bothering,
then I can drop the patch. Otherwise, I can update the patch to do the
cleanup that was within the transaction boundaries at the end of
boot_yyparse.

If decreasing the size of postgres.bki is not worth the effort, I'll
drop any effort on doing so, but considering that it is about 1MB of
our uncompressed distributables, I'd say decreases in size are worth
the effort, most of the time.

Kind regards,

Matthias van de Meent

Re: Detoasting optionally to make Explain-Analyze less misleading

2023-09-12 Thread Tom Lane

Matthias van de Meent  writes:
> Hmm, maybe we should measure the overhead of serializing the tuples instead.
> The difference between your patch and "serializing the tuples, but not
> sending them" is that serializing also does the detoasting, but also
> includes any time spent in the serialization functions of the type. So
> an option "SERIALIZE" which measures all the time the server spent on
> the query (except the final step of sending the bytes to the client)
> would likely be more useful than "just" detoasting.

+1, that was my immediate reaction to the proposal as well.  Some
output functions are far from cheap.  Doing only the detoast part
seems like it's still misleading.

Do we need to go as far as offering both text-output and binary-output
options?

regards, tom lane

Re: Document that PG_TRY block cannot have a return statement

2023-09-12 Thread Tom Lane

Serpent  writes:
> I'm talking about this part:

> PG_TRY();
> {
>   ... code that might throw ereport(ERROR) ...
> }

Ah.  Your phrasing needs work for clarity then.  Also, "return"
is hardly the only way to break it; break, continue, or goto
leading out of the PG_TRY are other possibilities.  Maybe more
like "The XXX code must exit normally (by control reaching
the end) if it does not throw ereport(ERROR)."  Not quite sure
what to use for XXX.

regards, tom lane

Re: Detoasting optionally to make Explain-Analyze less misleading

2023-09-12 Thread stepan rutz


Hi Matthias,

thanks for your feedback.

I wasn't sure on the memory-contexts. I was actually also unsure on
whether the array

  TupleTableSlot.tts_isnull

is also set up correctly by the previous call to slot_getallattrs(slot).
This would allow to get rid of even more code from this patch, which is
in the loop and determines whether a field is null or not. I switched to
using tts_isnull from TupleTableSlot now, seems to be ok and the docs
say it is save to use.

Also I reuse the MemoryContext throughout the livetime of the receiver.
Not sure if that makes a difference. One more thing I noticed. During
explain.c the DestReceiver's destroy callback was never invoked. I added
a line to do that, however I am unsure whether this is the right place
or a good idea in the first place. This potentially affects plain
analyze calls as well, though seems to behave nicely. Even when I
explain (analyze) select * into 

This is the call I am talking about, which was missing from explain.c

  dest->rDestroy(dest);


Attached a new patch. Hoping for feedback,

Greetings, Stepan


On 12.09.23 14:25, Matthias van de Meent wrote:

On Tue, 12 Sept 2023 at 12:56, stepan rutz  wrote:

Hi,

I have fallen into this trap and others have too. If you run
EXPLAIN(ANALYZE) no de-toasting happens. This makes query-runtimes
differ a lot. The bigger point is that the average user expects more
from EXPLAIN(ANALYZE) than what it provides. This can be suprising. You
can force detoasting during explain with explicit calls to length(), but
that is tedious. Those of us who are forced to work using java stacks,
orms and still store mostly documents fall into this trap sooner or
later. I have already received some good feedback on this one, so this
is an issue that bother quite a few people out there.

Yes, the lack of being able to see the impact of detoasting (amongst
others) in EXPLAIN (ANALYZE) can hide performance issues.


It would be great to get some feedback on the subject and how to address
this, maybe in totally different ways.

Hmm, maybe we should measure the overhead of serializing the tuples instead.
The difference between your patch and "serializing the tuples, but not
sending them" is that serializing also does the detoasting, but also
includes any time spent in the serialization functions of the type. So
an option "SERIALIZE" which measures all the time the server spent on
the query (except the final step of sending the bytes to the client)
would likely be more useful than "just" detoasting.


0001_explain_analyze_and_detoast.patch

I notice that this patch creates and destroys a memory context for
every tuple that the DestReceiver receives. I think that's quite
wasteful, as you should be able to create only one memory context and
reset it before (or after) each processed tuple. That also reduces the
differences in measurements between EXPLAIN and normal query
processing of the tuples - after all, we don't create new memory
contexts for every tuple in the normal DestRemote receiver either,
right?

Kind regards,

Matthias van de Meent
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 8570b14f62..95a09dd4cb 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -47,7 +47,6 @@ ExplainOneQuery_hook_type ExplainOneQuery_hook = NULL;
 /* Hook for plugins to get control in explain_get_index_name() */
 explain_get_index_name_hook_type explain_get_index_name_hook = NULL;
 
-
 /* OR-able flags for ExplainXMLTag() */
 #define X_OPENING 0
 #define X_CLOSING 1
@@ -155,6 +154,8 @@ static void ExplainYAMLLineStarting(ExplainState *es);
 static void escape_yaml(StringInfo buf, const char *str);
 
 
+static DestReceiver *CreateDetoastDestReceiver(void);
+
 
 /*
  * ExplainQuery -
@@ -192,6 +193,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 			es->settings = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "generic_plan") == 0)
 			es->generic = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "detoast") == 0)
+			es->detoast = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "timing") == 0)
 		{
 			timing_set = true;
@@ -244,6 +247,12 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  errmsg("EXPLAIN option TIMING requires ANALYZE")));
 
+	/* check that detoast is used with EXPLAIN ANALYZE */
+	if (es->detoast && !es->analyze)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("EXPLAIN option DETOAST requires ANALYZE")));
+
 	/* check that GENERIC_PLAN is not used with EXPLAIN ANALYZE */
 	if (es->generic && es->analyze)
 		ereport(ERROR,
@@ -565,9 +574,13 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 	/*
 	 * Normally we discard the query's output, but if explaining CREATE TABLE
 	 * AS, we'd better use the appropriate tuple receiver.
+	 * Also if we choose to detoast during explain, we need to send the
+	 * tuples to a receiver

Re: Adding a pg_get_owned_sequence function?

2023-09-12 Thread Dagfinn Ilmari Mannsåker

Tom Lane  writes:

> Peter Eisentraut  writes:
>> Would it work to just overload pg_get_serial_sequence with regclass 
>> argument types?
>
> Probably not; the parser would have no principled way to resolve
> pg_get_serial_sequence('foo', 'bar') as one or the other.  I'm
> not sure offhand if it would throw error or just choose one, but
> if it just chooses one it'd likely be the text variant.

That works fine, and it prefers the text version:

~=# create function pg_get_serial_sequence(tbl regclass, col name)
returns regclass strict stable parallel safe
return pg_get_serial_sequence(tbl::text, col::text)::regclass;
CREATE FUNCTION

~=# select pg_typeof(pg_get_serial_sequence('identest', 'id'));
┌───┐
│ pg_typeof │
├───┤
│ text  │
└───┘
(1 row)

~=# select pg_typeof(pg_get_serial_sequence('identest', 'id'::name));
┌───┐
│ pg_typeof │
├───┤
│ regclass  │
└───┘
(1 row)

> It's possible that we could get away with just summarily changing
> the argument type from text to regclass.  ISTR that we did exactly
> that with nextval() years ago, and didn't get too much push-back.
> But we couldn't do the same for the return type.  Also, this
> approach does nothing for the concern about the name being
> misleading.

Maybe we should go all the way the other way, and call it
pg_get_identity_sequence() and claim that "serial" is a legacy form of
identity columns?

- ilmari

Re: remaining sql/json patches

2023-09-12 Thread Tom Lane

Peter Eisentraut  writes:
> On 06.09.23 17:01, Alvaro Herrera wrote:
>> Assert()ing that a pointer is not null, and in the next line
>> dereferencing that pointer, is useless: the process would crash anyway
>> at the time of dereference, so the Assert() adds no value.  Better to
>> leave the assert out.

> I don't think this is quite correct.  If you dereference a pointer, the 
> compiler may assume that it is not null and rearrange code accordingly. 
> So it might not crash.  Keeping the assertion would alter that assumption.

Uh ... only in assert-enabled builds.  If your claim is correct,
this'd result in different behavior in debug and production builds,
which would be even worse.  But I don't believe the claim.
I side with Alvaro's position here: such an assert is unhelpful.

regards, tom lane

Re: Adding a pg_get_owned_sequence function?

2023-09-12 Thread Tom Lane

Peter Eisentraut  writes:
> Would it work to just overload pg_get_serial_sequence with regclass 
> argument types?

Probably not; the parser would have no principled way to resolve
pg_get_serial_sequence('foo', 'bar') as one or the other.  I'm
not sure offhand if it would throw error or just choose one, but
if it just chooses one it'd likely be the text variant.

It's possible that we could get away with just summarily changing
the argument type from text to regclass.  ISTR that we did exactly
that with nextval() years ago, and didn't get too much push-back.
But we couldn't do the same for the return type.  Also, this
approach does nothing for the concern about the name being
misleading.

regards, tom lane

Re: Document that PG_TRY block cannot have a return statement

2023-09-12 Thread Tom Lane

Serpent  writes:
> I created a tiny patch that documents that the code block following
> PG_TRY() cannot have any return statement.

AFAIK, this is wrong.  The actual requirement is already stated
in the comment:

 * ... The error recovery code
 * can either do PG_RE_THROW to propagate the error outwards, or do a
 * (sub)transaction abort.

regards, tom lane

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-12 Thread Hayato Kuroda (Fujitsu)

Dear Hou,

Thank you for reviewing!

> 1.
> 
>  #include "access/transam.h"
>  #include "catalog/pg_language_d.h"
> +#include "fe_utils/string_utils.h"
>  #include "pg_upgrade.h"
> 
> It seems we don't need this head file anymore.

Removed.

> 2.
> + if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
> + elog(ERROR, "Replication slots must not be invalidated
> during the upgrade.");
> 
> I think normally the first letter is lowercase, and we can avoid the period.

Right, fixed. Also, a period is removed based on the rule. Apart from other 
detailed
messages, this just reports what happened.

```
if (nslots_on_old > max_replication_slots)
pg_fatal("max_replication_slots (%d) must be greater than or 
equal to the number of "
-"logical replication slots (%d) on the old 
cluster.",
+"logical replication slots (%d) on the old 
cluster",
 max_replication_slots, nslots_on_old);
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: pg_rewind WAL segments deletion pitfall

2023-09-12 Thread Alexander Kukushkin

Hi,

Please find attached v5.
What changed:
1. Now we collect which files should be kept  in a separate hash table.
2. Decision whether to keep the file is made only when the file is actually
missing on the source. That is, remaining WAL files will be copied over as
it currently is, although it could be extremely inefficient and unnecessary.
3. Added TAP test that actually at least one file isn't removed.

Regards,
--
Alexander Kukushkin
From 3e1e6c9d968e9b829357b6eb0a7dfa366b550668 Mon Sep 17 00:00:00 2001
From: Alexander Kukushkin 
Date: Tue, 12 Sep 2023 14:09:47 +0200
Subject: [PATCH v5] Be more picky with WAL segment deletion in pg_rewind

Make pg_rewind to be a bit wiser in terms of creating filemap:
preserve on the target all WAL segments that contain records between the
last common checkpoint and the point of divergence.

Co-authored-by: Polina Bungina 
---
 src/bin/pg_rewind/filemap.c   | 62 ++-
 src/bin/pg_rewind/filemap.h   |  4 ++
 src/bin/pg_rewind/parsexlog.c | 22 +++
 src/bin/pg_rewind/pg_rewind.c |  3 +
 src/bin/pg_rewind/t/010_keep_recycled_wals.pl | 62 +++
 5 files changed, 151 insertions(+), 2 deletions(-)
 create mode 100644 src/bin/pg_rewind/t/010_keep_recycled_wals.pl

diff --git a/src/bin/pg_rewind/filemap.c b/src/bin/pg_rewind/filemap.c
index 58280d9abc..d44d716d82 100644
--- a/src/bin/pg_rewind/filemap.c
+++ b/src/bin/pg_rewind/filemap.c
@@ -64,6 +64,27 @@ static file_entry_t *lookup_filehash_entry(const char *path);
 static int	final_filemap_cmp(const void *a, const void *b);
 static bool check_file_excluded(const char *path, bool is_source);
 
+typedef struct skipwal_t {
+	const char *path;
+	uint32		status;
+} skipwal_t;
+
+#define SH_PREFIX		keepwalhash
+#define SH_ELEMENT_TYPE	skipwal_t
+#define SH_KEY_TYPE		const char *
+#define	SH_KEY			path
+#define SH_HASH_KEY(tb, key)	hash_string_pointer(key)
+#define SH_EQUAL(tb, a, b)		(strcmp(a, b) == 0)
+#define	SH_SCOPE		static inline
+#define SH_RAW_ALLOCATOR	pg_malloc0
+#define SH_DECLARE
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+static keepwalhash_hash *keepwalhash = NULL;
+
+static bool keepwalhash_entry_exists(const char *path);
+
 /*
  * Definition of one element part of an exclusion list, used to exclude
  * contents when rewinding.  "name" is the name of the file or path to
@@ -207,6 +228,35 @@ lookup_filehash_entry(const char *path)
 	return filehash_lookup(filehash, path);
 }
 
+/* Initialize a hash table to store WAL file names that must be kept */
+void
+keepwalhash_init(void)
+{
+	keepwalhash = keepwalhash_create(FILEHASH_INITIAL_SIZE, NULL);
+}
+
+/* Prevent a given file deletion during rewind */
+void
+insert_keepwalhash_entry(const char *path)
+{
+	skipwal_t   *entry;
+	bool		found;
+
+	/* Should only be called with keepwalhash initialized */
+	Assert(keepwalhash);
+
+	entry = keepwalhash_insert(keepwalhash, path, );
+
+	if (!found)
+		entry->path = pg_strdup(path);
+}
+
+static bool
+keepwalhash_entry_exists(const char *path)
+{
+	return keepwalhash_lookup(keepwalhash, path) != NULL;
+}
+
 /*
  * Callback for processing source file list.
  *
@@ -682,7 +732,16 @@ decide_file_action(file_entry_t *entry)
 	}
 	else if (entry->target_exists && !entry->source_exists)
 	{
-		/* File exists in target, but not source. Remove it. */
+		/* File exists in target, but not source. */
+
+		if (keepwalhash_entry_exists(path))
+		{
+			/* This is a WAL file that should be kept. */
+			pg_log_debug("Not removing %s because it is required for recovery", path);
+			return FILE_ACTION_NONE;
+		}
+
+		/* Otherwise remove an unexpected file. */
 		return FILE_ACTION_REMOVE;
 	}
 	else if (!entry->target_exists && !entry->source_exists)
@@ -818,7 +877,6 @@ decide_file_actions(void)
 	return filemap;
 }
 
-
 /*
  * Helper function for filemap hash table.
  */
diff --git a/src/bin/pg_rewind/filemap.h b/src/bin/pg_rewind/filemap.h
index 48f240dff1..9a0d4bbc54 100644
--- a/src/bin/pg_rewind/filemap.h
+++ b/src/bin/pg_rewind/filemap.h
@@ -109,5 +109,9 @@ extern void process_target_wal_block_change(ForkNumber forknum,
 extern filemap_t *decide_file_actions(void);
 extern void calculate_totals(filemap_t *filemap);
 extern void print_filemap(filemap_t *filemap);
+extern void preserve_file(char *filepath);
+
+extern void keepwalhash_init(void);
+extern void insert_keepwalhash_entry(const char *path);
 
 #endif			/* FILEMAP_H */
diff --git a/src/bin/pg_rewind/parsexlog.c b/src/bin/pg_rewind/parsexlog.c
index 27782237d0..4c7f9bef14 100644
--- a/src/bin/pg_rewind/parsexlog.c
+++ b/src/bin/pg_rewind/parsexlog.c
@@ -176,6 +176,10 @@ findLastCheckpoint(const char *datadir, XLogRecPtr forkptr, int tliIndex,
 	char	   *errormsg;
 	XLogPageReadPrivate private;
 
+	/* Track WAL segments opened while searching a checkpoint */
+	XLogSegNo	segno = 0;
+	TimeLineID	tli = 0;
+
 	/*
 	 * The given fork pointer points to the end of the last common record,

RE: pg_upgrade and logical replication

2023-09-12 Thread Zhijie Hou (Fujitsu)

On Monday, September 11, 2023 6:32 PM vignesh C  wrote:
> 
> 
> The attached v7 patch has the changes for the same.

Thanks for updating the patch, here are few comments:

1.

+/*
+ * binary_upgrade_sub_replication_origin_advance
+ *
+ * Update the remote_lsn for the subscriber's replication origin.
+ */
+Datum
+binary_upgrade_sub_replication_origin_advance(PG_FUNCTION_ARGS)
+{

Is there any usage apart from pg_upgrade for this function, if not, I think
we'd better move this function to pg_upgrade_support.c. If yes, I think maybe
better to rename it to a general one.

2.

+ * Verify that all subscriptions have a valid remote_lsn and don't contain
+ * any table in srsubstate different than ready ('r').
+ */
+static void
+check_for_subscription_state(ClusterInfo *cluster)

I think we'd better follow the same style of
check_for_isn_and_int8_passing_mismatch() to record the invalid things in a
file.

3.

+   if (fout->dopt->binary_upgrade && fout->remoteVersion >= 10)
+   {
+   appendPQExpBuffer(query,
+ "SELECT 
binary_upgrade_create_sub_rel_state('%s', %u, '%c'",
+ subrinfo->dobj.name,

I think we'd better consider using appendStringLiteral or related function for
the dobj.name here to make sure the string convertion is safe.

4.

The following commit message may need update:
"binary_upgrade_create_sub_rel_state SQL function, and also provides an
additional LSN parameter for CREATE SUBSCRIPTION to restore the underlying
replication origin remote LSN. "

I think we have changed to another approach which doesn't provide new parameter
in DDL.

5. 
+   /* Fetch the existing tuple. */
+   tup = SearchSysCacheCopy2(SUBSCRIPTIONNAME, MyDatabaseId,
+ 
CStringGetDatum(subname));

Since we don't modify the tuple here, SearchSysCache2 seems enough.

6. 
+   "LEFT 
JOIN pg_catalog.pg_database d"
+   "  ON 
d.oid = s.subdbid "
+   "WHERE 
coalesce(remote_lsn, '0/0') = '0/0'");

For the subscriptions that were just created and finished the table sync but
haven't applied any changes, their remote_lsn will also be 0/0. Do we
need to report ERROR in this case ?

Best Regards,
Hou zj

Re: Support prepared statement invalidation when result types change

2023-09-12 Thread Jelte Fennema

When running the Postgres JDBC tests with this patchset I found dumb
mistake in my last patch where I didn't initialize the contents of
orig_params correctly. This new patchset  fixes that.


v4-0001-Support-prepared-statement-invalidation-when-resu.patch
Description: Binary data


v4-0002-Completely-remove-fixed_result-from-CachedPlanSou.patch
Description: Binary data


v4-0003-Support-changing-argument-types-of-prepared-state.patch
Description: Binary data

Document that PG_TRY block cannot have a return statement

2023-09-12 Thread Serpent

Hi,

I created a tiny patch that documents that the code block following
PG_TRY() cannot have any return statement.

Please CC me, as I'm not subscribed to this list.
commit 1968e53b9a649691fbedbdc059e2e933aa2b471b
Author: Serpent7776 
Date:   Tue Sep 12 14:38:09 2023 +0200

Document that PG_TRY block cannot have a return statement

Document the fact that the block of code that follows `PG_TRY()` cannot
have any form of return statement, because `PG_TRY()` modifies global
state that is later restored in `PG_CATCH()` or `PG_FINALLY()`.

diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
index 4a9562fdaa..f81b94b054 100644
--- a/src/include/utils/elog.h
+++ b/src/include/utils/elog.h
@@ -352,6 +352,8 @@ extern PGDLLIMPORT ErrorContextCallback *error_context_stack;
  * You cannot use both PG_CATCH() and PG_FINALLY() in the same
  * PG_TRY()/PG_END_TRY() block.
  *
+ * There cannot be a return statement in the block of code following PG_TRY().
+ *
  * Note: while the system will correctly propagate any new ereport(ERROR)
  * occurring in the recovery section, there is a small limit on the number
  * of levels this will work for.  It's best to keep the error recovery

Re: Possibility to disable `ALTER SYSTEM`

2023-09-12 Thread Martín Marqués

Hi,

> Maybe in addition to making "ALTER SYSTEM" throw an error, the feature that 
> disables it should also disable reading postgresql.auto.conf? Maybe even 
> delete it and make it an error if it is present on startup (maybe even warn 
> if it shows up while the DB is running?).

The outcome looked for is that the system GUCs that require a restart
or reload are not modified unless it's through some orchestration or
someone with physical access to the configuration files (yeah, we
still have the COPY PROGRAM).

We shouldn't mix this with not reading postgresql.auto.conf, or even
worse, deleting it. I don't think it's a good idea to delete the file.
Ignoring it might be of interest, but completely outside the scope of
the intention I'm seeing from the k8s teams.

> Counterpoint: maybe the idea is to disable ALTER SYSTEM but still use 
> postgresql.auto.conf, maintained by an external program, to control the 
> instance's behaviour.

I believe that's the idea, although we have `include` and
`include_dir` which can be used the same way as `postgresql.auto.conf`
is automatically included.

Kind regards, Martín

-- 
Martín Marqués
It’s not that I have something to hide,
it’s that I have nothing I want you to see

Re: Detoasting optionally to make Explain-Analyze less misleading

2023-09-12 Thread Matthias van de Meent

On Tue, 12 Sept 2023 at 12:56, stepan rutz  wrote:
>
> Hi,
>
> I have fallen into this trap and others have too. If you run
> EXPLAIN(ANALYZE) no de-toasting happens. This makes query-runtimes
> differ a lot. The bigger point is that the average user expects more
> from EXPLAIN(ANALYZE) than what it provides. This can be suprising. You
> can force detoasting during explain with explicit calls to length(), but
> that is tedious. Those of us who are forced to work using java stacks,
> orms and still store mostly documents fall into this trap sooner or
> later. I have already received some good feedback on this one, so this
> is an issue that bother quite a few people out there.

Yes, the lack of being able to see the impact of detoasting (amongst
others) in EXPLAIN (ANALYZE) can hide performance issues.

> It would be great to get some feedback on the subject and how to address
> this, maybe in totally different ways.

Hmm, maybe we should measure the overhead of serializing the tuples instead.
The difference between your patch and "serializing the tuples, but not
sending them" is that serializing also does the detoasting, but also
includes any time spent in the serialization functions of the type. So
an option "SERIALIZE" which measures all the time the server spent on
the query (except the final step of sending the bytes to the client)
would likely be more useful than "just" detoasting.

> 0001_explain_analyze_and_detoast.patch

I notice that this patch creates and destroys a memory context for
every tuple that the DestReceiver receives. I think that's quite
wasteful, as you should be able to create only one memory context and
reset it before (or after) each processed tuple. That also reduces the
differences in measurements between EXPLAIN and normal query
processing of the tuples - after all, we don't create new memory
contexts for every tuple in the normal DestRemote receiver either,
right?

Kind regards,

Matthias van de Meent

Re: Removing unneeded self joins

2023-09-12 Thread Andrey Lepikhov


On 5/7/2023 21:28, Andrey Lepikhov wrote:

Hi,

During the significant code revision in v.41 I lost some replacement 
operations. Here is the fix and extra tests to check this in the future.
Also, Tom added the JoinDomain structure five months ago, and I added 
code to replace relids for that place too.
One more thing, I found out that we didn't replace SJs, defined by 
baserestrictinfos if no one self-join clause have existed for the join. 
Now, it is fixed, and the test has been added.

To understand changes readily, see the delta file in the attachment.
Here is new patch in attachment. Rebased on current master and some 
minor gaffes are fixed.


--
regards,
Andrey Lepikhov
Postgres Professional
From 70bb5cf3d11b2797f1a9c7b04740435135229d29 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Tue, 12 Sep 2023 18:25:51 +0700
Subject: [PATCH] Remove self-joins.

Self Join Elimination (SJE) feature removes an inner join of a plain table to
itself in the query tree if is proved that the join can be replaced with
a scan without impact to the query result.
Self join and inner relation are replaced with the outer in query, equivalence
classes and planner info structures. Also, inner restrictlist moves to the
outer one with removing duplicated clauses. Thus, this optimization reduces
length of range table list (especially it make sense for partitioned relations),
reduces number of restriction clauses === selectivity estimations and
potentially can improve total planner prediction for the query.

The SJE proof based on innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
1. At most one inner row matches the join clause.
2. Each matched inner row must be (physically) the same row as the outer one.

In this patch we use the next approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Add to the list above baseretrictinfo of inner table.
3. Check innerrel_is_unique() for the qual list. If it returns false, skip this
pair of joining tables.
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove 
possibility
of the self-join elimination inner and outer clauses must have exact match.

Relation replacement procedure is not trivial and it is partly combined with 
the one,
used to remove useless left joins.

Tests, covering this feature, added to the join.sql.
Some regression tests changed due to self-join removal logic.
---
 doc/src/sgml/config.sgml  |   16 +
 src/backend/optimizer/path/indxpath.c |   38 +
 src/backend/optimizer/plan/analyzejoins.c | 1094 -
 src/backend/optimizer/plan/planmain.c |5 +
 src/backend/utils/misc/guc_tables.c   |   22 +
 src/include/optimizer/paths.h |3 +
 src/include/optimizer/planmain.h  |7 +
 src/test/regress/expected/equivclass.out  |   32 +
 src/test/regress/expected/join.out|  824 -
 src/test/regress/expected/sysviews.out|3 +-
 src/test/regress/expected/updatable_views.out |   17 +-
 src/test/regress/sql/equivclass.sql   |   16 +
 src/test/regress/sql/join.sql |  360 ++
 src/tools/pgindent/typedefs.list  |2 +
 14 files changed, 2375 insertions(+), 64 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 6bc1b215db..43c07b0d3b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5299,6 +5299,22 @@ ANY num_sync ( 
+  enable_self_join_removal (boolean)
+  
+   enable_self_join_removal configuration 
parameter
+  
+  
+  
+   
+   Enables or disables the query planner's optimization which analyses
+query tree and replaces self joins with semantically equivalent single
+scans. Take into consideration only plain tables.
+The default is on.
+   
+  
+ 
+
  
   enable_seqscan (boolean)
   
diff --git a/src/backend/optimizer/path/indxpath.c 
b/src/backend/optimizer/path/indxpath.c
index 6a93d767a5..508285d1ef 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3494,6 +3494,21 @@ bool
 relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
  List *restrictlist,
  List *exprlist, List 
*oprlist)
+{
+   return relation_has_unique_index_ext(root, rel, restrictlist,
+   
 exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * if extra_clauses isn't NULL, return baserestrictinfo clauses which were
+ * used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-12 Thread Hayato Kuroda (Fujitsu)

Dear Peter,

Thank you for reviewing! PSA new version.

> src/backend/replication/slot.c
> 
> 3. InvalidatePossiblyObsoleteSlot
> 
> + /*
> + * Raise an ERROR if the logical replication slot is invalidating. It
> + * would not happen because max_slot_wal_keep_size is set to -1 during
> + * the upgrade, but it stays safe.
> + */
> + if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
> + elog(ERROR, "Replication slots must not be invalidated during the 
> upgrade.");
> 
> 3a.
> That comment didn't seem good. I think you mean like in the suggestion below.
> 
> SUGGESTION
> It should not be possible for logical replication slots to be
> invalidated because max_slot_wal_keep_size is set to -1 during the
> upgrade. The following is just for sanity-checking.

This part was updated in v35. Please tell me if current version is still bad...

> 3b.
> I wasn't sure if 'max_slot_wal_keep_size' GUC is accessible in this
> scope, but if it is available then maybe
> Assert(max_slot_wal_keep_size_mb == -1); should also be included in
> this sanity check.

IIUC, guc parameters are visible from all the postgres processes.
Added.

> src/bin/pg_upgrade/check.c
> 
> 4. check_new_cluster_logical_replication_slots
> 
> + conn = connectToServer(_cluster, "template1");
> +
> + prep_status("Checking for logical replication slots");
> 
> There is some inconsistency with all the subsequent pg_fatals within
> this function -- some of them mention "New cluster" but most of them
> do not.
> 
> Meanwhile, Kuroda-san showed me sample output like:
> 
> Checking for presence of required libraries   ok
> Checking database user is the install userok
> Checking for prepared transactionsok
> Checking for new cluster tablespace directories   ok
> Checking for logical replication slots
> New cluster must not have logical replication slots but found 1 slot.
> Failure, exiting
> 
> So, I felt the log message title ("Checking...") should be changed to
> include the words "new cluster" just like the log preceding it:
> 
> "Checking for logical replication slots" ==> "Checking for new cluster
> logical replication slots"
> 
> Now all the subsequent pg_fatals clearly are for "new cluster"

Changed.

> 5. check_new_cluster_logical_replication_slots
> 
> + if (nslots_on_new)
> + pg_fatal(ngettext("New cluster must not have logical replication
> slots but found %d slot.",
> +   "New cluster must not have logical replication slots but found %d slots.",
> +   nslots_on_new),
> + nslots_on_new);
> 
> 5a.
> TBH, I didn't see why you go to unnecessary trouble to have a plural
> message here. The message could just be like:
> "New cluster must have 0 logical replication slots but found %d."
> 
> ~
> 
> 5b.
> However, now (from the previous review comment #4) if "New cluster" is
> already explicit in the log, the pg_fatal message can become just:
> "New cluster must have ..." ==> "Expected 0 logical replication slots
> but found %d."

Basically it's better. But the initial character should be lower case and period
is not needed. Modified like that.

> 9. get_old_cluster_logical_slot_infos
> 
> + i_slotname = PQfnumber(res, "slot_name");
> + i_plugin = PQfnumber(res, "plugin");
> + i_twophase = PQfnumber(res, "two_phase");
> + i_caught_up = PQfnumber(res, "caught_up");
> + i_invalid = PQfnumber(res, "conflicting");
> 
> IMO SQL should be using an alias for this column, so you can say:
> i_invalid = PQfnumber(res, "invalid")
> 
> which seems better than switching the wording in code.

Modified. The argument of PQfnumber() must be same as the column name, so the
word "as invalid" was added to SQL.

> src/bin/pg_upgrade/pg_upgrade.h
> 
> 10. LogicalSlotInfo
> 
> +typedef struct
> +{
> + char*slotname; /* slot name */
> + char*plugin; /* plugin */
> + bool two_phase; /* can the slot decode 2PC? */
> + bool caught_up; /* Is confirmed_flush_lsn the same as latest
> + * checkpoint LSN? */
> + bool invalid; /* Is the slot usable? */
> +} LogicalSlotInfo;
> 
> ~
> 
> + bool invalid; /* Is the slot usable? */
> This field name and comment have opposite meanings. Invalid means NOT usable.
> 
> SUGGESTION
> /* If true, the slot is unusable. */

Fixed.

> src/bin/pg_upgrade/server.c
> 
> 11. start_postmaster
> 
>   * we only modify the new cluster, so only use it there.  If there is a
>   * crash, the new cluster has to be recreated anyway.  fsync=off is a big
>   * win on ext4.
> + *
> + * Also, the max_slot_wal_keep_size is set to -1 to prevent the WAL removal
> + * required by logical slots. The setting could avoid the invalidation of
> + * slots during the upgrade.
>   */
> ~
> 
> IMO this comment "to prevent the WAL removal required by logical
> slots" is ambiguous about how it could be interpreted.  Needs
> rearranging for clarity.

The description was changed. How do you think?

> 12. start_postmaster
> 
>   (cluster == _cluster) ?
> - " -c synchronous_commit=off -c fsync=off -c

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-12 Thread Hayato Kuroda (Fujitsu)

Dear Peter,

Thank you for reviewing!

=
> Commit message
> 
> 1.
> Note that the pg_resetwal command would remove WAL files, which are required
> as
> restart_lsn. If WALs required by logical replication slots are removed, they 
> are
> unusable. Therefore, during the upgrade, slot restoration is done
> after the final
> pg_resetwal command. The workflow ensures that required WALs are remained.
> 
> ~
> 
> SUGGESTION (minor wording and /required as/required for/ and
> /remained/retained/)
> Note that the pg_resetwal command would remove WAL files, which are
> required for restart_lsn. If WALs required by logical replication
> slots are removed, the slots are unusable. Therefore, during the
> upgrade, slot restoration is done after the final pg_resetwal command.
> The workflow ensures that required WALs are retained.

Fixed.

> doc/src/sgml/ref/pgupgrade.sgml
> 
> 2.
> The SGML is mal-formed so I am unable to build PG DOCS. Please try
> building the docs before posting the patch.
> 
> ref/pgupgrade.sgml:446: parser error : Opening and ending tag
> mismatch: itemizedlist line 410 and listitem
>  
> ^

Fixed. Sorry for noise.

> 3.
> + 
> +  
> +   The new cluster must not have permanent logical replication slots, 
> i.e.,
> +   there are no slots whose
> +linkend="view-pg-replication-slots">pg_replication_slots.t
> emporary
> +   is false.
> +  
> + 
> 
> /there are no slots whose/there must be no slots where/

Fixed. 

> 4.
> or take a file system backup as the standbys are still synchronized
> -   with the primary.)  Replication slots are not copied and must
> -   be recreated.
> +   with the primary.) Only logical slots on the primary are migrated to 
> the
> +   new standby, and other slots on the old standby must be recreated as
> +   they are not copied.
>
> 
> Mixing the terms "migrated" and "copied" seems to complicate this.
> Does the following suggestion work better instead?
> 
> SUGGESTION (??)
> Only logical slots on the primary are migrated to the new standby. Any
> other slots present on the old standby must be recreated.

Hmm, I preferred to use "copied". How do you think?

> src/backend/replication/slot.c
> 
> 5. InvalidatePossiblyObsoleteSlot
> 
> + /*
> + * The logical replication slots shouldn't be invalidated as
> + * max_slot_wal_keep_size is set to -1 during the upgrade.
> + */
> + if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
> + elog(ERROR, "Replication slots must not be invalidated during the 
> upgrade.");
> +
> 
> I felt the comment could have another sentence like "The following is
> just a sanity check."

Added.

> src/bin/pg_upgrade/function.c
> 
> 6. get_loadable_libraries
> 
> + array_size = totaltups + count_old_cluster_logical_slots();
> + os_info.libraries = (LibraryInfo *) pg_malloc(sizeof(LibraryInfo) *
> (array_size));
>   totaltups = 0;
> 
> 6a.
> Maybe something like 'n_libinfos' would be a more meaningful name than
> 'array_size'?

Fixed. 

> 6b.
> + os_info.libraries = (LibraryInfo *) pg_malloc(sizeof(LibraryInfo) *
> (array_size));
> 
> Those extra parentheses around "(array_size)" seem overkill.

Removed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Server crash on RHEL 9/s390x platform against PG16

2023-09-12 Thread Suraj Kharage

Hi,

Found server crash on RHEL 9/s390x platform with below test case -

*Machine details:*







*[edb@9428da9d2137 postgres]$ cat /etc/redhat-release AlmaLinux release 9.2
(Turquoise Kodkod)[edb@9428da9d2137 postgres]$ lscpuArchitecture:
s390x  CPU op-mode(s):   32-bit, 64-bit  Address sizes:39 bits
physical, 48 bits virtual  Byte Order:   Big Endian*
*Configure command:*
./configure --prefix=/home/edb/postgres/ --with-lz4 --with-zstd --with-llvm
--with-perl --with-python --with-tcl --with-openssl --enable-nls
--with-libxml --with-libxslt --with-systemd --with-libcurl --without-icu
--enable-debug --enable-cassert --with-pgport=5414


*Test case:*
CREATE TABLE rm32044_t1
(
pkey   integer,
val  text
);
CREATE TABLE rm32044_t2
(
pkey   integer,
label  text,
hidden boolean
);
CREATE TABLE rm32044_t3
(
pkey integer,
val integer
);
CREATE TABLE rm32044_t4
(
pkey integer
);
insert into rm32044_t1 values ( 1 , 'row1');
insert into rm32044_t1 values ( 2 , 'row2');
insert into rm32044_t2 values ( 1 , 'hidden', true);
insert into rm32044_t2 values ( 2 , 'visible', false);
insert into rm32044_t3 values (1 , 1);
insert into rm32044_t3 values (2 , 1);

postgres=# SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON rm32044_t1.pkey
= rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON rm32044_t3.pkey =
rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.

*backtrace:*
[edb@9428da9d2137 postgres]$ gdb bin/postgres
data/qemu_postgres_20230911-140628_65620.core
Core was generated by `postgres: edb postgres [local] SELECT  '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x010a8366 in heap_compute_data_size
(tupleDesc=tupleDesc@entry=0x1ba3d10,
values=values@entry=0x1ba4168, isnull=isnull@entry=0x1ba41a8) at
heaptuple.c:227
227 VARATT_CAN_MAKE_SHORT(DatumGetPointer(val)))
[Current thread is 1 (LWP 65597)]
Missing separate debuginfos, use: dnf debuginfo-install
glibc-2.34-60.el9.s390x libcap-2.48-8.el9.s390x
libedit-3.1-37.20210216cvs.el9.s390x libffi-3.4.2-7.el9.s390x
libgcc-11.3.1-4.3.el9.alma.s390x libgcrypt-1.10.0-10.el9_2.s390x
libgpg-error-1.42-5.el9.s390x libstdc++-11.3.1-4.3.el9.alma.s390x
libxml2-2.9.13-3.el9_2.1.s390x libzstd-1.5.1-2.el9.s390x
llvm-libs-15.0.7-1.el9.s390x lz4-libs-1.9.3-5.el9.s390x
ncurses-libs-6.2-8.20210508.el9.s390x openssl-libs-3.0.7-17.el9_2.s390x
systemd-libs-252-14.el9_2.3.s390x xz-libs-5.2.5-8.el9_0.s390x
(gdb) bt
#0  0x010a8366 in heap_compute_data_size
(tupleDesc=tupleDesc@entry=0x1ba3d10,
values=values@entry=0x1ba4168, isnull=isnull@entry=0x1ba41a8) at
heaptuple.c:227
#1  0x010a9bb0 in heap_form_minimal_tuple
(tupleDescriptor=0x1ba3d10, values=0x1ba4168, isnull=0x1ba41a8) at
heaptuple.c:1484
#2  0x016553fa in ExecCopySlotMinimalTuple (slot=)
at ../../../../src/include/executor/tuptable.h:472
#3  tuplesort_puttupleslot (state=state@entry=0x1be4d18,
slot=slot@entry=0x1ba4120)
at tuplesortvariants.c:610
#4  0x012dc0e0 in ExecIncrementalSort (pstate=0x1acb4d8) at
nodeIncrementalSort.c:716
#5  0x012b32c6 in ExecProcNode (node=0x1acb4d8) at
../../../src/include/executor/executor.h:273
#6  ExecutePlan (execute_once=, dest=0x1ade698,
direction=, numberTuples=0, sendTuples=,
operation=CMD_SELECT, use_parallel_mode=,
planstate=0x1acb4d8, estate=0x1acb258) at execMain.c:1670
#7  standard_ExecutorRun (queryDesc=0x19ad338, direction=,
count=0, execute_once=) at execMain.c:365
#8  0x014a6ae2 in PortalRunSelect (portal=portal@entry=0x1a63558,
forward=forward@entry=true, count=0, count@entry=9223372036854775807,
dest=dest@entry=0x1ade698) at pquery.c:924
#9  0x014a84e0 in PortalRun (portal=portal@entry=0x1a63558,
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true,
run_once=run_once@entry=true, dest=dest@entry=0x1ade698, altdest=0x1ade698,
qc=0x40007ff7b0) at pquery.c:768
#10 0x014a3c1c in exec_simple_query (
query_string=0x19ea0e8 "SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2
ON rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON
rm32044_t3.pkey = rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden;")
at postgres.c:1274
#11 0x014a57aa in PostgresMain (dbname=,
username=) at postgres.c:4637
#12 0x013fdaf6 in BackendRun (port=0x1a132c0, port=0x1a132c0) at
postmaster.c:4464
#13 BackendStartup (port=0x1a132c0) at postmaster.c:4192
#14 ServerLoop () at postmaster.c:1782
#15 0x013fec34 in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x19a59a0)
at postmaster.c:1466
#16 0x01096faa in main (argc=, argv=0x19a59a0) at
main.c:198

(gdb) p val
$1 = 0
```

Does anybody have any idea about this?

-- 
--

Thanks & Regards,

Re: trying again to get incremental backup

2023-09-12 Thread Dilip Kumar

On Wed, Aug 30, 2023 at 9:20 PM Robert Haas  wrote:
>
> Meanwhile, here's a rebased set of patches. The somewhat-primitive
> attempts at writing tests are in 0009, but they don't work, for the
> reasons explained above. I think I'd probably like to go ahead and
> commit 0001 and 0002 soon if there are no objections, since I think
> those are good refactorings independently of the rest of this.
>

I have started reading the patch today, I haven't yet completed one
pass but here are my comments in 0007

1.

+ BlockNumber relative_block_numbers[RELSEG_SIZE];

This is close to 400kB of memory, so I think it is better we palloc it
instead of keeping it in the stack.

2.
  /*
  * Try to parse the directory name as an unsigned integer.
  *
- * Tablespace directories should be positive integers that can
- * be represented in 32 bits, with no leading zeroes or trailing
+ * Tablespace directories should be positive integers that can be
+ * represented in 32 bits, with no leading zeroes or trailing
  * garbage. If we come across a name that doesn't meet those
  * criteria, skip it.

Unrelated code refactoring hunk

3.
+typedef struct
+{
+ const char *filename;
+ pg_checksum_context *checksum_ctx;
+ bbsink*sink;
+ size_t bytes_sent;
+} FileChunkContext;

This structure is not used anywhere.

4.
+ * If the file is to be set incrementally, then num_incremental_blocks
+ * should be the number of blocks to be sent, and incremental_blocks

/If the file is to be set incrementally/If the file is to be sent incrementally

5.
- while (bytes_done < statbuf->st_size)
+ while (1)
  {
- size_t remaining = statbuf->st_size - bytes_done;
+ /*

I do not really like this change, because after removing this you have
put 2 independent checks for sending the full file[1] and sending it
incrementally[1].  Actually for sending incrementally
'statbuf->st_size' is computed from the 'num_incremental_blocks'
itself so why don't we keep this breaking condition in the while loop
itself?  So that we can avoid these two separate conditions.

[1]
+ /*
+ * If we've read the required number of bytes, then it's time to
+ * stop.
+ */
+ if (bytes_done >= statbuf->st_size)
+ break;

[2]
+ /*
+ * If we've read all the blocks, then it's time to stop.
+ */
+ if (ibindex >= num_incremental_blocks)
+ break;


6.
+typedef struct
+{
+ TimeLineID tli;
+ XLogRecPtr start_lsn;
+ XLogRecPtr end_lsn;
+} backup_wal_range;
+
+typedef struct
+{
+ uint32 status;
+ const char *path;
+ size_t size;
+} backup_file_entry;

Better we add some comments for these structures.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: persist logical slots to disk during shutdown checkpoint

2023-09-12 Thread Amit Kapila

On Tue, Sep 12, 2023 at 10:55 AM Michael Paquier  wrote:
>
> On Mon, Sep 11, 2023 at 02:49:49PM +0530, Amit Kapila wrote:
>
> Please see the v11 attached, that rewords all the places of the patch
> that need clarifications IMO.  I've found that the comment additions
> in CheckPointReplicationSlots() to be overcomplicated:
> - The key point to force a flush of a slot if its confirmed_lsn has
> moved ahead of the last LSN where it was saved is to make the follow
> up restart more responsive.
>

I don't think it will become more responsive in any way, not sure what
made you think like that. The key idea is that after restart we want
to ensure that all the WAL data up to the shutdown checkpoint record
is sent to downstream. As mentioned in the commit message, this will
help in ensuring that upgrades don't miss any data and then there is
another small advantage as mentioned in the commit message.

> - Not sure that there is any point to mention the other code paths in
> the tree where ReplicationSlotSave() can be called, and a slot can be
> saved in other processes than just WAL senders (like slot
> manipulations in normal backends, for one).  This was the last
> sentence in v10.
>

The point was the earlier sentence is no longer true and keeping it as
it is could be wrong or at least misleading. For example, earlier it
is okay to say, "This needn't actually be part of a checkpoint, ..."
but now that is no longer true as we want to invoke this at the time
of shutdown checkpoint for correctness. If we want to be precise, we
can say, "It is convenient to flush dirty replication slots at the
time of checkpoint. Additionally, .."

>
> +   if (s->data.invalidated == RS_INVAL_NONE &&
> +   s->data.confirmed_flush != s->last_saved_confirmed_flush)
>
> Actually this is incorrect, no?  Shouldn't we make sure that the
> confirmed_flush is strictly higher than the last saved LSN?
>

I can't see why it is incorrect. Do you see how (in what scenario) it
could go wrong? As per my understanding, confirmed_flush LSN will
always be greater than equal to last_saved_confirmed_flush but we
don't want to ensure that point here because we just want if the
latest value is not the same then we should mark the slot dirty and
flush it as that will be location we have ensured to update before
walsender shutdown. I think it is better to add an assert if you are
worried about any such case and we had thought of adding it as well
but then didn't do it because we don't have matching asserts to ensure
that we never assign prior LSN value to consfirmed_flush LSN.

+ /*
+ * LSN used to track the last confirmed_flush LSN where the slot's data
+ * has been flushed to disk.
+ */
+ XLogRecPtr last_saved_confirmed_flush;

I don't want to argue on such a point because it is a little bit of a
matter of personal choice but I find this comment unclear. It seems to
read that confirmed_flush LSN is some LSN position which is where we
flushed the slot's data and that is not true. I found the last comment
in the patch sent by me: "This value tracks the last confirmed_flush
LSN flushed which is used during a shutdown checkpoint to decide if
logical's slot data should be forcibly flushed or not." which I feel
we agreed upon is better.

-- 
With Regards,
Amit Kapila.

Detoasting optionally to make Explain-Analyze less misleading

2023-09-12 Thread stepan rutz


Hi,

I have fallen into this trap and others have too. If you run
EXPLAIN(ANALYZE) no de-toasting happens. This makes query-runtimes
differ a lot. The bigger point is that the average user expects more
from EXPLAIN(ANALYZE) than what it provides. This can be suprising. You
can force detoasting during explain with explicit calls to length(), but
that is tedious. Those of us who are forced to work using java stacks,
orms and still store mostly documents fall into this trap sooner or
later. I have already received some good feedback on this one, so this
is an issue that bother quite a few people out there.

Attached is a patch for addressing the issue in form of adding another
parameter to explain. I don't know if that is a good idea, but I got
some feedback that a solution to this problem would be appreciated by
some people out there. It would also be nice to reflect the detoasting
in the "buffers" option of explain as well. The change for detoasting is
only a few lines though.

So the idea was to allow this

EXPLAIN (ANALYZE, DETOAST) SELECT * FROM sometable;

and perform the detoasting step additionally during the explain. This
just gives a more realistic runtime and by playing around with the
parameter and comparing the execution-times of the query one even gets
an impression about the detoasting cost involved in a query. Since the
parameter is purely optional, it would not affect any existing measures.

It is not uncommon that the runtime of explain-analyze is way
unrealistic in the real world, where people use PostgreSQL to store
larger and larger documents inside tables and not using Large-Objects.


Here is a video of the effect (in an exagerated form):
https://www.stepanrutz.com/short_detoast_subtitles.mp4

It would be great to get some feedback on the subject and how to address
this, maybe in totally different ways.

Greetings from cologne, Stepan


Stepan Rutz - IT Consultant, Cologne Germany, stepan.rutz AT gmx.de
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 8570b14f62..50859e6a1e 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -47,6 +47,8 @@ ExplainOneQuery_hook_type ExplainOneQuery_hook = NULL;
 /* Hook for plugins to get control in explain_get_index_name() */
 explain_get_index_name_hook_type explain_get_index_name_hook = NULL;
 
+/* Forward decl for detoasting analyze */
+static const DestReceiver detoastAnalyzeDR;
 
 /* OR-able flags for ExplainXMLTag() */
 #define X_OPENING 0
@@ -192,6 +194,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 			es->settings = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "generic_plan") == 0)
 			es->generic = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "detoast") == 0)
+			es->detoast = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "timing") == 0)
 		{
 			timing_set = true;
@@ -244,6 +248,12 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  errmsg("EXPLAIN option TIMING requires ANALYZE")));
 
+	/* check that detoast is used with EXPLAIN ANALYZE */
+	if (es->detoast && !es->analyze)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("EXPLAIN option DETOAST requires ANALYZE")));
+
 	/* check that GENERIC_PLAN is not used with EXPLAIN ANALYZE */
 	if (es->generic && es->analyze)
 		ereport(ERROR,
@@ -565,9 +575,13 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 	/*
 	 * Normally we discard the query's output, but if explaining CREATE TABLE
 	 * AS, we'd better use the appropriate tuple receiver.
+	 * Also if we choose to detoast during explain, we need to send the 
+	 * tuples to a receiver that performs the detoasting
 	 */
 	if (into)
 		dest = CreateIntoRelDestReceiver(into);
+	else if (es->analyze && es->detoast)
+		dest = 
 	else
 		dest = None_Receiver;
 
@@ -5052,3 +5066,98 @@ escape_yaml(StringInfo buf, const char *str)
 {
 	escape_json(buf, str);
 }
+
+
+/* 
+ *		detoasting DestReceiver functions
+ * 
+ */
+/* the idea is that running  EXPLAIN (ANALYZE)  sometimes gives results that 
+ * are way unrealistic, since during explain there is no detoasting of 
+ * values. That means you get no idea what your actual query time is going
+ * to be, nor what the actual overhead of detoasting is.
+ * 
+ * if you run EXPLAIN (ANAYLZE, DETOAST) then you get an impression of
+ * what is actually happening when you use postgresql as a document-database
+ * or have large values for some other reason.
+ */
+
+/* this method receives the tuples/records during explain (analyze, detoast)
+ * and detoasts them.
+ * this method is otherwise not called and its purpose is to detoast to
+ * make analyze optionally more realistic
+ */
+static bool
+detoastAnalyzeReceive(TupleTableSlot *slot, DestReceiver *self)
+{
+	MemoryContext cxt;
+	MemoryContext oldcxt;
+	int i;
+	bool shouldFree = true;
+	TupleDesc tupdesc =

Re: CHECK Constraint Deferrable

2023-09-12 Thread vignesh C

On Thu, 7 Sept 2023 at 17:26, Himanshu Upadhyaya
 wrote:
>
> Attached is v2 of the patch, rebased against the latest HEAD.

Thanks for working on this, few comments:
1) "CREATE TABLE check_constr_tbl (i int CHECK(i<>0) DEFERRABLE, t
text)" is crashing in windows, the same was noticed in CFBot too:
2023-09-11 08:11:36.585 UTC [58563][client backend]
[pg_regress/constraints][13/880:0] LOG:  statement: CREATE TABLE
check_constr_tbl (i int CHECK(i<>0) DEFERRABLE, t text);
2023-09-11 08:11:36.586 UTC [58560][client backend]
[pg_regress/inherit][15/391:0] LOG:  statement: drop table c1;
../src/backend/commands/trigger.c:220:26: runtime error: member access
within null pointer of type 'struct CreateTrigStmt'
==58563==Using libbacktrace symbolizer.

The details of CFBot failure can be seen at [1]

2) Alter of check constraint deferrable is not handled, is this intentional?
CREATE TABLE check_constr_tbl (i int CHECK(i<>0) DEFERRABLE, t text);
postgres=# alter table check_constr_tbl alter constraint
check_constr_tbl_i_check not deferrable;
ERROR:  constraint "check_constr_tbl_i_check" of relation
"check_constr_tbl" is not a foreign key constraint

3) Should we handle this scenario for domains too:
CREATE DOMAIN c1_check AS INT CHECK(VALUE > 10);
create table test(c1 c1_check);
alter domain c1_check ADD check (VALUE > 20) DEFERRABLE INITIALLY DEFERRED;

begin;
-- should this be deffered
insert into test values(19);
ERROR:  value for domain c1_check violates check constraint "c1_check_check1"

4) There is one warning:
heap.c: In function ‘StoreRelCheck’:
heap.c:2178:24: warning: implicit declaration of function
‘CreateTrigger’ [-Wimplicit-function-declaration]
 2178 | (void) CreateTrigger(trigger, NULL,
RelationGetRelid(rel),
  |^

5) This should be added to typedefs.list file:
+typedef enum checkConstraintRecheck
+{
+   CHECK_RECHECK_DISABLED, /* Recheck of CHECK constraint
is disabled, so
+*
DEFERRED CHECK constraint will be
+*
considered as non-deferrable check
+*
constraint.  */
+   CHECK_RECHECK_ENABLED,  /* Recheck of CHECK constraint
is enabled, so
+*
CHECK constraint will be validated but
+*
error will not be reported for deferred
+*
CHECK constraint. */
+   CHECK_RECHECK_EXISTING  /* Recheck of existing violated CHECK
+*
constraint, indicates that this is a
+*
deferred recheck of a row that was reported
+* as
a potential violation of CHECK
+* CONSTRAINT */
+}  checkConstraintRecheck;

[1] - 
https://api.cirrus-ci.com/v1/artifact/task/4855966353588224/testrun/build-32/testrun/pg_upgrade/002_pg_upgrade/log/002_pg_upgrade_old_node.log

Regards,
Vignesh

Re: Infinite Interval

2023-09-12 Thread Dean Rasheed

On Thu, 24 Aug 2023 at 14:51, Ashutosh Bapat
 wrote:
>
> The patches still apply. But here's a rebased version with one white
> space error fixed. Also ran pgindent.
>

This needs another rebase, and it looks like the infinite interval
input code is broken.

I took a quick look, and had a couple of other review comments:

1). In interval_mul(), I think "result_sign" would be a more accurate
name than "result_is_inf" for the local variable.

2). interval_accum() and interval_accum_inv() don't work correctly
with infinite intervals. To make them work, they need to count the
number of infinities seen, to allow them to be subtracted off by the
inverse function (similar to the code in numeric.c, except for the
NaN-handling, which will need to be different). Consider, for example:

SELECT x, avg(x) OVER(ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING)
  FROM (VALUES ('1 day'::interval),
   ('3 days'::interval),
   ('infinity'::timestamptz - now()),
   ('4 days'::interval),
   ('6 days'::interval)) v(x);
ERROR:  interval out of range

as compared to:

SELECT x, avg(x) OVER(ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING)
  FROM (VALUES (1::numeric),
   (3::numeric),
   ('infinity'::numeric),
   (4::numeric),
   (6::numeric)) v(x);

x |avg
--+
1 | 2.
3 |   Infinity
 Infinity |   Infinity
4 | 5.
6 | 6.
(5 rows)

Regards,
Dean

Re: SQL:2011 application time

2023-09-12 Thread jian he

hi. some trivial issue:

in src/backend/catalog/index.c
/* * System attributes are never null, so no need to check. */
if (attnum <= 0)

since you already checked attnum == 0
so here you can just attnum < 0?
-
ERROR:  column "valid_at" named in WITHOUT OVERLAPS is not a range type

IMHO, "named" is unnecessary.
-
doc/src/sgml/catalogs.sgml
pg_constraint adds another attribute (column): contemporal, seems no doc entry.

also the temporal in oxford definition is "relating to time", here we
can deal with range.
So maybe  "temporal" is not that accurate?

Re: Add 'worker_type' to pg_stat_subscription

2023-09-12 Thread Peter Smith

On Tue, Sep 12, 2023 at 1:44 PM Michael Paquier  wrote:
>
> On Tue, Sep 12, 2023 at 01:07:51PM +1000, Peter Smith wrote:
> > The type is only assigned during worker process launch, and during
> > process cleanup [1]. It's only possible to be UNKNOWN after
> > logicalrep_worker_cleanup().
> >
> > AFAIK the stats can never see a worker with an UNKNOWN type, although
> > it was due to excessive caution against something unforeseen that my
> > original code did below instead of the elog.
> >
> > + case WORKERTYPE_UNKNOWN: /* should not be possible */
> > + nulls[9] = true;
> >
> > Adding "unknown" for something that is supposed to be impossible might
> > be slight overkill, but so long as there is no obligation to write
> > about "unknown" in the PG DOCS then I agree it is probably better to
> > do that,
>
> Using an elog() is OK IMO.  pg_stat_get_subscription() holds
> LogicalRepWorkerLock in shared mode, and the only code path setting a
> worker to WORKERTYPE_UNKNOWN requires that this same LWLock is hold in
> exclusive mode while resetting all the shmem fields of the
> subscription entry cleaned up, which is what
> pg_stat_get_subscription() uses to check if a subscription should be
> included in its SRF.
>
> Shouldn't this patch add or tweak some SQL queries in
> src/test/subscription/ to show some worker types, at least?

Right. I found just a single test currently using pg_stat_subscription
catalog. I added a worker_type check for that.

PSA v5

--
Kind Regards,
Peter Smith.
Fujitsu Australia


v5-0001-add-worker-type-to-pg_stat_subscription.patch
Description: Binary data

RE: pg_upgrade and logical replication

2023-09-12 Thread Hayato Kuroda (Fujitsu)

Dear Vignesh,

Thank you for updating the patch! Here are some comments.

Sorry if there are duplicate comments - the thread revived recently so I might
lose my memory.

01. General

Is there a possibility that apply worker on old cluster connects to the
publisher during the upgrade? Regarding the pg_upgrade on publisher, the we
refuse TCP/IP connections from remotes and port number is also changed, so we 
can
assume that subscriber does not connect to. But IIUC such settings may not 
affect
to the connection source, so that the apply worker may try to connect to the
publisher. Also, is there any hazards if it happens?

02. Upgrade functions

Two functions - binary_upgrade_create_sub_rel_state and 
binary_upgrade_sub_replication_origin_advance
should be located at pg_upgrade_support.c. Also, CHECK_IS_BINARY_UPGRADE() macro
can be used.

03. Parameter combinations

IIUC getSubscriptionTables() should be exitted quickly if --no-subscriptions is
specified, whereas binary_upgrade_create_sub_rel_state() is failed.


04. I failed my test

I executed attached script but failed to upgrade:

```
Restoring database schemas in the new cluster 
  postgres
*failure*

Consult the last few lines of 
"data_N3/pg_upgrade_output.d/20230912T054546.320/log/pg_upgrade_dump_5.log" for
the probable cause of the failure.
Failure, exiting
```

I checked the log and found that binary_upgrade_create_sub_rel_state() does not
support skipping the fourth argument:

```
pg_restore: from TOC entry 4059; 16384 16387 SUBSCRIPTION TABLE sub sub postgres
pg_restore: error: could not execute query: ERROR:  function 
binary_upgrade_create_sub_rel_state(unknown, integer, unknown) does not exist
LINE 1: SELECT binary_upgrade_create_sub_rel_state('sub', 16384, 'r'...
   ^
HINT:  No function matches the given name and argument types. You might need to 
add explicit type casts.
Command was: SELECT binary_upgrade_create_sub_rel_state('sub', 16384, 'r');
```

IIUC if we allow to skip arguments, we must define wrappers like 
pg_copy_logical_replication_slot_*.
Another approach is that pg_dump always dumps srsublsn even if it is NULL.


Best Regards,
Hayato Kuroda
FUJITSU LIMITED



test.sh
Description: test.sh

Re: Cleaning up array_in()

2023-09-12 Thread jian he

On Mon, Sep 11, 2023 at 8:00 PM Alexander Lakhin  wrote:
>
> I can confirm that all those anomalies are fixed now.
> But new version brings a warning when compiled with gcc:
> arrayfuncs.c:659:9: warning: variable 'prev_tok' is uninitialized when used 
> here [-Wuninitialized]
>  if (prev_tok == ATOK_DELIM || nest_level == 
> 0)
>  ^~~~
> arrayfuncs.c:628:3: note: variable 'prev_tok' is declared here
>  ArrayToken  prev_tok;
>  ^
> 1 warning generated.
>
> Also it looks like an updated comment needs fixing/improving:
>   /* No array dimensions, so first literal character should be oepn 
> curl-braces */
> (should be an opening brace?)
>

fixed these 2 issues.
--query
SELECT ('{ ' || string_agg(chr((ascii('B') + round(random() * 25)) ::
integer),', ') || ' }')::text[]
FROM generate_series(1,1e6) \watch i=0.1 c=1

After applying the patch, the above query runs slightly faster.
From 332a8a7b65cf26f75f729dafe3446df5c8cba19e Mon Sep 17 00:00:00 2001
From: Tom Lane 
Date: Tue, 4 Jul 2023 14:07:31 -0400
Subject: [PATCH v7 3/7] Re-indent ArrayCount().

This cleans up after the removal of the "while (!itemdone)" loop.
---
 src/backend/utils/adt/arrayfuncs.c | 214 ++---
 1 file changed, 106 insertions(+), 108 deletions(-)

diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index 59bb0614..d1b5d48b 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -477,21 +477,21 @@ ArrayCount(const char *str, int *dim, char typdelim, Node *escontext)
 	{
 		bool		new_element = false;
 
-			switch (*ptr)
-			{
-case '\0':
-	/* Signal a premature end of the string */
-	ereturn(escontext, -1,
-			(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-			 errmsg("malformed array literal: \"%s\"", str),
-			 errdetail("Unexpected end of input.")));
-case '\\':
+		switch (*ptr)
+		{
+			case '\0':
+/* Signal a premature end of the string */
+ereturn(escontext, -1,
+		(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+		 errmsg("malformed array literal: \"%s\"", str),
+		 errdetail("Unexpected end of input.")));
+			case '\\':
 
-	/*
-	 * An escape must be after a level start, after an element
-	 * start, or after an element delimiter. In any case we
-	 * now must be past an element start.
-	 */
+/*
+ * An escape must be after a level start, after an element
+ * start, or after an element delimiter. In any case we now
+ * must be past an element start.
+ */
 switch (parse_state)
 {
 	case ARRAY_LEVEL_STARTED:
@@ -511,22 +511,22 @@ ArrayCount(const char *str, int *dim, char typdelim, Node *escontext)
  errdetail("Unexpected \"%c\" character.",
 		   '\\')));
 }
-	/* skip the escaped character */
-	if (*(ptr + 1))
-		ptr++;
-	else
-		ereturn(escontext, -1,
-(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
- errmsg("malformed array literal: \"%s\"", str),
- errdetail("Unexpected end of input.")));
-	break;
-case '"':
+/* skip the escaped character */
+if (*(ptr + 1))
+	ptr++;
+else
+	ereturn(escontext, -1,
+			(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+			 errmsg("malformed array literal: \"%s\"", str),
+			 errdetail("Unexpected end of input.")));
+break;
+			case '"':
 
-	/*
-	 * A quote must be after a level start, after a quoted
-	 * element start, or after an element delimiter. In any
-	 * case we now must be past an element start.
-	 */
+/*
+ * A quote must be after a level start, after a quoted element
+ * start, or after an element delimiter. In any case we now
+ * must be past an element start.
+ */
 switch (parse_state)
 {
 	case ARRAY_LEVEL_STARTED:
@@ -549,17 +549,16 @@ ArrayCount(const char *str, int *dim, char typdelim, Node *escontext)
  errmsg("malformed array literal: \"%s\"", str),
  errdetail("Unexpected array element.")));
 }
-	break;
-case '{':
-	if (!in_quotes)
-	{
-		/*
-		 * A left brace can occur if no nesting has occurred
-		 * yet, after a level start, or after a level
-		 * delimiter.  If we see one after an element
-		 * delimiter, we have something like "{1,{2}}", so
-		 * complain about non-rectangularity.
-		 */
+break;
+			case '{':
+if (!in_quotes)
+{
+	/*
+	 * A left brace can occur if no nesting has occurred yet,
+	 * after a level start, or after a level delimiter.  If we
+	 * see one after an element delimiter, we have something
+	 * like "{1,{2}}", so complain about non-rectangularity.
+	 */
 	switch (parse_state)
 	{
 		case ARRAY_NO_LEVEL:
@@ -579,19 +578,19 @@ ArrayCount(const char *str, int *dim, char typdelim, Node *escontext)

Re: Row pattern recognition

2023-09-12 Thread Tatsuo Ishii

Regarding v6 patch:

> SELECT company, tdate, price,
>  first_value(price) OVER w,
>  last_value(price) OVER w,
>  max(price) OVER w,
>  min(price) OVER w,
>  sum(price) OVER w,
>  avg(price) OVER w,
>  count(price) OVER w
> FROM stock
> WINDOW w AS (
> PARTITION BY company
> ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> AFTER MATCH SKIP PAST LAST ROW
> INITIAL
> PATTERN (START UP+ DOWN+)
> DEFINE
> START AS TRUE,
> UP AS price > PREV(price),
> DOWN AS price < PREV(price)
> );
>  company  |   tdate| price | first_value | last_value | max  | min | sum  
> |  avg  | count 
> --++---+-++--+-+--+---+---
>  company1 | 07-01-2023 |   100 | 100 |140 |  200 | 100 |  590 
> |  147.5000 | 4
>  company1 | 07-02-2023 |   200 | ||  | |  
> |   |  
>  company1 | 07-03-2023 |   150 | ||  | |  
> |   |  
>  company1 | 07-04-2023 |   140 | ||  | |  
> |   |  
>  company1 | 07-05-2023 |   150 | ||  | |  
> |   |  
>  company1 | 07-06-2023 |90 |  90 |120 |  130 |  90 |  450 
> |  112.5000 | 4
>  company1 | 07-07-2023 |   110 | ||  | |  
> |   |  
>  company1 | 07-08-2023 |   130 | ||  | |  
> |   |  
>  company1 | 07-09-2023 |   120 | ||  | |  
> |   |  
>  company1 | 07-10-2023 |   130 | ||  | |  
> |   |  
>  company2 | 07-01-2023 |50 |  50 |   1400 | 2000 |  50 | 4950 
> | 1237.5000 | 4
>  company2 | 07-02-2023 |  2000 | ||  | |  
> |   |  
>  company2 | 07-03-2023 |  1500 | ||  | |  
> |   |  
>  company2 | 07-04-2023 |  1400 | ||  | |  
> |   |  
>  company2 | 07-05-2023 |  1500 | ||  | |  
> |   |  
>  company2 | 07-06-2023 |60 |  60 |   1200 | 1300 |  60 | 3660 
> |  915. | 4
>  company2 | 07-07-2023 |  1100 | ||  | |  
> |   |  
>  company2 | 07-08-2023 |  1300 | ||  | |  
> |   |  
>  company2 | 07-09-2023 |  1200 | ||  | |  
> |   |  
>  company2 | 07-10-2023 |  1300 | ||  | |  
> |   |  
> (20 rows)

count column for unmatched rows should have been 0, rather than
NULL. i.e.

 company  |   tdate| price | first_value | last_value | max  | min | sum  | 
 avg  | count 
--++---+-++--+-+--+---+---
 company1 | 07-01-2023 |   100 | 100 |140 |  200 | 100 |  590 | 
 147.5000 | 4
 company1 | 07-02-2023 |   200 | ||  | |  | 
  |  
 company1 | 07-03-2023 |   150 | ||  | |  | 
  |  
 company1 | 07-04-2023 |   140 | ||  | |  | 
  |  
 company1 | 07-05-2023 |   150 | ||  | |  | 
  | 0
 company1 | 07-06-2023 |90 |  90 |120 |  130 |  90 |  450 | 
 112.5000 | 4
 company1 | 07-07-2023 |   110 | ||  | |  | 
  |  
 company1 | 07-08-2023 |   130 | ||  | |  | 
  |  
 company1 | 07-09-2023 |   120 | ||  | |  | 
  |  
 company1 | 07-10-2023 |   130 | ||  | |  | 
  | 0
 company2 | 07-01-2023 |50 |  50 |   1400 | 2000 |  50 | 4950 | 
1237.5000 | 4
 company2 | 07-02-2023 |  2000 | ||  | |  | 
  |  
 company2 | 07-03-2023 |  1500 | ||  | |  | 
  |  
 company2 | 07-04-2023 |  1400 | ||  | |  | 
  |  
 company2 | 07-05-2023 |  1500 | ||  | |  | 
  | 0

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-12 Thread Hayato Kuroda (Fujitsu)

Dear Michael,

> On Tue, Sep 12, 2023 at 02:33:25AM +, Zhijie Hou (Fujitsu) wrote:
> > 2.
> > +   if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
> > +   elog(ERROR, "Replication slots must not be invalidated
> during the upgrade.");
> >
> > I think normally the first letter is lowercase, and we can avoid the period.
> 
> Documentation is your friend:
> https://www.postgresql.org/docs/current/error-style-guide.html

Thank you for the information! It is quite helpful for me.
(Some fatal errors started with capital character like "Your installation 
contains...",
but I regarded them as the detail or hint message.)

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: Create shorthand for including all extra tests

2023-09-12 Thread Peter Eisentraut


On 04.09.23 22:30, Tom Lane wrote:

Noah Misch  writes:

On Mon, Sep 04, 2023 at 08:16:44PM +0200, Daniel Gustafsson wrote:

On 4 Sep 2023, at 17:01, Tom Lane  wrote:

I think this is a seriously bad idea.  The entire point of not including
certain tests in check-world by default is that the omitted tests are
security hazards, so a developer or buildfarm owner should review each
one before deciding whether to activate it on their machine.



Other than PG_TEST_EXTRA=wal_consistency_checking, they have the same hazard:
they treat the loopback interface as private, so anyone with access to
loopback interface ports can hijack the test.  I'd be fine with e.g.
PG_TEST_EXTRA=private-lo activating all of those.  We don't gain by inviting
the tester to review the tests to rediscover this common factor.


Yeah, I could live with something like that from the security standpoint.
Not sure if it helps Nazir's use-case though.  Maybe we could invent
categories that can be used in place of individual test names?
For now,

PG_TEST_EXTRA="needs-private-lo slow"

would cover the territory of "all", and I think it'd be very seldom
that we'd have to invent new categories here (though maybe I lack
imagination today).


At least the kerberos tests also appear to require a lot of randomness 
for their setup, and sometimes in VM environments they hang for minutes 
until they get that.  I suppose that would go under "slow".


Also, at least in my mind, when we added the kerberos and ldap tests, a 
partial reason for excluding them from the default run was "requires 
additional unusual software to be installed".  The additional kerberos 
and ldap server software used in those tests is not covered by 
configure/meson, so it's a bit more DIY.

Re: psql - pager support - using invisible chars for signalling end of report

2023-09-12 Thread Peter Eisentraut


On 06.09.23 05:07, Thomas Munro wrote:

This sounds better than the QUERY_SEPARATOR hack from commit
664d757531e, and similar kludges elsewhere.  I think Pavel and David
are right about NUL being impossible in psql query output, no?


Note:

  -z, --field-separator-zero
   set field separator for unaligned output to 
zero byte

  -0, --record-separator-zero
   set record separator for unaligned output to 
zero byte

Re: remaining sql/json patches

2023-09-12 Thread Peter Eisentraut


On 06.09.23 17:01, Alvaro Herrera wrote:

Assert()ing that a pointer is not null, and in the next line
dereferencing that pointer, is useless: the process would crash anyway
at the time of dereference, so the Assert() adds no value.  Better to
leave the assert out.


I don't think this is quite correct.  If you dereference a pointer, the 
compiler may assume that it is not null and rearrange code accordingly. 
So it might not crash.  Keeping the assertion would alter that assumption.

Re: proposal: psql: show current user in prompt

2023-09-12 Thread Peter Eisentraut


On 11.09.23 13:59, Jelte Fennema wrote:

@Tom and @Robert, since you originally suggested extending the
protocol for this, I think some input from you on the protocol design
would be quite helpful. BTW, this protocol extension is the main
reason I personally care for this patch, because it would allow
PgBouncer to ask for updates on certain GUCs so that it can preserve
session level SET commands even in transaction pooling mode.
Right now PgBouncer can only do this for a handful of GUCs, but
there's quite a few others that are useful for PgBouncer to preserve
by default:
- search_path
- statement_timeout
- lock_timeout


ISTM that for a purpose like pgbouncer, it would be simpler to add a new 
GUC "report these variables" and send that in the startup message?  That 
might not help with the psql use case, but it would be much simpler.

Re: Adding a pg_get_owned_sequence function?

2023-09-12 Thread Peter Eisentraut


On 09.06.23 21:19, Dagfinn Ilmari Mannsåker wrote:

I've always been annoyed by the fact that pg_get_serial_sequence takes
the table and returns the sequence as strings rather than regclass. And
since identity columns were added, the name is misleading as well (which
is even acknowledged in the docs, together with a suggestion for a
better name).


If you are striving for less misleading terminology, note that the 
concept of an "owned sequence" does not exist for users of identity 
sequences, and ALTER SEQUENCE / OWNED BY cannot be used for such sequences.


Would it work to just overload pg_get_serial_sequence with regclass 
argument types?

Re: [ psql - review request ] review request for \d+ tablename, \d+ indexname indenting

2023-09-12 Thread 쿼리트릭스

Thank you for letting me know more about the test method.
As you said, we applied the patch using git diff and created a test case on
the src/test/regress/sql.
Considering your question, we think it is enough to assume just one
subpartition level.
Because, Concidering the common partition configuration methods, we think
it is  rare case to configure subpartitions contains subpartitions.
So, we think it would be appropriate to mark up to level 1 of the
subpartition when using \d+.
If there subpartitions contains subpartitions, the keyword 'CONTAINS
SUBPARTITIONS' is added next to the partition name to indicate that the
subpartitions contains subpartitions exists.
These sources were tested on 14.5, 15.2 and 16 RC versions, respectively.
If you have any other opinions on this, please let us know. we will
actively consider it.

Team Query Tricks
---
querytricks2023.gmail.com
Query Tricks (github.com) 



2023년 8월 26일 (토) 오전 6:01, Cary Huang 님이 작성:

> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, failed
> Implements feature:   not tested
> Spec compliant:   not tested
> Documentation:not tested
>
> Hello
>
> Thank you for the patch and the effort to enhance \d+ 's output on
> partitioned tables that contain sub-partitions. However, the patch does not
> apply and I notice that this patch is generated as a differ file from 2
> files, describe.c and describe_change.c. You should use git diff to
> generate a patch rather than maintaining 2 files yourself. Also I noticed
> that you include a "create_object.sql" file to illustrate the feature,
> which is not necessary. Instead, you should add them as a regression test
> cases in the existing regression test suite under "src/test/regress", so
> these will get run as tests to illustrate the feature. This patch changes
> the output of \d+ and it could potentially break other test cases so you
> should fix them in the patch in addition to providing the feature
>
> Now, regarding the feature, I see that you intent to print the sub
> partitions' partitions in the output, which is okay in my opinion. However,
> a sub-partition can also contain another sub-partition, which contains
> another sub-partition and so on. So it is possible that sub-partitions can
> span very, very deep. Your example assumes only 1 level of sub-partitions.
> Are you going to print all of them out in \d+? If so, it would definitely
> cluster the output so much that it starts to become annoying. Are you
> planning to set a limit on how many levels of sub-partitions to print or
> just let it print as many as it needs?
>
> thank you
>
> Cary Huang
> ---
> Highgo Software Canada
> www.highgo.ca


subpartition_indentation.diff
Description: Binary data

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-12 Thread Hayato Kuroda (Fujitsu)

Dear Peter,

Thank you for reviewing! Before posting new patch set, I want to respond some
comments.

> 
> ==
> 1. GENERAL -- Cluster Terminology
> 
> This is not really a problem of your patch, but during message review,
> I noticed the terms old/new cluster VERSUS source/target cluster and
> both were used many times:
> 
> For example.
> ".*new clusmter --> 44 occurences
> ".*old cluster --> 21 occurences
> ".*source cluster --> 6 occurences
> ".*target cluster --> 12 occurences
> 
> Perhaps there should be a new thread/patch to use consistent terms.
> 
> Thoughts?

I preferred the term new/old because I could not found the term source/target
in the documentation for the pg_upgrade. (IIUC I used new/old in my patch).
Anyway, it should be discussed in another thread.

> 2. GENERAL - Error message cases
> 
> Just FYI, there are many inconsistent capitalising in these patch
> messages, but then the same is also true for the HEAD code. It's a bit
> messy, but generally, I think your capitalisation was aligned with
> what I saw in HEAD, so I didn't comment anywhere about it.

Yeah, the rule is broken even in HEAD. I determined a rule in [1], which seems
consistent with other parts in the file.
Michael kindly told the error message formatting [2], and basically it follows 
the
style. (IIUC pg_fatal("Your installation...") is followed the
"Detail and hint messages" rule.)

> ==
> src/bin/pg_upgrade/info.c
> 
> 7. get_db_rel_and_slot_infos
> 
> void
> get_db_rel_and_slot_infos(ClusterInfo *cluster)
> {
> int dbnum;
> 
> if (cluster->dbarr.dbs != NULL)
> free_db_and_rel_infos(>dbarr);
> 
> ~
> 
> Judging from the HEAD code this function was intended to be reentrant
> -- e.g. it does cleanup code free_db_and_rel_infos in case there was
> something there from before.
> 
> IIUC there is no such cleanup for the slot_arr. I forget why this was
> removed. Sure, you might be able to survive the memory leaks, but
> choosing NOT to clean up the slot_arr seems to contradict the
> intention of HEAD calling free_db_and_rel_infos.

free_db_and_rel_infos() is called if get_db_rel_and_slot_infos() is called
several times for the same cluster. Followings are callers: 

* check_and_dump_old_cluster(), target is old_cluster
* check_new_cluster(), target is new_cluster
* create_new_objects(), target is new_cluster

And we requires that new_cluster must not have logical slots, this restriction
cannot ease. Therefore, there are no possibilities slot_arr must be free()'d,
so that I removed (See similar discussion [3]). I think we should not add no-op 
codes.
In old version there was an Assert() instead, but removed based on the comment 
[4].

> 8. get_db_infos
> 
> I noticed the pg_malloc0 is reverted in this function.
> 
> - dbinfos = (DbInfo *) pg_malloc(sizeof(DbInfo) * ntups);
> + dbinfos = (DbInfo *) pg_malloc0(sizeof(DbInfo) * ntups);
> 
> IMO it is better to do pg_malloc0 here.
> 
> Sure, everything probably works OK for the current code,

Yes, it works well. No one checks slot_arr before
get_old_cluster_logical_slot_infos(). In the old version, it was checked like
(slot_arr == NULL) infree_db_and_rel_infos(), but removed.

> but it seems
> unnecessarily risky to assume that functions will forever be called in
> a specific order. AFAICT if someone (e.g. for debugging) calls
> count_old_cluster_logical_slots() or calls print_slot_infos() then the
> behaviour is undefined because slot_arr.nslots remains uninitialized.


Hmm, I do not think such assumption is needed. In the current code pg_malloc() 
is
used in get_db_infos(), so there is a possibility that print_rel_infos() is
executed for debugging. The behavior is undefined - this is same as you said,
and code has been alive. Based on that I think we can accept the risk and
reduce operations instead. If you knew other example, please share here...

[1]: 
https://www.postgresql.org/message-id/TYAPR01MB586642D33208D190F67CDD7BF5F2A%40TYAPR01MB5866.jpnprd01.prod.outlook.com
[2]: 
https://www.postgresql.org/docs/devel/error-style-guide.html#ERROR-STYLE-GUIDE-GRAMMAR-PUNCTUATION
[3]: 
https://www.postgresql.org/message-id/TYAPR01MB5866732D30ABB976992BDECCF5789%40TYAPR01MB5866.jpnprd01.prod.outlook.com
[4]: 
https://www.postgresql.org/message-id/OS0PR01MB5716670FE547BA87FDEF895E94EDA%40OS0PR01MB5716.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: Make --help output fit within 80 columns per line

2023-09-12 Thread Peter Eisentraut

I like this work a lot.  It's good to give developers easy to verify 
guidance about formatting the --help messages.


However, I think instead of just adding a bunch of line breaks to 
satisfy the test, we should really try to make the lines shorter by 
rewording.  Otherwise, chances are we are making the usability worse for 
many users, because it's more likely that part of the help will scroll 
off the screen.  For example, in many cases, we could replace "do not" 
by "don't", or we could decrease the indentation of the second column by 
a few spaces, or just reword altogether.


Also, it would be very useful if the TAP test function could print out 
the violating lines if a test fails.  (Similar to how is() and like() 
print the failing values.)  Maybe start with that, and then it's easier 
to play with different wording variants to try to make it fit better.

Re: Row pattern recognition

2023-09-12 Thread Tatsuo Ishii

>> What about leaving this (reevaluation) for now? Because:
>>
>> 1) we don't have CLASSIFIER
>> 2) we don't allow to give CLASSIFIER to PREV as its arggument
>>
>> so I think we don't need to worry about this for now.
> 
> Sure. I'm all for deferring features to make it easier to iterate; I
> just want to make sure the architecture doesn't hit a dead end. Or at
> least, not without being aware of it.

Ok, let's defer this issue. Currently the patch already exceeds 3k
lines. I am afraid too big patch cannot be reviewed by anyone, which
means it will never be committed.

> Also: is CLASSIFIER the only way to run into this issue?

Good question. I would like to know.

>> What if we don't follow the standard, instead we follow POSIX EREs?  I
>> think this is better for users unless RPR's REs has significant merit
>> for users.
> 
> Piggybacking off of what Vik wrote upthread, I think we would not be
> doing ourselves any favors by introducing a non-compliant
> implementation that performs worse than a traditional NFA. Those would
> be some awful bug reports.

What I am not sure about is, you and Vik mentioned that the
traditional NFA is superior that POSIX NFA in terms of performance.
But how "lexicographic ordering" is related to performance?

>> I am not sure if we need to worry about this because of the reason I
>> mentioned above.
> 
> Even if we adopted POSIX NFA semantics, we'd still have to implement
> our own parser for the PATTERN part of the query. I don't think
> there's a good way for us to reuse the parser in src/backend/regex.

Ok.

>> > Does that seem like a workable approach? (Worst-case, my code is just
>> > horrible, and we throw it in the trash.)
>>
>> Yes, it seems workable. I think for the first cut of RPR needs at
>> least the +quantifier with reasonable performance. The current naive
>> implementation seems to have issue because of exhaustive search.
> 
> +1

BTW, attched is the v6 patch. The differences from v5 include:

- Now aggregates can be used with RPR. Below is an example from the
  regression test cases, which is added by v6 patch.

- Fix assersion error pointed out by Erik.

SELECT company, tdate, price,
 first_value(price) OVER w,
 last_value(price) OVER w,
 max(price) OVER w,
 min(price) OVER w,
 sum(price) OVER w,
 avg(price) OVER w,
 count(price) OVER w
FROM stock
WINDOW w AS (
PARTITION BY company
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
AFTER MATCH SKIP PAST LAST ROW
INITIAL
PATTERN (START UP+ DOWN+)
DEFINE
START AS TRUE,
UP AS price > PREV(price),
DOWN AS price < PREV(price)
);
 company  |   tdate| price | first_value | last_value | max  | min | sum  | 
 avg  | count 
--++---+-++--+-+--+---+---
 company1 | 07-01-2023 |   100 | 100 |140 |  200 | 100 |  590 | 
 147.5000 | 4
 company1 | 07-02-2023 |   200 | ||  | |  | 
  |  
 company1 | 07-03-2023 |   150 | ||  | |  | 
  |  
 company1 | 07-04-2023 |   140 | ||  | |  | 
  |  
 company1 | 07-05-2023 |   150 | ||  | |  | 
  |  
 company1 | 07-06-2023 |90 |  90 |120 |  130 |  90 |  450 | 
 112.5000 | 4
 company1 | 07-07-2023 |   110 | ||  | |  | 
  |  
 company1 | 07-08-2023 |   130 | ||  | |  | 
  |  
 company1 | 07-09-2023 |   120 | ||  | |  | 
  |  
 company1 | 07-10-2023 |   130 | ||  | |  | 
  |  
 company2 | 07-01-2023 |50 |  50 |   1400 | 2000 |  50 | 4950 | 
1237.5000 | 4
 company2 | 07-02-2023 |  2000 | ||  | |  | 
  |  
 company2 | 07-03-2023 |  1500 | ||  | |  | 
  |  
 company2 | 07-04-2023 |  1400 | ||  | |  | 
  |  
 company2 | 07-05-2023 |  1500 | ||  | |  | 
  |  
 company2 | 07-06-2023 |60 |  60 |   1200 | 1300 |  60 | 3660 | 
 915. | 4
 company2 | 07-07-2023 |  1100 | ||  | |  | 
  |  
 company2 | 07-08-2023 |  1300 | ||  | |  | 
  |  
 company2 | 07-09-2023 |  1200 | ||  | |  | 
  |  
 company2 | 07-10-2023 |  1300 | ||  | |  |

Re: Make psql's qeury canceling test simple by using signal() routine of IPC::Run

2023-09-12 Thread Michael Paquier

On Wed, Sep 06, 2023 at 12:45:24AM +0900, Yugo NAGATA wrote:
> I attached the update patch. I removed the incorrect comments and
> unnecessary lines. Also,  I rewrote the test to use "skip_all" instead
> of SKIP because we skip the whole test rather than a part of it.

Thanks for checking how IPC::Run behaves in this case on Windows!

Right.  This test is currently setting up a node for nothing, so let's
skip this test entirely under $windows_os and move on.  I'll backpatch
that down to 15 once the embargo on REL_16_STABLE is lifted with the
16.0 tag.
--
Michael


signature.asc
Description: PGP signature

Re: pg_rewind with cascade standby doesn't work well

2023-09-12 Thread Michael Paquier

On Mon, Sep 11, 2023 at 07:04:30PM +0300, Aleksander Alekseev wrote:
> Many thanks for submitting the patch. I added it to the nearest open
> commitfest [1].
> 
> IMO a test is needed that makes sure no one is going to break this in
> the future.

You definitely need more complex test scenarios for that.  If you can
come up with new ways to make the TAP tests of pg_rewind mode modular
in handling more complicated node setups, that would be a nice
addition, for example.

> [1]: https://commitfest.postgresql.org/45/4559/

@@ -7951,6 +7951,15 @@ xlog_redo(XLogReaderState *record)
 ereport(PANIC,
 (errmsg("unexpected timeline ID %u (should be %u) in 
end-of-recovery record",
 xlrec.ThisTimeLineID, replayTLI)));
+/*
+ * Update the control file so that crash recovery can follow the 
timeline
+ * changes to this point.
+ */
+LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+ControlFile->minRecoveryPoint = lsn;
+ControlFile->minRecoveryPointTLI = xlrec.ThisTimeLineID;

This patch is at least incorrect in its handling of crash recovery,
because these two should *never* be set in this case as we want to
replay up to the end of WAL.  For example, see xlog.c or the top of
xlogrecovery.c about the assumptions behind these variables:
/* crash recovery should always recover to the end of WAL */
ControlFile->minRecoveryPoint = InvalidXLogRecPtr;
ControlFile->minRecoveryPointTLI = 0;

If an end-of-recovery record is replayed during crash recovery, these
assumptions are plain broken.

One thing that we could consider is to be more aggressive with
restartpoints when replaying this record for a standby, see a few
lines above the lines added by your patch, for example.  And we could
potentially emulate a post-promotion restart point to get a refresh of
the control file as it should, with the correct code paths involved in
the updates of minRecoveryPoint when the checkpointer does the job.
--
Michael

signature.asc
Description: PGP signature

72 matches

Mail list logo