date:20230904

Re: pg_upgrade fails with in-place tablespace[

2023-09-04 Thread Michael Paquier

On Tue, Sep 05, 2023 at 10:28:44AM +0800, Rui Zhao wrote:
> It would be highly appreciated if the official PG could also
> incorporate support for this feature.

This is a development option, and documented as such.  You may also
want to be aware of the fact that tablespaces in the data folder
itself are discouraged, because they don't really make sense, and that
we generate a WARNING if trying to do that since 33cb8ff6aa11.

I don't have really have more to add, but others interested in this
topic are of course free to share their opinions and/or comments.
--
Michael

signature.asc
Description: PGP signature

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-04 Thread Zhijie Hou (Fujitsu)

On Friday, September 1, 2023 9:05 PM Kuroda, Hayato/黒田 隼人 
 wrote:
> 

Hi,

Thanks for updating the patch.
I have a comment about the check related to the wal_status.

Currently, there are few places where we check the wal_status of slots. e.g.
check_old_cluster_for_valid_slots(),get_loadable_libraries(), and
get_old_cluster_logical_slot_infos().

But as discussed in another thread[1]. There are some kind of WALs that will be
written when pg_upgrade are checking the old cluster which could cause the wal
size to exceed the max_slot_wal_keep_size. In this case, checkpoint will remove
the wals required by slots and invalidate these slots(the wal_status get
changed as well).

Based on this, it’s possible that the slots we get each time when checking
wal_status are different, because they may get changed in between these checks.
This may not cause serious problems for now, because we will either copy all
the slots including ones invalidated when upgrading or we report ERROR. But I
feel it's better to get consistent result each time we check the slots to close
the possibility for problems in the future. So, I feel we could centralize the
check for wal_status and slots fetch, so that even if some slots status changed
after that, it won't have a risk to affect our check. What do you think ?

[1] 
https://www.postgresql.org/message-id/flat/CAA4eK1LLik2818uzYqS73O%2BHe5LK_%2B%3DkthyZ6hwT6oe9TuxycA%40mail.gmail.com#16efea0a76d623b1335e73fc1e28f5ef

Best Regards,
Hou zj

Re: Autogenerate some wait events code and documentation

2023-09-04 Thread Michael Paquier

On Mon, Sep 04, 2023 at 02:14:58PM +0200, Drouvot, Bertrand wrote:
> # Build the descriptions.  There are in camel-case.
> # LWLock and Lock classes do not need any modifications.
> 
> Nit: 2 whitespaces before "There are in camel"

The whitespaces are intentional, the typo in the first line is not.

> +   my $waiteventdescription = '';
> +   if (   $waitclassname eq 'WaitEventLWLock'
> 
> Nit: Too many whitespace after the "if (" ?? (I guess pgperltidy would
> fix it).

Here, perltidy is indeed complaining, but it is adding a few
whitespaces.

> Then, the only diff is:
> 
> < Client,WalSenderWaitWal,Waiting for WAL to be flushed in WAL sender process
> ---
> > Client,WalSenderWaitForWAL,Waiting for WAL to be flushed in WAL sender 
> > process
> 
> That said, it looks good to me.

Ah, good catch.  I did not think about cross-checking the data in the
new view before and after the patch set.  This rename needs to happen
in 0001.

Please find v5 attached.  How does that look?
--
Michael
From 878ecc7a13ef9086ec3dc0cc70bff1f5d8a22eea Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 5 Sep 2023 14:32:37 +0900
Subject: [PATCH v5 1/2] Rename wait events with more consistent camelcase
 style

This will help in the introduction of more simplifications with the
generation of wait event data using wait_event_names.txt.  The event
names used the same case-insensitive characters, hence applying lower()
or upper() to the monitoring queries allows the detection of the same
events as before this change.
---
 src/backend/libpq/pqmq.c  |  2 +-
 src/backend/replication/walsender.c   |  2 +-
 src/backend/storage/ipc/shm_mq.c  |  6 +-
 .../utils/activity/wait_event_names.txt   | 90 +--
 4 files changed, 50 insertions(+), 50 deletions(-)

diff --git a/src/backend/libpq/pqmq.c b/src/backend/libpq/pqmq.c
index 253181f47a..38b042804c 100644
--- a/src/backend/libpq/pqmq.c
+++ b/src/backend/libpq/pqmq.c
@@ -182,7 +182,7 @@ mq_putmessage(char msgtype, const char *s, size_t len)
 			break;
 
 		(void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0,
-		 WAIT_EVENT_MQ_PUT_MESSAGE);
+		 WAIT_EVENT_MESSAGE_QUEUE_PUT_MESSAGE);
 		ResetLatch(MyLatch);
 		CHECK_FOR_INTERRUPTS();
 	}
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 46e48492ac..e250b0567e 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1654,7 +1654,7 @@ WalSndWaitForWal(XLogRecPtr loc)
 		if (pq_is_send_pending())
 			wakeEvents |= WL_SOCKET_WRITEABLE;
 
-		WalSndWait(wakeEvents, sleeptime, WAIT_EVENT_WAL_SENDER_WAIT_WAL);
+		WalSndWait(wakeEvents, sleeptime, WAIT_EVENT_WAL_SENDER_WAIT_FOR_WAL);
 	}
 
 	/* reactivate latch so WalSndLoop knows to continue */
diff --git a/src/backend/storage/ipc/shm_mq.c b/src/backend/storage/ipc/shm_mq.c
index d134a09a19..06d6b73527 100644
--- a/src/backend/storage/ipc/shm_mq.c
+++ b/src/backend/storage/ipc/shm_mq.c
@@ -1017,7 +1017,7 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
 			 * cheaper than setting one that has been reset.
 			 */
 			(void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0,
-			 WAIT_EVENT_MQ_SEND);
+			 WAIT_EVENT_MESSAGE_QUEUE_SEND);
 
 			/* Reset the latch so we don't spin. */
 			ResetLatch(MyLatch);
@@ -1163,7 +1163,7 @@ shm_mq_receive_bytes(shm_mq_handle *mqh, Size bytes_needed, bool nowait,
 		 * setting one that has been reset.
 		 */
 		(void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0,
-		 WAIT_EVENT_MQ_RECEIVE);
+		 WAIT_EVENT_MESSAGE_QUEUE_RECEIVE);
 
 		/* Reset the latch so we don't spin. */
 		ResetLatch(MyLatch);
@@ -1252,7 +1252,7 @@ shm_mq_wait_internal(shm_mq *mq, PGPROC **ptr, BackgroundWorkerHandle *handle)
 
 		/* Wait to be signaled. */
 		(void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0,
-		 WAIT_EVENT_MQ_INTERNAL);
+		 WAIT_EVENT_MESSAGE_QUEUE_INTERNAL);
 
 		/* Reset the latch so we don't spin. */
 		ResetLatch(MyLatch);
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 13774254d2..5287ad5f63 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -45,15 +45,15 @@
 Section: ClassName - WaitEventActivity
 
 WAIT_EVENT_ARCHIVER_MAIN	ArchiverMain	"Waiting in main loop of archiver process."
-WAIT_EVENT_AUTOVACUUM_MAIN	AutoVacuumMain	"Waiting in main loop of autovacuum launcher process."
-WAIT_EVENT_BGWRITER_HIBERNATE	BgWriterHibernate	"Waiting in background writer process, hibernating."
-WAIT_EVENT_BGWRITER_MAIN	BgWriterMain	"Waiting in main loop of background writer process."
+WAIT_EVENT_AUTOVACUUM_MAIN	AutovacuumMain	"Waiting in main loop of autovacuum launcher process."
+WAIT_EVENT_BGWRITER_HIBERNATE	BgwriterHibernate	"Waiting in background writer process, hibernating."
+WAIT_EVENT_BGWRITER_M

Re: Impact of checkpointer during pg_upgrade

2023-09-04 Thread Amit Kapila

On Tue, Sep 5, 2023 at 10:09 AM Dilip Kumar  wrote:
>
> On Tue, Sep 5, 2023 at 9:38 AM Amit Kapila  wrote:
> >
> > On Mon, Sep 4, 2023 at 4:19 PM Dilip Kumar  wrote:
> > >
> > > Said that there is a possibility that some of the slots which got
> > > invalidated even on the previous checkpoint might get the same LSN as
> > > the slot which got invalidated later if there is no activity between
> > > these two checkpoints. So if we go with this approach then there is
> > > some risk of migrating some of the slots which were already
> > > invalidated even before the shutdown checkpoint.
> > >
> >
> > I think even during the shutdown checkpoint, after writing shutdown
> > checkpoint WAL, we can invalidate some slots that in theory are safe
> > to migrate/copy because all the WAL for those slots would also have
> > been sent. So, those would be similar to what we invalidate during the
> > upgrade, no?
>
> Thats correct
>
>  If so, I think it is better to have the same behavior for
> > invalidated slots irrespective of the time it gets invalidated. We can
> > either give an error for such slots during the upgrade (which means
> > disallow the upgrade) or simply ignore such slots during the upgrade.
> > I would prefer ERROR but if we want to ignore such slots, we can
> > probably inform the user in some way about ignored slots, so that she
> > can later drop corresponding subscritions or recreate such slots and
> > do the required sync-up to continue the replication.
>
> Earlier I was thinking that ERRORing out is better so that the user
> can take necessary action for the invalidated slots and then retry
> upgrade.  But thinking again I could not find what are the advantages
> of this because if we error out then also users need to restart the
> old cluster again and have to drop the corresponding subscriptions
> OTOH if we allow the upgrade by ignoring the slots then also the user
> has to take similar actions on the new cluster?  So what's the
> advantage of erroring out over upgrading?
>

The advantage is that we avoid inconvenience caused to users because
Drop Subscription will be unsuccessful as the corresponding slots are
not present. So users first need to disassociate slots for the
subscription and then drop the subscription. Also, I am not sure
leaving behind some slots doesn't have any other impact, otherwise,
why don't we drop such slots from time to time after they are marked
invalidated during normal operation? If users really want to leave
behind such invalidated slots after upgrade, we can even think of
providing some option like "exclude_invalid_logical_slots".

-- 
With Regards,
Amit Kapila.

Re: persist logical slots to disk during shutdown checkpoint

2023-09-04 Thread Dilip Kumar

On Tue, Sep 5, 2023 at 9:58 AM Amit Kapila  wrote:
>
> On Tue, Sep 5, 2023 at 9:10 AM Dilip Kumar  wrote:
> >
> > The comments don't mention anything about why we are just flushing at
> > the shutdown checkpoint.  I mean the checkpoint is not that frequent
> > and we already perform a lot of I/O during checkpoints so isn't it
> > wise to flush during every checkpoint.  We may argue that there is no
> > extra advantage of that as we are not optimizing for crash recovery
> > but OTOH there is no reason for not doing so for other checkpoints or
> > we are worried about the concurrency with parallel walsender running
> > during non shutdown checkpoint if so then better we explain that as
> > well?  If it is already discussed in the thread and we have a
> > conclusion on this then maybe we can mention this in comments?
> >
>
> The point is that at the time of non-shutdown checkpoints, it is not
> clear that there is an extra advantage but we will definitely add
> extra I/O for this. Because at other times, we will already be saving
> the slot from time to time as the replication makes progress. And, we
> also need to flush such slots during shutdown for correctness for some
> use cases like upgrades. We can probably add something like: "At other
> times, the walsender keeps saving the slot from time to time as the
> replication progresses, so there is no clear advantage of flushing
> additional slots at the time of checkpoint". Will that work for you?

Yeah that comments will work out, my only concern was because we added
an explicit condition that it should be synced only during shutdown
checkpoint so better comments also explicitly explains the reason.
Anyway I am fine with either way whether we sync at the shutdown
checkpoint or all the checkpoint.  Because I/O for slot sync during
checkpoint time should not be a real worry and with that if we can
avoid additional code with extra conditions then it's better because
such code branches will be frequently hit and I think for testability
pov we prefer to add code in common path unless there is some overhead
or it is specifically meant for that branch only.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Impact of checkpointer during pg_upgrade

2023-09-04 Thread Dilip Kumar

On Tue, Sep 5, 2023 at 9:38 AM Amit Kapila  wrote:
>
> On Mon, Sep 4, 2023 at 4:19 PM Dilip Kumar  wrote:
> >
> > Said that there is a possibility that some of the slots which got
> > invalidated even on the previous checkpoint might get the same LSN as
> > the slot which got invalidated later if there is no activity between
> > these two checkpoints. So if we go with this approach then there is
> > some risk of migrating some of the slots which were already
> > invalidated even before the shutdown checkpoint.
> >
>
> I think even during the shutdown checkpoint, after writing shutdown
> checkpoint WAL, we can invalidate some slots that in theory are safe
> to migrate/copy because all the WAL for those slots would also have
> been sent. So, those would be similar to what we invalidate during the
> upgrade, no?

Thats correct

 If so, I think it is better to have the same behavior for
> invalidated slots irrespective of the time it gets invalidated. We can
> either give an error for such slots during the upgrade (which means
> disallow the upgrade) or simply ignore such slots during the upgrade.
> I would prefer ERROR but if we want to ignore such slots, we can
> probably inform the user in some way about ignored slots, so that she
> can later drop corresponding subscritions or recreate such slots and
> do the required sync-up to continue the replication.

Earlier I was thinking that ERRORing out is better so that the user
can take necessary action for the invalidated slots and then retry
upgrade.  But thinking again I could not find what are the advantages
of this because if we error out then also users need to restart the
old cluster again and have to drop the corresponding subscriptions
OTOH if we allow the upgrade by ignoring the slots then also the user
has to take similar actions on the new cluster?  So what's the
advantage of erroring out over upgrading?  I see a clear advantage of
upgrading is that the user wants to upgrade and that's successful
without reattempting.   If we say that if we error out and then there
are some option for user to salvage those invalidated slots and he can
somehow migrate those slot as well by retrying upgrade then it make
sense to error out and let user take some action on old cluster, but
if all he has to do is to drop the subscription or recreate the slot
in both the cases then letting the upgrade pass is better option at
least IMHO.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: persist logical slots to disk during shutdown checkpoint

2023-09-04 Thread Amit Kapila

On Tue, Sep 5, 2023 at 9:10 AM Dilip Kumar  wrote:
>
> On Fri, Sep 1, 2023 at 12:16 PM vignesh C  wrote:
> >
> > On Fri, 1 Sept 2023 at 10:06, Amit Kapila  wrote:
> > >
> > > On Thu, Aug 31, 2023 at 6:12 PM Ashutosh Bapat
> > >  wrote:
> > > >
> > > > On Thu, Aug 31, 2023 at 2:52 PM Amit Kapila  
> > > > wrote:
> > > > >
> > > > > All but one. Normally, the idea of marking dirty is to indicate that
> > > > > we will actually write/flush the contents at a later point (except
> > > > > when required for correctness) as even indicated in the comments atop
> > > > > ReplicatioinSlotMarkDirty(). However, I see your point that we use
> > > > > that protocol at all the current places including CreateSlotOnDisk().
> > > > > So, we can probably do it here as well.
> > > >
> > > > yes
> > > >
> > >
> > > I think we should also ensure that slots are not invalidated
> > > (slot.data.invalidated != RS_INVAL_NONE) before marking them dirty
> > > because we don't allow decoding from such slots, so we shouldn't
> > > include those.
> >
> > Added this check.
> >
> > Apart from this I have also fixed the following issues that were
> > agreed on: a) Setting slots to dirty in CheckPointReplicationSlots
> > instead of setting it in SaveSlotToPath b) The comments were moved
> > from ReplicationSlot and moved to CheckPointReplicationSlots c) Tests
> > will be run in autovacuum = off d) updating last_saved_confirmed_flush
> > based on cp.slotdata.confirmed_flush rather than
> > slot->data.confirmed_flush.
> > I have also added the commitfest entry for this at [1].
>
> The overall idea looks fine to me
>
> +
> + /*
> + * We won't ensure that the slot is persisted after the
> + * confirmed_flush LSN is updated as that could lead to frequent
> + * writes.  However, we need to ensure that we do persist the slots at
> + * the time of shutdown whose confirmed_flush LSN is changed since we
> + * last saved the slot to disk. This will help in avoiding retreat of
> + * the confirmed_flush LSN after restart.
> + */
> + if (is_shutdown && SlotIsLogical(s))
> + {
> + SpinLockAcquire(&s->mutex);
> + if (s->data.invalidated == RS_INVAL_NONE &&
> + s->data.confirmed_flush != s->last_saved_confirmed_flush)
> + s->dirty = true;
> + SpinLockRelease(&s->mutex);
> + }
>
> The comments don't mention anything about why we are just flushing at
> the shutdown checkpoint.  I mean the checkpoint is not that frequent
> and we already perform a lot of I/O during checkpoints so isn't it
> wise to flush during every checkpoint.  We may argue that there is no
> extra advantage of that as we are not optimizing for crash recovery
> but OTOH there is no reason for not doing so for other checkpoints or
> we are worried about the concurrency with parallel walsender running
> during non shutdown checkpoint if so then better we explain that as
> well?  If it is already discussed in the thread and we have a
> conclusion on this then maybe we can mention this in comments?
>

The point is that at the time of non-shutdown checkpoints, it is not
clear that there is an extra advantage but we will definitely add
extra I/O for this. Because at other times, we will already be saving
the slot from time to time as the replication makes progress. And, we
also need to flush such slots during shutdown for correctness for some
use cases like upgrades. We can probably add something like: "At other
times, the walsender keeps saving the slot from time to time as the
replication progresses, so there is no clear advantage of flushing
additional slots at the time of checkpoint". Will that work for you?
Having said that, I am not opposed to doing it for non-shutdown
checkpoints if one makes a separate case for it.

-- 
With Regards,
Amit Kapila.

Re: Impact of checkpointer during pg_upgrade

2023-09-04 Thread Amit Kapila

On Mon, Sep 4, 2023 at 4:19 PM Dilip Kumar  wrote:
>
> Said that there is a possibility that some of the slots which got
> invalidated even on the previous checkpoint might get the same LSN as
> the slot which got invalidated later if there is no activity between
> these two checkpoints. So if we go with this approach then there is
> some risk of migrating some of the slots which were already
> invalidated even before the shutdown checkpoint.
>

I think even during the shutdown checkpoint, after writing shutdown
checkpoint WAL, we can invalidate some slots that in theory are safe
to migrate/copy because all the WAL for those slots would also have
been sent. So, those would be similar to what we invalidate during the
upgrade, no? If so, I think it is better to have the same behavior for
invalidated slots irrespective of the time it gets invalidated. We can
either give an error for such slots during the upgrade (which means
disallow the upgrade) or simply ignore such slots during the upgrade.
I would prefer ERROR but if we want to ignore such slots, we can
probably inform the user in some way about ignored slots, so that she
can later drop corresponding subscritions or recreate such slots and
do the required sync-up to continue the replication.

-- 
With Regards,
Amit Kapila.

Re: proposal: psql: show current user in prompt

2023-09-04 Thread Pavel Stehule

po 4. 9. 2023 v 14:24 odesílatel Jelte Fennema  napsal:

> On Sun, 3 Sept 2023 at 20:58, Pavel Stehule 
> wrote:
> > here is an try
>
> Overall it does what I had in mind. Below a few suggestions:
>
> +int
> +PQprotocolSubversion(const PGconn *conn)
>
> Ugh, it's quite annoying that the original PQprotocolVersion only
> returns the major version and thus we need this new function. It
> seems like it would be much nicer if it returned a number similar to
> PQserverVersion. I think it might be nicer to change PQprotocolVersion
> to do that than to add another function. We could do:
>
> return PG_PROTOCOL_MAJOR(conn->pversion) * 100 +
> PG_PROTOCOL_MINOR(conn->pversion);
>
> or even:
>
> if (PG_PROTOCOL_MAJOR(conn->pversion) == 3 &&
> PG_PROTOCOL_MINOR(conn->pversion))
> return 3;
> else
> return PG_PROTOCOL_MAJOR(conn->pversion) * 100 +
> PG_PROTOCOL_MINOR(conn->pversion);
>
> The second option would be safest backwards compatibility wise, but in
> practice you would only get another value than 3 (or 0) when
> connecting to pre 7.4 servers. That seems old enough that I don't
> think anyone is actually calling this function. **I'd like some
> feedback from others on this though.**
>

This point is open. I'll wait for a reply from others.


>
> +   /* The protocol 3.0 is required */
> +   if (PG_PROTOCOL_MAJOR(their_version) == 3)
> +   conn->pversion = their_version;
>
> Let's compare against the actual PG_PROTOCOL_EARLIEST and
> PG_PROTOCOL_LATEST to determine if the version is supported or not.
>

changed
From f1975b784627fda06b23303c4f1fb6972d80a0a0 Mon Sep 17 00:00:00 2001
From: "ok...@github.com" 
Date: Mon, 24 Jul 2023 20:18:16 +0200
Subject: [PATCH 4/4] Implementation of %N prompt placeholder

It is based on forcing reporting feature"role" GUC to client.
---
 doc/src/sgml/ref/psql-ref.sgml | 19 ++-
 src/bin/psql/command.c | 13 +
 src/bin/psql/prompt.c  | 28 
 src/bin/psql/settings.h|  1 +
 src/bin/psql/startup.c | 32 
 5 files changed, 92 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index d94e3cacfc..8b267a6da6 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -4568,7 +4568,24 @@ testdb=> INSERT INTO my_table VALUES (:'content');
 The port number at which the database server is listening.
   
 
-  
+  
+%N
+
+ 
+  The database role name. This value is specified by command
+  SET ROLE. Until execution of this command
+  the value is set to the database session user name.
+ 
+
+ 
+  This substitution requires PostgreSQL
+  version 16 and up. When you use older version, the empty string
+  is used instead.
+ 
+
+  
+
+  
 %n
 
  
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index bcd8eb3538..bad0fdf415 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -3883,6 +3883,7 @@ SyncVariables(void)
 {
 	char		vbuf[32];
 	const char *server_version;
+	PGresult   *result;
 
 	/* get stuff from connection */
 	pset.encoding = PQclientEncoding(pset.db);
@@ -3912,6 +3913,18 @@ SyncVariables(void)
 	/* send stuff to it, too */
 	PQsetErrorVerbosity(pset.db, pset.verbosity);
 	PQsetErrorContextVisibility(pset.db, pset.show_context);
+
+	/* link role GUC when it is needed for prompt */
+	if (pset.prompt_shows_role)
+		result = PQlinkParameterStatus(pset.db, "role");
+	else
+		result = PQunlinkParameterStatus(pset.db, "role");
+
+	if (PQresultStatus(result) != PGRES_COMMAND_OK)
+		pg_log_info("cannot set REPORT flag on configuration variable \"role\": %s",
+	PQerrorMessage(pset.db));
+
+	PQclear(result);
 }
 
 /*
diff --git a/src/bin/psql/prompt.c b/src/bin/psql/prompt.c
index 969cd9908e..d306c91d19 100644
--- a/src/bin/psql/prompt.c
+++ b/src/bin/psql/prompt.c
@@ -165,6 +165,34 @@ get_prompt(promptStatus_t status, ConditionalStack cstack)
 	if (pset.db)
 		strlcpy(buf, session_username(), sizeof(buf));
 	break;
+	/* DB server user role */
+case 'N':
+	if (pset.db)
+	{
+		const char *rolename = NULL;
+
+		/*
+		 * This feature requires GUC "role" to be marked
+		 * by GUC_REPORT flag. This is done by PQlinkParameterStatus
+		 * function. This function requires protocol 3.1 (ReportGUC
+		 * message). Fallback is empty string.
+		 */
+		if (PQprotocolVersion(pset.db) == 3 &&
+			PQprotocolSubversion(pset.db) >= 1)
+		{
+			rolename  = PQparameterStatus(pset.db, "role");
+
+			/* fallback when role is not set yet */
+			if (rolename && strcmp(rolename, "none") == 0)
+rolename = session_username();
+		}
+
+		if (rolename)
+			strlcpy(buf, rolename, sizeof(buf));
+		else
+			buf[0] = '\0';
+

Re: persist logical slots to disk during shutdown checkpoint

2023-09-04 Thread Dilip Kumar

On Fri, Sep 1, 2023 at 12:16 PM vignesh C  wrote:
>
> On Fri, 1 Sept 2023 at 10:06, Amit Kapila  wrote:
> >
> > On Thu, Aug 31, 2023 at 6:12 PM Ashutosh Bapat
> >  wrote:
> > >
> > > On Thu, Aug 31, 2023 at 2:52 PM Amit Kapila  
> > > wrote:
> > > >
> > > > All but one. Normally, the idea of marking dirty is to indicate that
> > > > we will actually write/flush the contents at a later point (except
> > > > when required for correctness) as even indicated in the comments atop
> > > > ReplicatioinSlotMarkDirty(). However, I see your point that we use
> > > > that protocol at all the current places including CreateSlotOnDisk().
> > > > So, we can probably do it here as well.
> > >
> > > yes
> > >
> >
> > I think we should also ensure that slots are not invalidated
> > (slot.data.invalidated != RS_INVAL_NONE) before marking them dirty
> > because we don't allow decoding from such slots, so we shouldn't
> > include those.
>
> Added this check.
>
> Apart from this I have also fixed the following issues that were
> agreed on: a) Setting slots to dirty in CheckPointReplicationSlots
> instead of setting it in SaveSlotToPath b) The comments were moved
> from ReplicationSlot and moved to CheckPointReplicationSlots c) Tests
> will be run in autovacuum = off d) updating last_saved_confirmed_flush
> based on cp.slotdata.confirmed_flush rather than
> slot->data.confirmed_flush.
> I have also added the commitfest entry for this at [1].

The overall idea looks fine to me

+
+ /*
+ * We won't ensure that the slot is persisted after the
+ * confirmed_flush LSN is updated as that could lead to frequent
+ * writes.  However, we need to ensure that we do persist the slots at
+ * the time of shutdown whose confirmed_flush LSN is changed since we
+ * last saved the slot to disk. This will help in avoiding retreat of
+ * the confirmed_flush LSN after restart.
+ */
+ if (is_shutdown && SlotIsLogical(s))
+ {
+ SpinLockAcquire(&s->mutex);
+ if (s->data.invalidated == RS_INVAL_NONE &&
+ s->data.confirmed_flush != s->last_saved_confirmed_flush)
+ s->dirty = true;
+ SpinLockRelease(&s->mutex);
+ }

The comments don't mention anything about why we are just flushing at
the shutdown checkpoint.  I mean the checkpoint is not that frequent
and we already perform a lot of I/O during checkpoints so isn't it
wise to flush during every checkpoint.  We may argue that there is no
extra advantage of that as we are not optimizing for crash recovery
but OTOH there is no reason for not doing so for other checkpoints or
we are worried about the concurrency with parallel walsender running
during non shutdown checkpoint if so then better we explain that as
well?  If it is already discussed in the thread and we have a
conclusion on this then maybe we can mention this in comments?


/*
  * Flush all replication slots to disk.
  *
- * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * is_shutdown is true in case of a shutdown checkpoint.
  */
 void
-CheckPointReplicationSlots(void)
+CheckPointReplicationSlots(bool is_shutdown)

It seems we have removed two lines from the function header comments,
is this intentional or accidental?

Other than this patch LGTM.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Optionally using a better backtrace library?

2023-09-04 Thread Noah Misch

On Mon, Jul 03, 2023 at 11:58:25AM +0200, Alvaro Herrera wrote:
> On 2023-Jul-02, Andres Freund wrote:
> > I like that we now have a builtin backtrace ability. Unfortunately I think 
> > the
> > backtraces are often not very useful, because only externally visible
> > functions are symbolized.
> 
> Agreed, these backtraces are pretty close to useless.  Not completely,
> but I haven't found a practical way to use them for actual debugging
> of production problems.

For what it's worth, I use the attached script to convert the current
errbacktrace output to a fully-symbolized backtrace.  Nonetheless, ...

> > I hacked it up for ereport() to debug something, and the backtraces are
> > considerably better:
> > 
> > 2023-07-02 10:52:54.863 PDT [1398207][client backend][:0][[unknown]] LOG:  
> > will crash
> > 2023-07-02 10:52:54.863 PDT [1398207][client backend][:0][[unknown]] 
> > BACKTRACE:
> > [0x55fcd03e6143] PostgresMain: 
> > ../../../../home/andres/src/postgresql/src/backend/tcop/postgres.c:4126
> > [0x55fcd031154c] BackendRun: 
> > ../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:4461
> > [0x55fcd0310dd8] BackendStartup: 
> > ../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:4189
> > [0x55fcd030ce75] ServerLoop: 
> > ../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:1779
> 
> Yeah, this looks much more usable.

... +1 for offering this.
#! /usr/bin/perl

# Usage: errbacktrace2line EXE_FILE ) {
if (/^[[:space:]]*(\/.*\.so)$last2_fields/) {
my($obj, $offset, $return_addr) = ($1, $2, $3);
conditional_flush_addr $obj;
push @addr, ($offset || $return_addr);
} elsif (/$last2_fields/) {
my($offset, $return_addr) = ($1, $2);
conditional_flush_addr $bin;
push @addr, ($offset || $return_addr);
}
}

flush_addr;

Re: persist logical slots to disk during shutdown checkpoint

2023-09-04 Thread Amit Kapila

On Tue, Sep 5, 2023 at 7:54 AM Zhijie Hou (Fujitsu)
 wrote:
>
> On Monday, September 4, 2023 6:15 PM vignesh C  wrote:
> >
> > On Mon, 4 Sept 2023 at 15:20, Amit Kapila  wrote:
> > >
> > > On Fri, Sep 1, 2023 at 10:50 AM vignesh C  wrote:
> > > >
> > > > On Fri, 1 Sept 2023 at 10:06, Amit Kapila  
> > > > wrote:
> > > > >
> > > > > On Thu, Aug 31, 2023 at 6:12 PM Ashutosh Bapat
> > > > >  wrote:
> > > > > >
> > > > > > On Thu, Aug 31, 2023 at 2:52 PM Amit Kapila
> >  wrote:
> > > > > > >
> > > > > > > All but one. Normally, the idea of marking dirty is to
> > > > > > > indicate that we will actually write/flush the contents at a
> > > > > > > later point (except when required for correctness) as even
> > > > > > > indicated in the comments atop ReplicatioinSlotMarkDirty().
> > > > > > > However, I see your point that we use that protocol at all the 
> > > > > > > current
> > places including CreateSlotOnDisk().
> > > > > > > So, we can probably do it here as well.
> > > > > >
> > > > > > yes
> > > > > >
> > > > >
> > > > > I think we should also ensure that slots are not invalidated
> > > > > (slot.data.invalidated != RS_INVAL_NONE) before marking them dirty
> > > > > because we don't allow decoding from such slots, so we shouldn't
> > > > > include those.
> > > >
> > > > Added this check.
> > > >
> > > > Apart from this I have also fixed the following issues that were
> > > > agreed on: a) Setting slots to dirty in CheckPointReplicationSlots
> > > > instead of setting it in SaveSlotToPath
> > > >
> > >
> > > + if (is_shutdown && SlotIsLogical(s)) { SpinLockAcquire(&s->mutex);
> > > + if (s->data.invalidated == RS_INVAL_NONE &&
> > > + s->data.confirmed_flush != s->last_saved_confirmed_flush) dirty =
> > > + s->true;
> > >
> > > I think it is better to use ReplicationSlotMarkDirty() as that would
> > > be consistent with all other usages.
> >
> > ReplicationSlotMarkDirty works only on MyReplicationSlot whereas
> > CheckpointReplicationSlots loops through all the slots and marks the
> > appropriate slot as dirty, we might have to change ReplicationSlotMarkDirty 
> > to
> > take the slot as input parameter and all caller should pass 
> > MyReplicationSlot.
>
> Personally, I feel if we want to centralize the code of marking dirty into a
> function, we can introduce a new static function MarkSlotDirty(slot) to mark
> passed slot dirty and let ReplicationSlotMarkDirty and
> CheckpointReplicationSlots call it. Like:
>
> void
> ReplicationSlotMarkDirty(void)
> {
> MarkSlotDirty(MyReplicationSlot);
> }
>
> +static void
> +MarkSlotDirty(ReplicationSlot *slot)
> +{
> +   Assert(slot != NULL);
> +
> +   SpinLockAcquire(&slot->mutex);
> +   slot->just_dirtied = true;
> +   slot->dirty = true;
> +   SpinLockRelease(&slot->mutex);
> +}
>
> This is somewhat similar to the relation between ReplicationSlotSave(serialize
> my backend's replications slot) and SaveSlotToPath(save the passed slot).
>
> > Another thing is we have already taken spin lock to access
> > last_confirmed_flush_lsn from CheckpointReplicationSlots, we could set dirty
> > flag here itself, else we will have to release the lock and call
> > ReplicationSlotMarkDirty which will take lock again.
>
> Yes, this is unavoidable, but maybe it's not a big problem as
> we only do it at shutdown.
>

True but still it doesn't look elegant. I also thought about having a
probably inline function that marks both just_dirty and dirty fields.
However, that requires us to assert that the caller has already
acquired a spinlock. I see a macro SpinLockFree() that might help but
it didn't seem to be used anywhere in the code so not sure if we can
rely on it.

> > Instead shall we set just_dirtied also in CheckpointReplicationSlots?
> > Thoughts?
>
> I agree we'd better set just_dirtied to true to ensure we will serialize slot
> info here, because if some other processes just serialized the slot, the dirty
> flag will be reset to false if we don't set just_dirtied to true in
> CheckpointReplicationSlots(), this race condition may not exists for now, but
> seems better to completely forbid it by setting just_dirtied.
>

Agreed, and it is better to close any such possibility because we
can't say with certainty about manual slots. This seems better than
the other ideas we discussed.

-- 
With Regards,
Amit Kapila.

Re: Report planning memory in EXPLAIN ANALYZE

2023-09-04 Thread Lepikhov Andrei

Using your patch I found out one redundant memory usage in the planner [1]. It 
can be interesting as an example of how this patch can detect problems.

[1] Optimize planner memory consumption for huge arrays
https://www.postgresql.org/message-id/flat/em9939439a-441a-4b27-a977-ebdf5987dc49%407d14f008.com

-- 
Regards,
Andrei Lepikhov

On Thu, Aug 24, 2023, at 5:31 PM, Ashutosh Bapat wrote:
> Sorry for the late reply. I was working on David's suggestion.
>
> Here's a response to your questions and also a new set of patches.
>
> On Tue, Aug 22, 2023 at 1:16 PM jian he  wrote:
>> Hi. I tested it.
>> not sure if following is desired behavior. first run with explain,
>> then run with explain(summary on).
>> the second time,  Planning Memory: 0 bytes.
>>
>> regression=# PREPARE q4 AS SELECT 1 AS a;
>> explain EXECUTE q4;
>> QUERY PLAN
>> --
>>  Result  (cost=0.00..0.01 rows=1 width=4)
>> (1 row)
>>
>> regression=# explain(summary on) EXECUTE q4;
>> QUERY PLAN
>> --
>>  Result  (cost=0.00..0.01 rows=1 width=4)
>>  Planning Time: 0.009 ms
>>  Planning Memory: 0 bytes
>> (3 rows)
>> -
>
> Yes. This is expected since the plan is already available and no
> memory is required to fetch it from the cache. I imagine, if there
> were parameters to the prepared plan, it would consume some memory to
> evaluate those parameters and some more memory if replanning was
> required.
>
>
>> previously, if you want stats of a given memory context and its
>> children, you can only use MemoryContextStatsDetail.
>> but it will only go to stderr or LOG_SERVER_ONLY.
>> Now, MemoryContextMemUsed is being exposed. I can do something like:
>>
>> mem_consumed = MemoryContextMemUsed(CurrentMemoryContext);
>> //do stuff.
>> mem_consumed = MemoryContextMemUsed(CurrentMemoryContext) - mem_consumed;
>>
>> it will give me the NET memory consumed by doing staff in between. Is
>> my understanding correct?
>
> Yes.
>
> Here are three patches based on the latest master.
>
> 0001
> 
> this is same as the previous patch with few things fixed. 1. Call
> MemoryContextMemUsed() before INSTR_TIME_SET_CURRENT so that the time
> taken by MemoryContextMemUsed() is not counted in planning time. 2. In
> ExplainOnePlan, use a separate code block for reporting memory.
>
> 0002
> 
> This patch reports both memory allocated and memory used in the
> CurrentMemoryContext at the time of planning. It converts "Planning
> Memory" into a section with two values reported as "used" and
> "allocated" as below
>
> #explain (summary on) select * from pg_class c, pg_type t where
> c.reltype = t.oid;
> QUERY PLAN
> --
>  Hash Join  (cost=28.84..47.08 rows=414 width=533)
>... snip ...
>  Planning Time: 9.274 ms
>  Planning Memory: used=80848 bytes allocated=90112 bytes
> (7 rows)
>
> In JSON format
> #explain (summary on, format json) select * from pg_class c, pg_type t
> where c.reltype = t.oid;
>   QUERY PLAN
> ---
>  [+
>{  +
>  "Plan": {+
>   ... snip ...
>  },   +
>  "Planning Time": 0.466,  +
>  "Planning Memory": { +
>"Used": 80608, +
>"Allocated": 90112 +
>  }+
>}  +
>  ]
> (1 row)
>
> PFA explain and explain analyze output in all the formats.
>
> The patch adds MemoryContextMemConsumed() which is similar to
> MemoryContextStats() or MemoryContextStatsDetails() except 1. the
> later always prints the memory statistics to either stderr or to the
> server error log and 2. it doesn't return MemoryContextCounters that
> it gathered. We should probably change MemoryContextStats or
> MemoryContextStatsDetails() according to those two points and not add
> MemoryContextMemConsumed().
>
> I have not merged this into 0001 yet. But once we agree upon whether
> this is the right thing to do, I will merge it into 0001.
>
> 0003
> 
> When reporting memory allocated, a confusion may arise as to whether
> to report the "net" memory allocated between start and end of planner
> OR only the memory that remains allocated after end. This confusion
> can be avoided by using an exclusive memory context (child of
> CurrentMemoryContext) for planning. That's what 0003 implements as
> suggested by David. As mentioned in one of the comments in the patch,
> in order to measure memory allocated accurately the new MemoryContext
> has to be of the same type as the memory context that will be used
> otherwise by the pla

Re: Sync scan & regression tests

2023-09-04 Thread Tom Lane

Heikki Linnakangas  writes:
> With shared_buffers='20MB', the tests passed. I'm going to change it 
> back to 10MB now, so that we continue to cover that case.

So chipmunk is getting through the core tests now, but instead it
is failing in contrib/pg_visibility [1]:

diff -U3 
/home/pgbfarm/buildroot/HEAD/pgsql.build/contrib/pg_visibility/expected/pg_visibility.out

/home/pgbfarm/buildroot/HEAD/pgsql.build/contrib/pg_visibility/results/pg_visibility.out
--- 
/home/pgbfarm/buildroot/HEAD/pgsql.build/contrib/pg_visibility/expected/pg_visibility.out
   2022-10-08 19:00:15.905074105 +0300
+++ 
/home/pgbfarm/buildroot/HEAD/pgsql.build/contrib/pg_visibility/results/pg_visibility.out
2023-09-02 00:25:51.814148116 +0300
@@ -218,7 +218,8 @@
  0 | t   | t
  1 | t   | t
  2 | t   | t
-(3 rows)
+ 3 | f   | f
+(4 rows)

 select * from pg_check_frozen('copyfreeze');
  t_ctid 

I find this easily reproducible by setting shared_buffers=10MB.
But I'm confused about why, because the affected test case
dates to Tomas' commit 7db0cd214 of 2021-01-17, and chipmunk
passed many times after that.  Might be worth bisecting in
the interval where chipmunk wasn't reporting?

regards, tom lane

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunk&dt=2023-09-01%2015%3A15%3A56

RE: Subscription statistics are not dropped at DROP SUBSCRIPTION in some cases

2023-09-04 Thread Zhijie Hou (Fujitsu)

On Monday, September 4, 2023 10:42 PM Masahiko Sawada  
wrote:

Hi,

> On Mon, Sep 4, 2023 at 9:38 PM Zhijie Hou (Fujitsu) 
> wrote:
> >
> > On Wednesday, July 5, 2023 2:53 PM Masahiko Sawada
>  wrote:
> >
> > > Thanks for reviewing the patch. Pushed.
> >
> > My colleague Vignesh noticed that the newly added test cases were failing in
> BF animal sungazer[1].
> 
> Thank you for reporting!
> 
> >
> > The test failed to drop the slot which is active on publisher.
> >
> > --error-log--
> > This failure is because pg_drop_replication_slot fails with "replication 
> > slot
> "test_tab2_sub" is active for PID 55771638":
> > 2023-09-02 09:00:04.806 UTC [12910732:4] 026_stats.pl LOG:  statement:
> > SELECT pg_drop_replication_slot('test_tab2_sub')
> > 2023-09-02 09:00:04.807 UTC [12910732:5] 026_stats.pl ERROR:
> > replication slot "test_tab2_sub" is active for PID 55771638
> > 2023-09-02 09:00:04.807 UTC [12910732:6] 026_stats.pl STATEMENT:
> > SELECT pg_drop_replication_slot('test_tab2_sub')
> > -
> >
> > I the reason is that the test DISABLEd the subscription before
> > dropping the slot, while "ALTER SUBSCRIPTION DISABLE" doesn't wait for
> > the walsender to release the slot, so it's possible that the walsender
> > is still alive when calling
> > pg_drop_replication_slot() to drop the slot on
> > publisher(pg_drop_xxxslot() doesn't wait for slot to be released).
> 
> I agree with your analysis.
> 
> >
> > I think we can wait for the slot to become inactive before dropping like:
> > $node_primary->poll_query_until('otherdb',
> > "SELECT NOT EXISTS (SELECT 1 FROM pg_replication_slots
> WHERE active_pid IS NOT NULL)"
> > )
> >
> 
> I prefer this approach but it would be better to specify the slot name in the
> query.
> 
> > Or we can just don't drop the slot as it’s the last testcase.
> 
> Since we might add other tests after that in the future, I think it's better 
> to drop
> the replication slot (and subscription).
> 
> >
> > Another thing might be worth considering is we can add one parameter
> > in
> > pg_drop_replication_slot() to let user control whether to wait or not,
> > and the test can be fixed as well by passing nowait=false to the func.
> 
> While it could be useful in general we cannot use this approach for this issue
> since it cannot be backpatched to older branches. We might want to discuss it
> on a new thread.
> 
> I've attached a draft patch to fix this issue. Feedback is very welcome.

Thanks, it looks good to me.

Best Regards,
Hou zj

Re: pg_upgrade fails with in-place tablespace[

2023-09-04 Thread Rui Zhao

Thank you for your response. It is evident that there is a need
for this features in our system.
Firstly, our customers express their desire to utilize tablespaces
for table management, without necessarily being concerned about
the directory location of these tablespaces.
Secondly, currently PG only supports absolute-path tablespaces, but
in-place tablespaces are very likely to become popular in the future.
Therefore, it is essential to incorporate support for in-place
tablespaces in the pg_upgrade feature. I intend to implement
this functionality in our system to accommodate our customers'
requirements.
It would be highly appreciated if the official PG could also
incorporate support for this feature.
--
Best regards,
Rui Zhao

RE: persist logical slots to disk during shutdown checkpoint

2023-09-04 Thread Zhijie Hou (Fujitsu)

On Monday, September 4, 2023 6:15 PM vignesh C  wrote:
> 
> On Mon, 4 Sept 2023 at 15:20, Amit Kapila  wrote:
> >
> > On Fri, Sep 1, 2023 at 10:50 AM vignesh C  wrote:
> > >
> > > On Fri, 1 Sept 2023 at 10:06, Amit Kapila  wrote:
> > > >
> > > > On Thu, Aug 31, 2023 at 6:12 PM Ashutosh Bapat
> > > >  wrote:
> > > > >
> > > > > On Thu, Aug 31, 2023 at 2:52 PM Amit Kapila
>  wrote:
> > > > > >
> > > > > > All but one. Normally, the idea of marking dirty is to
> > > > > > indicate that we will actually write/flush the contents at a
> > > > > > later point (except when required for correctness) as even
> > > > > > indicated in the comments atop ReplicatioinSlotMarkDirty().
> > > > > > However, I see your point that we use that protocol at all the 
> > > > > > current
> places including CreateSlotOnDisk().
> > > > > > So, we can probably do it here as well.
> > > > >
> > > > > yes
> > > > >
> > > >
> > > > I think we should also ensure that slots are not invalidated
> > > > (slot.data.invalidated != RS_INVAL_NONE) before marking them dirty
> > > > because we don't allow decoding from such slots, so we shouldn't
> > > > include those.
> > >
> > > Added this check.
> > >
> > > Apart from this I have also fixed the following issues that were
> > > agreed on: a) Setting slots to dirty in CheckPointReplicationSlots
> > > instead of setting it in SaveSlotToPath
> > >
> >
> > + if (is_shutdown && SlotIsLogical(s)) { SpinLockAcquire(&s->mutex);
> > + if (s->data.invalidated == RS_INVAL_NONE &&
> > + s->data.confirmed_flush != s->last_saved_confirmed_flush) dirty =
> > + s->true;
> >
> > I think it is better to use ReplicationSlotMarkDirty() as that would
> > be consistent with all other usages.
> 
> ReplicationSlotMarkDirty works only on MyReplicationSlot whereas
> CheckpointReplicationSlots loops through all the slots and marks the
> appropriate slot as dirty, we might have to change ReplicationSlotMarkDirty to
> take the slot as input parameter and all caller should pass MyReplicationSlot.

Personally, I feel if we want to centralize the code of marking dirty into a
function, we can introduce a new static function MarkSlotDirty(slot) to mark
passed slot dirty and let ReplicationSlotMarkDirty and
CheckpointReplicationSlots call it. Like:

void
ReplicationSlotMarkDirty(void)
{
MarkSlotDirty(MyReplicationSlot);
}

+static void
+MarkSlotDirty(ReplicationSlot *slot)
+{
+   Assert(slot != NULL);
+
+   SpinLockAcquire(&slot->mutex);
+   slot->just_dirtied = true;
+   slot->dirty = true;
+   SpinLockRelease(&slot->mutex);
+}

This is somewhat similar to the relation between ReplicationSlotSave(serialize
my backend's replications slot) and SaveSlotToPath(save the passed slot).

> Another thing is we have already taken spin lock to access
> last_confirmed_flush_lsn from CheckpointReplicationSlots, we could set dirty
> flag here itself, else we will have to release the lock and call
> ReplicationSlotMarkDirty which will take lock again.

Yes, this is unavoidable, but maybe it's not a big problem as
we only do it at shutdown.

> Instead shall we set just_dirtied also in CheckpointReplicationSlots?
> Thoughts?

I agree we'd better set just_dirtied to true to ensure we will serialize slot
info here, because if some other processes just serialized the slot, the dirty
flag will be reset to false if we don't set just_dirtied to true in
CheckpointReplicationSlots(), this race condition may not exists for now, but
seems better to completely forbid it by setting just_dirtied.

Best Regards,
Hou zj

Re: pg_upgrade fails with in-place tablespace

2023-09-04 Thread Rui Zhao

Thank you for your response. It is evident that there is a need
for this features in our system.
Firstly, our customers express their desire to utilize tablespaces
for table management, without necessarily being concerned about
the directory location of these tablespaces.
Secondly, currently PG only supports absolute-path tablespaces, but
in-place tablespaces are very likely to become popular in the future.
Therefore, it is essential to incorporate support for in-place
tablespaces in the pg_upgrade feature. I intend to implement
this functionality in our system to accommodate our customers'
requirements.
It would be highly appreciated if the official PG could also
incorporate support for this feature.
--
Best regards,
Rui Zhao
--
From:Michael Paquier 
Sent At:2023 Sep. 1 (Fri.) 12:58
To:Mark 
Cc:pgsql-hackers 
Subject:Re: pg_upgrade fails with in-place tablespace[
On Sat, Aug 19, 2023 at 08:11:28PM +0800, Rui Zhao wrote:
> Please refer to the TAP test I have included for a better understanding
> of my suggestion.
Sure, but it seems to me that my original question is not really
answered: what's your use case for being able to support in-place
tablespaces in pg_upgrade? The original use case being such
tablespaces is to ease the introduction of tests with primaries and
standbys, which is not something that really applies to pg_upgrade,
no? Perhaps there is meaning in having more TAP tests with
tablespaces and a combination of primary/standbys, still having
in-place tablespaces does not really make much sense to me because, as
these are in the data folder, we don't use them to test the path
re-creation logic.
I think that we should just add a routine in check.c that scans
pg_tablespace, reporting all the non-absolute paths found with their
associated tablespace names.
--
Michael

Re: Cleaning up array_in()

2023-09-04 Thread jian he

On Mon, Sep 4, 2023 at 8:00 AM jian he  wrote:
>
> hi.
> attached v4.
> v4, 0001 to 0005 is the same as v3 in
> https://www.postgresql.org/message-id/5859ce4e-2be4-92b0-c85c-e1e24eab57c6%40iki.fi
>
> v4-0006 doing some modifications to address the corner case mentioned
> in the previous thread (like select '{{1,},{1},}'::text[]).
> also fixed all these FIXME, Heikki mentioned in the code.
>
> v4-0007 refactor ReadDimensionInt. to make the array dimension bound
> variables within the INT_MIN and INT_MAX. so it will make select
> '[21474836488:21474836489]={1,2}'::int[]; fail.


attached, same as v4, but delete unused variable {nitems_according_to_dims}.
From ca35ce0cba131e6cb65147d4f4dd6ea4277bcf15 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Sat, 8 Jul 2023 22:19:48 +0300
Subject: [PATCH v4 5/7] Determine array dimensions and parse the elements in
 one pass.

The logic in ArrayCount() and ReadArrayStr() was hard to follow,
because they mixed low-level escaping and quoting with tracking the
curly braces. Introduce a ReadArrayToken() function that tokenizes the
input, handling quoting and escaping.

Instead of having separate ArrayCount() and ReadArrayStr() functions, move
the logic for determining and checking the structure of the dimensions
to ReadArrayStr(). ReadArrayStr() now determines the dimensions and
parses the elements in one pass.

ReadArrayStr() no longer scribbles on the input. Instead, it allocates
a separate scratch buffer that it uses to hold the string representation
of each element. The scratch buffer must be as large as the input string,
so this doesn't save memory, but palloc(strlen(x)) is cheaper than
pstrdup(x).
---
 src/backend/utils/adt/arrayfuncs.c   | 1026 +++---
 src/test/regress/expected/arrays.out |2 +-
 src/tools/pgindent/typedefs.list |2 +-
 3 files changed, 431 insertions(+), 599 deletions(-)

diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index 7a63db09..19d7af6d 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -53,18 +53,6 @@ bool		Array_nulls = true;
 			PG_FREE_IF_COPY(array, n); \
 	} while (0)
 
-typedef enum
-{
-	ARRAY_NO_LEVEL,
-	ARRAY_LEVEL_STARTED,
-	ARRAY_ELEM_STARTED,
-	ARRAY_QUOTED_ELEM_STARTED,
-	ARRAY_QUOTED_ELEM_COMPLETED,
-	ARRAY_ELEM_DELIMITED,
-	ARRAY_LEVEL_COMPLETED,
-	ARRAY_LEVEL_DELIMITED
-} ArrayParseState;
-
 /* Working state for array_iterate() */
 typedef struct ArrayIteratorData
 {
@@ -89,18 +77,30 @@ typedef struct ArrayIteratorData
 	int			current_item;	/* the item # we're at in the array */
 }			ArrayIteratorData;
 
+/* ReadArrayToken return type */
+typedef enum
+{
+	ATOK_LEVEL_START,
+	ATOK_LEVEL_END,
+	ATOK_DELIM,
+	ATOK_ELEM,
+	ATOK_ELEM_NULL,
+	ATOK_ERROR,
+} ArrayToken;
+
 static bool ReadDimensionInt(char **srcptr, int *result, const char *origStr, Node *escontext);
 static bool ReadArrayDimensions(char **srcptr, int *ndim, int *dim, int *lBound,
 const char *origStr, Node *escontext);
-static int	ArrayCount(const char *str, int *dim, char typdelim,
-	   Node *escontext);
-static bool ReadArrayStr(char *arrayStr, const char *origStr,
-		 int nitems,
+static bool ReadArrayStr(char **srcptr,
 		 FmgrInfo *inputproc, Oid typioparam, int32 typmod,
 		 char typdelim,
 		 int typlen, bool typbyval, char typalign,
-		 Datum *values, bool *nulls,
-		 bool *hasnulls, int32 *nbytes, Node *escontext);
+		 int *ndim, int *dim,
+		 int *nitems,
+		 Datum **values, bool **nulls,
+		 const char *origStr, Node *escontext);
+static ArrayToken ReadArrayToken(char **srcptr, char *elembuf, char typdelim,
+ const char *origStr, Node *escontext);
 static void ReadArrayBinary(StringInfo buf, int nitems,
 			FmgrInfo *receiveproc, Oid typioparam, int32 typmod,
 			int typlen, bool typbyval, char typalign,
@@ -186,12 +186,11 @@ array_in(PG_FUNCTION_ARGS)
 	char		typalign;
 	char		typdelim;
 	Oid			typioparam;
-	char	   *string_save,
-			   *p;
-	int			i,
-nitems;
-	Datum	   *dataPtr;
-	bool	   *nullsPtr;
+	char	   *p;
+	int			nitems;
+	int			nitems_according_to_dims;
+	Datum	   *values;
+	bool	   *nulls;
 	bool		hasnulls;
 	int32		nbytes;
 	int32		dataoffset;
@@ -234,15 +233,13 @@ array_in(PG_FUNCTION_ARGS)
 	typdelim = my_extra->typdelim;
 	typioparam = my_extra->typioparam;
 
-	/* Make a modifiable copy of the input */
-	string_save = pstrdup(string);
-
 	/*
+	 * Start processing the input string.
+	 *
 	 * If the input string starts with dimension info, read and use that.
-	 * Otherwise, we require the input to be in curly-brace style, and we
-	 * prescan the input to determine dimensions.
+	 * Otherwise, we determine the dimensions from the in curly-braces.
 	 */
-	p = string_save;
+	p = string;
 	if (!ReadArrayDimensions(&p, &ndim, dim, lBound, string, escontext))
 		return (Datum) 0;
 
@@ -254,17 +251,11 @@ array_in(PG_FUNCTION_ARGS)
 	(errcode(ERRCODE_INVALID_T

Re: brininsert optimization opportunity

2023-09-04 Thread Soumyadeep Chakraborty

Rebased against latest HEAD.

Regards,
Soumyadeep (VMware)
From 94a8acd3125aa4a613c238e435ed78dba9f40625 Mon Sep 17 00:00:00 2001
From: Soumyadeep Chakraborty 
Date: Sat, 29 Jul 2023 09:17:49 -0700
Subject: [PATCH v5 1/1] Reuse revmap and brin desc in brininsert

brininsert() used to have code that performed per-tuple initialization
of the revmap. That had some overhead.

To mitigate, we introduce a new AM callback to clean up state at the end
of all inserts, and perform the revmap cleanup there.
---
 contrib/bloom/blutils.c  |  1 +
 doc/src/sgml/indexam.sgml| 14 +-
 src/backend/access/brin/brin.c   | 74 +++-
 src/backend/access/gin/ginutil.c |  1 +
 src/backend/access/gist/gist.c   |  1 +
 src/backend/access/hash/hash.c   |  1 +
 src/backend/access/index/indexam.c   | 15 ++
 src/backend/access/nbtree/nbtree.c   |  1 +
 src/backend/access/spgist/spgutils.c |  1 +
 src/backend/executor/execIndexing.c  |  5 ++
 src/include/access/amapi.h   |  3 ++
 src/include/access/brin_internal.h   |  1 +
 src/include/access/genam.h   |  2 +
 13 files changed, 108 insertions(+), 12 deletions(-)

diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
index f23fbb1d9e0..4830cb3fee6 100644
--- a/contrib/bloom/blutils.c
+++ b/contrib/bloom/blutils.c
@@ -131,6 +131,7 @@ blhandler(PG_FUNCTION_ARGS)
 	amroutine->ambuild = blbuild;
 	amroutine->ambuildempty = blbuildempty;
 	amroutine->aminsert = blinsert;
+	amroutine->aminsertcleanup = NULL;
 	amroutine->ambulkdelete = blbulkdelete;
 	amroutine->amvacuumcleanup = blvacuumcleanup;
 	amroutine->amcanreturn = NULL;
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index e813e2b620a..17f7d6597f1 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -139,6 +139,7 @@ typedef struct IndexAmRoutine
 ambuild_function ambuild;
 ambuildempty_function ambuildempty;
 aminsert_function aminsert;
+aminsertcleanup_function aminsertcleanup;
 ambulkdelete_function ambulkdelete;
 amvacuumcleanup_function amvacuumcleanup;
 amcanreturn_function amcanreturn;   /* can be NULL */
@@ -355,11 +356,22 @@ aminsert (Relation indexRelation,
within an SQL statement, it can allocate space
in indexInfo->ii_Context and store a pointer to the
data in indexInfo->ii_AmCache (which will be NULL
-   initially).
+   initially). If the data cached contains structures that can be simply pfree'd,
+   they will implicitly be pfree'd. But, if it contains more complex data, such
+   as Buffers or TupleDescs, additional cleanup is necessary. This additional
+   cleanup can be performed in indexinsertcleanup.
   
 
   
 
+bool
+aminsertcleanup (IndexInfo *indexInfo);
+
+   Clean up state that was maintained across successive inserts in
+   indexInfo->ii_AmCache. This is useful if the data
+   contained is complex - like Buffers or TupleDescs which need additional
+   cleanup, unlike simple structures that will be implicitly pfree'd.
+
 IndexBulkDeleteResult *
 ambulkdelete (IndexVacuumInfo *info,
   IndexBulkDeleteResult *stats,
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index d4fec654bb6..76927beb0ec 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -58,6 +58,17 @@ typedef struct BrinBuildState
 	BrinMemTuple *bs_dtuple;
 } BrinBuildState;
 
+/*
+ * We use a BrinInsertState to capture running state spanning multiple
+ * brininsert invocations, within the same command.
+ */
+typedef struct BrinInsertState
+{
+	BrinRevmap *bs_rmAccess;
+	BrinDesc   *bs_desc;
+	BlockNumber bs_pages_per_range;
+}			BrinInsertState;
+
 /*
  * Struct used as "opaque" during index scans
  */
@@ -72,6 +83,7 @@ typedef struct BrinOpaque
 
 static BrinBuildState *initialize_brin_buildstate(Relation idxRel,
   BrinRevmap *revmap, BlockNumber pagesPerRange);
+static BrinInsertState *initialize_brin_insertstate(Relation idxRel, IndexInfo *indexInfo);
 static void terminate_brin_buildstate(BrinBuildState *state);
 static void brinsummarize(Relation index, Relation heapRel, BlockNumber pageRange,
 		  bool include_partial, double *numSummarized, double *numExisting);
@@ -117,6 +129,7 @@ brinhandler(PG_FUNCTION_ARGS)
 	amroutine->ambuild = brinbuild;
 	amroutine->ambuildempty = brinbuildempty;
 	amroutine->aminsert = brininsert;
+	amroutine->aminsertcleanup = brininsertcleanup;
 	amroutine->ambulkdelete = brinbulkdelete;
 	amroutine->amvacuumcleanup = brinvacuumcleanup;
 	amroutine->amcanreturn = NULL;
@@ -140,6 +153,28 @@ brinhandler(PG_FUNCTION_ARGS)
 	PG_RETURN_POINTER(amroutine);
 }
 
+/*
+ * Initialize a BrinInsertState to maintain state to be used across multiple
+ * tuple inserts, within the same command.
+ */
+static BrinInsertState *
+initialize_brin_insertstate(Relation idxRel, IndexInfo *indexInfo)
+{
+	BrinInsertState *bistate;
+	MemoryContext oldcxt;
+
+	oldcxt = MemoryContextSwit

Re: Inefficiency in parallel pg_restore with many tables

2023-09-04 Thread Nathan Bossart

On Sat, Sep 02, 2023 at 11:55:21AM -0700, Nathan Bossart wrote:
> I ended up hacking together a (nowhere near committable) patch to see how
> hard it would be to allow using any type with binaryheap.  It doesn't seem
> too bad.

I spent some more time on this patch and made the relevant adjustments to
the rest of the set.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 3a51f25ae712b792bdde90b133c09b0b940f4198 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Mon, 4 Sep 2023 15:04:40 -0700
Subject: [PATCH v7 1/5] Allow binaryheap to use any type.

We would like to make binaryheap available to frontend code in a
future commit, but presently the implementation uses Datum for the
node type, which is inaccessible in the frontend.  This commit
modifies the implementation to allow using any type for the nodes.
binaryheap_allocate() now requires callers to specify the size of
the node, and several of the other functions now use void pointers.
---
 src/backend/executor/nodeGatherMerge.c| 19 ++--
 src/backend/executor/nodeMergeAppend.c| 22 ++---
 src/backend/lib/binaryheap.c  | 99 +++
 src/backend/postmaster/pgarch.c   | 36 ---
 .../replication/logical/reorderbuffer.c   | 21 ++--
 src/backend/storage/buffer/bufmgr.c   | 20 ++--
 src/include/lib/binaryheap.h  | 21 ++--
 7 files changed, 137 insertions(+), 101 deletions(-)

diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 9d5e1a46e9..f76406b575 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -52,7 +52,7 @@ typedef struct GMReaderTupleBuffer
 } GMReaderTupleBuffer;
 
 static TupleTableSlot *ExecGatherMerge(PlanState *pstate);
-static int32 heap_compare_slots(Datum a, Datum b, void *arg);
+static int32 heap_compare_slots(const void *a, const void *b, void *arg);
 static TupleTableSlot *gather_merge_getnext(GatherMergeState *gm_state);
 static MinimalTuple gm_readnext_tuple(GatherMergeState *gm_state, int nreader,
 	  bool nowait, bool *done);
@@ -429,6 +429,7 @@ gather_merge_setup(GatherMergeState *gm_state)
 
 	/* Allocate the resources for the merge */
 	gm_state->gm_heap = binaryheap_allocate(nreaders + 1,
+			sizeof(int),
 			heap_compare_slots,
 			gm_state);
 }
@@ -489,7 +490,7 @@ reread:
 /* Don't have a tuple yet, try to get one */
 if (gather_merge_readnext(gm_state, i, nowait))
 	binaryheap_add_unordered(gm_state->gm_heap,
-			 Int32GetDatum(i));
+			 &i);
 			}
 			else
 			{
@@ -565,14 +566,14 @@ gather_merge_getnext(GatherMergeState *gm_state)
 		 * the heap, because it might now compare differently against the
 		 * other elements of the heap.
 		 */
-		i = DatumGetInt32(binaryheap_first(gm_state->gm_heap));
+		binaryheap_first(gm_state->gm_heap, &i);
 
 		if (gather_merge_readnext(gm_state, i, false))
-			binaryheap_replace_first(gm_state->gm_heap, Int32GetDatum(i));
+			binaryheap_replace_first(gm_state->gm_heap, &i);
 		else
 		{
 			/* reader exhausted, remove it from heap */
-			(void) binaryheap_remove_first(gm_state->gm_heap);
+			binaryheap_remove_first(gm_state->gm_heap, NULL);
 		}
 	}
 
@@ -585,7 +586,7 @@ gather_merge_getnext(GatherMergeState *gm_state)
 	else
 	{
 		/* Return next tuple from whichever participant has the leading one */
-		i = DatumGetInt32(binaryheap_first(gm_state->gm_heap));
+		binaryheap_first(gm_state->gm_heap, &i);
 		return gm_state->gm_slots[i];
 	}
 }
@@ -750,11 +751,11 @@ typedef int32 SlotNumber;
  * Compare the tuples in the two given slots.
  */
 static int32
-heap_compare_slots(Datum a, Datum b, void *arg)
+heap_compare_slots(const void *a, const void *b, void *arg)
 {
 	GatherMergeState *node = (GatherMergeState *) arg;
-	SlotNumber	slot1 = DatumGetInt32(a);
-	SlotNumber	slot2 = DatumGetInt32(b);
+	SlotNumber	slot1 = *((const int *) a);
+	SlotNumber	slot2 = *((const int *) b);
 
 	TupleTableSlot *s1 = node->gm_slots[slot1];
 	TupleTableSlot *s2 = node->gm_slots[slot2];
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 21b5726e6e..e0ace294a0 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -52,7 +52,7 @@
 typedef int32 SlotNumber;
 
 static TupleTableSlot *ExecMergeAppend(PlanState *pstate);
-static int	heap_compare_slots(Datum a, Datum b, void *arg);
+static int	heap_compare_slots(const void *a, const void *b, void *arg);
 
 
 /* 
@@ -125,8 +125,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
 	mergestate->ms_nplans = nplans;
 
 	mergestate->ms_slots = (TupleTableSlot **) palloc0(sizeof(TupleTableSlot *) * nplans);
-	mergestate->ms_heap = binaryheap_allocate(nplans, heap_compare_slots,
-			  mergestate);
+	mergestate->ms_heap = binaryheap_allocate

Re: backtrace_functions emits trace for any elog

2023-09-04 Thread Noah Misch

On Mon, Sep 04, 2023 at 09:30:32PM +0100, Ilya Gladyshev wrote:
> I used backtrace_functions to debug one of my ideas and found its behavior 
> counter-intuitive and contradictory to it own docs. I think the GUC is 
> supposed to be used to dump backtrace only on elog(ERROR) (should it also be 
> done for higher levels? not sure about this), but, in fact, it does that for 
> any log-level. I have attached a patch that checks log-level before attaching 
> backtrace.

This would make the feature much less useful.  Better to change the docs.

Re: Create shorthand for including all extra tests

2023-09-04 Thread Noah Misch

On Mon, Sep 04, 2023 at 04:30:31PM -0400, Tom Lane wrote:
> Noah Misch  writes:
> > On Mon, Sep 04, 2023 at 08:16:44PM +0200, Daniel Gustafsson wrote:
> >> On 4 Sep 2023, at 17:01, Tom Lane  wrote:
> >>> I think this is a seriously bad idea.  The entire point of not including
> >>> certain tests in check-world by default is that the omitted tests are
> >>> security hazards, so a developer or buildfarm owner should review each
> >>> one before deciding whether to activate it on their machine.
> 
> > Other than PG_TEST_EXTRA=wal_consistency_checking, they have the same 
> > hazard:
> > they treat the loopback interface as private, so anyone with access to
> > loopback interface ports can hijack the test.  I'd be fine with e.g.
> > PG_TEST_EXTRA=private-lo activating all of those.  We don't gain by inviting
> > the tester to review the tests to rediscover this common factor.
> 
> Yeah, I could live with something like that from the security standpoint.
> Not sure if it helps Nazir's use-case though.  Maybe we could invent
> categories that can be used in place of individual test names?
> For now,
> 
>   PG_TEST_EXTRA="needs-private-lo slow"
> 
> would cover the territory of "all", and I think it'd be very seldom
> that we'd have to invent new categories here (though maybe I lack
> imagination today).

I could imagine categories for filesystem bytes and RAM bytes.  Also, while
needs-private-lo has a bounded definition, "slow" doesn't.  If today's one
"slow" test increases check-world duration by 1.1x, we may not let a
100x-increase test use the same keyword.

If one introduced needs-private-lo, the present spelling of "all" would be
"needs-private-lo wal_consistency_checking".  Looks okay to me.  Doing nothing
here wouldn't be ruinous, of course.

Re: information_schema and not-null constraints

2023-09-04 Thread Tom Lane

Alvaro Herrera  writes:
> In 0002, I took the tests added by Peter's proposed patch and put them
> in a separate test file that runs at the end.  There are some issues,
> however.  One is that the ORDER BY clause in the check_constraints view
> is not fully deterministic, because the table name is not part of the
> view definition, so we cannot sort by table name.

I object very very strongly to this proposed test method.  It
completely undoes the work I did in v15 (cc50080a8 and related)
to make the core regression test scripts mostly independent of each
other.  Even without considering the use-case of running a subset of
the tests, the new test's expected output will constantly be needing
updates as side effects of unrelated changes.

regards, tom lane

backtrace_functions emits trace for any elog

2023-09-04 Thread Ilya Gladyshev

Hi,
I used backtrace_functions to debug one of my ideas and found its behavior 
counter-intuitive and contradictory to it own docs. I think the GUC is supposed 
to be used to dump backtrace only on elog(ERROR) (should it also be done for 
higher levels? not sure about this), but, in fact, it does that for any 
log-level. I have attached a patch that checks log-level before attaching 
backtrace.

Regards,
Ilya


0001-check-elevel-before-attaching-backtrace.patch
Description: Binary data

Re: Create shorthand for including all extra tests

2023-09-04 Thread Tom Lane

Noah Misch  writes:
> On Mon, Sep 04, 2023 at 08:16:44PM +0200, Daniel Gustafsson wrote:
>> On 4 Sep 2023, at 17:01, Tom Lane  wrote:
>>> I think this is a seriously bad idea.  The entire point of not including
>>> certain tests in check-world by default is that the omitted tests are
>>> security hazards, so a developer or buildfarm owner should review each
>>> one before deciding whether to activate it on their machine.

> Other than PG_TEST_EXTRA=wal_consistency_checking, they have the same hazard:
> they treat the loopback interface as private, so anyone with access to
> loopback interface ports can hijack the test.  I'd be fine with e.g.
> PG_TEST_EXTRA=private-lo activating all of those.  We don't gain by inviting
> the tester to review the tests to rediscover this common factor.

Yeah, I could live with something like that from the security standpoint.
Not sure if it helps Nazir's use-case though.  Maybe we could invent
categories that can be used in place of individual test names?
For now,

PG_TEST_EXTRA="needs-private-lo slow"

would cover the territory of "all", and I think it'd be very seldom
that we'd have to invent new categories here (though maybe I lack
imagination today).

regards, tom lane

Re: Show various offset arrays for heap WAL records

2023-09-04 Thread Melanie Plageman

On Mon, Jul 10, 2023 at 3:44 AM Heikki Linnakangas  wrote:
> I'm late to the party, but regarding commit c03c2eae0a, which added the
> guidelines for writing formatting desc functions:
>
> You moved the comment from rmgrdesc_utils.c into rmgrdesc_utils.h, but I
> don't think that was a good idea. Our usual convention is to have the
> function comment in the .c file, not at the declaration in the header
> file. When I want to know what a function does, I jump to the .c file,
> and might miss the comment in the header entirely.
>
> Let's add a src/backend/access/rmgrdesc/README file. We don't currently
> have any explanation anywhere why the rmgr desc functions are in a
> separate directory. The README would be a good place to explain that,
> and to have the formatting guidelines. See attached.

diff --git a/src/backend/access/rmgrdesc/README
b/src/backend/access/rmgrdesc/README
new file mode 100644
index 00..abe84b9f11
--- /dev/null
+++ b/src/backend/access/rmgrdesc/README
@@ -0,0 +1,60 @@
+src/backend/access/rmgrdesc/README
+
+WAL resource manager description functions
+==
+
+For debugging purposes, there is a "description function", or rmgrdesc
+function, for each WAL resource manager. The rmgrdesc function parses the WAL
+record and prints the contents of the WAL record in a somewhat human-readable
+format.
+
+The rmgrdesc functions for all resource managers are gathered in this
+directory, because they are also used in the stand-alone pg_waldump program.

"standalone" seems the more common spelling of this adjective in the
codebase today.

+They could potentially be used by out-of-tree debugging tools too, although
+the the functions or the output format should not be considered a stable API.

You have an extra "the".

I might phrase the last bit as "neither the description functions nor
the output format should be considered part of a stable API"

+Guidelines for rmgrdesc output format
+=

I noticed you used === for both headings and wondered if it was
intentional. Other READMEs I looked at in src/backend/access tend to
have a single heading underlined with  and then subsequent
headings are underlined with -. I could see an argument either way
here, but I just thought I would bring it up in case it was not a
conscious choice.

Otherwise, LGTM.

- Melanie

Re: Commitfest 2023-09 starts soon

2023-09-04 Thread Melanie Plageman

On Mon, Sep 4, 2023 at 1:01 PM Peter Eisentraut  wrote:
>
> I have done several passes to make sure that patch statuses are more
> accurate.  As explained in a nearby message, I have set several patches
> back from "Ready to Committer" to "Needs review" if additional
> discussion happened past the first status change.  I have also in
> several cases removed reviewers from a patch entry if they haven't been
> active on the thread in several months.  (Feel free to sign up again,
> but don't "block" the patch.)

I had originally planned to write thread summaries for all status="Needs
review" patches in the commitfest. However, ISTM these summaries are
largely useful for reaching consensus on changing the status of these
patches (setting them to "waiting on author", "returned with feedback",
etc). Since Peter has already updated all of the statuses, I've decided
not to write summaries and instead spend that time doing review.

However, I thought it might be useful to provide a list of all of the
"Needs review" entries which, at the time of writing, have had zero
reviews or replies (except from the author):

Document efficient self-joins / UPDATE LIMIT techniques.
https://commitfest.postgresql.org/44/4539/

pg_basebackup: Always return valid temporary slot names
https://commitfest.postgresql.org/44/4534/

pg_resetwal tests, logging, and docs update
https://commitfest.postgresql.org/44/4533/

Streaming I/O, vectored I/O
https://commitfest.postgresql.org/44/4532/

Improving the heapgetpage function improves performance in common scenarios
https://commitfest.postgresql.org/44/4524/

CI speed improvements for FreeBSD
https://commitfest.postgresql.org/44/4520/

Implementation of distinct in Window Aggregates: take two
https://commitfest.postgresql.org/44/4519/

Improve pg_restore toc file parsing and format for better performances
https://commitfest.postgresql.org/44/4509/

Fix false report of wraparound in pg_serial
https://commitfest.postgresql.org/44/4516/

Simplify create_merge_append_path a bit for clarity
https://commitfest.postgresql.org/44/4496/

Fix bogus Asserts in calc_non_nestloop_required_outer
https://commitfest.postgresql.org/44/4478/

Retiring is_pushed_down
https://commitfest.postgresql.org/44/4458/

Flush disk write caches by default on macOS and Windows
https://commitfest.postgresql.org/44/4453/

Add last_commit_lsn to pg_stat_database
https://commitfest.postgresql.org/44/4355/

Optimizing "boundary cases" during backward scan B-Tree index descents
https://commitfest.postgresql.org/44/4380/

XLog size reductions: Reduced XLog record header size
https://commitfest.postgresql.org/44/4386/

Unified file access using virtual file descriptors
https://commitfest.postgresql.org/44/4420/

Optimise index range scan by performing first check with upper page boundary
https://commitfest.postgresql.org/44/4434/

Revises for the check of parameterized partial paths
https://commitfest.postgresql.org/44/4425/

Opportunistically pruning page before update
https://commitfest.postgresql.org/44/4384/

Checks in RegisterBackgroundWorker()
https://commitfest.postgresql.org/44/4514/

Allow direct lookups of SpecialJoinInfo by ojrelid
https://commitfest.postgresql.org/44/4313/

Parent/child context relation in pg_get_backend_memory_contexts()
https://commitfest.postgresql.org/44/4379/

Support Right Semi Join
https://commitfest.postgresql.org/44/4284/

bug: ANALYZE progress report with inheritance tables
https://commitfest.postgresql.org/44/4144/

archive modules loose ends
https://commitfest.postgresql.org/44/4192/

- Melanie

Re: Create shorthand for including all extra tests

2023-09-04 Thread Noah Misch

On Mon, Sep 04, 2023 at 08:16:44PM +0200, Daniel Gustafsson wrote:
> > On 4 Sep 2023, at 17:01, Tom Lane  wrote:
> > Nazir Bilal Yavuz  writes:
> >> I created an 'all' option for PG_TEST_EXTRA to enable all test suites
> >> defined under PG_TEST_EXTRA.
> > 
> > I think this is a seriously bad idea.  The entire point of not including
> > certain tests in check-world by default is that the omitted tests are
> > security hazards, so a developer or buildfarm owner should review each
> > one before deciding whether to activate it on their machine.
> 
> I dunno, I've certainly managed to not run the tests I hoped to multiple 
> times.
> I think it could be useful for sandboxed testrunners which are destroyed after
> each run. There is for sure a foot-gun angle to it, no question about that.

Other than PG_TEST_EXTRA=wal_consistency_checking, they have the same hazard:
they treat the loopback interface as private, so anyone with access to
loopback interface ports can hijack the test.  I'd be fine with e.g.
PG_TEST_EXTRA=private-lo activating all of those.  We don't gain by inviting
the tester to review the tests to rediscover this common factor.

Re: Create shorthand for including all extra tests

2023-09-04 Thread Daniel Gustafsson

> On 4 Sep 2023, at 17:01, Tom Lane  wrote:
> 
> Nazir Bilal Yavuz  writes:
>> I created an 'all' option for PG_TEST_EXTRA to enable all test suites
>> defined under PG_TEST_EXTRA.
> 
> I think this is a seriously bad idea.  The entire point of not including
> certain tests in check-world by default is that the omitted tests are
> security hazards, so a developer or buildfarm owner should review each
> one before deciding whether to activate it on their machine.

I dunno, I've certainly managed to not run the tests I hoped to multiple times.
I think it could be useful for sandboxed testrunners which are destroyed after
each run. There is for sure a foot-gun angle to it, no question about that.

--
Daniel Gustafsson

Re: Commitfest 2023-09 starts soon

2023-09-04 Thread Matthias van de Meent

On Mon, 4 Sept 2023 at 18:19, Aleksander Alekseev
 wrote:
>
> Hi Matthias,
>
> > I'm a bit confused about your use of "consensus". True, there was no
> > objection, but it looks like no patch author or reviewer was informed
> > (cc-ed or directly referenced) that the patch was being discussed
> > before achieving this "consensus", and the "consensus" was reached
> > within 4 days, of which 2 weekend, in a thread that has (until now)
> > involved only you and Peter E.
> >
> > Usually, you'd expect discussion about a patch to happen on the
> > patch's thread before any action is taken (or at least a mention on
> > that thread), but quite clearly that hasn't happened here.
> > Are patch authors expected to follow any and all discussion on threads
> > with "Commitfest" in the title?
> > If so, shouldn't the relevant wiki pages be updated, and/or the
> > -hackers community be updated by mail in a new thread about these
> > policy changes?
>
> I understand your disappointment and assure you that no one is acting
> with bad intentions here. Also please note that English is a second
> language for many of us which represents a challenge when it comes to
> expressing thoughts on the mailing list. We have a common goal here,
> to make PostgreSQL an even better system than it is now.
>
> The patches under question were in "Waiting for Author" state for a
> *long* time and the authors were notified about this. We could toss
> such patches from one CF to another month after month or mark as RwF
> and let the author know that no one is going to review that patch
> until the author takes the actions. It's been noted that the letter
> approach is more productive in the long run.

This far I agree - we can't keep patches around with issues if they're
not being worked on. And I do appreciate your work on pruning dead or
stale patches. But:

> The discussion can
> continue in the same thread and the same thread can be registered for
> the upcoming CF.

This is one of my major concerns here: Patch resolution is being
discussed on -hackers, but outside of the thread used to discuss that
patch (as indicated in the CF app), and without apparent author
inclusion.To me, that feels like going behind the author's back, and I
don't think that this should be normalized.

As mentioned in the earlier mail, my other concern is the use of
"consensus" in this context. You link to a message on -hackers, with
no visible agreements. As a patch author myself, if a lack of comments
on my patch in an otherwise unrelated thread is "consensus", then I'll
probably move all patches that have yet to be commented on to RfC, as
there'd be "consensus" that they should be included as-is in
PostgreSQL. But I digress.

I think it would be better to just remove the "consensus" part of your
mail, and just put down the real reason why each patch is being RfC-ed
or rejected. That is, don't imply that there are hackers that OK-ed it
when there are none, and inform patch authors directly about the
reasons why the patch is being revoked; so without "see consensus in
[0]".

Kind regards,

Matthias van de Meent

Re: proposal: psql: show current user in prompt

2023-09-04 Thread Pavel Stehule

po 4. 9. 2023 v 14:24 odesílatel Jelte Fennema  napsal:

> On Sun, 3 Sept 2023 at 20:58, Pavel Stehule 
> wrote:
> > here is an try
>
> Overall it does what I had in mind. Below a few suggestions:
>
> +int
> +PQprotocolSubversion(const PGconn *conn)
>
> Ugh, it's quite annoying that the original PQprotocolVersion only
> returns the major version and thus we need this new function. It
> seems like it would be much nicer if it returned a number similar to
> PQserverVersion. I think it might be nicer to change PQprotocolVersion
> to do that than to add another function. We could do:
>
> return PG_PROTOCOL_MAJOR(conn->pversion) * 100 +
> PG_PROTOCOL_MINOR(conn->pversion);
>
> or even:
>
> if (PG_PROTOCOL_MAJOR(conn->pversion) == 3 &&
> PG_PROTOCOL_MINOR(conn->pversion))
> return 3;
> else
> return PG_PROTOCOL_MAJOR(conn->pversion) * 100 +
> PG_PROTOCOL_MINOR(conn->pversion);
>
> The second option would be safest backwards compatibility wise, but in
> practice you would only get another value than 3 (or 0) when
> connecting to pre 7.4 servers. That seems old enough that I don't
> think anyone is actually calling this function. **I'd like some
> feedback from others on this though.**
>

Both versions look a little bit strange to me. I have not strong opinion
about it, but I am not sure if is best to change contract after 20 years ago

commit from Jun 2003 efc3a25bb02ada63158fe7006673518b005261ba

I prefer to introduce a new function - it is ten lines of code. The form is
not important - it can be a full number or minor number. It doesn't matter
I think. But my opinion in this area is not strong, and I like to see
feedback from others too. It is true that this feature and interface is not
fully complete.

Reards

Pavel


> +   /* The protocol 3.0 is required */
> +   if (PG_PROTOCOL_MAJOR(their_version) == 3)
> +   conn->pversion = their_version;
>
> Let's compare against the actual PG_PROTOCOL_EARLIEST and
> PG_PROTOCOL_LATEST to determine if the version is supported or not.
>

Re: PATCH: Add REINDEX tag to event triggers

2023-09-04 Thread Jim Jones


On 01.09.23 18:10, jian he wrote:
Thanks for pointing this out! 

Thanks for the fix!

also due to commit
https://git.postgresql.org/cgit/postgresql.git/commit/?id=11af63fb48d278b86aa948a5b57f136ef03c2bb7
ExecReindex function input arguments also changed. so we need to
rebase this patch.

change to
currentReindexStatement = unconstify(ReindexStmt*, stmt);
seems to work for me. I tested it, no warning. But I am not 100% sure.

anyway I refactored the patch, making it git applyable
also change from "currentReindexStatement = stmt;" to
"currentReindexStatement = unconstify(ReindexStmt*, stmt);" due to
ExecReindex function changes.


LGTM. It applies and builds cleanly, all tests pass and documentation 
also builds ok. The CFbot seems also much happier now :)


I tried this "small stress test" to see if the code was leaking .. but 
it all looks ok to me:


DO $$
BEGIN
  FOR i IN 1..1 LOOP
    REINDEX INDEX reindex_test.reindex_test1_idx1;
    REINDEX TABLE reindex_test.reindex_tester1;
  END LOOP;
END;
$$;


Jim

information_schema and not-null constraints

2023-09-04 Thread Alvaro Herrera

In reference to [1], 0001 attached to this email contains the updated
view definitions that I propose.

In 0002, I took the tests added by Peter's proposed patch and put them
in a separate test file that runs at the end.  There are some issues,
however.  One is that the ORDER BY clause in the check_constraints view
is not fully deterministic, because the table name is not part of the
view definition, so we cannot sort by table name.  In the current
regression database there is only one case[2] where two constraints have
the same name and different definition:

  inh_check_constraint   │ 2 │ ((f1 > 0)) NOT VALID ↵
 │   │ ((f1 > 0))

(on tables invalid_check_con and invalid_check_con_child).  I assume
this is going to bite us at some point.  We could just add a WHERE
clause to omit that one constraint.

Another issue I notice eyeballing at the results is that foreign keys on
partitioned tables are listing the rows used to implement the
constraints on partitions, which are sort-of "internal" constraints (and
are not displayed by psql's \d).  I hope this is a relatively simple fix
that we could extract from the code used by psql.

Anyway, I think I'm going to get 0001 committed sometime tomorrow, and
then play a bit more with 0002 to try and get it pushed soon also.

Thanks

[1] https://postgr.es/m/81b461c4-edab-5d8c-2f88-203108425...@enterprisedb.com

[2]
select constraint_name, count(*),
   string_agg(distinct check_clause, E'\n')
from information_schema.check_constraints
group by constraint_name
having count(*) > 1;

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"You don't solve a bad join with SELECT DISTINCT" #CupsOfFail
https://twitter.com/connor_mc_d/status/1431240081726115845
>From d8a5f8103934fe65a83a2ca44f6af72449cb6aa9 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 4 Sep 2023 18:05:50 +0200
Subject: [PATCH v2 1/2] update information_schema definition

---
 src/backend/catalog/information_schema.sql | 74 --
 1 file changed, 27 insertions(+), 47 deletions(-)

diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql
index a06ec7a0a8..c402cca7f4 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -444,22 +444,19 @@ CREATE VIEW check_constraints AS
 WHERE pg_has_role(coalesce(c.relowner, t.typowner), 'USAGE')
   AND con.contype = 'c'
 
-UNION
+UNION ALL
 -- not-null constraints
-
-SELECT CAST(current_database() AS sql_identifier) AS constraint_catalog,
-   CAST(n.nspname AS sql_identifier) AS constraint_schema,
-   CAST(CAST(n.oid AS text) || '_' || CAST(r.oid AS text) || '_' || CAST(a.attnum AS text) || '_not_null' AS sql_identifier) AS constraint_name, -- XXX
-   CAST(a.attname || ' IS NOT NULL' AS character_data)
- AS check_clause
-FROM pg_namespace n, pg_class r, pg_attribute a
-WHERE n.oid = r.relnamespace
-  AND r.oid = a.attrelid
-  AND a.attnum > 0
-  AND NOT a.attisdropped
-  AND a.attnotnull
-  AND r.relkind IN ('r', 'p')
-  AND pg_has_role(r.relowner, 'USAGE');
+SELECT current_database()::information_schema.sql_identifier AS constraint_catalog,
+   rs.nspname::information_schema.sql_identifier AS constraint_schema,
+   con.conname::information_schema.sql_identifier AS constraint_name,
+   format('CHECK (%s IS NOT NULL)', at.attname)::information_schema.character_data AS check_clause
+ FROM pg_constraint con
+LEFT JOIN pg_namespace rs ON rs.oid = con.connamespace
+LEFT JOIN pg_class c ON c.oid = con.conrelid
+LEFT JOIN pg_type t ON t.oid = con.contypid
+LEFT JOIN pg_attribute at ON (con.conrelid = at.attrelid AND con.conkey[1] = at.attnum)
+ WHERE pg_has_role(coalesce(c.relowner, t.typowner), 'USAGE'::text)
+   AND con.contype = 'n';
 
 GRANT SELECT ON check_constraints TO PUBLIC;
 
@@ -826,6 +823,20 @@ CREATE VIEW constraint_column_usage AS
 AND r.relkind IN ('r', 'p')
 AND NOT a.attisdropped
 
+	UNION ALL
+
+	/* not-null constraints */
+SELECT DISTINCT nr.nspname, r.relname, r.relowner, a.attname, nc.nspname, c.conname
+  FROM pg_namespace nr, pg_class r, pg_attribute a, pg_namespace nc, pg_constraint c
+  WHERE nr.oid = r.relnamespace
+AND r.oid = a.attrelid
+AND r.oid = c.conrelid
+AND a.attnum = c.conkey[1]
+AND c.connamespace = nc.oid
+AND c.contype = 'n'
+AND r.relkind in ('r', 'p')
+AND not a.attisdropped
+
 UNION ALL
 
 /* unique/primary key/foreign key constraints */
@@ -1828,6 +1839,7 @@ CREATE VIEW table_constraints AS
CAST(r.relname AS sql_identifier) AS table_name,
CAST(
  CASE c.contype WHEN 'c' THEN 'CHECK'
+

Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2023-09-04 Thread Aleksander Alekseev

Hi,

> > Reviewing this now. I think it's almost ready to be committed.
> >
> > There's another big effort going on to move SLRUs to the regular buffer
> > cache (https://commitfest.postgresql.org/43/3514/). I wonder how moving
> > to 64 bit page numbers will affect that. BlockNumber is still 32 bits,
> > after all.
>
> Somehow I didn't pay too much attention to this effort, thanks. I will
> familiarize myself with the patch. Intuitively I don't think that the
> patchse should block each other.
>
> > This patch makes the filenames always 12 characters wide (for SLRUs that
> > opt-in to the new naming). That's actually not enough for the full range
> > that a 64 bit page number can represent. Should we make it 16 characters
> > now, to avoid repeating the same mistake we made earlier? Or make it
> > more configurable, on a per-SLRU basis. One reason I don't want to just
> > make it 16 characters is that WAL segment filenames are also 16 hex
> > characters, which could cause confusion.
>
> Good catch. I propose the following solution:
>
> ```
>  SlruFileName(SlruCtl ctl, char *path, int64 segno)
>  {
>  if (ctl->long_segment_names)
> -return snprintf(path, MAXPGPATH, "%s/%012llX", ctl->Dir,
> +/*
> + * We could use 16 characters here but the disadvantage would be that
> + * the SLRU segments will be hard to distinguish from WAL segments.
> + *
> + * For this reason we use 15 characters. It is enough but also means
> + * that in the future we can't decrease SLRU_PAGES_PER_SEGMENT 
> easily.
> + */
> +return snprintf(path, MAXPGPATH, "%s/%015llX", ctl->Dir,
>  (long long) segno);
>  else
>  return snprintf(path, MAXPGPATH, "%s/%04X", (ctl)->Dir,
> ```
>
> PFE the corrected patchset v58.

While triaging the patches for the September CF [1] a consensus was
reached that the patchset needs another round of review. Also I
removed myself from the list of reviewers in order to make it clear
that a review from somebody else would be appreciated.

[1]: https://postgr.es/m/0737f444-59bb-ac1d-2753-873c40da0840%40eisentraut.org
-- 
Best regards,
Aleksander Alekseev

Re: Commitfest 2023-09 starts soon

2023-09-04 Thread Aleksander Alekseev

Hi Matthias,

> I'm a bit confused about your use of "consensus". True, there was no
> objection, but it looks like no patch author or reviewer was informed
> (cc-ed or directly referenced) that the patch was being discussed
> before achieving this "consensus", and the "consensus" was reached
> within 4 days, of which 2 weekend, in a thread that has (until now)
> involved only you and Peter E.
>
> Usually, you'd expect discussion about a patch to happen on the
> patch's thread before any action is taken (or at least a mention on
> that thread), but quite clearly that hasn't happened here.
> Are patch authors expected to follow any and all discussion on threads
> with "Commitfest" in the title?
> If so, shouldn't the relevant wiki pages be updated, and/or the
> -hackers community be updated by mail in a new thread about these
> policy changes?

I understand your disappointment and assure you that no one is acting
with bad intentions here. Also please note that English is a second
language for many of us which represents a challenge when it comes to
expressing thoughts on the mailing list. We have a common goal here,
to make PostgreSQL an even better system than it is now.

The patches under question were in "Waiting for Author" state for a
*long* time and the authors were notified about this. We could toss
such patches from one CF to another month after month or mark as RwF
and let the author know that no one is going to review that patch
until the author takes the actions. It's been noted that the letter
approach is more productive in the long run. The discussion can
continue in the same thread and the same thread can be registered for
the upcoming CF.

This being said, Peter is the CF manager, so he has every right to
change the status of the patches under questions if he disagrees.

-- 
Best regards,
Aleksander Alekseev

Re: SLRUs in the main buffer pool, redux

2023-09-04 Thread Aleksander Alekseev

Hi,

> > Unfortunately the patchset rotted quite a bit since February and needs a 
> > rebase.
>
> A consensus was reached to mark this patch as RwF for now. There
> are many patches to be reviewed and this one doesn't seem to be in the
> best shape, so we have to prioritise. Please feel free re-submitting
> the patch for the next commitfest.

See also [1]

"""
[...]
Also, please consider joining the efforts and having one thread
with a single patchset rather than different threads with different
competing patches. This will simplify the work of the reviewers a lot.

Personally I would suggest taking one step back and agree on a
particular RFC first and then continue working on a single patchset
according to this RFC. We did it in the past in similar cases and this
approach proved to be productive.
[...]
"""

[1]: 
https://postgr.es/m/CAJ7c6TME5Z8k4undYUmKavD_dQFL0ujA%2BzFCK1eTH0_pzM%3DXrA%40mail.gmail.com

-- 
Best regards,
Aleksander Alekseev

Re: SLRUs in the main buffer pool - Page Header definitions

2023-09-04 Thread Aleksander Alekseev

Hi,

> Agreed that we'd certainly want to make sure it's all correct and to do
> performance testing but in terms of how many buffers... isn't much of
> the point of this that we have data in memory because it's being used
> and if it's not then we evict it?  That is, I wouldn't think we'd have
> set parts of the buffer pool for SLRUs vs. regular data but would let
> the actual demand drive what pages are in the cache and I thought that
> was part of this proposal and part of the reasoning behind making this
> change.
>
> There's certainly an argument to be made that our current cache
> management strategy doesn't account very well for the value of pages
> (eg: btree root pages vs. random heap pages, perhaps) and that'd
> certainly be a good thing to improve on, but that's independent of this.
> If it's not, then that's certainly moving the goal posts a very long way
> in terms of this effort.

During the triage of the patches submitted for the September CF [1] I
noticed that the corresponding CF entry [2] has two threads linked.
Only the first thread was used by CF bot [3], also Heikki and Thomas
were listed as the authors. The patch from the first thread rotted and
was not updated for some time which resulted in marking the patch as
RwF for now [4]

It looks like the patch in *this* thread was never registered on the
commitfest and/or tested by CF bot, unless I'm missing something.
Unfortunately it's a bit late to register it for the September CF
especially considering the fact that it doesn't apply at the moment.

This being said, please consider submitting the patch for the upcoming
CF. Also, please consider joining the efforts and having one thread
with a single patchset rather than different threads with different
competing patches. This will simplify the work of the reviewers a lot.

Personally I would suggest taking one step back and agree on a
particular RFC first and then continue working on a single patchset
according to this RFC. We did it in the past in similar cases and this
approach proved to be productive.

[1]: https://postgr.es/m/0737f444-59bb-ac1d-2753-873c40da0840%40eisentraut.org
[2]: https://commitfest.postgresql.org/44/3514/
[3]: http://cfbot.cputube.org/
[4]: 
https://postgr.es/m/CAJ7c6TN%3D1EF1bTA6w8W%3D0e_Bj%2B-jsiHK0ap1uC_ZUGjwu%2B4JGw%40mail.gmail.com

-- 
Best regards,
Aleksander Alekseev

Re: Commitfest 2023-09 starts soon

2023-09-04 Thread Matthias van de Meent

On Thu, 31 Aug 2023 at 14:35, Aleksander Alekseev
 wrote:
>
> Hi,
> > On Thu, 31 Aug 2023 at 11:37, Peter Eisentraut  wrote:
> > > There are a number of patches carried over from the PG16 development
> > > cycle that have been in "Waiting on author" for several months.  I will
> > > aggressively prune those after the start of this commitfest if there
> > > hasn't been any author activity by then.
> >
> > [1 patch]
>
> This was the one that I could name off the top of my head.
>
> [5 more patches]
>
> I will apply corresponding status changes if there will be no objections.

On Mon, 4 Sept 2023 at [various], Aleksander Alekseev
 wrote:
>
> Hi,
>
> > [various patches]
>
> A consensus was reached [1] to mark this patch as RwF for now. There
> are many patches to be reviewed and this one doesn't seem to be in the
> best shape, so we have to prioritise. Please feel free re-submitting
> the patch for the next commitfest.

I'm a bit confused about your use of "consensus". True, there was no
objection, but it looks like no patch author or reviewer was informed
(cc-ed or directly referenced) that the patch was being discussed
before achieving this "consensus", and the "consensus" was reached
within 4 days, of which 2 weekend, in a thread that has (until now)
involved only you and Peter E.

Usually, you'd expect discussion about a patch to happen on the
patch's thread before any action is taken (or at least a mention on
that thread), but quite clearly that hasn't happened here.
Are patch authors expected to follow any and all discussion on threads
with "Commitfest" in the title?
If so, shouldn't the relevant wiki pages be updated, and/or the
-hackers community be updated by mail in a new thread about these
policy changes?

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

[0] https://wiki.postgresql.org/wiki/Submitting_a_Patch

Re: Query execution in Perl TAP tests needs work

2023-09-04 Thread Andrew Dunstan



On 2023-09-02 Sa 20:17, Thomas Munro wrote:

On Sun, Sep 3, 2023 at 6:42 AM Andrew Dunstan  wrote:


I confess I'm a little reluctant to impose this burden on buildfarm owners. We 
should think about some sort of fallback in case this isn't supported on some 
platform, either due to technological barriers or buildfarm owner reluctance.

I guess you're thinking that it could be done in such a way that it is
automatically used for $node->safe_psql() and various other things if
Platypus is detected, but otherwise forking psql as now, for a
transition period?  Then we could nag build farm owners, and
eventually turn off the fallback stuff after N months.  After that it
would begin to be possible to use this in more interesting and
advanced ways.



Yeah, that would be ideal. I'll prep a version of the module that 
doesn't fail if FFI::Platypus isn't available, but instead sets a flag 
we can test.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Re: Suppressing compiler warning on Debian 12/gcc 12.2.0

2023-09-04 Thread Bruce Momjian

On Thu, Aug 31, 2023 at 03:25:09PM -0400, Bruce Momjian wrote:
> Being a new user of Debian 12/gcc 12.2.0, I wrote the following shell
> script to conditionally add gmake rules with compiler flags to
> src/Makefile.custom to suppress warnings for certain files.  This allows
> me to compile all supported Postgres releases without warnings.
> 
> I actually didn't how simple it was to add per-file compile flags until
> I read:
> 
>   
> https://stackoverflow.com/questions/6546162/how-to-add-different-rules-for-specific-files

This might be simpler for people to modify since it abstracts out the
version checking.

---

# gmake per-file options:  
https://stackoverflow.com/questions/6546162/how-to-add-different-rules-for-specific-files

for FILE in configure.in configure.ac
do  if [ -e "$FILE" ]
thenVERSION=$(sed -n 's/^AC_INIT(\[PostgreSQL\], 
\[\([0-9]\+\).*$/\1/p' "$FILE")
break
fi
done

[ -z "$VERSION" ] && echo 'Could not find Postgres version' 1>&2 && exit 1

if [ "$VERSION" -eq 11 ]
thencat >> src/Makefile.custom <> src/Makefile.custom

Re: Create shorthand for including all extra tests

2023-09-04 Thread Tom Lane

Nazir Bilal Yavuz  writes:
> I created an 'all' option for PG_TEST_EXTRA to enable all test suites
> defined under PG_TEST_EXTRA.

I think this is a seriously bad idea.  The entire point of not including
certain tests in check-world by default is that the omitted tests are
security hazards, so a developer or buildfarm owner should review each
one before deciding whether to activate it on their machine.

regards, tom lane

Create shorthand for including all extra tests

2023-09-04 Thread Nazir Bilal Yavuz

Hi,

I realized that I forgot to add the new extra test to my test scripts.
So, I thought maybe we can use shorthand for including all extra
tests. With that, running a full testsuite is easier without having to
keep up with new tests and updates.

I created an 'all' option for PG_TEST_EXTRA to enable all test suites
defined under PG_TEST_EXTRA. I created the check_extra_tests_enabled()
function in the Test/Utils.pm file. This function takes the test's
name as an input and checks if PG_TEST_EXTRA contains 'all' or this
test's name.

I thought another advantage could be that this can be used in CI. But
when 'wal_consistency_checking' is enabled, CI times get longer since
it does resource intensive operations.

Any kind of feedback would be appreciated.

Regards,
Nazir Bilal Yavuz
Microsoft
From be91a0aaf926c83bf266f92e0523f41ca333b048 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Mon, 4 Sep 2023 16:58:42 +0300
Subject: [PATCH v1] Create shorthand for including all extra tests

This patch aims to make running full testsuite easier without having to
keep up with new tests and updates.

Create 'all' option for PG_TEST_EXTRA to enable all test suites defined
under PG_TEST_EXTRA. That is achieved by creating
check_extra_tests_enabled() function in Test/Utils.pm file. This
function takes the test's name as an input and checks if PG_TEST_EXTRA
contains 'all' or this test's name.
---
 doc/src/sgml/regress.sgml|  9 +
 src/interfaces/libpq/t/004_load_balance_dns.pl   |  2 +-
 src/test/kerberos/t/001_auth.pl  |  2 +-
 src/test/ldap/t/001_auth.pl  |  2 +-
 src/test/ldap/t/002_bindpasswd.pl|  2 +-
 src/test/modules/Makefile|  2 +-
 .../t/001_mutated_bindpasswd.pl  |  2 +-
 src/test/perl/PostgreSQL/Test/Utils.pm   | 16 
 src/test/recovery/t/027_stream_regress.pl|  3 +--
 src/test/ssl/t/001_ssltests.pl   |  2 +-
 src/test/ssl/t/002_scram.pl  |  2 +-
 src/test/ssl/t/003_sslinfo.pl|  2 +-
 12 files changed, 35 insertions(+), 11 deletions(-)

diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index 675db86e4d7..40269c258ef 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -262,6 +262,15 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance'
 
The following values are currently supported:

+
+ all
+ 
+  
+   Enables all extra tests.
+  
+ 
+
+
 
  kerberos
  
diff --git a/src/interfaces/libpq/t/004_load_balance_dns.pl b/src/interfaces/libpq/t/004_load_balance_dns.pl
index 875070e2120..62eeb21843e 100644
--- a/src/interfaces/libpq/t/004_load_balance_dns.pl
+++ b/src/interfaces/libpq/t/004_load_balance_dns.pl
@@ -6,7 +6,7 @@ use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::Cluster;
 use Test::More;
 
-if ($ENV{PG_TEST_EXTRA} !~ /\bload_balance\b/)
+if (!PostgreSQL::Test::Utils::check_extra_text_enabled('load_balance'))
 {
 	plan skip_all =>
 	  'Potentially unsafe test load_balance not enabled in PG_TEST_EXTRA';
diff --git a/src/test/kerberos/t/001_auth.pl b/src/test/kerberos/t/001_auth.pl
index 0deb9bffc8d..59574178afc 100644
--- a/src/test/kerberos/t/001_auth.pl
+++ b/src/test/kerberos/t/001_auth.pl
@@ -28,7 +28,7 @@ if ($ENV{with_gssapi} ne 'yes')
 {
 	plan skip_all => 'GSSAPI/Kerberos not supported by this build';
 }
-elsif ($ENV{PG_TEST_EXTRA} !~ /\bkerberos\b/)
+elsif (!PostgreSQL::Test::Utils::check_extra_text_enabled('kerberos'))
 {
 	plan skip_all =>
 	  'Potentially unsafe test GSSAPI/Kerberos not enabled in PG_TEST_EXTRA';
diff --git a/src/test/ldap/t/001_auth.pl b/src/test/ldap/t/001_auth.pl
index 3e113fd6ebb..9db07e801e9 100644
--- a/src/test/ldap/t/001_auth.pl
+++ b/src/test/ldap/t/001_auth.pl
@@ -18,7 +18,7 @@ if ($ENV{with_ldap} ne 'yes')
 {
 	plan skip_all => 'LDAP not supported by this build';
 }
-elsif ($ENV{PG_TEST_EXTRA} !~ /\bldap\b/)
+elsif (!PostgreSQL::Test::Utils::check_extra_text_enabled('ldap'))
 {
 	plan skip_all =>
 	  'Potentially unsafe test LDAP not enabled in PG_TEST_EXTRA';
diff --git a/src/test/ldap/t/002_bindpasswd.pl b/src/test/ldap/t/002_bindpasswd.pl
index bcd4aa2b742..a1b1bd8c22f 100644
--- a/src/test/ldap/t/002_bindpasswd.pl
+++ b/src/test/ldap/t/002_bindpasswd.pl
@@ -18,7 +18,7 @@ if ($ENV{with_ldap} ne 'yes')
 {
 	plan skip_all => 'LDAP not supported by this build';
 }
-elsif ($ENV{PG_TEST_EXTRA} !~ /\bldap\b/)
+elsif (!PostgreSQL::Test::Utils::check_extra_text_enabled('ldap'))
 {
 	plan skip_all =>
 	  'Potentially unsafe test LDAP not enabled in PG_TEST_EXTRA';
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 6331c976dcb..2fdcff24785 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -43,7 +43,7 @@ endif
 
 # Test runs an LDAP server, so only run if ldap is in PG_TEST_EXTRA
 ifeq ($(with_ldap),yes)
-ifneq (,$(filt

Re: Subscription statistics are not dropped at DROP SUBSCRIPTION in some cases

2023-09-04 Thread Masahiko Sawada

Hi,

On Mon, Sep 4, 2023 at 9:38 PM Zhijie Hou (Fujitsu)
 wrote:
>
> On Wednesday, July 5, 2023 2:53 PM Masahiko Sawada  
> wrote:
>
> Hi,
>
> > Thanks for reviewing the patch. Pushed.
>
> My colleague Vignesh noticed that the newly added test cases were failing in 
> BF animal sungazer[1].

Thank you for reporting!

>
> The test failed to drop the slot which is active on publisher.
>
> --error-log--
> This failure is because pg_drop_replication_slot fails with "replication slot 
> "test_tab2_sub" is active for PID 55771638":
> 2023-09-02 09:00:04.806 UTC [12910732:4] 026_stats.pl LOG:  statement: SELECT 
> pg_drop_replication_slot('test_tab2_sub')
> 2023-09-02 09:00:04.807 UTC [12910732:5] 026_stats.pl ERROR:  replication 
> slot "test_tab2_sub" is active for PID 55771638
> 2023-09-02 09:00:04.807 UTC [12910732:6] 026_stats.pl STATEMENT:  SELECT 
> pg_drop_replication_slot('test_tab2_sub')
> -
>
> I the reason is that the test DISABLEd the subscription before dropping the
> slot, while "ALTER SUBSCRIPTION DISABLE" doesn't wait for the walsender to
> release the slot, so it's possible that the walsender is still alive when 
> calling
> pg_drop_replication_slot() to drop the slot on publisher(pg_drop_xxxslot()
> doesn't wait for slot to be released).

I agree with your analysis.

>
> I think we can wait for the slot to become inactive before dropping like:
> $node_primary->poll_query_until('otherdb',
> "SELECT NOT EXISTS (SELECT 1 FROM pg_replication_slots WHERE 
> active_pid IS NOT NULL)"
> )
>

I prefer this approach but it would be better to specify the slot name
in the query.

> Or we can just don't drop the slot as it’s the last testcase.

Since we might add other tests after that in the future, I think it's
better to drop the replication slot (and subscription).

>
> Another thing might be worth considering is we can add one parameter in
> pg_drop_replication_slot() to let user control whether to wait or not, and the
> test can be fixed as well by passing nowait=false to the func.

While it could be useful in general we cannot use this approach for
this issue since it cannot be backpatched to older branches. We might
want to discuss it on a new thread.

I've attached a draft patch to fix this issue. Feedback is very welcome.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


fix.patch
Description: Binary data

Re: Commitfest 2023-09 starts soon

2023-09-04 Thread Aleksander Alekseev

Hi Peter,

> The patch was first set to "Ready for Committer" on 2023-03-29, and if I
> pull up the thread in the web archive view, that is in the middle of the
> page.  So as a committer, I would expect that someone would review
> whatever happened in the second half of that thread before turning it
> over to committer.
>
> As a general rule, if significant additional discussion or patch posting
> happens after a patch is set to "Ready for Committer", I'm setting it
> back to "Needs review" until someone actually re-reviews it.

OK, fair enough.

> I also notice that you are listed as both author and reviewer of that
> patch, which I think shouldn't be done.  It appears that you are in fact
> the author, so I would recommend that you remove yourself from the
> reviewers.

Sometimes I start as a reviewer and then for instance add my own
patches to the thread. In cases like this I end up being both an
author and a reviewer, but it doesn't mean that I review my own
patches :)

In this particular case IMO it would be appropriate to remove myself
from the list of reviewers. So I did.

Thanks!

-- 
Best regards,
Aleksander Alekseev

Re: Extract numeric filed in JSONB more effectively

2023-09-04 Thread Andy Fan

Hi,

  v13 attached.  Changes includes:

1.  fix the bug Jian provides.
2.  reduce more code duplication without DirectFunctionCall.
3.  add the overlooked  jsonb_path_query and jsonb_path_query_first as
candidates


--
Best Regards
Andy Fan


v13-0001-optimize-casting-jsonb-to-a-given-type.patch
Description: Binary data

Re: Commitfest 2023-09 starts soon

2023-09-04 Thread Peter Eisentraut

I have done several passes to make sure that patch statuses are more 
accurate.  As explained in a nearby message, I have set several patches 
back from "Ready to Committer" to "Needs review" if additional 
discussion happened past the first status change.  I have also in 
several cases removed reviewers from a patch entry if they haven't been 
active on the thread in several months.  (Feel free to sign up again, 
but don't "block" the patch.)


I was going to say, unfortunately that means that there is now even more 
work to do, but in fact, it seems this has all kind of balanced out, and 
I hope it's now more accurate for someone wanting to pick up some work:


Status summary: Needs review: 224. Waiting on Author: 24. Ready for 
Committer: 31. Committed: 41. Returned with Feedback: 14. Withdrawn: 2. 
Rejected: 2. Total: 338.


(And yes, the total number of patches has grown since the commitfest has 
started!?!)



On 01.09.23 14:55, Peter Eisentraut wrote:

Okay, here we go, starting with:

Status summary: Needs review: 227. Waiting on Author: 37. Ready for 
Committer: 30. Committed: 40. Rejected: 1. Returned with Feedback: 1. 
Withdrawn: 1. Total: 337.


(which is less than CF 2023-07)

I have also already applied one round of the waiting-on-author-pruning 
described below (not included in the above figures).



On 31.08.23 10:36, Peter Eisentraut wrote:
Commitfest 2023-09 (https://commitfest.postgresql.org/44/) starts in 
less than 28 hours.


If you have any patches you would like considered, be sure to add them 
in good time.


All patch authors, and especially experienced hackers, are requested 
to make sure the patch status is up to date.  If the patch is still 
being worked on, it might not need to be in "Needs review".  
Conversely, if you are hoping for a review but the status is "Waiting 
on author", then it will likely be ignored by reviewers.


There are a number of patches carried over from the PG16 development 
cycle that have been in "Waiting on author" for several months.  I 
will aggressively prune those after the start of this commitfest if 
there hasn't been any author activity by then.

Re: Commitfest 2023-09 starts soon

2023-09-04 Thread Peter Eisentraut


On 04.09.23 15:22, Aleksander Alekseev wrote:

Hi Peter,


Okay, here we go, starting with:

Status summary: Needs review: 227. Waiting on Author: 37. Ready for
Committer: 30. Committed: 40. Rejected: 1. Returned with Feedback: 1.
Withdrawn: 1. Total: 337.

(which is less than CF 2023-07)

I have also already applied one round of the waiting-on-author-pruning
described below (not included in the above figures).


* Index SLRUs by 64-bit integers rather than by 32-bit integers
https://commitfest.postgresql.org/44/3489/

The status here was changed to "Needs Review". These patches are in
good shape and previously were marked as "Ready for Committer".
Actually I thought Heikki would commit them to PG16, but it didn't
happen. If there are no objections, I will return the RfC status in a
bit since it seems to be more appropriate in this case.


The patch was first set to "Ready for Committer" on 2023-03-29, and if I 
pull up the thread in the web archive view, that is in the middle of the 
page.  So as a committer, I would expect that someone would review 
whatever happened in the second half of that thread before turning it 
over to committer.


As a general rule, if significant additional discussion or patch posting 
happens after a patch is set to "Ready for Committer", I'm setting it 
back to "Needs review" until someone actually re-reviews it.


I also notice that you are listed as both author and reviewer of that 
patch, which I think shouldn't be done.  It appears that you are in fact 
the author, so I would recommend that you remove yourself from the 
reviewers.

Re: Unlogged relation copy is not fsync'd

2023-09-04 Thread Melanie Plageman

On Fri, Aug 25, 2023 at 8:47 AM Heikki Linnakangas  wrote:
>
> I noticed another missing fsync() with unlogged tables, similar to the
> one at [1].
>
> RelationCopyStorage does this:
>
> >   /*
> >* When we WAL-logged rel pages, we must nonetheless fsync them.  The
> >* reason is that since we're copying outside shared buffers, a 
> > CHECKPOINT
> >* occurring during the copy has no way to flush the previously 
> > written
> >* data to disk (indeed it won't know the new rel even exists).  A 
> > crash
> >* later on would replay WAL from the checkpoint, therefore it 
> > wouldn't
> >* replay our earlier WAL entries. If we do not fsync those pages 
> > here,
> >* they might still not be on disk when the crash occurs.
> >*/
> >   if (use_wal ||  copying_initfork)
> >   smgrimmedsync(dst, forkNum);
>
> That 'copying_initfork' condition is wrong. The first hint of that is
> that 'use_wal' is always true for an init fork. I believe this was meant
> to check for 'relpersistence == RELPERSISTENCE_UNLOGGED'. Otherwise,
> this bad thing can happen:
>
> 1. Create an unlogged table
> 2. ALTER TABLE unlogged_tbl SET TABLESPACE ... -- This calls
> RelationCopyStorage
> 3. a checkpoint happens while the command is running
> 4. After the ALTER TABLE has finished, shut down postgres cleanly.
> 5. Lose power
>
> When you recover, the unlogged table is not reset, because it was a
> clean postgres shutdown. But the part of the file that was copied after
> the checkpoint at step 3 was never fsync'd, so part of the file is lost.
> I was able to reproduce with a VM that I kill to simulate step 5.
>
> This is the same scenario that the smgrimmedsync() call above protects
> from for WAL-logged relations. But we got the condition wrong: instead
> of worrying about the init fork, we need to call smgrimmedsync() for all
> *other* forks of an unlogged relation.
>
> Fortunately the fix is trivial, see attached. I suspect we have similar
> problems in a few other places, though. end_heap_rewrite(), _bt_load(),
> and gist_indexsortbuild have a similar-looking smgrimmedsync() calls,
> with no consideration for unlogged relations at all. I haven't tested
> those, but they look wrong to me.

The patch looks reasonable to me. Is this [1] case in hash index build
that I reported but didn't take the time to reproduce similar?

- Melanie

[1] 
https://www.postgresql.org/message-id/CAAKRu_bPc81M121pOEU7W%3D%2BwSWEebiLnrie4NpaFC%2BkWATFtSA%40mail.gmail.com

Re: Commitfest 2023-09 starts soon

2023-09-04 Thread Aleksander Alekseev

Hi Peter,

> Okay, here we go, starting with:
>
> Status summary: Needs review: 227. Waiting on Author: 37. Ready for
> Committer: 30. Committed: 40. Rejected: 1. Returned with Feedback: 1.
> Withdrawn: 1. Total: 337.
>
> (which is less than CF 2023-07)
>
> I have also already applied one round of the waiting-on-author-pruning
> described below (not included in the above figures).

* Index SLRUs by 64-bit integers rather than by 32-bit integers
https://commitfest.postgresql.org/44/3489/

The status here was changed to "Needs Review". These patches are in
good shape and previously were marked as "Ready for Committer".
Actually I thought Heikki would commit them to PG16, but it didn't
happen. If there are no objections, I will return the RfC status in a
bit since it seems to be more appropriate in this case.

-- 
Best regards,
Aleksander Alekseev

Re: pgsql: Allow tailoring of ICU locales with custom rules

2023-09-04 Thread Amit Kapila

On Tue, Aug 22, 2023 at 10:55 PM Jeff Davis  wrote:
>
> On Mon, 2023-08-14 at 10:34 +0200, Peter Eisentraut wrote:
> > I have investigated this.  My assessment is that how PostgreSQL
> > interfaces with ICU is correct.  Whether what ICU does is correct
> > might
> > be debatable.  I have filed a bug with ICU about this:
> > https://unicode-org.atlassian.net/browse/ICU-22456 , but there is no
> > response yet.
>
> Is everything other than the language and region simply discarded when
> a rules string is present, or are some attributes preserved, or is
> there some other nuance?
>
> > You can work around this by including the desired attributes in the
> > rules string, for example
> >
> >  create collation c3 (provider=icu,
> >locale='und-u-ka-shifted-ks-level1',
> >rules='[alternate shifted][strength 1]',
> >deterministic=false);
> >
> > So I don't think there is anything we need to do here for PostgreSQL
> > 16.
>
> Is there some way we can warn a user that some attributes will be
> discarded, or improve the documentation? Letting the user figure this
> out for themselves doesn't seem right.
>
> Are you sure we want to allow rules for the database default collation
> in 16, or should we start with just allowing them in CREATE COLLATION
> and then expand to the database default collation later? I'm still a
> bit concerned about users getting too fancy with daticurules, and
> ending up not being able to connect to their database anymore.
>

There is still an Open Item corresponding to this. Does anyone else
want to weigh in?

-- 
With Regards,
Amit Kapila.

Re: Speaker Bureau

2023-09-04 Thread Ashutosh Bapat

Added my name to the list.

I am available to travel for meetups.


-- 
Best Wishes,
Ashutosh Bapat

On Sat, Sep 2, 2023 at 12:30 AM Dave Cramer  wrote:
>
> Greetings,
>
> If you are on the speaker list can you send an email to ugc...@postgresql.us 
> indicating whether you are available to travel for meetups?
>
> This serves the obvious purpose but also provides your email address to us.
>
> Thanks,
>
> Dave Cramer

Re: trying again to get incremental backup

2023-09-04 Thread Dilip Kumar

On Wed, Aug 30, 2023 at 9:20 PM Robert Haas  wrote:

> Unless someone has a brilliant idea that I lack, this suggests to me
> that this whole line of testing is a dead end. I can, of course, write
> tests that compare clusters *logically* -- do the correct relations
> exist, are they accessible, do they have the right contents?

Can't we think of comparing at the block level, like we can compare
each block but ignore the content of the hole?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

RE: Subscription statistics are not dropped at DROP SUBSCRIPTION in some cases

2023-09-04 Thread Zhijie Hou (Fujitsu)

On Wednesday, July 5, 2023 2:53 PM Masahiko Sawada  
wrote:

Hi,

> Thanks for reviewing the patch. Pushed.

My colleague Vignesh noticed that the newly added test cases were failing in BF 
animal sungazer[1].

The test failed to drop the slot which is active on publisher.

--error-log--
This failure is because pg_drop_replication_slot fails with "replication slot 
"test_tab2_sub" is active for PID 55771638":
2023-09-02 09:00:04.806 UTC [12910732:4] 026_stats.pl LOG:  statement: SELECT 
pg_drop_replication_slot('test_tab2_sub')
2023-09-02 09:00:04.807 UTC [12910732:5] 026_stats.pl ERROR:  replication slot 
"test_tab2_sub" is active for PID 55771638
2023-09-02 09:00:04.807 UTC [12910732:6] 026_stats.pl STATEMENT:  SELECT 
pg_drop_replication_slot('test_tab2_sub')
-

I the reason is that the test DISABLEd the subscription before dropping the
slot, while "ALTER SUBSCRIPTION DISABLE" doesn't wait for the walsender to
release the slot, so it's possible that the walsender is still alive when 
calling
pg_drop_replication_slot() to drop the slot on publisher(pg_drop_xxxslot() 
doesn't wait for slot to be released).

I think we can wait for the slot to become inactive before dropping like:
$node_primary->poll_query_until('otherdb',
"SELECT NOT EXISTS (SELECT 1 FROM pg_replication_slots WHERE 
active_pid IS NOT NULL)"
)

Or we can just don't drop the slot as it’s the last testcase.

Another thing might be worth considering is we can add one parameter in
pg_drop_replication_slot() to let user control whether to wait or not, and the
test can be fixed as well by passing nowait=false to the func.

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2023-09-02%2002%3A17%3A01

Best Regards,
Hou zj

Re: A minor adjustment to get_cheapest_path_for_pathkeys

2023-09-04 Thread Aleksander Alekseev

Hi,

>> I see the reasoning behind the proposed change, but I'm not convinced
>> that there will be any measurable performance improvements. Firstly,
>> compare_path_costs() is rather cheap. Secondly, require_parallel_safe
>> is `false` in most of the cases. Last but not least, one should prove
>> that this particular place is a bottleneck under given loads. I doubt
>> it is. Most of the time it's a network, disk I/O or locks.
>>
>> So unless the author can provide benchmarks that show measurable
>> benefits of the change I suggest rejecting it.
>
> Hmm, I doubt that there would be any measurable performance gains from
> this minor tweak.  I think this tweak is more about being cosmetic.  But
> I'm OK if it is deemed unnecessary and thus rejected.

During the triage of the patches submitted for the September CF a
consensus was reached [1] to mark this patch as Rejected.

[1]: https://postgr.es/m/0737f444-59bb-ac1d-2753-873c40da0840%40eisentraut.org

-- 
Best regards,
Aleksander Alekseev

Re: [17] CREATE SUBSCRIPTION ... SERVER

2023-09-04 Thread Ashutosh Bapat

On Sat, Sep 2, 2023 at 1:41 AM Robert Haas  wrote:
>
> On Fri, Sep 1, 2023 at 4:04 PM Jeff Davis  wrote:
> > On Thu, 2023-08-31 at 17:17 -0400, Joe Conway wrote:
> > > Maybe move postgres_fdw to be a first class built in feature instead
> > > of
> > > an extension?
> >
> > That could make sense, but we still have to solve the problem of how to
> > present a built-in FDW.
> >
> > FDWs don't have a schema, so it can't be inside pg_catalog. So we'd
> > need some special logic somewhere to make pg_dump and psql \dew work as
> > expected, and I'm not quite sure what to do there.
>
> I'm worried that an approach based on postgres_fdw would have security
> problems. I think that we don't want postgres_fdw installed in every
> PostgreSQL cluster for security reasons. And I think that the set of
> people who should be permitted to manage connection strings for
> logical replication subscriptions could be different from the set of
> people who are entitled to use postgres_fdw.

If postgres_fdw was the only way to specify a connection to be used
with subscriptions, what you are saying makes sense. But it's not. We
will continue to support current mechanism which doesn't require
postgres_fdw to be installed on every PostgreSQL cluster.

What security problems do you foresee if postgres_fdw is used in
addition to the current mechanism?

-- 
Best Wishes,
Ashutosh Bapat

Re: Avoid unncessary always true test (src/backend/storage/buffer/bufmgr.c)

2023-09-04 Thread Peter Eisentraut


On 30.06.23 20:48, Karina Litskevich wrote:
So as for me calling LockRelationForExtension() and 
UnlockRelationForExtension()
without testing eb.rel first looks more like a bug here. However, they 
are never
actually called with eb.rel=NULL because of the EB_* flags, so there is 
no bug
here. I believe we should add Assert checking that when eb.rel is NULL, 
flags

are such that we won't use eb.rel. And yes, we can remove unnecessary checks
where the flags value guaranty us that eb.rel is not NULL.

And another minor observation. It seems to me that we don't need a 
"could have

been closed while waiting for lock" in ExtendBufferedRelShared(), because I
believe the comment above already explains why updating eb.smgr:

  * Note that another backend might have extended the relation by the time
  * we get the lock.

I attached the new version of the patch as I see it.


This patch version looks the most sensible to me.  But as commented 
further downthread, some explanation around the added assertions would 
be good.

Re: [17] CREATE SUBSCRIPTION ... SERVER

2023-09-04 Thread Ashutosh Bapat

On Sat, Sep 2, 2023 at 12:24 AM Jeff Davis  wrote:
>
> On Fri, 2023-09-01 at 12:28 +0530, Ashutosh Bapat wrote:
> > Thinking larger, how about we allow any FDW to be used here.
>
> That's a possibility, but I think that means the subscription would
> need to constantly re-check the parameters rather than relying on the
> FDW's validator.
>
> Otherwise it might be the wrong kind of FDW, and the user might be able
> to circumvent the password_required protection. It might not even be a
> postgres-related FDW at all, which would be a bit strange.
>
> If it's constantly re-checking the parameters then it raises the
> possibility that some "ALTER SERVER" or "ALTER USER MAPPING" succeeds
> but then subscriptions to that foreign server start failing, which
> would not be ideal. But I could be fine with that.

Why do we need to re-check parameters constantly? We will need to
restart subscriptions which are using the user mapping of FDW when
user mapping or server options change. If that mechanism isn't there,
we will need to build it. But that's doable.

I didn't understand your worry about circumventing password_required protection.

>
> > But I think there's some value in bringing
> > together these two subsystems which deal with foreign data logically
> > (as in logical vs physical view of data).
>
> I still don't understand how a core dependency on an extension would
> work.

We don't need to if we allow any FDW (even if non-postgreSQL) to be
specified there. For non-postgresql FDW the receiver will need to
construct the appropriate command and use appropriate protocol to get
the changes and apply locally. The  server at the other end may not
even have logical replication capability. The extension or "input
plugin" (as against output plugin) would decide whether it can deal
with the foreign server specific logical replication protocol. We add
another callback to FDW which can return whether the given foreign
server supports logical replication or not. If the users
misconfigured, their subscriptions will throw errors.

But with this change we open a built-in way to "replicate in" as we
have today to "replicate out".

-- 
Best Wishes,
Ashutosh Bapat

Re: SLRUs in the main buffer pool, redux

2023-09-04 Thread Aleksander Alekseev

Hi,

> Unfortunately the patchset rotted quite a bit since February and needs a 
> rebase.

A consensus was reached [1] to mark this patch as RwF for now. There
are many patches to be reviewed and this one doesn't seem to be in the
best shape, so we have to prioritise. Please feel free re-submitting
the patch for the next commitfest.

[1]: https://postgr.es/m/0737f444-59bb-ac1d-2753-873c40da0840%40eisentraut.org

-- 
Best regards,
Aleksander Alekseev

Re: [PATCH] Infinite loop while acquiring new TOAST Oid

2023-09-04 Thread Aleksander Alekseev

Hi,

> Aleksander, thank you for reminding me of this patch, try to do it in a few 
> days.

A consensus was reached [1] to mark this patch as RwF for now. There
are many patches to be reviewed and this one doesn't seem to be in the
best shape, so we have to prioritise. Please feel free re-submitting
the patch for the next commitfest.

[1]: https://postgr.es/m/0737f444-59bb-ac1d-2753-873c40da0840%40eisentraut.org

-- 
Best regards,
Aleksander Alekseev

Re: Flush SLRU counters in checkpointer process

2023-09-04 Thread Aleksander Alekseev

Hi,

> I think I've managed to reproduce the issue. The test I've added to check 
> slru flush was the one failing in the regression suite.

A consensus was reached [1] to mark this patch as RwF for now. There
are many patches to be reviewed and this one doesn't seem to be in the
best shape, so we have to prioritise. Please feel free re-submitting
the patch for the next commitfest.

[1]: https://postgr.es/m/0737f444-59bb-ac1d-2753-873c40da0840%40eisentraut.org
-- 
Best regards,
Aleksander Alekseev

Re: proposal: psql: show current user in prompt

2023-09-04 Thread Jelte Fennema

On Sun, 3 Sept 2023 at 20:58, Pavel Stehule  wrote:
> here is an try

Overall it does what I had in mind. Below a few suggestions:

+int
+PQprotocolSubversion(const PGconn *conn)

Ugh, it's quite annoying that the original PQprotocolVersion only
returns the major version and thus we need this new function. It
seems like it would be much nicer if it returned a number similar to
PQserverVersion. I think it might be nicer to change PQprotocolVersion
to do that than to add another function. We could do:

return PG_PROTOCOL_MAJOR(conn->pversion) * 100 +
PG_PROTOCOL_MINOR(conn->pversion);

or even:

if (PG_PROTOCOL_MAJOR(conn->pversion) == 3 && PG_PROTOCOL_MINOR(conn->pversion))
return 3;
else
return PG_PROTOCOL_MAJOR(conn->pversion) * 100 +
PG_PROTOCOL_MINOR(conn->pversion);

The second option would be safest backwards compatibility wise, but in
practice you would only get another value than 3 (or 0) when
connecting to pre 7.4 servers. That seems old enough that I don't
think anyone is actually calling this function. **I'd like some
feedback from others on this though.**

+   /* The protocol 3.0 is required */
+   if (PG_PROTOCOL_MAJOR(their_version) == 3)
+   conn->pversion = their_version;

Let's compare against the actual PG_PROTOCOL_EARLIEST and
PG_PROTOCOL_LATEST to determine if the version is supported or not.

Re: Autogenerate some wait events code and documentation

2023-09-04 Thread Drouvot, Bertrand


Hi,

On 8/29/23 8:41 AM, Michael Paquier wrote:

On Tue, Aug 29, 2023 at 08:17:10AM +0200, Drouvot, Bertrand wrote:
That may be a bug in the matrix because of bb90022, as git am can be
easily pissed. 


git am does not complain anymore.


+   # Generate the element name for the enums based on the
+   # description.  The C symbols are prefixed with "WAIT_EVENT_".

Nit: 2 whitespaces before "The C"

# Build the descriptions.  There are in camel-case.
# LWLock and Lock classes do not need any modifications.

Nit: 2 whitespaces before "There are in camel"

+   my $waiteventdescription = '';
+   if (   $waitclassname eq 'WaitEventLWLock'

Nit: Too many whitespace after the "if (" ?? (I guess pgperltidy would
fix it).


I have double-checked the docs generated, while on it, and I am not
seeing anything missing, particularly for the LWLock and Lock parts..


I did compare the output of select * from pg_wait_events order by 1,2 and
ignored the case (with and without the patch series).

Then, the only diff is:

< Client,WalSenderWaitWal,Waiting for WAL to be flushed in WAL sender process
---

Client,WalSenderWaitForWAL,Waiting for WAL to be flushed in WAL sender process


That said, it looks good to me.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Extract numeric filed in JSONB more effectively

2023-09-04 Thread Andy Fan

Hi Jian,

 SELECT (test_json -> 'field1')::int4 FROM test_jsonb WHERE json_type
> = 'object';
> -ERROR:  cannot cast jsonb string to type integer
> +ERROR:  unknown jsonb type: 1125096840
>

Thanks for the report!  The reason is I return the address of a local
variable.

jsonb_object_field_start(PG_FUNCTION_ARGS)
{

JsonbValue  *v;
JsonbValue  vbuf;
v = getKeyJsonValueFromContainer(&jb->root,
 VARDATA_ANY(key),\
 VARSIZE_ANY_EXHDR(key),
 &vbuf);
PG_RETURN_POINTER(v);
}

Here the v points to vbuf which is a local variable in stack.  I'm confused
that why it works on my local machine and also works in the most queries
in cfbot, the fix is below

v = getKeyJsonValueFromContainer(&jb->root,
 VARDATA_ANY(key),\
 VARSIZE_ANY_EXHDR(key),
 NULL);


I will send an updated version soon.

-- 
Best Regards
Andy Fan

Re: Avoid a possible null pointer (src/backend/utils/adt/pg_locale.c)

2023-09-04 Thread Ranier Vilela

Em dom., 3 de set. de 2023 às 22:01, Michael Paquier 
escreveu:

> On Sat, Sep 02, 2023 at 09:13:11AM -0300, Ranier Vilela wrote:
> > I tried to keep the same behavior as before.
> > Note that if the locale equals COLLPROVIDER_LIBC,
> > the message to the user will be the same.
>
> -/* shouldn't happen */
> -elog(ERROR, "unsupported collprovider: %c", locale->provider);
> +elog(ERROR, "collprovider '%c' does not support pg_strnxfrm_prefix()",
> + locale->provider);
>
> Based on what's written here, these messages could be better because
> full sentences are not encouraged in error messages, even for
> non-translated elogs:
> https://www.postgresql.org/docs/current/error-style-guide.html
>
> Perhaps something like "could not use collprovider %c: no support for
> %s", where the function name is used, would be more consistent.
>
Sure.
I have no objection to the wording of the message.
If there is consensus, I can send another patch.

best regards,
Ranier Vilela

Re: cataloguing NOT NULL constraints

2023-09-04 Thread Alvaro Herrera

Looking at your 0001 patch, which adds tests for some of the
information_schema views, I think it's a bad idea to put them in
whatever other regression .sql files; they would be subject to
concurrent changes depending on what other tests are being executed in
the same parallel test.  I suggest to put them all in a separate .sql
file, and schedule that to run in the last concurrent group, together
with the tablespace test.  This way, it would capture all the objects
left over by other test files.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/

Re: Optimize planner memory consumption for huge arrays

2023-09-04 Thread Dilip Kumar

On Mon, Sep 4, 2023 at 3:49 PM Lepikhov Andrei
 wrote:
>
> > + Const *c = makeConst(nominal_element_type,
> > + -1,
> > + nominal_element_collation,
> > + elmlen,
> > + elem_values[i],
> > + elem_nulls[i],
> > + elmbyval);
> > +
> > + args = list_make2(leftop, c);
> >   if (is_join_clause)
> >   s2 = DatumGetFloat8(FunctionCall5Coll(&oprselproc,
> > clause->inputcollid,
> > @@ -1984,7 +1985,8 @@ scalararraysel(PlannerInfo *root,
> > ObjectIdGetDatum(operator),
> > PointerGetDatum(args),
> > Int32GetDatum(varRelid)));
> > -
> > + list_free(args);
> > + pfree(c);
> >
> > Maybe you can just use list_free_deep, instead of storing the constant
> > in a separate variable.
> As I see, the first element in the array is leftop, which is used on other 
> iterations. So, we can't use list_free_deep here.

Yeah you are right, thanks.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

[PATCH] Add inline comments to the pg_hba_file_rules view

2023-09-04 Thread Jim Jones



Hi,

This patch proposes the column "comment" to the pg_hba_file_rules view. 
It basically parses the inline comment (if any) of a valid pg_hba.conf 
entry and displays it in the new column.


For such pg_hba entries ...

host db jim 127.0.0.1/32 md5 # foo
host db jim 127.0.0.1/32 md5 #bar
host db jim 127.0.0.1/32 md5 # #foo#

... it returns the following pg_hba_file_rules records:

postgres=#  SELECT type, database, user_name, address, comment
 FROM pg_hba_file_rules
 WHERE user_name[1]='jim';

 type | database | user_name |  address  | comment
--+--+---+---+-
 host | {db} | {jim} | 127.0.0.1 | foo
 host | {db} | {jim} | 127.0.0.1 | bar
 host | {db} | {jim} | 127.0.0.1 | #foo#
(3 rows)


This feature can come in quite handy when we need to read important 
comments from the hba entries without having access to the pg_hba.conf 
file directly.


The patch slightly changes the test 004_file_inclusion.pl to accommodate 
the new column and the hba comments.


Discussion: 
https://www.postgresql.org/message-id/flat/3fec6550-93b0-b542-b203-b0054aaee83b%40uni-muenster.de


Best regards,
Jim
From bb795fae29a0f714c590f94176a4675d7ae85a2f Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Mon, 4 Sep 2023 12:32:04 +0200
Subject: [PATCH v1] Add inline comments to the pg_hba_file_rules view

This patch proposes the column "comment" to the pg_hba_file_rules view.
It basically parses the inline comment (if any) of a valid pg_hba.conf
entry and displays it in the new column. The patch slightly changes the
test 004_file_inclusion.pl to accomodate the new column and the hba comments.
The view's documentation at system-views.sgml is extended accordingly.

The function GetInlineComment() was added to conffiles.c to avoid adding more
complexity directly to hba.c. It also enables this feature to be used by other
configuration files, if necessary.
---
 doc/src/sgml/system-views.sgml|  9 +
 src/backend/utils/adt/hbafuncs.c  | 11 +-
 src/backend/utils/misc/conffiles.c| 26 +
 src/include/catalog/pg_proc.dat   |  6 +--
 src/include/libpq/hba.h   |  1 +
 src/include/utils/conffiles.h |  1 +
 .../authentication/t/004_file_inclusion.pl| 39 ---
 src/test/regress/expected/rules.out   |  3 +-
 8 files changed, 85 insertions(+), 11 deletions(-)

diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 2b35c2f91b..d4951372a5 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -1090,6 +1090,15 @@
   
  
 
+
+  
+   comment text
+  
+  
+   Text after the # comment character in the end of a valid pg_hba.conf entry, if any
+  
+ 
+
  
   
error text
diff --git a/src/backend/utils/adt/hbafuncs.c b/src/backend/utils/adt/hbafuncs.c
index 73d3ad1dad..929678e97e 100644
--- a/src/backend/utils/adt/hbafuncs.c
+++ b/src/backend/utils/adt/hbafuncs.c
@@ -22,6 +22,7 @@
 #include "utils/array.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
+#include "utils/conffiles.h"
 
 
 static ArrayType *get_hba_options(HbaLine *hba);
@@ -159,7 +160,7 @@ get_hba_options(HbaLine *hba)
 }
 
 /* Number of columns in pg_hba_file_rules view */
-#define NUM_PG_HBA_FILE_RULES_ATTS	 11
+#define NUM_PG_HBA_FILE_RULES_ATTS	 12
 
 /*
  * fill_hba_line
@@ -191,6 +192,7 @@ fill_hba_line(Tuplestorestate *tuple_store, TupleDesc tupdesc,
 	const char *addrstr;
 	const char *maskstr;
 	ArrayType  *options;
+	char   *comment;
 
 	Assert(tupdesc->natts == NUM_PG_HBA_FILE_RULES_ATTS);
 
@@ -346,6 +348,13 @@ fill_hba_line(Tuplestorestate *tuple_store, TupleDesc tupdesc,
 			values[index++] = PointerGetDatum(options);
 		else
 			nulls[index++] = true;
+
+		/* comments */
+		comment = GetInlineComment(hba->rawline);
+		if(comment)
+			values[index++] = CStringGetTextDatum(comment);
+		else
+			nulls[index++] = true;
 	}
 	else
 	{
diff --git a/src/backend/utils/misc/conffiles.c b/src/backend/utils/misc/conffiles.c
index 376a5c885b..0f4169dea3 100644
--- a/src/backend/utils/misc/conffiles.c
+++ b/src/backend/utils/misc/conffiles.c
@@ -25,6 +25,32 @@
 #include "storage/fd.h"
 #include "utils/conffiles.h"
 
+/*
+ * GetInlineComment
+ *
+ * This function returns comments of a given config file line,
+ * if there is any. A comment is any text after the # character.
+ */
+char *
+GetInlineComment(char *line)
+{
+	size_t len;
+	char *comment = strchr(line, '#');
+
+	if(!comment)
+		return NULL;
+
+	/* ignoring '#' character, as we don't need it in the comment itself. */
+	comment++;
+	len = strlen(comment);
+
+	/* trim leading and trailing whitespaces */
+	while(isspace(comment[len - 1])) --len;
+	while(*comment && isspace(*comment)) ++comment, --len;
+
+	return pnstrdup(comment, len);
+}
+
 /*
  * AbsoluteConfigLocation
  *
diff --git a/src/include/catalog/pg_proc.dat b/src/inc

Re: Impact of checkpointer during pg_upgrade

2023-09-04 Thread Dilip Kumar

On Mon, Sep 4, 2023 at 1:41 PM Dilip Kumar  wrote:
>
> > > I think we can do better, like we can just read the latest
> > > checkpoint's LSN before starting the old cluster.  And now while
> > > checking the slot can't we check if the the slot is invalidated then
> > > their confirmed_flush_lsn >= the latest_checkpoint_lsn we preserved
> > > before starting the cluster because if so then those slot might have
> > > got invalidated during the upgrade no?
> > >
> >
> > Isn't that possible only if we update confirmend_flush LSN while
> > invalidating? Otherwise, how the check you are proposing can succeed?
>
> I am not suggesting to compare the confirmend_flush_lsn to the latest
> checkpoint LSN instead I am suggesting that before starting the
> cluster we get the location of the latest checkpoint LSN that should
> be the shutdown checkpoint LSN.  So now also in [1] we check that
> confirmed flush lsn should be equal to the latest checkpoint lsn.   So
> the only problem is that after we restart the cluster during the
> upgrade we might invalidate some of the slots which are perfectly fine
> to migrate and we want to identify those slots.  So if we know the the
> LSN of the shutdown checkpoint before the cluster started then we can
> perform a additional checks on all the invalidated slots that their
> confirmed lsn >= shutdown checkpoint lsn we preserved before
> restarting the cluster (not the latest checkpoint lsn) then those
> slots got invalidated only after we started the cluster for upgrade?
> Is there any loophole in this theory?   This theory is based on the
> assumption that the confirmed flush lsn are not moving forward for the
> already invalidated slots that means the slot which got invalidated
> before we shutdown for upgrade will have confirm flush lsn value <
> shutdown checkpoint and the slots which got invalidated during the
> upgrade will have confirm flush lsn at least equal to the shutdown
> checkpoint.

Said that there is a possibility that some of the slots which got
invalidated even on the previous checkpoint might get the same LSN as
the slot which got invalidated later if there is no activity between
these two checkpoints. So if we go with this approach then there is
some risk of migrating some of the slots which were already
invalidated even before the shutdown checkpoint.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: More new SQL/JSON item methods

2023-09-04 Thread Jeevan Chalke

>
> Looking at the SQL standard itself, in the 2023 edition section 9.46
> "SQL/JSON path language: syntax and semantics", it shows this:
>
>  ::=
> type  
> | size  
> | double  
> | ceiling  
> | floor  
> | abs  
> | datetime  [  ] 
> | keyvalue  
> | bigint  
> | boolean  
> | date  
> | decimal  [  [   ] ] 
> | integer  
> | number  
> | string  
> | time  [  ] 
> | time_tz  [  ] 
> | timestamp  [  ] 
> | timestamp_tz  [  ] 
>
> and then details, for each of those, rules like
>
> III) If JM specifies , then:
>  1) For all j, 1 (one) ≤ j ≤ n,
> Case:
> a) If I_j is not a number or character string, then let ST be data
>exception — non-numeric SQL/JSON item (22036).
> b) Otherwise, let X be an SQL variable whose value is I_j.
>Let V_j be the result of
>CAST (X AS DOUBLE PRECISION)
>If this conversion results in an exception condition, then
>let ST be that exception condition.
>  2) Case:
> a) If ST is not successful completion, then the result of JAE
>is ST.
> b) Otherwise, the result of JAE is the SQL/JSON sequence V_1,
>..., V_n.
>
> so at least superficially our implementation is constrained by what the
> SQL standard says to do, and we should verify that this implementation
> matches those rules.  We don't necessarily need to watch what do other
> specs such as jsonpath itself.
>

I believe our current implementation of the .double() method is in line with
this. And these new methods are following the same suit.



> > - surely there's a more direct way to make boolean from numeric
> >   than to serialize the numeric and parse an int?
>

Yeah, we can directly check the value = 0 for false, true otherwise.
But looking at the PostgreSQL conversion to bool, it doesn't allow floating
point values to be converted to boolean and only accepts int4. That's why I
did the int4 conversion.

Thanks

-- 
Jeevan Chalke

*Senior Staff SDE, Database Architect, and ManagerProduct Development*



edbpostgres.com

Re: Optimize planner memory consumption for huge arrays

2023-09-04 Thread Lepikhov Andrei




On Mon, Sep 4, 2023, at 3:37 PM, Dilip Kumar wrote:
> On Mon, Sep 4, 2023 at 11:58 AM Lepikhov Andrei
>  wrote:
>>
>> Hi, hackers,
>>
>> Looking at the planner behaviour with the memory consumption patch [1], I 
>> figured out that arrays increase memory consumption by the optimizer 
>> significantly. See init.sql in attachment.
>> The point here is that the planner does small memory allocations for each 
>> element during estimation. As a result, it looks like the planner consumes 
>> about 250 bytes for each integer element.
>>
>> It is maybe not a problem most of the time. However, in the case of 
>> partitions, memory consumption multiplies by each partition. Such a corner 
>> case looks weird, but the fix is simple. So, why not?
>>
>> The diff in the attachment is proof of concept showing how to reduce wasting 
>> of memory. Having benchmarked a bit, I didn't find any overhead.
>
> + Const *c = makeConst(nominal_element_type,
> + -1,
> + nominal_element_collation,
> + elmlen,
> + elem_values[i],
> + elem_nulls[i],
> + elmbyval);
> +
> + args = list_make2(leftop, c);
>   if (is_join_clause)
>   s2 = DatumGetFloat8(FunctionCall5Coll(&oprselproc,
> clause->inputcollid,
> @@ -1984,7 +1985,8 @@ scalararraysel(PlannerInfo *root,
> ObjectIdGetDatum(operator),
> PointerGetDatum(args),
> Int32GetDatum(varRelid)));
> -
> + list_free(args);
> + pfree(c);
>
> Maybe you can just use list_free_deep, instead of storing the constant
> in a separate variable.
As I see, the first element in the array is leftop, which is used on other 
iterations. So, we can't use list_free_deep here.

-- 
Regards,
Andrei Lepikhov

Re: persist logical slots to disk during shutdown checkpoint

2023-09-04 Thread vignesh C

On Mon, 4 Sept 2023 at 15:20, Amit Kapila  wrote:
>
> On Fri, Sep 1, 2023 at 10:50 AM vignesh C  wrote:
> >
> > On Fri, 1 Sept 2023 at 10:06, Amit Kapila  wrote:
> > >
> > > On Thu, Aug 31, 2023 at 6:12 PM Ashutosh Bapat
> > >  wrote:
> > > >
> > > > On Thu, Aug 31, 2023 at 2:52 PM Amit Kapila  
> > > > wrote:
> > > > >
> > > > > All but one. Normally, the idea of marking dirty is to indicate that
> > > > > we will actually write/flush the contents at a later point (except
> > > > > when required for correctness) as even indicated in the comments atop
> > > > > ReplicatioinSlotMarkDirty(). However, I see your point that we use
> > > > > that protocol at all the current places including CreateSlotOnDisk().
> > > > > So, we can probably do it here as well.
> > > >
> > > > yes
> > > >
> > >
> > > I think we should also ensure that slots are not invalidated
> > > (slot.data.invalidated != RS_INVAL_NONE) before marking them dirty
> > > because we don't allow decoding from such slots, so we shouldn't
> > > include those.
> >
> > Added this check.
> >
> > Apart from this I have also fixed the following issues that were
> > agreed on: a) Setting slots to dirty in CheckPointReplicationSlots
> > instead of setting it in SaveSlotToPath
> >
>
> + if (is_shutdown && SlotIsLogical(s))
> + {
> + SpinLockAcquire(&s->mutex);
> + if (s->data.invalidated == RS_INVAL_NONE &&
> + s->data.confirmed_flush != s->last_saved_confirmed_flush)
> + s->dirty = true;
>
> I think it is better to use ReplicationSlotMarkDirty() as that would
> be consistent with all other usages.

ReplicationSlotMarkDirty works only on MyReplicationSlot whereas
CheckpointReplicationSlots loops through all the slots and marks the
appropriate slot as dirty, we might have to change
ReplicationSlotMarkDirty to take the slot as input parameter and all
caller should pass MyReplicationSlot. Another thing is we have already
taken spin lock to access last_confirmed_flush_lsn from
CheckpointReplicationSlots, we could set dirty flag here itself, else
we will have to release the lock and call ReplicationSlotMarkDirty
which will take lock again. Instead shall we set just_dirtied also in
CheckpointReplicationSlots?
Thoughts?

Regards,
Vignesh

Re: persist logical slots to disk during shutdown checkpoint

2023-09-04 Thread Amit Kapila

On Fri, Sep 1, 2023 at 10:50 AM vignesh C  wrote:
>
> On Fri, 1 Sept 2023 at 10:06, Amit Kapila  wrote:
> >
> > On Thu, Aug 31, 2023 at 6:12 PM Ashutosh Bapat
> >  wrote:
> > >
> > > On Thu, Aug 31, 2023 at 2:52 PM Amit Kapila  
> > > wrote:
> > > >
> > > > All but one. Normally, the idea of marking dirty is to indicate that
> > > > we will actually write/flush the contents at a later point (except
> > > > when required for correctness) as even indicated in the comments atop
> > > > ReplicatioinSlotMarkDirty(). However, I see your point that we use
> > > > that protocol at all the current places including CreateSlotOnDisk().
> > > > So, we can probably do it here as well.
> > >
> > > yes
> > >
> >
> > I think we should also ensure that slots are not invalidated
> > (slot.data.invalidated != RS_INVAL_NONE) before marking them dirty
> > because we don't allow decoding from such slots, so we shouldn't
> > include those.
>
> Added this check.
>
> Apart from this I have also fixed the following issues that were
> agreed on: a) Setting slots to dirty in CheckPointReplicationSlots
> instead of setting it in SaveSlotToPath
>

+ if (is_shutdown && SlotIsLogical(s))
+ {
+ SpinLockAcquire(&s->mutex);
+ if (s->data.invalidated == RS_INVAL_NONE &&
+ s->data.confirmed_flush != s->last_saved_confirmed_flush)
+ s->dirty = true;

I think it is better to use ReplicationSlotMarkDirty() as that would
be consistent with all other usages.

-- 
With Regards,
Amit Kapila.

Re: Incremental View Maintenance, take 2

2023-09-04 Thread jian he

On Sat, Sep 2, 2023 at 7:46 PM Tatsuo Ishii  wrote:
>
> > attached is my refactor. there is some whitespace errors in the
> > patches, you need use
> > git apply --reject --whitespace=fix
> > basedon_v29_matview_c_refactor_update_set_clause.patch
> >
> > Also you patch cannot use git apply, i finally found out bulk apply
>
> I have no problem with applying Yugo's v29 patches using git apply, no
> white space errors.
>

thanks. I downloaded the patches from the postgres website, then the
problem was solved.

other ideas based on v29.

src/include/utils/rel.h
680: #define RelationIsIVM(relation) ((relation)->rd_rel->relisivm)
I guess it would be better to add some comments to address the usage.
Since all peer macros all have some comments.

pg_class change, I guess we need bump CATALOG_VERSION_NO?

small issue. makeIvmAggColumn and calc_delta need to add an empty
return statement?

style issue. in gram.y, "incremental" upper case?
+   CREATE OptNoLog incremental MATERIALIZED VIEW
create_mv_target AS SelectStmt opt_with_data

I don't know how pgident works, do you need to add some keywords to
src/tools/pgindent/typedefs.list to make indentation work?

in
/* If this is not the last AFTER trigger call, immediately exit. */
Assert (entry->before_trig_count >= entry->after_trig_count);
if (entry->before_trig_count != entry->after_trig_count)
return PointerGetDatum(NULL);

before returning NULL, do you also need clean_up_IVM_hash_entry? (I
don't know when this case will happen)

in
/* Replace the modified table with the new delta table and
calculate the new view delta*/
replace_rte_with_delta(rte, table, true, queryEnv);
refresh_matview_datafill(dest_new, query, queryEnv, tupdesc_new, "");

replace_rte_with_delta does not change the argument: table, argument:
queryEnv. refresh_matview_datafill just uses the partial argument of
the function calc_delta. So I guess, I am confused by the usage of
replace_rte_with_delta. also I think it should return void, since you
just modify the input argument. Here refresh_matview_datafill is just
persisting new delta content to dest_new?

Re: pg_upgrade and logical replication

2023-09-04 Thread Amit Kapila

On Mon, Sep 4, 2023 at 12:15 PM Michael Paquier  wrote:
>
> On Mon, Sep 04, 2023 at 11:51:14AM +0530, Amit Kapila wrote:
> > +1 for doing it via function (something like
> > binary_upgrade_create_sub_rel_state). We already have the internal
> > function AddSubscriptionRelState() that can do the core work.
>
> It is one of these patches that I have let aside for too long, and it
> solves a use-case of its own.  I think that I could hack that pretty
> quickly given that Julien has done a bunch of the ground work.  Would
> you agree with that?
>

Yeah, I agree that could be hacked quickly but note I haven't reviewed
in detail if there are other design issues in this patch. Note that we
thought first to support the upgrade of the publisher node, otherwise,
immediately after upgrading the subscriber and publisher, the
subscriptions won't work and start giving errors as they are dependent
on slots in the publisher. One other point that needs some thought is
that the LSN positions we are going to copy in the catalog may no
longer be valid after the upgrade (of the publisher) because we reset
WAL. Does that need some special consideration or are we okay with
that in all cases?  As of now, things are quite safe as documented in
pg_dump doc page that it will be the user's responsibility to set up
replication after dump/restore. I think it would be really helpful if
you could share your thoughts on the publisher-side matter as we are
facing a few tricky questions to be answered. For example, see a new
thread [1].

> > Like the publisher-side upgrade patch [1], I think we should allow
> > upgrading subscriptions by default instead with some flag like
> > --preserve-subscription-state. If required, we can introduce --exclude
> > option for upgrade. Having it just for pg_dump sounds reasonable to
> > me.
> >
> > [1] - 
> > https://www.postgresql.org/message-id/TYAPR01MB58664C81887B3AF2EB6B16E3F5939%40TYAPR01MB5866.jpnprd01.prod.outlook.com
>
> In the interface of the publisher for pg_upgrade agreed on and set in
> stone?  I certainly agree to have a consistent upgrade experience for
> the two sides of logical replication, publications and subscriptions.
> Also, I'd rather have a filtering option at the same time as the
> upgrade option to give more control to users from the start.
>

The point raised by Jonathan for not having an option for pg_upgrade
is that it will be easy for users, otherwise, users always need to
enable this option. Consider a replication setup, wouldn't users want
by default it to be upgraded? Asking them to do that via an option
would be an inconvenience. So, that was the reason, we wanted to have
an --exclude option and by default allow slots to be upgraded. I think
the same theory applies here.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1LV3%2B76CSOAk0h8Kv0AKb-OETsJHe6Sq6172-7DZXf0Qg%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

Re: In-placre persistance change of a relation

2023-09-04 Thread Kyotaro Horiguchi

At Thu, 24 Aug 2023 11:22:32 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> I could turn this into something like undo longs in a simple form, but
> I'd rather not craft a general-purpose undo log system for this unelss
> it's absolutely necessary.

This is a patch for a basic undo log implementation. It looks like it
works well for some orphan-files-after-a-crash and data-loss-on-reinit
cases.  However, it is far from complete and likely has issues with
crash-safety and the durability of undo log files (and memory leaks
and performance and..).

I'm posting this to move the discussion forward.

(This doesn't contain the third file "ALTER TABLE ..ALL IN TABLESPACE" part.)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From da5696b9026fa916ae991f7da616062c5b19e705 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 31 Aug 2023 11:49:10 +0900
Subject: [PATCH v29 1/2] Introduce undo log implementation

This patch adds a simple implementation of UNDO log feature.
---
 src/backend/access/transam/Makefile|   1 +
 src/backend/access/transam/rmgr.c  |   4 +-
 src/backend/access/transam/simpleundolog.c | 343 +
 src/backend/access/transam/twophase.c  |   3 +
 src/backend/access/transam/xact.c  |  24 ++
 src/backend/access/transam/xlog.c  |  20 +-
 src/backend/catalog/storage.c  | 171 ++
 src/backend/storage/file/reinit.c  |  78 +
 src/backend/storage/smgr/smgr.c|   9 +
 src/bin/initdb/initdb.c|  17 +
 src/bin/pg_rewind/parsexlog.c  |   2 +-
 src/bin/pg_waldump/rmgrdesc.c  |   2 +-
 src/include/access/rmgr.h  |   2 +-
 src/include/access/rmgrlist.h  |  44 +--
 src/include/access/simpleundolog.h |  36 +++
 src/include/catalog/storage.h  |   3 +
 src/include/catalog/storage_ulog.h |  35 +++
 src/include/catalog/storage_xlog.h |   9 +
 src/include/storage/reinit.h   |   2 +
 src/include/storage/smgr.h |   1 +
 src/tools/pgindent/typedefs.list   |   6 +
 21 files changed, 780 insertions(+), 32 deletions(-)
 create mode 100644 src/backend/access/transam/simpleundolog.c
 create mode 100644 src/include/access/simpleundolog.h
 create mode 100644 src/include/catalog/storage_ulog.h

diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile
index 661c55a9db..531505cbbd 100644
--- a/src/backend/access/transam/Makefile
+++ b/src/backend/access/transam/Makefile
@@ -21,6 +21,7 @@ OBJS = \
 	rmgr.o \
 	slru.o \
 	subtrans.o \
+	simpleundolog.o \
 	timeline.o \
 	transam.o \
 	twophase.o \
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 7d67eda5f7..840cbdecd3 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -35,8 +35,8 @@
 #include "utils/relmapper.h"
 
 /* must be kept in sync with RmgrData definition in xlog_internal.h */
-#define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,mask,decode) \
-	{ name, redo, desc, identify, startup, cleanup, mask, decode },
+#define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,mask,decode,undo) \
+	{ name, redo, desc, identify, startup, cleanup, mask, decode},
 
 RmgrData	RmgrTable[RM_MAX_ID + 1] = {
 #include "access/rmgrlist.h"
diff --git a/src/backend/access/transam/simpleundolog.c b/src/backend/access/transam/simpleundolog.c
new file mode 100644
index 00..ebbacce298
--- /dev/null
+++ b/src/backend/access/transam/simpleundolog.c
@@ -0,0 +1,343 @@
+/*-
+ *
+ * simpleundolog.c
+ *		Simple implementation of PostgreSQL transaction-undo-log manager
+ *
+ * In this module, procedures required during a transaction abort are
+ * logged. Persisting this information becomes crucial, particularly for
+ * ensuring reliable post-processing during the restart following a transaction
+ * crash. At present, in this module, logging of information is performed by
+ * simply appending data to a created file.
+ *
+ * Portions Copyright (c) 1996-2023, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/backend/access/transam/clog.c
+ *
+ *-
+ */
+
+#include "postgres.h"
+
+#include "access/simpleundolog.h"
+#include "access/twophase_rmgr.h"
+#include "access/parallel.h"
+#include "access/xact.h"
+#include "catalog/storage_ulog.h"
+#include "storage/fd.h"
+
+#define ULOG_FILE_MAGIC 0x12345678
+
+typedef struct UndoLogFileHeader
+{
+	int32 magic;
+	bool  prepared;
+} UndoLogFileHeader;
+
+typedef struct UndoDescData
+{
+	const char *name;
+	void	(*rm_undo) (SimpleUndoLogRecord *record, bool prepared);
+} UndoDescData;
+
+/* must be kept in sync with RmgrData definition in xlog_internal.h */
+#define PG_RMGR(sy

Re: Optimize planner memory consumption for huge arrays

2023-09-04 Thread Dilip Kumar

On Mon, Sep 4, 2023 at 11:58 AM Lepikhov Andrei
 wrote:
>
> Hi, hackers,
>
> Looking at the planner behaviour with the memory consumption patch [1], I 
> figured out that arrays increase memory consumption by the optimizer 
> significantly. See init.sql in attachment.
> The point here is that the planner does small memory allocations for each 
> element during estimation. As a result, it looks like the planner consumes 
> about 250 bytes for each integer element.
>
> It is maybe not a problem most of the time. However, in the case of 
> partitions, memory consumption multiplies by each partition. Such a corner 
> case looks weird, but the fix is simple. So, why not?
>
> The diff in the attachment is proof of concept showing how to reduce wasting 
> of memory. Having benchmarked a bit, I didn't find any overhead.

+ Const *c = makeConst(nominal_element_type,
+ -1,
+ nominal_element_collation,
+ elmlen,
+ elem_values[i],
+ elem_nulls[i],
+ elmbyval);
+
+ args = list_make2(leftop, c);
  if (is_join_clause)
  s2 = DatumGetFloat8(FunctionCall5Coll(&oprselproc,
clause->inputcollid,
@@ -1984,7 +1985,8 @@ scalararraysel(PlannerInfo *root,
ObjectIdGetDatum(operator),
PointerGetDatum(args),
Int32GetDatum(varRelid)));
-
+ list_free(args);
+ pfree(c);

Maybe you can just use list_free_deep, instead of storing the constant
in a separate variable.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Impact of checkpointer during pg_upgrade

2023-09-04 Thread Dilip Kumar

On Mon, Sep 4, 2023 at 11:18 AM Amit Kapila  wrote:
>
> On Mon, Sep 4, 2023 at 10:33 AM Dilip Kumar  wrote:
> >
> > On Mon, Sep 4, 2023 at 8:41 AM Amit Kapila  wrote:
> > >
> > > On Sat, Sep 2, 2023 at 6:12 PM Dilip Kumar  wrote:
> > > >
> > > > On Sat, Sep 2, 2023 at 10:09 AM Amit Kapila  
> > > > wrote:
> > > >
> > > > > The other possibilities apart from not allowing an upgrade in such a
> > > > > case could be (a) Before starting the old cluster, we fetch the slots
> > > > > directly from the disk using some tool like [2] and make the decisions
> > > > > based on that state;
> > > >
> > > > Okay, so IIUC along with dumping the slot data we also need to dump
> > > > the latest checkpoint LSN because during upgrade we do check that the
> > > > confirmed flush lsn for all the slots should be the same as the latest
> > > > checkpoint.  Yeah but I think we could work this out.
> > > >
> > > We already have the latest checkpoint LSN information from
> > > pg_controldata. I think we can use that as the patch proposed in the
> > > thread [1] is doing now. Do you have something else in mind?
> >
> > I think I did not understood the complete proposal.  And what I meant
> > is that if we dump the slot before we start the cluster thats fine.
> > But then later after starting the old cluster if we run some query
> > like we do in check_old_cluster_for_valid_slots() then thats not
> > right, because there is gap between the status of the slots what we
> > dumped before starting the cluster and what we are checking after the
> > cluster, so there is not point of that check right?
> >
>
> That's right but if we do read slots from disk, we preserve those in
> the memory and use that information instead of querying it again in
> check_old_cluster_for_valid_slots().
>
> > > >  (b) During the upgrade, we don't allow WAL to be
> > > > > removed if it can invalidate slots; (c) Copy/Migrate the invalid slots
> > > > > as well but for that, we need to expose an API to invalidate the
> > > > > slots;
> > > >
> > > >  (d) somehow distinguish the slots that are invalidated during
> > > > > an upgrade and then simply copy such slots because anyway we ensure
> > > > > that all the WAL required by slot is sent before shutdown.
> > > >
> > > > Yeah this could also be an option, although we need to think the
> > > > mechanism of distinguishing those slots looks clean and fit well with
> > > > other architecture.
> > > >
> > >
> > > If we want to do this we probably need to maintain a flag in the slot
> > > indicating that it was invalidated during an upgrade and then use the
> > > same flag in the upgrade to check the validity of slots. I think such
> > > a flag needs to be maintained at the same level as
> > > ReplicationSlotInvalidationCause to avoid any inconsistency among
> > > those.
> >
> > I think we can do better, like we can just read the latest
> > checkpoint's LSN before starting the old cluster.  And now while
> > checking the slot can't we check if the the slot is invalidated then
> > their confirmed_flush_lsn >= the latest_checkpoint_lsn we preserved
> > before starting the cluster because if so then those slot might have
> > got invalidated during the upgrade no?
> >
>
> Isn't that possible only if we update confirmend_flush LSN while
> invalidating? Otherwise, how the check you are proposing can succeed?

I am not suggesting to compare the confirmend_flush_lsn to the latest
checkpoint LSN instead I am suggesting that before starting the
cluster we get the location of the latest checkpoint LSN that should
be the shutdown checkpoint LSN.  So now also in [1] we check that
confirmed flush lsn should be equal to the latest checkpoint lsn.   So
the only problem is that after we restart the cluster during the
upgrade we might invalidate some of the slots which are perfectly fine
to migrate and we want to identify those slots.  So if we know the the
LSN of the shutdown checkpoint before the cluster started then we can
perform a additional checks on all the invalidated slots that their
confirmed lsn >= shutdown checkpoint lsn we preserved before
restarting the cluster (not the latest checkpoint lsn) then those
slots got invalidated only after we started the cluster for upgrade?
Is there any loophole in this theory?   This theory is based on the
assumption that the confirmed flush lsn are not moving forward for the
already invalidated slots that means the slot which got invalidated
before we shutdown for upgrade will have confirm flush lsn value <
shutdown checkpoint and the slots which got invalidated during the
upgrade will have confirm flush lsn at least equal to the shutdown
checkpoint.

[1] 
https://www.postgresql.org/message-id/TYAPR01MB5866F7D8ED15BA1E8E4A2AB0F5E4A%40TYAPR01MB5866.jpnprd01.prod.outlook.com

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Unlogged relation copy is not fsync'd

2023-09-04 Thread Michael Paquier

On Fri, Aug 25, 2023 at 03:47:27PM +0300, Heikki Linnakangas wrote:
> That 'copying_initfork' condition is wrong. The first hint of that is that
> 'use_wal' is always true for an init fork. I believe this was meant to check
> for 'relpersistence == RELPERSISTENCE_UNLOGGED'. Otherwise, this bad thing
> can happen:
> 
> 1. Create an unlogged table
> 2. ALTER TABLE unlogged_tbl SET TABLESPACE ... -- This calls
> RelationCopyStorage
> 3. a checkpoint happens while the command is running
> 4. After the ALTER TABLE has finished, shut down postgres cleanly.
> 5. Lose power
> 
> When you recover, the unlogged table is not reset, because it was a clean
> postgres shutdown. But the part of the file that was copied after the
> checkpoint at step 3 was never fsync'd, so part of the file is lost. I was
> able to reproduce with a VM that I kill to simulate step 5.

Oops.

The comment at the top of smgrimmedsync() looks incorrect now to me
now.  When copying the data from something else than an init fork, the
relation pages are not WAL-logged for an unlogged relation.

> Fortunately the fix is trivial, see attached. I suspect we have similar
> problems in a few other places, though. end_heap_rewrite(), _bt_load(), and
> gist_indexsortbuild have a similar-looking smgrimmedsync() calls, with no
> consideration for unlogged relations at all. I haven't tested those, but
> they look wrong to me.

Oops ^ N.  And end_heap_rewrite() mentions directly
RelationCopyStorage().. 
--
Michael


signature.asc
Description: PGP signature

79 matches

Mail list logo