Re: POC: GROUP BY optimization

2024-01-14 Thread Alexander Korotkov
On Mon, Jan 15, 2024 at 8:42 AM Richard Guo  wrote:
> On Mon, Jan 15, 2024 at 8:20 AM Alexander Korotkov  
> wrote:
>>
>> Thank you for providing the test case relevant for this code change.
>> The revised patch incorporating this change is attached.  Now the
>> patchset looks good to me.  I'm going to push it if there are no
>> objections.
>
> Seems I'm late for the party.  Can we hold for several more days?  I'd
> like to have a review on this patch.

Sure, NP.  I'll hold it till your review.

--
Regards,
Alexander Korotkov




Re: In-placre persistance change of a relation

2024-01-14 Thread Kyotaro Horiguchi
At Tue, 9 Jan 2024 15:07:20 +0530, vignesh C  wrote in 
> CFBot shows compilation issues at [1] with:

Thanks!

The reason for those errors was that I didn't consider Meson at the
time. Additionally, the signature change of reindex_index() caused the
build failure. I fixed both issues. While addressing these issues, I
modified the simpleundolog module to honor
wal_sync_method. Previously, the sync method for undo logs was
determined independently, separate from xlog.c. However, I'm still not
satisfied with the method for handling PG_O_DIRECT.

In this version, I have added the changes to enable the use of
wal_sync_method outside of xlog.c as the first part of the patchset.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 40749357f24adf89dc79db9b34f5c053288489bb Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Mon, 15 Jan 2024 15:57:53 +0900
Subject: [PATCH v31 1/3] Export wal_sync_method related functions

Export several functions related to wal_sync_method for use in
subsequent commits. Since PG_O_DIRECT cannot be used in those commits,
the new function XLogGetSyncBit() will mask PG_O_DIRECT.
---
 src/backend/access/transam/xlog.c | 73 +--
 src/include/access/xlog.h |  2 +
 2 files changed, 52 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 478377c4a2..c5f51849ee 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8403,21 +8403,29 @@ assign_wal_sync_method(int new_wal_sync_method, void *extra)
 	}
 }
 
+/*
+ * Exported version of get_sync_bit()
+ *
+ * Do not expose PG_O_DIRECT for uses outside xlog.c.
+ */
+int
+XLogGetSyncBit(void)
+{
+	return get_sync_bit(wal_sync_method) & ~PG_O_DIRECT;
+}
+
 
 /*
- * Issue appropriate kind of fsync (if any) for an XLOG output file.
+ * Issue appropriate kind of fsync (if any) according to wal_sync_method.
  *
- * 'fd' is a file descriptor for the XLOG file to be fsync'd.
- * 'segno' is for error reporting purposes.
+ * 'fd' is a file descriptor for the file to be fsync'd.
  */
-void
-issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
+const char *
+XLogFsyncFile(int fd)
 {
-	char	   *msg = NULL;
+	const char *msg = NULL;
 	instr_time	start;
 
-	Assert(tli != 0);
-
 	/*
 	 * Quick exit if fsync is disabled or write() has already synced the WAL
 	 * file.
@@ -8425,7 +8433,7 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
 	if (!enableFsync ||
 		wal_sync_method == WAL_SYNC_METHOD_OPEN ||
 		wal_sync_method == WAL_SYNC_METHOD_OPEN_DSYNC)
-		return;
+		return NULL;
 
 	/* Measure I/O timing to sync the WAL file */
 	if (track_wal_io_timing)
@@ -8460,19 +8468,6 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
 			break;
 	}
 
-	/* PANIC if failed to fsync */
-	if (msg)
-	{
-		char		xlogfname[MAXFNAMELEN];
-		int			save_errno = errno;
-
-		XLogFileName(xlogfname, tli, segno, wal_segment_size);
-		errno = save_errno;
-		ereport(PANIC,
-(errcode_for_file_access(),
- errmsg(msg, xlogfname)));
-	}
-
 	pgstat_report_wait_end();
 
 	/*
@@ -8486,7 +8481,39 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
 		INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_sync_time, end, start);
 	}
 
-	PendingWalStats.wal_sync++;
+	if (msg != NULL)
+		PendingWalStats.wal_sync++;
+
+	return msg;
+}
+
+/*
+ * Issue appropriate kind of fsync (if any) for an XLOG output file.
+ *
+ * 'fd' is a file descriptor for the XLOG file to be fsync'd.
+ * 'segno' is for error reporting purposes.
+ */
+void
+issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
+{
+	const char	*msg;
+	
+	Assert(tli != 0);
+
+	msg = XLogFsyncFile(fd);
+
+	/* PANIC if failed to fsync */
+	if (msg)
+	{
+		char		xlogfname[MAXFNAMELEN];
+		int			save_errno = errno;
+
+		XLogFileName(xlogfname, tli, segno, wal_segment_size);
+		errno = save_errno;
+		ereport(PANIC,
+(errcode_for_file_access(),
+ errmsg(msg, xlogfname)));
+	}
 }
 
 /*
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 301c5fa11f..2a0d65b537 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -217,6 +217,8 @@ extern void xlog_redo(struct XLogReaderState *record);
 extern void xlog_desc(StringInfo buf, struct XLogReaderState *record);
 extern const char *xlog_identify(uint8 info);
 
+extern int XLogGetSyncBit(void);
+extern const char *XLogFsyncFile(int fd);
 extern void issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli);
 
 extern bool RecoveryInProgress(void);
-- 
2.39.3

>From 5c120b94c407b971485ab52133399305e5e81a88 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 31 Aug 2023 11:49:10 +0900
Subject: [PATCH v31 2/3] Introduce undo log implementation

This patch adds a simple implementation of UNDO log feature.
---
 src/backend/access/transam/Makefile|   1 +
 src/backend/access/transam/meson.build |   1 +
 src/backend/access/transam/rmgr.c  |   4 +-
 

Re: Add PQsendSyncMessage() to libpq

2024-01-14 Thread Michael Paquier
On Wed, Jan 10, 2024 at 03:40:36PM +0900, Michael Paquier wrote:
> Hence, as a whole, wouldn't it be more consistent if the new
> PQsendPipelineSync() and the existing PQpipelineSync() call an
> internal static routine (PQPipelineSyncInternal?) that can switch
> between both modes?  Let's just make the extra argument a boolean.

Yeah, I'll go with that after a second look.  Attached is what I am
finishing with, and I have reproduced some numbers with the pgbench
metacommand mentioned upthread, which is reeeaaally nice.

I have also made a few edits to the tests.
--
Michael
From b653759f8dfdfe83d0d8bc1d4a0ac9d4e272a061 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 15 Jan 2024 16:15:00 +0900
Subject: [PATCH v6] Add PQsendPipelineSync() to libpq

This new function is equivalent to PQpipelineSync(), except
that it does not flush anything to the server; the user must
subsequently call PQflush() instead. Its purpose is to reduce
the system call overhead of pipeline mode.
---
 src/interfaces/libpq/exports.txt  |  1 +
 src/interfaces/libpq/fe-exec.c| 41 +++--
 src/interfaces/libpq/libpq-fe.h   |  1 +
 .../modules/libpq_pipeline/libpq_pipeline.c   | 43 ++
 .../traces/multi_pipelines.trace  | 11 +
 doc/src/sgml/libpq.sgml   | 45 ---
 6 files changed, 131 insertions(+), 11 deletions(-)

diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index 28b861fd93..088592deb1 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -192,3 +192,4 @@ PQclosePortal 189
 PQsendClosePrepared   190
 PQsendClosePortal 191
 PQchangePassword  192
+PQsendPipelineSync193
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index 106d14e6ee..9e7e670921 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -81,6 +81,7 @@ static int	PQsendTypedCommand(PGconn *conn, char command, char type,
 			   const char *target);
 static int	check_field_number(const PGresult *res, int field_num);
 static void pqPipelineProcessQueue(PGconn *conn);
+static int	pqPipelineSyncInternal(PGconn *conn, bool immediate_flush);
 static int	pqPipelineFlush(PGconn *conn);
 
 
@@ -3224,6 +3225,25 @@ pqPipelineProcessQueue(PGconn *conn)
 /*
  * PQpipelineSync
  *		Send a Sync message as part of a pipeline, and flush to server
+ */
+int
+PQpipelineSync(PGconn *conn)
+{
+	return pqPipelineSyncInternal(conn, true);
+}
+
+/*
+ * PQsendPipelineSync
+ *		Send a Sync message as part of a pipeline, without flushing to server
+ */
+int
+PQsendPipelineSync(PGconn *conn)
+{
+	return pqPipelineSyncInternal(conn, false);
+}
+
+/*
+ * Wrapper for PQpipelineSync and PQsendPipelineSync.
  *
  * It's legal to start submitting more commands in the pipeline immediately,
  * without waiting for the results of the current pipeline. There's no need to
@@ -3240,9 +3260,12 @@ pqPipelineProcessQueue(PGconn *conn)
  * The connection will remain in pipeline mode and unavailable for new
  * synchronous command execution functions until all results from the pipeline
  * are processed by the client.
+ *
+ * immediate_flush controls if the flush happens immediately after sending the
+ * Sync message or not.
  */
-int
-PQpipelineSync(PGconn *conn)
+static int
+pqPipelineSyncInternal(PGconn *conn, bool immediate_flush)
 {
 	PGcmdQueueEntry *entry;
 
@@ -3288,9 +3311,19 @@ PQpipelineSync(PGconn *conn)
 	/*
 	 * Give the data a push.  In nonblock mode, don't complain if we're unable
 	 * to send it all; PQgetResult() will do any additional flushing needed.
+	 * If immediate_flush is disabled, the data is pushed if we are past the
+	 * size threshold.
 	 */
-	if (PQflush(conn) < 0)
-		goto sendFailed;
+	if (immediate_flush)
+	{
+		if (pqFlush(conn) < 0)
+			goto sendFailed;
+	}
+	else
+	{
+		if (pqPipelineFlush(conn) < 0)
+			goto sendFailed;
+	}
 
 	/* OK, it's launched! */
 	pqAppendCmdQueueEntry(conn, entry);
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index f0ec660cb6..defc415fa3 100644
--- a/src/interfaces/libpq/libpq-fe.h
+++ b/src/interfaces/libpq/libpq-fe.h
@@ -474,6 +474,7 @@ extern int	PQenterPipelineMode(PGconn *conn);
 extern int	PQexitPipelineMode(PGconn *conn);
 extern int	PQpipelineSync(PGconn *conn);
 extern int	PQsendFlushRequest(PGconn *conn);
+extern int	PQsendPipelineSync(PGconn *conn);
 
 /* LISTEN/NOTIFY support */
 extern PGnotify *PQnotifies(PGconn *conn);
diff --git a/src/test/modules/libpq_pipeline/libpq_pipeline.c b/src/test/modules/libpq_pipeline/libpq_pipeline.c
index 71cd04c5f2..c68e20d0b5 100644
--- a/src/test/modules/libpq_pipeline/libpq_pipeline.c
+++ b/src/test/modules/libpq_pipeline/libpq_pipeline.c
@@ -162,6 +162,7 @@ test_multi_pipelines(PGconn *conn)
 	if (PQenterPipelineMode(conn) != 1)
 		pg_fatal("failed to enter pipeline mode: %s", 

Re: pgbnech: allow to cancel queries during benchmark

2024-01-14 Thread Tatsuo Ishii
> On Wed, 6 Sep 2023 20:13:34 +0900
> Yugo NAGATA  wrote:
>  
>> I attached the updated patch v3. The changes since the previous
>> patch includes the following;
>> 
>> I removed the unnecessary condition (&& false) that you
>> pointed out in [1].
>> 
>> The test was rewritten by using IPC::Run signal() and integrated
>> to "001_pgbench_with_server.pl". This test is skipped on Windows
>> because SIGINT causes to terminate the test itself as discussed
>> in [2] about query cancellation test in psql.
>> 
>> I added some comments to describe how query cancellation is
>> handled as I explained in [1].
>> 
>> Also, I found the previous patch didn't work on Windows so fixed it.
>> On non-Windows system, a thread waiting a response of long query can
>> be interrupted by SIGINT, but on Windows, threads do not return from
>> waiting until queries they are running are cancelled. This is because,
>> when the signal is received, the system just creates a new thread to
>> execute the callback function specified by setup_cancel_handler, and
>> other thread continue to run[3]. Therefore, the queries have to be
>> cancelled in the callback function.
>> 
>> [1] 
>> https://www.postgresql.org/message-id/a58388ac-5411-4760-ea46-71324d8324cb%40mines-paristech.fr
>> [2] 
>> https://www.postgresql.org/message-id/20230906004524.2fd6ee049f8a6c6f2690b99c%40sraoss.co.jp
>> [3] https://learn.microsoft.com/en-us/windows/console/handlerroutine
> 
> I found that --disable-thread-safety option was removed in 68a4b58eca0329.
> So, I removed codes involving ENABLE_THREAD_SAFETY from the patch.
> 
> Also, I wrote a commit log draft.

> Previously, Ctrl+C during benchmark killed pgbench immediately,
> but queries running at that time were not cancelled.

Better to mention explicitely that queries keep on running on the
backend. What about this?

Previously, Ctrl+C during benchmark killed pgbench immediately, but
queries were not canceled and they keep on running on the backend
until they tried to send the result to pgbench.

> The commit
> fixes this so that cancel requests are sent for all connections
> before pgbench exits.

"sent for" -> "sent to"

> Attached is the updated version, v4.

+/* send cancel requests to all connections */
+static void
+cancel_all()
+{
+   for (int i = 0; i < nclients; i++)
+   {
+   char errbuf[1];
+   if (client_states[i].cancel != NULL)
+   (void) PQcancel(client_states[i].cancel, errbuf, 
sizeof(errbuf));
+   }
+}
+

Why in case of errors from PQCancel the error message is neglected? I
think it's better to print out the error message in case of error.

+* On non-Windows, any callback function is not set. When SIGINT is
+* received, CancelRequested is just set, and only thread #0 is 
interrupted
+* and returns from waiting input from the backend. After that, the 
thread
+* sends cancel requests to all benchmark queries.

The second line is a little bit long according to the coding
standard. Fix like this?

 * On non-Windows, any callback function is not set. When SIGINT is
 * received, CancelRequested is just set, and only thread #0 is
 * interrupted and returns from waiting input from the backend. After
 * that, the thread sends cancel requests to all benchmark queries.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Fix race condition in InvalidatePossiblyObsoleteSlot()

2024-01-14 Thread Bertrand Drouvot
Hi hackers,

While working on [1], we discovered (thanks Alexander for the testing) that an
conflicting active logical slot on a standby could be "terminated" without
leading to an "obsolete" message (see [2]).

Indeed, in case of an active slot we proceed in 2 steps in
InvalidatePossiblyObsoleteSlot():

- terminate the backend holding the slot
- report the slot as obsolete

This is racy because between the two we release the mutex on the slot, which
means that the slot's effective_xmin and effective_catalog_xmin could advance
during that time (leading to exit the loop).

I think that holding the mutex longer is not an option (given what we'd to do
while holding it) so the attached proposal is to record the effective_xmin and
effective_catalog_xmin instead that was used during the backend termination.

[1]: 
https://www.postgresql.org/message-id/flat/bf67e076-b163-9ba3-4ade-b9fc51a3a8f6%40gmail.com
[2]: 
https://www.postgresql.org/message-id/7c025095-5763-17a6-8c80-899b35bd0459%40gmail.com

Looking forward to your feedback,

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 2e4d580af4a1e6b2cb000d4e9db2c42e40aa4cd2 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Thu, 11 Jan 2024 13:57:53 +
Subject: [PATCH v1] Fix race condition in InvalidatePossiblyObsoleteSlot()

In case of an active slot we proceed in 2 steps:
- terminate the backend holding the slot
- report the slot as obsolete

This is racy because between the two we release the mutex on the slot, which
means the effective_xmin and effective_catalog_xmin could advance during that time.

Holding the mutex longer is not an option (given what we'd to do while holding it)
so record the effective_xmin and effective_catalog_xmin instead that was used during
the backend termination.
---
 src/backend/replication/slot.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)
 100.0% src/backend/replication/

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 52da694c79..2e34cca0e8 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1352,6 +1352,9 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 {
 	int			last_signaled_pid = 0;
 	bool		released_lock = false;
+	bool		terminated = false;
+	XLogRecPtr	initial_effective_xmin;
+	XLogRecPtr	initial_catalog_effective_xmin;
 
 	for (;;)
 	{
@@ -1396,15 +1399,20 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 case RS_INVAL_HORIZON:
 	if (!SlotIsLogical(s))
 		break;
+	if (!terminated)
+	{
+		initial_effective_xmin = s->effective_xmin;
+		initial_catalog_effective_xmin = s->effective_catalog_xmin;
+	}
 	/* invalid DB oid signals a shared relation */
 	if (dboid != InvalidOid && dboid != s->data.database)
 		break;
-	if (TransactionIdIsValid(s->effective_xmin) &&
-		TransactionIdPrecedesOrEquals(s->effective_xmin,
+	if (TransactionIdIsValid(initial_effective_xmin) &&
+		TransactionIdPrecedesOrEquals(initial_effective_xmin,
 	  snapshotConflictHorizon))
 		conflict = cause;
-	else if (TransactionIdIsValid(s->effective_catalog_xmin) &&
-			 TransactionIdPrecedesOrEquals(s->effective_catalog_xmin,
+	else if (TransactionIdIsValid(initial_catalog_effective_xmin) &&
+			 TransactionIdPrecedesOrEquals(initial_catalog_effective_xmin,
 		   snapshotConflictHorizon))
 		conflict = cause;
 	break;
@@ -1499,6 +1507,7 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 	(void) kill(active_pid, SIGTERM);
 
 last_signaled_pid = active_pid;
+terminated = true;
 			}
 
 			/* Wait until the slot is released. */
-- 
2.34.1



Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-14 Thread Masahiko Sawada
On Thu, Jan 11, 2024 at 10:24 AM Sutou Kouhei  wrote:
>
> Hi,
>
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Wed, 10 Jan 2024 16:53:48 +0900,
>   Masahiko Sawada  wrote:
>
> >> Interesting. But I feel that it introduces another (a bit)
> >> tricky mechanism...
> >
> > Right. On the other hand, I don't think the idea 3 is good for the
> > same reason Michael-san pointed out before[1][2].
> >
> > [1] https://www.postgresql.org/message-id/ZXEUIy6wl4jHy6Nm%40paquier.xyz
> > [2] https://www.postgresql.org/message-id/ZXKm9tmnSPIVrqZz%40paquier.xyz
>
> I think that the important part of the Michael-san's opinion
> is "keep COPY TO implementation and COPY FROM implementation
> separated for maintainability".
>
> The patch focused in [1][2] uses one routine for both of
> COPY TO and COPY FROM. If we use the approach, we need to
> change one common routine from copyto.c and copyfrom.c (or
> export callbacks from copyto.c and copyfrom.c and use them
> in copyto.c to construct one common routine). It's
> the problem.
>
> The idea 3 still has separated routines for COPY TO and COPY
> FROM. So I think that it still keeps COPY TO implementation
> and COPY FROM implementation separated.
>
> >> BTW, we also need to set .type:
> >>
> >>  .routine = COPYTO_ROUTINE (
> >>  .type = T_CopyToFormatRoutine,
> >>  .start_fn = testfmt_copyto_start,
> >>  .onerow_fn = testfmt_copyto_onerow,
> >>  .end_fn = testfmt_copyto_end
> >>  )
> >
> > I think it's fine as the same is true for table AM.
>
> Ah, sorry. I should have said explicitly. I don't this that
> it's not a problem. I just wanted to say that it's missing.

Thank you for pointing it out.

>
>
> Defining one more static const struct instead of providing a
> convenient (but a bit tricky) macro may be straightforward:
>
> static const CopyToFormatRoutine testfmt_copyto_routine = {
> .type = T_CopyToFormatRoutine,
> .start_fn = testfmt_copyto_start,
> .onerow_fn = testfmt_copyto_onerow,
> .end_fn = testfmt_copyto_end
> };
>
> static const CopyFormatRoutine testfmt_copyto_handler = {
> .type = T_CopyFormatRoutine,
> .is_from = false,
> .routine = (Node *) _copyto_routine
> };

Yeah, IIUC this is the option 2 you mentioned[1]. I think we can go
with this idea as it's the simplest. If we find a better way, we can
change it later. That is CopyFormatRoutine will be like:

typedef struct CopyFormatRoutine
{
NodeTag type;

/* either CopyToFormatRoutine or CopyFromFormatRoutine */
Node   *routine;
}   CopyFormatRoutine;

And the core can check the node type of the 'routine7 in the
CopyFormatRoutine returned by extensions.

Regards,

[1] 
https://www.postgresql.org/message-id/20240110.120034.501385498034538233.kou%40clear-code.com

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Oom on temp (un-analyzed table caused by JIT) V16.1

2024-01-14 Thread Pavel Stehule
Hi

po 15. 1. 2024 v 7:24 odesílatel Kirk Wolak  napsal:

> Daniel,
>   You have a commit [1] that MIGHT fix this.
> I have a script that recreates the problem, using random data in pg_temp.
> And a nested cursor.
>
>   It took me a few days to reduce this from actual code that was
> experiencing this.  If I turn off JIT, the problem goes away.  (if I don't
> FETCH the first row, the memory loss does not happen.  Maybe because
> opening a cursor is more decoration/prepare)
>
>   I don't have an easy way to test this script right now against the
> commit.
> I am hopeful that your fix fixes this.
>
>   This was my first OOM issue in PG in 3yrs of working with it.
>
>   The problem goes away if the TABLE is analyzed, or JIT is disabled.
>
>   The current script, if run, will consume about 25% of my system memory
> (10GB).
> Just call the function below until it dies if that's what you need.  The
> only way to get the memory back down is to close the connection.
>
> SELECT pg_temp.fx(497);
>
> Surprisingly, to me, the report from pg_get_backend_memory_contexts()
> doesn't really show "missing memory", which  I thought it would.  (FWIW, we
> caught this with multiple rounds of testing our code, slowing down, then
> crashing...  Is there ANY way to interrogate that we are above X% of system
> memory so we know to let this backend go?)
>

I wrote simple extension that can show memory allocation from system
perspective

https://github.com/okbob/pgmeminfo



>
> It takes about 18 minutes to run on my 4 CPU VM.
>
> For now, we are going to add some ANALYZE statements to our code.
>

remember - don't run anything without VACUUM ANALYZE.

Without it, the queries can be slow - ANALYZE sets stats, VACUUM prepare
visibility maps - without visibility maps index only scan cannot be used

autovacuum doesn't see into opened transactions, and autovacuum is executed
in 1minute cycles. Autovacuum doesn't see temporary tables too. Temporary
tables (data) are visible only from owner process.




> We will consider disabling JIT.
>

Has sense only for bigger analytics queries.

Regards

Pavel


>
> Thanks,
> Kirk
> [1] = 2cf50585e54a7b0c6bc62a087c69043ae57e4252
> 
>
>
>
>
>


Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-14 Thread Masahiko Sawada
On Mon, Jan 15, 2024 at 8:21 AM Alexander Korotkov  wrote:
>
> On Sun, Jan 14, 2024 at 10:35 PM Masahiko Sawada  
> wrote:
> > Thank you for updating the patch. Here are two comments:
> >
> > ---
> > +   if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_UNSPECIFIED &&
> > +   cstate->num_errors > 0)
> > +   ereport(WARNING,
> > +   errmsg("%zd rows were skipped due to data type 
> > incompatibility",
> > +  cstate->num_errors));
> > +
> > /* Done, clean up */
> > error_context_stack = errcallback.previous;
> >
> > If a malformed input is not the last data, the context message seems odd:
> >
> > postgres(1:1769258)=# create table test (a int);
> > CREATE TABLE
> > postgres(1:1769258)=# copy test from stdin (save_error_to none);
> > Enter data to be copied followed by a newline.
> > End with a backslash and a period on a line by itself, or an EOF signal.
> > >> a
> > >> 1
> > >>
> > 2024-01-15 05:05:53.980 JST [1769258] WARNING:  1 rows were skipped
> > due to data type incompatibility
> > 2024-01-15 05:05:53.980 JST [1769258] CONTEXT:  COPY test, line 3: ""
> > COPY 1
> >
> > I think it's better to report the WARNING after resetting the
> > error_context_stack. Or is a WARNING really appropriate here? The
> > v15-0001-Make-COPY-FROM-more-error-tolerant.patch[1] uses NOTICE but
> > the v1-0001-Add-new-COPY-option-SAVE_ERROR_TO.patch[2] changes it to
> > WARNING without explanation.
>
> Thank you for noticing this.  I think NOTICE is more appropriate here.
> There is nothing to "worry" about: the user asked to ignore the errors
> and we did.  And yes, it doesn't make sense to use the last line as
> the context.  Fixed.
>
> > ---
> > +-- test missing data: should fail
> > +COPY check_ign_err FROM STDIN WITH (save_error_to none);
> > +1  {1}
> > +\.
> >
> > We might want to cover the extra data cases too.
>
> Agreed, the relevant test is added.

Thank you for updating the patch. I have one minor point:

+   if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_UNSPECIFIED &&
+   cstate->num_errors > 0)
+   ereport(NOTICE,
+   errmsg("%zd rows were skipped due to
data type incompatibility",
+  cstate->num_errors));
+

We can use errmsg_plural() instead.

I have a question about the option values; do you think we need to
have another value of SAVE_ERROR_TO option to explicitly specify the
current default behavior, i.e. not accept any error? With the v4
patch, the user needs to omit SAVE_ERROR_TO option to accept errors
during COPY FROM. If we change the default behavior in the future,
many users will be affected and probably end up changing their
applications to keep the current default behavior.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: POC: GROUP BY optimization

2024-01-14 Thread Richard Guo
On Mon, Jan 15, 2024 at 8:20 AM Alexander Korotkov 
wrote:

> Thank you for providing the test case relevant for this code change.
> The revised patch incorporating this change is attached.  Now the
> patchset looks good to me.  I'm going to push it if there are no
> objections.


Seems I'm late for the party.  Can we hold for several more days?  I'd
like to have a review on this patch.

Thanks
Richard


Re: Show WAL write and fsync stats in pg_stat_io

2024-01-14 Thread Michael Paquier
On Fri, Jan 12, 2024 at 04:23:26PM +0300, Nazir Bilal Yavuz wrote:
> On Thu, 11 Jan 2024 at 17:28, Melanie Plageman  
> wrote:
>> Even if we made a separate view for WAL I/O stats, we would still have
>> this issue of variable sized I/O vs block sized I/O and would probably
>> end up solving it with separate columns for the number of bytes and
>> number of operations.
> 
> Yes, I think it is more about flexibility and not changing the already
> published pg_stat_io view.

I don't know.  Adding more columns or changing op_bytes with an extra
mode that reflects on what the other columns mean is kind of the same
thing to me: we want pg_stat_io to report more modes so as all I/O can
be evaluated from a single view, but the complication is now that
everything is tied to BLCKSZ.

IMHO, perhaps we'd better put this patch aside until we are absolutely
*sure* of what we want to achieve when it comes to WAL, and I am
afraid that this cannot happen until we're happy with the way we
handle WAL reads *and* writes, including WAL receiver or anything that
has the idea of pulling its own page callback with
XLogReaderAllocate() in the backend.  Well, writes should be
relatively "easy" as things happen with XLOG_BLCKSZ, mainly, but
reads are the unknown part.

That also seems furiously related to the work happening with async I/O
or the fact that we may want to have in the view a separate meaning
for cached pages or pages read directly from disk.  The worst thing
that we would do is rush something into the tree and then have to deal
with the aftermath of what we'd need to deal with in terms of
compatibility depending on the state of the other I/O related work
when the new view is released.  That would not be fun for the users
and any hackers who would have to deal with that (aka mainly me if I
were to commit something), because pg_stat_io could mean something in
version N, still mean something entirely different in version N+1.
--
Michael


signature.asc
Description: PGP signature


Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-14 Thread Sutou Kouhei
Hi,

If there are no more comments for the current design, I'll
start implementing this feature with the following
approaches for "Discussing" items:

> 3.1 Should we use one function(internal) for COPY TO/FROM
> handlers or two function(internal)s (one is for COPY TO
> handler and another is for COPY FROM handler)?
> [4]

I'll choose "one function(internal) for COPY TO/FROM handlers".

> 3.4 Should we export Copy{To,From}State? Or should we just
> provide getters/setters to access Copy{To,From}State
> internal?
> [10]

I'll export Copy{To,From}State.


Thanks,
-- 
kou

In <20240112.144615.157925223373344229@clear-code.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 12 Jan 2024 14:46:15 +0900 (JST),
  Sutou Kouhei  wrote:

> Hi,
> 
> Here is the current summary for a this discussion to make
> COPY format extendable. It's for reaching consensus and
> starting implementing the feature. (I'll start implementing
> the feature once we reach consensus.) If you have any
> opinion, please share it.
> 
> Confirmed:
> 
> 1.1 Making COPY format extendable will not reduce performance.
> [1]
> 
> Decisions:
> 
> 2.1 Use separated handler for COPY TO and COPY FROM because
> our COPY TO implementation (copyto.c) and COPY FROM
> implementation (coypfrom.c) are separated.
> [2]
> 
> 2.2 Don't use system catalog for COPY TO/FROM handlers. We can
> just use a function(internal) that returns a handler instead.
> [3]
> 
> 2.3 The implementation must include documentation.
> [5]
> 
> 2.4 The implementation must include test.
> [6]
> 
> 2.5 The implementation should be consist of small patches
> for easy to review.
> [6]
> 
> 2.7 Copy{To,From}State must have a opaque space for
> handlers.
> [8]
> 
> 2.8 Export CopySendData() and CopySendEndOfRow() for COPY TO
> handlers.
> [8]
> 
> 2.9 Make "format" in PgMsg_CopyOutResponse message
> extendable.
> [9]
> 
> 2.10 Make makeNode() call avoidable in function(internal)
>  that returns COPY TO/FROM handler.
>  [9]
> 
> 2.11 Custom COPY TO/FROM handlers must be able to parse
>  their options.
>  [11]
> 
> Discussing:
> 
> 3.1 Should we use one function(internal) for COPY TO/FROM
> handlers or two function(internal)s (one is for COPY TO
> handler and another is for COPY FROM handler)?
> [4]
> 
> 3.2 If we use separated function(internal) for COPY TO/FROM
> handlers, we need to define naming rule. For example,
> _to(internal) for COPY TO handler and
> _from(internal) for COPY FROM handler.
> [4]
> 
> 3.3 Should we use prefix or suffix for function(internal)
> name to avoid name conflict with other handlers such as
> tablesample handlers?
> [7]
> 
> 3.4 Should we export Copy{To,From}State? Or should we just
> provide getters/setters to access Copy{To,From}State
> internal?
> [10]
> 
> 
> [1] 
> https://www.postgresql.org/message-id/flat/20231204.153548.2126325458835528809.kou%40clear-code.com
> [2] https://www.postgresql.org/message-id/flat/ZXEUIy6wl4jHy6Nm%40paquier.xyz
> [3] 
> https://www.postgresql.org/message-id/flat/CAD21AoAhcZkAp_WDJ4sSv_%2Bg2iCGjfyMFgeu7MxjnjX_FutZAg%40mail.gmail.com
> [4] 
> https://www.postgresql.org/message-id/flat/CAD21AoDkoGL6yJ_HjNOg9cU%3DaAdW8uQ3rSQOeRS0SX85LPPNwQ%40mail.gmail.com
> [5] 
> https://www.postgresql.org/message-id/flat/TY3PR01MB9889C9234CD220A3A7075F0DF589A%40TY3PR01MB9889.jpnprd01.prod.outlook.com
> [6] https://www.postgresql.org/message-id/flat/ZXbiPNriHHyUrcTF%40paquier.xyz
> [7] 
> https://www.postgresql.org/message-id/flat/20231214.184414.2179134502876898942.kou%40clear-code.com
> [8] 
> https://www.postgresql.org/message-id/flat/20231221.183504.1240642084042888377.kou%40clear-code.com
> [9] https://www.postgresql.org/message-id/flat/ZYTfqGppMc9e_w2k%40paquier.xyz
> [10] 
> https://www.postgresql.org/message-id/flat/CAD21AoD%3DUapH4Wh06G6H5XAzPJ0iJg9YcW8r7E2UEJkZ8QsosA%40mail.gmail.com
> [11] 
> https://www.postgresql.org/message-id/flat/20240110.152023.1920937326588672387.kou%40clear-code.com
> 
> 
> Thanks,
> -- 
> kou
> 
> 




Oom on temp (un-analyzed table caused by JIT) V16.1

2024-01-14 Thread Kirk Wolak
Daniel,
  You have a commit [1] that MIGHT fix this.
I have a script that recreates the problem, using random data in pg_temp.
And a nested cursor.

  It took me a few days to reduce this from actual code that was
experiencing this.  If I turn off JIT, the problem goes away.  (if I don't
FETCH the first row, the memory loss does not happen.  Maybe because
opening a cursor is more decoration/prepare)

  I don't have an easy way to test this script right now against the commit.
I am hopeful that your fix fixes this.

  This was my first OOM issue in PG in 3yrs of working with it.

  The problem goes away if the TABLE is analyzed, or JIT is disabled.

  The current script, if run, will consume about 25% of my system memory
(10GB).
Just call the function below until it dies if that's what you need.  The
only way to get the memory back down is to close the connection.

SELECT pg_temp.fx(497);

Surprisingly, to me, the report from pg_get_backend_memory_contexts()
doesn't really show "missing memory", which  I thought it would.  (FWIW, we
caught this with multiple rounds of testing our code, slowing down, then
crashing...  Is there ANY way to interrogate that we are above X% of system
memory so we know to let this backend go?)

It takes about 18 minutes to run on my 4 CPU VM.

For now, we are going to add some ANALYZE statements to our code.
We will consider disabling JIT.

Thanks,
Kirk
[1] = 2cf50585e54a7b0c6bc62a087c69043ae57e4252

CREATE TABLE pg_temp.parts
(
seid bigint,
r_field_name_1   smallint,
fr_field_namesmallint   NOT NULL,
p1_field_namevarchar(4),
qty_field_name   integer,
p5_field_namevarchar(30),
partnum  varchar(30),
st_field_namesmallint DEFAULT 0 NOT NULL
); -- drop table pg_temp.parts;

INSERT INTO pg_temp.parts (seid, partnum, qty_field_name, fr_field_name, 
st_field_name)
SELECT (RANDOM() * 3821 + 1)::bigint   AS seid,
   (RANDOM() * 123456789)::textAS 
partnum,
   CASE
   WHEN q.rnd BETWEEN 0 AND 0.45 THEN FLOOR(RANDOM() * 900) + 100 -- 
Random number in the range [100, 999]
   WHEN q.rnd BETWEEN 0.46 AND 0.96 THEN LEAST(TRUNC(FLOOR(RANDOM() * 
99) + 1000)::int, 99::int) -- Random number in the range [1000, ]
   ELSE FLOOR(RANDOM() * 900) + 100 -- Random number in the 
range [10, 99]
   END AS 
qty_field_name,
   CASE WHEN RANDOM() < 0.72 THEN 0::smallint ELSE 1::smallint END AS 
fr_field_name,
   CASE WHEN RANDOM() < 0.46 THEN 1::smallint ELSE 2::smallint END AS 
st_field_name
  FROM (SELECT RANDOM() AS rnd, x FROM GENERATE_SERIES(1, 90_000_000) x) q;

CREATE INDEX idx_parts_supid ON pg_temp.parts USING btree (seid, p1_field_name, 
partnum, st_field_name, r_field_name_1, qty_field_name);
CREATE INDEX idx_parts_p5 ON pg_temp.parts USING btree (p5_field_name, seid, 
st_field_name, r_field_name_1, p1_field_name);
CREATE INDEX idx_parts_partnum ON pg_temp.parts USING btree (partnum, seid, 
st_field_name, r_field_name_1, p1_field_name);

CREATE OR REPLACE FUNCTION pg_temp.fx(asupplier bigint = 497 )
RETURNS void
LANGUAGE plpgsql
AS
$function$
DECLARE
supplier_parts   CURSOR (sid bigint) FOR  -- Again, selecting with 
COUNT() would reduce 1 query per row!
SELECT
partnum, qty_field_name, st_field_name, sum(qty_field_name) as qty
FROM pg_temp.parts
WHERE seid = sid AND (st_field_name = 1)
GROUP BY partnum, qty_field_name, st_field_name
ORDER BY partnum, qty_field_name, st_field_name;
supplier_part_qty_matches CURSOR (sid bigint, pnum varchar(30), pqty 
bigint) FOR
SELECT DISTINCT
seid, fr_field_name, partnum, st_field_name
FROM pg_temp.parts
WHERE seid <> sid AND partnum = pnum AND qty_field_name = pqty
ORDER BY seid, partnum;

a_partnum varchar(30);
a_qty integer;
a_st  smallint;
a_cnt integer = 0;
b_partnum varchar(30);
b_fr  smallint;
b_seidbigint;
b_st  smallint;
b_cnt bigint = 0;
BEGIN
RAISE NOTICE '%', (SELECT (PG_SIZE_PRETTY(SUM(used_bytes)), 
PG_SIZE_PRETTY(SUM(total_bytes)), PG_SIZE_PRETTY(SUM(free_bytes))) FROM 
pg_get_backend_memory_contexts());
OPEN supplier_parts (asupplier);
LOOP
FETCH supplier_parts INTO a_partnum, a_qty, a_st, a_qty;
EXIT WHEN NOT FOUND;
a_cnt := a_cnt + 1;
OPEN supplier_part_qty_matches (sid := asupplier, pnum := a_partnum, 
pqty := a_qty);
LOOP
FETCH supplier_part_qty_matches INTO b_seid, b_fr, b_partnum, b_st;
b_cnt := b_cnt + 1;
EXIT WHEN TRUE;  -- no Need to 

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-14 Thread Sutou Kouhei
Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 12 Jan 2024 14:40:41 +0800,
  Junwang Zhao  wrote:

>> Could you clarify what should we discuss? We should require
>> that COPY TO/FROM handlers should use PostgreSQL's memory
>> context for all internal memory allocations?
> 
> Yes, handlers should use PostgreSQL's memory context, and I think
> creating other memory context under CopyToStateData.copycontext
> should be suggested for handler creators, so I proposed exporting
> CopyToStateData to public header.

I see.

We can provide a getter for CopyToStateData::copycontext if
we don't want to export CopyToStateData. Note that I don't
have a strong opinion whether we should export
CopyToStateData or not.


Thanks,
-- 
kou




Commitfest 2024-01 second week update

2024-01-14 Thread vignesh C
Hi,

Here's a quick status report after the second week:
Status summary:
status   |   w1  |  w2
-+---+-
Needs review:|   238 | 213
Waiting on Author:   |44 |  46
Ready for Committer: |27 |  27
Committed:   |36 |  46
Moved to next CF | 1 |   3
Withdrawn:   | 2 |   4
Returned with Feedback:  | 3 |  12
Rejected:| 1 |   1
Total:   |   352 | 352

If you have submitted a patch and it's in "Waiting for author" state,
please aim to get it to "Needs review" state soon if you can, as
that's where people are most likely to be looking for things to
review.
I have pinged most threads that are in "Needs review" state and don't
apply, compile warning-free, or pass check-world.  I'll do some more
of that sort of thing. I have also returned few patches as the threads
have been inactive for a long time. I'll continue on this if the
thread is still inactive.
I have sent a private mail through commitfest to patch owners who have
submitted one or more patches but have not picked any of the patches
for review.

Regards,
Vignesh




Re: An inefficient query caused by unnecessary PlaceHolderVar

2024-01-14 Thread Richard Guo
Updated this patch over 29f114b6ff, which indicates that we should apply
the same rules for PHVs.

Thanks
Richard


v3-0001-Avoid-unnecessary-PlaceHolderVars-for-Vars-PHVs.patch
Description: Binary data


Re: BRIN indexes vs. SK_SEARCHARRAY (and preprocessing scan keys)

2024-01-14 Thread vignesh C
On Mon, 15 Jan 2024 at 04:45, Tomas Vondra
 wrote:
>
> On 1/14/24 12:18, vignesh C wrote:
> > On Fri, 14 Jul 2023 at 20:17, Tomas Vondra
> >  wrote:
> >>
> >> On 7/9/23 23:44, Tomas Vondra wrote:
> >>> ...
> > Yes, my previous message was mostly about backwards compatibility, and
> > this may seem a bit like an argument against it. But that message was
> > more a question "If we do this, is it actually backwards compatible the
> > way we want/need?")
> >
> > Anyway, I think the BrinDesc scratch space is a neat idea, I'll try
> > doing it that way and report back in a couple days.
> 
>  Cool. In 0005-Support-SK_SEARCHARRAY-in-BRIN-bloom-20230702.patch, you
>  used the preprocess function to pre-calculate the scankey's hash, even
>  for scalars. You could use the scratch space in BrinDesc for that,
>  before doing anything with SEARCHARRAYs.
> 
> >>>
> >>> Yeah, that's a good idea.
> >>>
> >>
> >> I started looking at this (the scratch space in BrinDesc), and it's not
> >> as straightforward. The trouble is BrinDesc is "per attribute" but the
> >> scratch space is "per scankey" (because we'd like to sort values from
> >> the scankey array).
> >>
> >> With the "new" consistent functions (that get all scan keys at once)
> >> this probably is not an issue, because we know which scan key we're
> >> processing and so we can map it to the scratch space. But with the old
> >> consistent function that's not the case. Maybe we should support this
> >> only with the "new" consistent function variant?
> >>
> >> This would however conflict with the idea to have a separate consistent
> >> function for arrays, which "splits" the scankeys into multiple groups
> >> again. There could be multiple SAOP scan keys, and then what?
> >>
> >> I wonder if the scratch space should be in the ScanKey instead?
> >
> > Are we planning to post an updated patch for this? If the interest has
> > gone down and if there are no plans to handle this I'm thinking of
> > returning this commitfest entry in this commitfest and can be opened
> > when there is more interest.
> >
>
> I still think the patch is a good idea and plan to get back to it, but
> probably not in this CF. Given that the last update if from July, it's
> fair to bump it - either RWF or just move to the next CF. Up to you.

I have changed the status to RWF, feel free to update the commitfest
after handling the comments.

Regards,
Vignesh




Re: the s_lock_stuck on perform_spin_delay

2024-01-14 Thread Andy Fan

Robert Haas  writes:

> On Wed, Jan 10, 2024 at 10:17 PM Andy Fan  wrote:
>> fixed in v2.
>
> Timing the spinlock wait seems like a separate patch from the new sanity 
> checks.

Yes, a separate patch would be better, so removed it from v4.

> I suspect that the new sanity checks should only be armed in
> assert-enabled builds.

There are 2 changes in v4. a). Make sure every code is only armed in
assert-enabled builds. Previously there was some counter++ in non
assert-enabled build. b). Record the location of spin lock so that
whenever the Assert failure, we know which spin lock it is. In our
internal testing, that helps a lot.

-- 
Best Regards
Andy Fan
>From 1caa02079c8cf689b73503d2c79d9814b33082aa Mon Sep 17 00:00:00 2001
From: "yizhi.fzh" 
Date: Mon, 15 Jan 2024 13:18:34 +0800
Subject: [PATCH v4 1/1] Detect more misuse of spin lock automatically

Spin lock are intended for *very* short-term locks, but it is possible
to be misused in many cases. e.g. Acquiring another LWLocks or regular
locks, memory allocation. In this patch, all of such cases may increase
the timing of holding the spin lock. In our culture, if any spin lock
can't be acquired for some-time, a crash should happen. See the
s_lock_stuck in perform_spin_delay.

CHECK_FOR_INTERRUPTS should be avoided as well when holding a spin lock.
Depends on what signals are left to handle, PG may raise error/fatal
which would cause the code jump to some other places which is hardly to
release the spin lock anyway. Because of this, elog(...) is not allowed
since it calls CHECK_FOR_INTERRUPTS.
---
 src/backend/access/table/tableam.c  |  1 +
 src/backend/storage/buffer/bufmgr.c |  1 +
 src/backend/storage/ipc/barrier.c   |  1 +
 src/backend/storage/ipc/shm_toc.c   |  1 +
 src/backend/storage/lmgr/lock.c |  2 ++
 src/backend/storage/lmgr/lwlock.c   |  2 ++
 src/backend/utils/hash/dynahash.c   |  1 +
 src/backend/utils/init/globals.c|  1 +
 src/backend/utils/mmgr/mcxt.c   |  8 +
 src/include/miscadmin.h | 56 +
 src/include/storage/buf_internals.h |  1 +
 src/include/storage/spin.h  | 16 +++--
 12 files changed, 88 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 6ed8cca05a..6c6a65764c 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -24,6 +24,7 @@
 #include "access/syncscan.h"
 #include "access/tableam.h"
 #include "access/xact.h"
+#include "miscadmin.h"
 #include "optimizer/plancat.h"
 #include "port/pg_bitutils.h"
 #include "storage/bufmgr.h"
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 7d601bef6d..c600a113cf 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -5409,6 +5409,7 @@ LockBufHdr(BufferDesc *desc)
 
 	init_local_spin_delay();
 
+	START_SPIN_LOCK();
 	while (true)
 	{
 		/* set BM_LOCKED flag */
diff --git a/src/backend/storage/ipc/barrier.c b/src/backend/storage/ipc/barrier.c
index 5b52e060ba..63167a04bb 100644
--- a/src/backend/storage/ipc/barrier.c
+++ b/src/backend/storage/ipc/barrier.c
@@ -85,6 +85,7 @@
 
 #include "postgres.h"
 #include "storage/barrier.h"
+#include "miscadmin.h"
 
 static inline bool BarrierDetachImpl(Barrier *barrier, bool arrive);
 
diff --git a/src/backend/storage/ipc/shm_toc.c b/src/backend/storage/ipc/shm_toc.c
index 8db9d25aac..9cdec41054 100644
--- a/src/backend/storage/ipc/shm_toc.c
+++ b/src/backend/storage/ipc/shm_toc.c
@@ -13,6 +13,7 @@
 
 #include "postgres.h"
 
+#include "miscadmin.h"
 #include "port/atomics.h"
 #include "storage/shm_toc.h"
 #include "storage/spin.h"
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index c70a1adb9a..3b069b233a 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -776,6 +776,8 @@ LockAcquireExtended(const LOCKTAG *locktag,
 	bool		found_conflict;
 	bool		log_lock = false;
 
+	ASSERT_NO_SPIN_LOCK();
+
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
 		elog(ERROR, "unrecognized lock method: %d", lockmethodid);
 	lockMethodTable = LockMethods[lockmethodid];
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index b4b989ac56..6ef753f670 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -1205,6 +1205,8 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
 
 	Assert(mode == LW_SHARED || mode == LW_EXCLUSIVE);
 
+	ASSERT_NO_SPIN_LOCK();
+
 	PRINT_LWDEBUG("LWLockAcquire", lock, mode);
 
 #ifdef LWLOCK_STATS
diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index a4152080b5..504dae4d6c 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -98,6 +98,7 @@
 
 #include "access/xact.h"
 #include "common/hashfn.h"
+#include "miscadmin.h"
 #include "port/pg_bitutils.h"
 #include "storage/shmem.h"
 #include 

Re: Compile warnings in dbcommands.c building with meson

2024-01-14 Thread jian he
Hi. one more feedback.

I tested the original repo setup, but it does not generate a warning
on my local setup.
meson setup  --reconfigure ${BUILD} \
-Dprefix=${PG_PREFIX} \
-Dpgport=5463 \
-Dplpython=enabled \
-Dcassert=true \
-Dtap_tests=enabled \
-Dicu=enabled  \
-Ddebug=true  \
-Dnls=disabled

 it generate warning, when I add -Dbuildtype=release :

meson setup  --reconfigure ${BUILD} \
-Dprefix=${PG_PREFIX} \
-Dpgport=5463 \
-Dplpython=enabled \
-Dcassert=true \
-Dtap_tests=enabled \
-Dicu=enabled  \
-Dbuildtype=release \
-Ddebug=true  \
-Dnls=disabled

After applying the patch, the warning disappeared.
so it fixed the problem.




Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed

2024-01-14 Thread Michael Paquier
On Sun, Jan 14, 2024 at 11:08:39PM -0500, Tom Lane wrote:
> Michael Paquier  writes:
>> While thinking about that, a second idea came into my mind: a
>> superuser-settable developer GUC to disable such WAL records to be
>> generated within certain areas of the test.  This requires a small
>> implementation, but nothing really huge, while being portable
>> everywhere.  And it is not the first time I've been annoyed with these
>> records when wanting a predictible set of WAL records for some test
>> case.
> 
> Hmm ... I see what you are after, but to what extent would this mean
> that what we are testing is not our real-world behavior?

Don't think so.  We don't care much about these records when it comes
to checking slot invalidation scenarios with a predictible XID
horizon, AFAIK.
--
Michael


signature.asc
Description: PGP signature


Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed

2024-01-14 Thread Tom Lane
Michael Paquier  writes:
> While thinking about that, a second idea came into my mind: a
> superuser-settable developer GUC to disable such WAL records to be
> generated within certain areas of the test.  This requires a small
> implementation, but nothing really huge, while being portable
> everywhere.  And it is not the first time I've been annoyed with these
> records when wanting a predictible set of WAL records for some test
> case.

Hmm ... I see what you are after, but to what extent would this mean
that what we are testing is not our real-world behavior?

regards, tom lane




Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed

2024-01-14 Thread Michael Paquier
On Fri, Jan 12, 2024 at 01:46:08PM +, Bertrand Drouvot wrote:
> 1) Michael's proposal up-thread (means tweak the test with a retry logic, 
> retrying
> things if such a standby snapshot is found).
> 
> 2) Don't report a test error for active slots in case its catalog_xmin 
> advanced.
> 
> I'd vote for 2) as:
> 
> - this is a corner case and the vast majority of the animals don't report any
> issues (means the active slot conflict detection is already well covered).
> 
> - even on the same animal it should be pretty rare to not have an active slot 
> conflict detection not covered at all (and the "failing" one would be probably
> moving over time).
> 
> - It may be possible that 1) ends up failing (as we'd need to put a limit on 
> the
> retry logic anyhow).
> 
> What do you think?
> 
> And BTW, looking closely at wait_until_vacuum_can_remove(), I'm not sure it's
> fully correct, so I'll give it another look.

The WAL records related to standby snapshots are playing a lot with
the randomness of the failures we are seeing.  Alexander has mentioned
offlist something else: using SIGSTOP on the bgwriter to avoid these
records and make the test more stable.  That would not be workable for
Windows, but I could live with that knowing that logical decoding for
standbys has no platform-speficic tweak for the code paths we're
testing here, and that would put as limitation to skip the test for
$windows_os.

While thinking about that, a second idea came into my mind: a
superuser-settable developer GUC to disable such WAL records to be
generated within certain areas of the test.  This requires a small
implementation, but nothing really huge, while being portable
everywhere.  And it is not the first time I've been annoyed with these
records when wanting a predictible set of WAL records for some test
case.

Another possibility would be to move these records elsewhere, outside
of the bgwriter, but we need such records at a good frequency for the
availability of read-only standbys.  And surely we'd want an on/off
switch anyway to get a full control for test sequences.
--
Michael


signature.asc
Description: PGP signature


Re: Documentation to upgrade logical replication cluster

2024-01-14 Thread Peter Smith
Hi Vignesh, here are some review comments for patch v2-0001.

==
doc/src/sgml/ref/pgupgrade.sgml

1.
+   
+Upgrade logical replication clusters
+
+
+ Refer logical
replication upgrade section
+ for details on upgrading logical replication clusters.
+
+
+   
+

This renders like:
Refer logical replication upgrade section for details on upgrading
logical replication clusters.

~

IMO it would be better to use xref instead of link, which will render
more normally like:
See Section 30.11 for details on upgrading logical replication clusters.

SUGGESTION

 See 
 for details on upgrading logical replication clusters.


==
doc/src/sgml/logical-replication.sgml

2. GENERAL - blurb

+ 
+  Upgrade
+
+  
+   
+Prepare for publisher upgrades

I felt there should be a short (1 or 2 sentence) general blurb about
pub/sub upgrade before jumping straight into:

"1. Prepare for publisher upgrades"
"2. Prepare for subscriber upgrades"
"3. Upgrading logical replication cluster"

~

Specifically, at first, it looks strange that the HTML renders as
steps 1,2,3 instead of sub-sections (30.11.1, 30.11.2, 30.11.3); Maybe
"steps" are fine, but then at least there needs to be some intro
sentence saying like "follow these steps:"

~~~

3.
+   
+Upgrading logical replication cluster

/cluster/clusters/

~~~

4.
+
+ The steps to upgrade the following logical replication clusters are
+ detailed below:
+ 
+  
+   
+Two-node
logical replication cluster.
+   
+  
+  
+   
+Cascaded
logical replication cluster.
+   
+  
+  
+   
+Two-node
circular logical replication cluster.
+   
+  
+ 
+

Isn't there a better way to accomplish this by using xref and
'xreflabel' so you don't have to type the link text here?


//
Steps to upgrade a two-node logical replication cluster
//

5.
+  
+   Let's say publisher is in node1 and subscriber is
+   in node2. The subscriber node2 has
+   two subscriptions sub1_node1_node2 and sub2_node1_node2 which is
+   subscribing the changes from node1.
+  

5a
Those subscription names should also be rendered as literals.

~

5b
/which is/which are/

~~~

6.
+   
+
+ Initialize data1_upgraded instance by using the required newer
+ version.
+
+   

data1_upgraded should be rendered as literal.

~~~

7.
+
+   
+
+ Initialize data2_upgraded instance by using the required newer
+ version.
+
+   

data2_upgraded should be rendered as literal.

~~~

8.
+
+   
+
+ On node2, create any tables that were created in
+ the upgraded publisher node1 server between
+ 
+ when the subscriptions where disabled in
node2
+ and now, e.g.:
+
+node2=# CREATE TABLE distributors (did integer PRIMARY KEY, name varchar(40));
+CREATE TABLE
+
+
+   

8a.
This link to the earlier step renders badly like:
On node2, create any tables that were created in the upgraded
publisher node1 server between when the subscriptions where disabled
in node2 and now, e.g.:

IMO this link should be like "Step N", not some words -- maybe it is
another opportunity for using xreflabel?

~

8b.
Also has typos "when the subscriptions where disabled" (??)

//
Steps to upgrade a cascaded logical replication clusters
//

9.
+
+ 
+  Steps to upgrade a cascaded logical replication clusters

The title has a strange mix of singular "a" and plural "clusters"

~~~

10.
+  
+   Let's say we have a cascaded logical replication setup
+   
node1->node2->node3.
+   Here node2 is subscribing the changes from
+   node1 and node3 is subscribing
+   the changes from node2. The node2
+   has two subscriptions sub1_node1_node2 and sub2_node1_node2 which is
+   subscribing the changes from node1. The
+   node3 has two subscriptions sub1_node2_node3 and
+   sub2_node2_node3 which is subscribing the changes from
+   node2.
+  

10a.
Those subscription names should also be rendered as literals.

~

10b.
/which is/which are/ (occurs 2x)

~~~

11.
+
+   
+
+ Initialize data1_upgraded instance by using the required
newer version.
+
+   

data1_upgraded should be rendered as literal.

~~~

12.
+
+   
+
+ Initialize data2_upgraded instance by using the required
newer version.
+
+   

data2_upgraded should be rendered as literal.

~~~

13.
+
+   
+
+ On node2, create any tables that were created in
+ the upgraded publisher node1 server between
+ 
+ when the subscriptions where disabled in
node2
+ and now, e.g.:
+
+node2=# CREATE TABLE distributors (did integer PRIMARY KEY, name varchar(40));
+CREATE TABLE
+
+
+   

13a.
This link to the earlier step renders badly like:
On 

Re: Recovering from detoast-related catcache invalidations

2024-01-14 Thread Xiaoran Wang
This is an interesting idea.
 Although some catalog tables are not in catcaches,
such as pg_depend, when scanning them, if there is any
SharedInvalidationMessage, the CatalogSnapshot
will be invalidated and recreated ("RelationInvalidatesSnapshotsOnly"
in  syscache.c)
Maybe during the system_scan, it receives the SharedInvalidationMessages
and returns the tuples which
are out of date. systable_recheck_tuple is used in dependency.c for such
case.



Tom Lane  于2024年1月14日周日 03:12写道:

> I wrote:
> > Xiaoran Wang  writes:
> >> Hmm, how about first checking if any invalidated shared messages have
> been
> >> accepted, then rechecking the tuple's visibility?
> >> If there is no invalidated shared message accepted during
> >> 'toast_flatten_tuple',
> >> there is no need to do then visibility check, then it can save several
> >> CPU cycles.
>
> > Meh, I'd just as soon not add the additional dependency/risk of bugs.
> > This is an expensive and seldom-taken code path, so I don't think
> > shaving a few cycles is really important.
>
> It occurred to me that this idea might be more interesting if we
> could encapsulate it right into systable_recheck_tuple: something
> like having systable_beginscan capture the current
> SharedInvalidMessageCounter and save it in the SysScanDesc struct,
> then compare in systable_recheck_tuple to possibly short-circuit
> that work.  This'd eliminate one of the main bug hazards in the
> idea, namely that you might capture SharedInvalidMessageCounter too
> late, after something's already happened.  However, the whole idea
> only works for catalogs that have catcaches, and the other users of
> systable_recheck_tuple are interested in pg_depend which doesn't.
> So that put a damper on my enthusiasm for the idea.
>
> regards, tom lane
>


Re: ALTER ROLE documentation improvement

2024-01-14 Thread Nathan Bossart
On Sun, Jan 14, 2024 at 04:17:41PM +0530, vignesh C wrote:
> The attached v3 version patch has the changes for the same.

LGTM.  I'll wait a little while longer for additional feedback, but if none
materializes, I'll commit this soon.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: doc: add LITERAL tag to RETURNING

2024-01-14 Thread torikoshia

On 2024-01-12 20:56, Alvaro Herrera wrote:

On 2024-Jan-12, Ashutosh Bapat wrote:

On Fri, Jan 12, 2024 at 11:27 AM torikoshia 
 wrote:

>
> RETURNING is usually tagged with appropriate tags, such as ,
> but not in the 'query' section of COPY.



The patch looks good.


Good catch, pushed.  It has user-visible effect, so I backpatched it.


Thanks for your review and push.

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation




Re: On login trigger: take three

2024-01-14 Thread Alexander Korotkov
Hi, Alexander!

On Sat, Jan 13, 2024 at 6:00 PM Alexander Lakhin  wrote:
> I've discovered one more instability in the event_trigger_login test.
> Please look for example at case [1]:
> ok 213   + event_trigger   28946 ms
> not ok 214   - event_trigger_login  6430 ms
> ok 215   - fast_default19349 ms
> ok 216   - tablespace  44789 ms
> 1..216
> # 1 of 216 tests failed.

Thank you for reporting this.

I'm going to take a closer look at this tomorrow.  But I doubt I would
find a solution other than removing the flaky parts of the test.

--
Regards,
Alexander Korotkov




Re: POC: GROUP BY optimization

2024-01-14 Thread Alexander Korotkov
On Sun, Jan 14, 2024 at 2:14 PM Andrei Lepikhov
 wrote:
> On 13/1/2024 22:00, Alexander Korotkov wrote:
> > On Sat, Jan 13, 2024 at 11:09 AM Andrei Lepikhov
> >  wrote:
> >> On 11/1/2024 18:30, Alexander Korotkov wrote:
> >>> On Tue, Jan 9, 2024 at 1:14 PM Pavel Borisov  
> >>> wrote:
> > Hmm, I don't see this old code in these patches. Resend 0002-* because
> > of trailing spaces.
> 
> 
>  AFAIK, cfbot does not seek old versions of patchset parts in previous 
>  messages. So for it to run correctly, a new version of the whole 
>  patchset should be sent even if most patches are unchanged.
> >>>
> >>> Please, find the revised patchset with some refactoring and comments
> >>> improvement from me.  I'll continue looking into this.
> >> The patch looks better, thanks to your refactoring.
> >> I propose additional comments and tests to make the code more
> >> understandable (see attachment).
> >> I intended to look into this part of the code more, but the tests show a
> >> difference between PostgreSQL v.15 and v.16, which causes changes in the
> >> code of this feature.
> >
> > Makes sense.  I've incorporated your changes into the patchset.
> One more improvement. To underpin code change:
>
> -   return cur_ec;  /* Match! */
> +   {
> +   /*
> +* Match!
> +*
> +* Copy the sortref if it wasn't set yet. That may happen if
> +* the ec was constructed from WHERE clause, i.e. it doesn't
> +* have a target reference at all.
> +*/
> +   if (cur_ec->ec_sortref == 0 && sortref > 0)
> +   cur_ec->ec_sortref = sortref;
> +   return cur_ec;
> +   }
>
> I propose the test (see attachment). It shows why we introduce this
> change: GROUP-BY should juggle not only pathkeys generated by explicit
> sort operators but also planner-made, likewise in this example, by
> MergeJoin.

Thank you for providing the test case relevant for this code change.
The revised patch incorporating this change is attached.  Now the
patchset looks good to me.  I'm going to push it if there are no
objections.

--
Regards,
Alexander Korotkov


0002-Explore-alternative-orderings-of-group-by-p-20240115.patch
Description: Binary data


0001-Generalize-common-code-of-adding-sort-befor-20240115.patch
Description: Binary data


Re: Built-in CTYPE provider

2024-01-14 Thread Michael Paquier
On Fri, Jan 12, 2024 at 01:13:04PM -0500, Robert Haas wrote:
> On Fri, Jan 12, 2024 at 1:00 PM Daniel Verite  wrote:
>> ISTM that in general the behavior of old psql vs new server does
>> not weight much against choosing optimal catalog changes.
> 
> +1.

+1.  There is a good amount of effort put in maintaining downward
compatibility in psql.  Upward compatibility would require more
manipulations of the stable branches to make older versions of psql
compatible with newer server versions.  Brr.
--
Michael


signature.asc
Description: PGP signature


Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-14 Thread Alexander Korotkov
On Sun, Jan 14, 2024 at 10:35 PM Masahiko Sawada  wrote:
> Thank you for updating the patch. Here are two comments:
>
> ---
> +   if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_UNSPECIFIED &&
> +   cstate->num_errors > 0)
> +   ereport(WARNING,
> +   errmsg("%zd rows were skipped due to data type 
> incompatibility",
> +  cstate->num_errors));
> +
> /* Done, clean up */
> error_context_stack = errcallback.previous;
>
> If a malformed input is not the last data, the context message seems odd:
>
> postgres(1:1769258)=# create table test (a int);
> CREATE TABLE
> postgres(1:1769258)=# copy test from stdin (save_error_to none);
> Enter data to be copied followed by a newline.
> End with a backslash and a period on a line by itself, or an EOF signal.
> >> a
> >> 1
> >>
> 2024-01-15 05:05:53.980 JST [1769258] WARNING:  1 rows were skipped
> due to data type incompatibility
> 2024-01-15 05:05:53.980 JST [1769258] CONTEXT:  COPY test, line 3: ""
> COPY 1
>
> I think it's better to report the WARNING after resetting the
> error_context_stack. Or is a WARNING really appropriate here? The
> v15-0001-Make-COPY-FROM-more-error-tolerant.patch[1] uses NOTICE but
> the v1-0001-Add-new-COPY-option-SAVE_ERROR_TO.patch[2] changes it to
> WARNING without explanation.

Thank you for noticing this.  I think NOTICE is more appropriate here.
There is nothing to "worry" about: the user asked to ignore the errors
and we did.  And yes, it doesn't make sense to use the last line as
the context.  Fixed.

> ---
> +-- test missing data: should fail
> +COPY check_ign_err FROM STDIN WITH (save_error_to none);
> +1  {1}
> +\.
>
> We might want to cover the extra data cases too.

Agreed, the relevant test is added.

--
Regards,
Alexander Korotkov


0001-Add-new-COPY-option-SAVE_ERROR_TO-v4.patch
Description: Binary data


Re: Printing backtrace of postgres processes

2024-01-14 Thread Maciek Sakrejda
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:tested, passed

I'm not sure if this actually still needs review, but it's marked as such in 
the CF app, so I'm reviewing it in the hopes of moving it along.

The feature works as documented. The docs say "This function is supported only 
if PostgreSQL was built with the ability to capture backtraces, otherwise it 
will emit a warning." I'm not sure what building with the ability to capture 
backtraces is, but it worked with no special config on my machine. I don't have 
much C experience, so I don't know if this is something that should have more 
context in a README somewhere, or if it's likely someone who's interested in 
this will already know what to do. The code looks fine to me.

I tried running make installcheck-world, but it failed on 17 tests. However, 
master also fails here on 17 tests. A normal make check-world passes on both 
branches. I assume I'm doing something wrong and would appreciate any pointers 
[0].

Based on my review and Daniel's comment above, I'm marking this as Ready for 
Committer.

Thanks,
Maciek

[0] My regression.diffs has errors like

```
diff -U3 
/home/maciek/code/aux/postgres/src/test/regress/expected/copyselect.out 
/home/maciek/code/aux/postgres/src/test/regress/results/copyselect.out
--- /home/maciek/code/aux/postgres/src/test/regress/expected/copyselect.out 
2023-01-02 12:21:10.792646101 -0800
+++ /home/maciek/code/aux/postgres/src/test/regress/results/copyselect.out  
2024-01-14 15:04:07.513887866 -0800
@@ -131,11 +131,6 @@
 2
  ?column? 
 --
-3
-(1 row)
-
- ?column? 
---
 4
 (1 row)
```

and

```
diff -U3 
/home/maciek/code/aux/postgres/src/test/regress/expected/create_table.out 
/home/maciek/code/aux/postgres/src/test/regress/results/create_table.out
--- /home/maciek/code/aux/postgres/src/test/regress/expected/create_table.out   
2023-10-02 22:14:02.583377845 -0700
+++ /home/maciek/code/aux/postgres/src/test/regress/results/create_table.out
2024-01-14 15:04:09.037890710 -0800
@@ -854,8 +854,6 @@
  b  | integer |   | not null | 1   | plain|  | 
 Partition of: parted FOR VALUES IN ('b')
 Partition constraint: ((a IS NOT NULL) AND (a = 'b'::text))
-Not-null constraints:
-"part_b_b_not_null" NOT NULL "b" (local, inherited)
 
 -- Both partition bound and partition key in describe output
 \d+ part_c
``` 

I'm on Ubuntu 22.04 with Postgres 11, 12, 13, and 16 installed from PGDG.

The new status of this patch is: Ready for Committer


Re: BRIN indexes vs. SK_SEARCHARRAY (and preprocessing scan keys)

2024-01-14 Thread Tomas Vondra
On 1/14/24 12:18, vignesh C wrote:
> On Fri, 14 Jul 2023 at 20:17, Tomas Vondra
>  wrote:
>>
>> On 7/9/23 23:44, Tomas Vondra wrote:
>>> ...
> Yes, my previous message was mostly about backwards compatibility, and
> this may seem a bit like an argument against it. But that message was
> more a question "If we do this, is it actually backwards compatible the
> way we want/need?")
>
> Anyway, I think the BrinDesc scratch space is a neat idea, I'll try
> doing it that way and report back in a couple days.

 Cool. In 0005-Support-SK_SEARCHARRAY-in-BRIN-bloom-20230702.patch, you
 used the preprocess function to pre-calculate the scankey's hash, even
 for scalars. You could use the scratch space in BrinDesc for that,
 before doing anything with SEARCHARRAYs.

>>>
>>> Yeah, that's a good idea.
>>>
>>
>> I started looking at this (the scratch space in BrinDesc), and it's not
>> as straightforward. The trouble is BrinDesc is "per attribute" but the
>> scratch space is "per scankey" (because we'd like to sort values from
>> the scankey array).
>>
>> With the "new" consistent functions (that get all scan keys at once)
>> this probably is not an issue, because we know which scan key we're
>> processing and so we can map it to the scratch space. But with the old
>> consistent function that's not the case. Maybe we should support this
>> only with the "new" consistent function variant?
>>
>> This would however conflict with the idea to have a separate consistent
>> function for arrays, which "splits" the scankeys into multiple groups
>> again. There could be multiple SAOP scan keys, and then what?
>>
>> I wonder if the scratch space should be in the ScanKey instead?
> 
> Are we planning to post an updated patch for this? If the interest has
> gone down and if there are no plans to handle this I'm thinking of
> returning this commitfest entry in this commitfest and can be opened
> when there is more interest.
> 

I still think the patch is a good idea and plan to get back to it, but
probably not in this CF. Given that the last update if from July, it's
fair to bump it - either RWF or just move to the next CF. Up to you.

As for the patch, I wonder if Heikki has some idea what to do about the
scratch space? I got stuck on thinking about how to do this with the two
types of consistent functions we support/allow.

To articulate the issue more clearly - the scratch space is "per index"
but we need scratch space "per index key". That's fine - we can simply
have pointers to multiple scratch spaces, I think.

But we have two ways to do consistent functions - the "old" gets scan
keys one attribute at a time, "new" gets all at once. For the "new" it's
not a problem, it's simple to identify the right scratch space. But for
the "old" one, how would that happen? The consistent function has no
idea which index key it's operating on, and how to identify the correct
scratch space.

I can think of two ways to deal with this:

1) Only allow SK_SEARCHARRAY for indexes supporting new-style consistent
functions (but I'm not sure how, considering amsearcharray is set way
before we know what the opclass does, or whether it implements the old
or new consistent function).

2) Allow SK_SEARCHARRAY even with old consistent function, but do some
dance in bringetbitmap() so to set the scratch space accordingly before
the call.

Now that I read Heikki's messages again, I see he suggested assigning a
new procnum to a consistent function supporting SK_SEARCHARRAY, which
seems to be very close to (1). Except that we'd now have 3 ways to
define a consistent function, and that sounds a bit ... too much?

Anyway, thinking about (1), I'm still not sure what to do about existing
opclasses - it'd be nice to have some backwards compatible solution,
without breaking everything and forcing everyone to implement all the
new stuff. Which is kinda why we already have two ways to do consistent
functions. Presumably we'd have to implement some "default" handling by
translating the SK_SEARCHARRAY key into simple equality keys ...


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Custom explain options

2024-01-14 Thread Tomas Vondra
On 1/13/24 17:13, Konstantin Knizhnik wrote:
> 
> On 13/01/2024 4:51 pm, Tomas Vondra wrote:
>>
>> On 1/12/24 20:30, Konstantin Knizhnik wrote:
>>> On 12/01/2024 7:03 pm, Tomas Vondra wrote:
 On 10/21/23 14:16, Konstantin Knizhnik wrote:
> Hi hackers,
>
> EXPLAIN statement has a list of options (i.e. ANALYZE, BUFFERS,
> COST,...) which help to provide useful details of query execution.
> In Neon we have added PREFETCH option which shows information about
> page
> prefetching during query execution (prefetching is more critical for
> Neon
> architecture because of separation of compute and storage, so it is
> implemented not only for bitmap heap scan as in Vanilla Postgres, but
> also for seqscan, indexscan and indexonly scan). Another possible
> candidate  for explain options is local file cache (extra caching
> layer
> above shared buffers which is used to somehow replace file system
> cache
> in standalone Postgres).
 Not quite related to this patch about EXPLAIN options, but can you
 share
 some details how you implemented prefetching for the other nodes?

 I'm asking because I've been working on prefetching for index scans, so
 I'm wondering if there's a better way to do this, or how to do it in a
 way that would allow neon to maybe leverage that too.

 regards

>>> Yes, I am looking at your PR. What we have implemented in Neon is more
>>> specific to Neon architecture where storage is separated from compute.
>>> So each page not found in shared buffers has to be downloaded from page
>>> server. It adds quite noticeable latency, because of network roundtrip.
>>> While vanilla Postgres can rely on OS file system cache when page is not
>>> found in shared buffer (access to OS file cache is certainly slower than
>>> to shared buffers
>>> because of syscall and copying of page, but performance penaly is not
>>> very large - less than 15%), Neon has no local files and so has to send
>>> request to the socket.
>>>
>>> This is why we have to perform aggressive prefetching whenever it is
>>> possible (when it it is possible to predict order of subsequent pages).
>>> Unlike vanilla Postgres which implements prefetch only for bitmap heap
>>> scan, we have implemented it for seqscan, index scan, indexonly scan,
>>> bitmap heap scan, vacuum, pg_prewarm.
>>> The main difference between Neon prefetch and vanilla Postgres prefetch
>>> is that first one is backend specific. So each backend prefetches only
>>> pages which it needs.
>>> This is why we have to rewrite prefetch for bitmap heap scan, which is
>>> using `fadvise` and assumes that pages prefetched by one backend in file
>>> cache, can be used by any other backend.
>>>
>> I do understand why prefetching is important in neon (likely more than
>> for core postgres). I'm interested in how it's actually implemented,
>> whether it's somehow similar to how my patch does things or in some
>> different (perhaps neon-specific way), and if the approaches are
>> different then what are the pros/cons. And so on.
>>
>> So is it implemented in the neon-specific storage, somehow, or where/how
>> does neon issue the prefetch requests?
> 
> Neon mostly preservers Postgres prefetch mechanism, so we are using
> PrefetchBuffer which checks if page is present in shared buffers
> and if not - calls smgrprefetch. We are using own storage manager
> implementation which instead of reading pages from local disk, download
> them from page server.
> And prefetch implementation in Neon storager manager is obviously also
> different from one in vanilla Postgres which uses posix_fadvise.
> Neon prefetch implementation inserts prefetch request in ring buffer and
> sends it to the server. When read operation is performed we check if
> there is correspondent prefetch request in ring buffer and if so - waits
> its completion.
> 

Thanks. Sure, neon has to use some custom prefetch implementation,
considering not posix_fadvise, considering there's no local page cache
in the architecture.

The thing that was not clear to me is who decides what to prefetch,
which code issues the prefetch requests etc. In the github links you
shared I see it happens in the index AM code (in nbtsearch.c).

That's interesting, because that's what my first prefetching patch did
too - not the same way, ofc, but in the same layer. Simply because it
seemed like the simplest way to do that. But the feedback was that's the
wrong layer, and that it should happen in the executor. And I agree with
that - the reasons are somewhere in the other thread.

Based on what I saw in the neon code, I think it should be possible for
neon to use "my" approach too, but that only works for the index scans,
ofc. Not sure what to do about the other places.

> As I already wrote - prefetch is done locally for each backend. And each
> backend has its own connection with page server. It  can be changed in
> future when we implement 

Re: plperl and perl 5.38

2024-01-14 Thread Christoph Berg
Re: Andrew Dunstan
> > +WARNING:  could not determine encoding for locale "C.utf8": codeset is 
> > "ANSI_X3.4-1968"
> 
> I can't reproduce this on my Ubuntu 22.04 ARM64 instance with perl 5.38.2
> installed via perlbrew, nor on a fresh Debian unstable with it's perl
> 5.38.2. In both instances my LANG is set to en_US.UTF-8.

It was a problem on the perl side:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1060456

(Fixed on Jan 12th)

Thanks for trying,
Christoph




Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-14 Thread Masahiko Sawada
On Sun, Jan 14, 2024 at 10:30 AM Alexander Korotkov
 wrote:
>
> Hi!
>
> I think this is a demanding and long-waited feature.  The thread is
> pretty long, but mostly it was disputes about how to save the errors.
> The present patch includes basic infrastructure and ability to ignore
> errors, thus it's pretty simple.
>
> On Sat, Jan 13, 2024 at 4:20 PM jian he  wrote:
> > On Fri, Jan 12, 2024 at 10:59 AM torikoshia  
> > wrote:
> > >
> > >
> > > Thanks for reviewing!
> > >
> > > Updated the patch merging your suggestions except below points:
> > >
> > > > +   cstate->num_errors = 0;
> > >
> > > Since cstate is already initialized in below lines, this may be
> > > redundant.
> > >
> > > | /* Allocate workspace and zero all fields */
> > > | cstate = (CopyFromStateData *) palloc0(sizeof(CopyFromStateData));
> > >
> > >
> > > >  +   Assert(!cstate->escontext->details_wanted);
> > >
> > > I'm not sure this is necessary, considering we're going to add other
> > > options like 'table' and 'log', which need details_wanted soon.
> > >
> > >
> > > --
> > > Regards,
> >
> > make save_error_to option cannot be used with COPY TO.
> > add redundant test, save_error_to with COPY TO test.
>
> I've incorporated these changes.  Also, I've changed
> CopyFormatOptions.save_error_to to enum and made some edits in
> comments and the commit message.  I'm going to push this if there are
> no objections.

Thank you for updating the patch. Here are two comments:

---
+   if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_UNSPECIFIED &&
+   cstate->num_errors > 0)
+   ereport(WARNING,
+   errmsg("%zd rows were skipped due to data type incompatibility",
+  cstate->num_errors));
+
/* Done, clean up */
error_context_stack = errcallback.previous;

If a malformed input is not the last data, the context message seems odd:

postgres(1:1769258)=# create table test (a int);
CREATE TABLE
postgres(1:1769258)=# copy test from stdin (save_error_to none);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> a
>> 1
>>
2024-01-15 05:05:53.980 JST [1769258] WARNING:  1 rows were skipped
due to data type incompatibility
2024-01-15 05:05:53.980 JST [1769258] CONTEXT:  COPY test, line 3: ""
COPY 1

I think it's better to report the WARNING after resetting the
error_context_stack. Or is a WARNING really appropriate here? The
v15-0001-Make-COPY-FROM-more-error-tolerant.patch[1] uses NOTICE but
the v1-0001-Add-new-COPY-option-SAVE_ERROR_TO.patch[2] changes it to
WARNING without explanation.

---
+-- test missing data: should fail
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+1  {1}
+\.

We might want to cover the extra data cases too.

Regards,

[1] 
https://www.postgresql.org/message-id/CACJufxEkkqnozdnvNMGxVAA94KZaCPkYw_Cx4JKG9ueNaZma_A%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/3d0b349ddbd4ae5f605f77b491697158%40oss.nttdata.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Recovering from detoast-related catcache invalidations

2024-01-14 Thread Noah Misch
On Fri, Jan 12, 2024 at 03:47:13PM -0500, Tom Lane wrote:
> I wrote:
> > This is uncomfortably much in bed with the tuple table slot code,
> > perhaps, but I don't see a way to do it more cleanly unless we want
> > to add some new provisions to that API.  Andres, do you have any
> > thoughts about that?
> 
> Oh!  After nosing around a bit more I remembered systable_recheck_tuple,
> which is meant for exactly this purpose.  So v4 attached.

systable_recheck_tuple() is blind to heap_inplace_update(), so it's not a
general proxy for invalidation messages.  The commit for $SUBJECT (ad98fb1)
doesn't create any new malfunctions, but I expect the systable_recheck_tuple()
part will change again before the heap_inplace_update() story is over
(https://postgr.es/m/flat/CAMp+ueZQz3yDk7qg42hk6-9gxniYbp-=bg2mgqecerqr5gg...@mail.gmail.com).




Re: A failure in t/038_save_logical_slots_shutdown.pl

2024-01-14 Thread Bharath Rupireddy
On Sat, Jan 13, 2024 at 4:43 PM Amit Kapila  wrote:
>
> > > The current test tries to ensure that
> > > during shutdown after we shutdown walsender and ensures that it sends
> > > all the wal records and receipts an ack for the same, there is no
> > > other WAL except shutdown_checkpoint. Vignesh's suggestion (a) makes
> > > the test robust enough that it shouldn't show spurious failures like
> > > the current one reported by you, so shall we try to proceed with that?
> >
> > Do you mean something like [1]? It ensures the test passes unless any
> > writes are added (in future) before the publisher restarts in the test
> > which can again make the tests flaky. How do we ensure no one adds
> > anything in before the publisher restart
> > 038_save_logical_slots_shutdown.pl? A note before the restart perhaps?
> >
>
> I am fine with adding the note.

Okay. Please see the attached patch.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v1-0001-Fix-test-failure-in-038_save_logical_slots_shutdo.patch
Description: Binary data


Re: plperl and perl 5.38

2024-01-14 Thread Andrew Dunstan



On 2024-01-12 Fr 05:14, Christoph Berg wrote:

Perl 5.38 has landed in Debian unstable, and plperl doesn't like it:

diff -U3 
/home/myon/projects/postgresql/pg/postgresql/src/pl/plperl/expected/plperl_elog_1.out
 
/home/myon/projects/postgresql/pg/postgresql/build/testrun/plperl/regress/results/plperl_elog.out
--- 
/home/myon/projects/postgresql/pg/postgresql/src/pl/plperl/expected/plperl_elog_1.out
   2023-07-24 12:47:52.124583553 +
+++ 
/home/myon/projects/postgresql/pg/postgresql/build/testrun/plperl/regress/results/plperl_elog.out
   2024-01-12 10:09:51.065265341 +
@@ -76,6 +76,7 @@
RETURN 1;
  END;
  $$;
+WARNING:  could not determine encoding for locale "C.utf8": codeset is 
"ANSI_X3.4-1968"
  select die_caller();
  NOTICE:  caught die
   die_caller
diff -U3 
/home/myon/projects/postgresql/pg/postgresql/src/pl/plperl/expected/plperl_call.out
 
/home/myon/projects/postgresql/pg/postgresql/build/testrun/plperl/regress/results/plperl_call.out
--- 
/home/myon/projects/postgresql/pg/postgresql/src/pl/plperl/expected/plperl_call.out
 2023-10-17 09:40:01.365865484 +
+++ 
/home/myon/projects/postgresql/pg/postgresql/build/testrun/plperl/regress/results/plperl_call.out
   2024-01-12 10:09:51.413278511 +
@@ -64,6 +64,7 @@
RAISE NOTICE '_a: %, _b: %', _a, _b;
  END
  $$;
+WARNING:  could not determine encoding for locale "C.utf8": codeset is 
"ANSI_X3.4-1968"
  NOTICE:  a: 10, b:
  NOTICE:  _a: 10, _b: 20
  DROP PROCEDURE test_proc1;

Same problem in 17devel and 16. (Did not try the older branches yet.)



I can't reproduce this on my Ubuntu 22.04 ARM64 instance with perl 
5.38.2 installed via perlbrew, nor on a fresh Debian unstable with it's 
perl 5.38.2. In both instances my LANG is set to en_US.UTF-8.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Tidy fill hstv array (src/backend/access/heap/pruneheap.c)

2024-01-14 Thread John Naylor
On Sat, Jan 13, 2024 at 9:36 PM Ranier Vilela  wrote:
>
> Em ter., 9 de jan. de 2024 às 06:31, John Naylor  
> escreveu:

>> This just moves an operation from one place to the other, while
>> obliterating the explanatory comment, so I don't see an advantage.
>
> Well, I think that is precisely the case for using memset.
> The way initialization is currently done is much slower and harmful to the 
> branch.
> Of course, the gain should be small, but it is fully justified for switching 
> to memset.

We haven't seen any evidence or reasoning for that. Simple
rules-of-thumb are not enough.




Re: [PoC] Improve dead tuple storage for lazy vacuum

2024-01-14 Thread John Naylor
On Fri, Jan 12, 2024 at 3:49 PM Masahiko Sawada  wrote:
>
> On Thu, Jan 11, 2024 at 9:28 AM Masahiko Sawada  wrote:
> > So I agree to remove both max_bytes and num_items from the control
> > object.Also, as you mentioned, we can remove the tidstore control
> > object itself. TidStoreGetHandle() returns a radix tree handle, and we
> > can pass it to TidStoreAttach().  I'll try it.

Thanks. It's worth looking closely here.

> I realized that if we remove the whole tidstore control object
> including max_bytes, processes who attached the shared tidstore cannot
> use TidStoreIsFull() actually as it always returns true.

I imagine that we'd replace that with a function (maybe an earlier
version had it?) to report the memory usage to the caller, which
should know where to find max_bytes.

> Also they
> cannot use TidStoreReset() as well since it needs to pass max_bytes to
> RT_CREATE(). It might not be a problem in terms of lazy vacuum, but it
> could be problematic for general use.

HEAD has no problem finding the necessary values, and I don't think
it'd be difficult to maintain that ability. I'm not actually sure what
"general use" needs to have, and I'm not sure anyone can guess.
There's the future possibility of parallel heap-scanning, but I'm
guessing a *lot* more needs to happen for that to work, so I'm not
sure how much it buys us to immediately start putting those two fields
in a special abstraction. The only other concrete use case mentioned
in this thread that I remember is bitmap heap scan, and I believe that
would never need to reset, only free the whole thing when finished.

I spent some more time studying parallel vacuum, and have some
thoughts. In HEAD, we have

-/*
- * VacDeadItems stores TIDs whose index tuples are deleted by index vacuuming.
- */
-typedef struct VacDeadItems
-{
- int max_items; /* # slots allocated in array */
- int num_items; /* current # of entries */
-
- /* Sorted array of TIDs to delete from indexes */
- ItemPointerData items[FLEXIBLE_ARRAY_MEMBER];
-} VacDeadItems;

...which has the tids, plus two fields that function _very similarly_
to the two extra fields in the tidstore control object. It's a bit
strange to me that the patch doesn't have this struct anymore.

I suspect if we keep it around (just change "items" to be the local
tidstore struct), the patch would have a bit less churn and look/work
more like the current code. I think it might be easier to read if the
v17 commits are suited to the current needs of vacuum, rather than try
to anticipate all uses. Richer abstractions can come later if needed.
Another stanza:

- /* Prepare the dead_items space */
- dead_items = (VacDeadItems *) shm_toc_allocate(pcxt->toc,
-est_dead_items_len);
- dead_items->max_items = max_items;
- dead_items->num_items = 0;
- MemSet(dead_items->items, 0, sizeof(ItemPointerData) * max_items);
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_ITEMS, dead_items);
- pvs->dead_items = dead_items;

With s/max_items/max_bytes/, I wonder if we can still use some of
this, and parallel workers would have no problem getting the necessary
info, as they do today. If not, I don't really understand why. I'm not
very familiar with working with shared memory, and I know the tree
itself needs some different setup, so it's quite possible I'm missing
something.

I find it difficult to kept straight these four things:

- radix tree
- radix tree control object
- tidstore
- tidstore control object

Even with the code in front of me, it's hard to reason about how these
concepts fit together. It'd be much more readable if this was
simplified.




Re: Extension Enhancement: Buffer Invalidation in pg_buffercache

2024-01-14 Thread Cédric Villemain

Hi Palak,

there is currently even more interest in your patch as it should help 
building tests for on-going development around cache/read 
management/effects.


Do you expect to be able to follow-up in the coming future ?

Thank you,
Cédric

On 04/01/2024 00:15, Jim Nasby wrote:

On 1/3/24 10:25 AM, Cédric Villemain wrote:

Hi Palak,

I did a quick review of the patch:

+CREATE FUNCTION pg_buffercache_invalidate(IN int, IN bool default true)
+RETURNS bool
+AS 'MODULE_PATHNAME', 'pg_buffercache_invalidate'
+LANGUAGE C PARALLEL SAFE;

--> Not enforced anywhere, but you can also add a comment to the 
function, for end users...


The arguments should also have names...



+    force = PG_GETARG_BOOL(1);

I think you also need to test PG_ARGISNULL with force parameter.
Actually, that's true for the first argument as well. Or, just mark the 
function as STRICT.


--
Jim Nasby, Data Architect, Austin TX



--
---
Cédric Villemain +33 (0)6 20 30 22 52
https://Data-Bene.io
PostgreSQL Expertise, Support, Training, R





Re: POC: GROUP BY optimization

2024-01-14 Thread Andrei Lepikhov

On 13/1/2024 22:00, Alexander Korotkov wrote:

On Sat, Jan 13, 2024 at 11:09 AM Andrei Lepikhov
 wrote:

On 11/1/2024 18:30, Alexander Korotkov wrote:

On Tue, Jan 9, 2024 at 1:14 PM Pavel Borisov  wrote:

Hmm, I don't see this old code in these patches. Resend 0002-* because
of trailing spaces.



AFAIK, cfbot does not seek old versions of patchset parts in previous messages. 
So for it to run correctly, a new version of the whole patchset should be sent 
even if most patches are unchanged.


Please, find the revised patchset with some refactoring and comments
improvement from me.  I'll continue looking into this.

The patch looks better, thanks to your refactoring.
I propose additional comments and tests to make the code more
understandable (see attachment).
I intended to look into this part of the code more, but the tests show a
difference between PostgreSQL v.15 and v.16, which causes changes in the
code of this feature.


Makes sense.  I've incorporated your changes into the patchset.

One more improvement. To underpin code change:

-   return cur_ec;  /* Match! */
+   {
+   /*
+* Match!
+*
+* Copy the sortref if it wasn't set yet. That may happen if
+* the ec was constructed from WHERE clause, i.e. it doesn't
+* have a target reference at all.
+*/
+   if (cur_ec->ec_sortref == 0 && sortref > 0)
+   cur_ec->ec_sortref = sortref;
+   return cur_ec;
+   }

I propose the test (see attachment). It shows why we introduce this 
change: GROUP-BY should juggle not only pathkeys generated by explicit 
sort operators but also planner-made, likewise in this example, by 
MergeJoin.


--
regards,
Andrei Lepikhov
Postgres Professional
diff --git a/src/test/regress/expected/aggregates.out 
b/src/test/regress/expected/aggregates.out
index 0d46e096e5..ca38e78f21 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2879,6 +2879,37 @@ FROM t1 WHERE c2 < 100 GROUP BY c1 ORDER BY 2;
 (10 rows)
 
 DROP TABLE t1 CASCADE;
+-- Check, that GROUP-BY reordering optimization can operate with pathkeys, 
built
+-- by planner itself. For example, by MergeJoin.
+SET enable_hashjoin = off;
+SET enable_nestloop = off;
+explain (COSTS OFF)
+SELECT c1.relname,c1.relpages
+FROM pg_class c1 JOIN pg_class c2 ON (c1.relname=c2.relname AND 
c1.relpages=c2.relpages)
+GROUP BY c1.reltuples,c1.relpages,c1.relname
+ORDER BY c1.relpages, c1.relname, c1.relpages*c1.relpages;
+ QUERY PLAN
  
+-
+ Incremental Sort
+   Sort Key: c1.relpages, c1.relname, ((c1.relpages * c1.relpages))
+   Presorted Key: c1.relpages, c1.relname
+   ->  Group
+ Group Key: c1.relpages, c1.relname, c1.reltuples
+ ->  Incremental Sort
+   Sort Key: c1.relpages, c1.relname, c1.reltuples
+   Presorted Key: c1.relpages, c1.relname
+   ->  Merge Join
+ Merge Cond: ((c1.relpages = c2.relpages) AND (c1.relname 
= c2.relname))
+ ->  Sort
+   Sort Key: c1.relpages, c1.relname
+   ->  Seq Scan on pg_class c1
+ ->  Sort
+   Sort Key: c2.relpages, c2.relname
+   ->  Seq Scan on pg_class c2
+(16 rows)
+
+RESET enable_hashjoin;
+RESET enable_nestloop;
 RESET enable_hashagg;
 RESET max_parallel_workers;
 RESET max_parallel_workers_per_gather;
diff --git a/src/test/regress/sql/aggregates.sql 
b/src/test/regress/sql/aggregates.sql
index f99167ac9e..cf87b5d5dd 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1233,6 +1233,18 @@ SELECT array_agg(c1 ORDER BY c2),c2
 FROM t1 WHERE c2 < 100 GROUP BY c1 ORDER BY 2;
 DROP TABLE t1 CASCADE;
 
+-- Check, that GROUP-BY reordering optimization can operate with pathkeys, 
built
+-- by planner itself. For example, by MergeJoin.
+SET enable_hashjoin = off;
+SET enable_nestloop = off;
+explain (COSTS OFF)
+SELECT c1.relname,c1.relpages
+FROM pg_class c1 JOIN pg_class c2 ON (c1.relname=c2.relname AND 
c1.relpages=c2.relpages)
+GROUP BY c1.reltuples,c1.relpages,c1.relname
+ORDER BY c1.relpages, c1.relname, c1.relpages*c1.relpages;
+RESET enable_hashjoin;
+RESET enable_nestloop;
+
 RESET enable_hashagg;
 RESET max_parallel_workers;
 RESET max_parallel_workers_per_gather;


Re: Improvements in pg_dump/pg_restore toc format and performances

2024-01-14 Thread vignesh C
On Fri, 10 Nov 2023 at 23:20, Nathan Bossart  wrote:
>
> On Tue, Oct 03, 2023 at 03:17:57PM +0530, vignesh C wrote:
> > Few comments:
>
> Pierre, do you plan to submit a new revision of this patch set for the
> November commitfest?  If not, the commitfest entry may be marked as
> returned-with-feedback soon.

I have changed the status of commitfest entry to "Returned with
Feedback" as the comments have not yet been resolved. Please handle
the comments and update the commitfest entry accordingly.

Regards,
Vignesh




Re: Document efficient self-joins / UPDATE LIMIT techniques.

2024-01-14 Thread vignesh C
On Tue, 31 Oct 2023 at 23:42, Corey Huinker  wrote:
>>
>>
>> I think the SQL statements should end with semicolons.  Our SQL examples
>> are usually written like that.
>
>
> ok
>
>
>>
>>
>> Our general style with CTEs seems to be (according to
>> https://www.postgresql.org/docs/current/queries-with.html):
>>
>>  WITH quaxi AS (
>>  SELECT ...
>>  )
>>  SELECT ...;
>
>
> done
>
>>
>>
>> About the DELETE example:
>> -
>>
>> The text suggests that a single, big DELETE operation can consume
>> too many resources.  That may be true, but the sum of your DELETEs
>> will consume even more resources.
>>
>> In my experience, the bigger problem with bulk deletes like that is
>> that you can run into deadlocks easily, so maybe that would be a
>> better rationale to give.  You could say that with this technique,
>> you can force the lock to be taken in a certain order, which will
>> avoid the possibility of deadlock with other such DELETEs.
>
>
> I've changed the wording to address your concerns:
>
>While doing this will actually increase the total amount of work 
> performed, it can break the work into chunks that have a more acceptable 
> impact on other workloads.
>
>
>>
>>
>> About the SELECT example:
>> -
>>
>> That example belongs to UPDATE, I'd say, because that is the main
>> operation.
>
>
> I'm iffy on that suggestion. A big part of putting it in SELECT was the fact 
> that it shows usage of SKIP LOCKED and FOR UPDATE.
>
>>
>>
>> The reason you give (avoid excessive locking) is good.
>> Perhaps you could mention that updating in batches also avoids
>> excessive bload (if you VACUUM between the batches).
>
>
> I went with:
>
>This technique has the additional benefit that it can reduce the overal 
> bloat of the updated table if the table can be vacuumed in between batch 
> updates.
>
>>
>>
>> About the UPDATE example:
>> -
>>
>> I think that could go, because it is pretty similar to the previous
>> one.  You even use ctid in both examples.
>
>
> It is similar, but the idea here is to aid in discovery. A user might miss 
> the technique for update if it's only documented in delete, and even if they 
> did see it there, they might not realize that it works for both UPDATE and 
> DELETE. We could make reference links from one to the other, but that seems 
> like extra work for the reader.

I have changed the status of commitfest entry to "Returned with
Feedback" as Laurenz's comments have not yet been resolved. Please
handle the comments and update the commitfest entry accordingly.

Regards,
Vignesh




Re: Build versionless .so for Android

2024-01-14 Thread Matthias Kuhn
What I try to do is packaging an app with androiddeployqt which fails with
an error:

The bundled library lib/libpq.so.5 doesn't end with .so. Android only
supports versionless libraries ending with the .so suffix.

This error was introduced in response to this issue which contains hints
about the underlying problem:

https://bugreports.qt.io/plugins/servlet/mobile#issue/QTBUG-101346

I hope this sheds some light
Matthias

On Fri, Jan 5, 2024, 21:57 Peter Eisentraut  wrote:

> On 05.01.24 01:00, Matthias Kuhn wrote:
> > Attached a patch with a (hopefully) better wording of the comment.
> >
> > I have unsuccessfully tried to find an official source for this policy.
> > So for reference some discussions about the topic:
> >
> > -
> >
> https://stackoverflow.com/questions/11491065/linking-with-versioned-shared-library-in-android-ndk
> <
> https://stackoverflow.com/questions/11491065/linking-with-versioned-shared-library-in-android-ndk
> >
> > -
> >
> https://stackoverflow.com/questions/18681401/how-can-i-remove-all-versioning-information-from-shared-object-files
> <
> https://stackoverflow.com/questions/18681401/how-can-i-remove-all-versioning-information-from-shared-object-files
> >
>   What I would like to see is a specific thing that you are trying to do
> that doesn't work.  Probably, you are writing a program that is meant to
> run on Android, and you are linking it (provide command line), and then
> what happens?  The linking fails?  It fails to run?  What is the error?
> Can you provide a minimal example?  And so on.
>
>


Re: BRIN indexes vs. SK_SEARCHARRAY (and preprocessing scan keys)

2024-01-14 Thread vignesh C
On Fri, 14 Jul 2023 at 20:17, Tomas Vondra
 wrote:
>
> On 7/9/23 23:44, Tomas Vondra wrote:
> > ...
> >>> Yes, my previous message was mostly about backwards compatibility, and
> >>> this may seem a bit like an argument against it. But that message was
> >>> more a question "If we do this, is it actually backwards compatible the
> >>> way we want/need?")
> >>>
> >>> Anyway, I think the BrinDesc scratch space is a neat idea, I'll try
> >>> doing it that way and report back in a couple days.
> >>
> >> Cool. In 0005-Support-SK_SEARCHARRAY-in-BRIN-bloom-20230702.patch, you
> >> used the preprocess function to pre-calculate the scankey's hash, even
> >> for scalars. You could use the scratch space in BrinDesc for that,
> >> before doing anything with SEARCHARRAYs.
> >>
> >
> > Yeah, that's a good idea.
> >
>
> I started looking at this (the scratch space in BrinDesc), and it's not
> as straightforward. The trouble is BrinDesc is "per attribute" but the
> scratch space is "per scankey" (because we'd like to sort values from
> the scankey array).
>
> With the "new" consistent functions (that get all scan keys at once)
> this probably is not an issue, because we know which scan key we're
> processing and so we can map it to the scratch space. But with the old
> consistent function that's not the case. Maybe we should support this
> only with the "new" consistent function variant?
>
> This would however conflict with the idea to have a separate consistent
> function for arrays, which "splits" the scankeys into multiple groups
> again. There could be multiple SAOP scan keys, and then what?
>
> I wonder if the scratch space should be in the ScanKey instead?

Are we planning to post an updated patch for this? If the interest has
gone down and if there are no plans to handle this I'm thinking of
returning this commitfest entry in this commitfest and can be opened
when there is more interest.

Regards,
Vignesh




Re: Add test module for Table Access Method

2024-01-14 Thread vignesh C
On Thu, 28 Sept 2023 at 10:23, Michael Paquier  wrote:
>
> On Sat, Jun 03, 2023 at 07:42:36PM -0400, Fabrízio de Royes Mello wrote:
> > So in order to improve things a bit in this area I'm proposing to add a
> > test module for Table Access Method similar what we already have for Index
> > Access Method.
> >
> > This code is based on the "blackhole_am" implemented by Michael Paquier:
> > https://github.com/michaelpq/pg_plugins/tree/main/blackhole_am
>
> dummy_index_am has included from the start additional coverage for the
> various internal add_*_reloption routines, that were never covered in
> the core tree.  Except if I am missing something, I am not seeing some
> of the extra usefulness for the patch you've sent here.

I have changed the status of commitfest entry to "Returned with
Feedback" as Michael's comments have not yet been resolved. Please
handle the comments and update the commitfest entry accordingly.

Regards,
Vignesh




Re: Add connection active, idle time to pg_stat_activity

2024-01-14 Thread vignesh C
On Wed, 25 Oct 2023 at 19:06, Andrei Zubkov  wrote:
>
> Hi Aleksander,
>
> On Wed, 2023-10-25 at 16:17 +0300, Aleksander Alekseev wrote:
> > On top of that not sure if I see the patch on the November commitfest
> > [1]. Please make sure it's there so that cfbot will check the patch.
>
> Yes, this patch is listed on the November commitfest. cfbot says rebase
> needed since 2023-08-21.

I have changed the status of commitfest entry to "Returned with
Feedback" as Andrei Zubkov's comments have not yet been resolved.
Please feel free to post an updated version of the patch and update
the commitfest entry accordingly.

Regards,
Vignesh




Re: Add last_commit_lsn to pg_stat_database

2024-01-14 Thread vignesh C
On Sat, 10 Jun 2023 at 07:57, James Coleman  wrote:
>
> I've previously noted in "Add last commit LSN to
> pg_last_committed_xact()" [1] that it's not possible to monitor how
> many bytes of WAL behind a logical replication slot is (computing such
> is obviously trivial for physical slots) because the slot doesn't need
> to replicate beyond the last commit. In some cases it's possible for
> the current WAL location to be far beyond the last commit. A few
> examples:
>
> - An idle server with checkout_timeout at a lower value (so lots of
> empty WAL advance).
> - Very large transactions: particularly insidious because committing a
> 1 GB transaction after a small transaction may show almost zero time
> lag even though quite a bit of data needs to be processed and sent
> over the wire (so time to replay is significantly different from
> current lag).
> - A cluster with multiple databases complicates matters further,
> because while physical replication is cluster-wide, the LSNs that
> matter for logical replication are database specific.
>
> Since we don't expose the most recent commit's LSN there's no way to
> say "the WAL is currently 1250, the last commit happened at 1000, the
> slot has flushed up to 800, therefore there are at most 200 bytes
> replication needs to read through to catch up.
>
> In the aforementioned thread [1] I'd proposed a patch that added a SQL
> function pg_last_commit_lsn() to expose the most recent commit's LSN.
> Robert Haas didn't think the initial version's modifications to
> commit_ts made sense, and a subsequent approach adding the value to
> PGPROC didn't have strong objections, from what I can see, but it also
> didn't generate any enthusiasm.
>
> As I was thinking about how to improve things, I realized that this
> information (since it's for monitoring anyway) fits more naturally
> into the stats system. I'd originally thought of exposing it in
> pg_stat_wal, but that's per-cluster rather than per-database (indeed,
> this is a flaw I hadn't considered in the original patch), so I think
> pg_stat_database is the correct location.
>
> I've attached a patch to track the latest commit's LSN in pg_stat_database.

I have changed the status of commitfest entry to "Returned with
Feedback" as Aleksander's comments have not yet been resolved. Please
feel free to post an updated version of the patch and update the
commitfest entry accordingly.

Regards,
Vignesh




Re: There should be a way to use the force flag when restoring databases

2024-01-14 Thread vignesh C
On Wed, 20 Sept 2023 at 17:27, Daniel Gustafsson  wrote:
>
> > On 20 Sep 2023, at 11:24, Peter Eisentraut  wrote:
> >
> > On 06.08.23 21:39, Ahmed Ibrahim wrote:
> >> I have addressed the pg version compatibility with the FORCE option in 
> >> drop. Here is the last version of the patch
> >
> > The patch is pretty small, but I think there is some disagreement whether 
> > we want this option at all?  Maybe some more people can make their opinions 
> > more explicit?
>
> My my concern is that a --force parameter conveys to the user that it's a big
> hammer to override things and get them done, when in reality this doesn't do
> that.  Taking the example from the pg_restore documentation which currently 
> has
> a dropdb step:
>
> 
> :~ $ ./bin/createdb foo
> :~ $ ./bin/psql -c "create table t(a int);" foo
> CREATE TABLE
> :~ $ ./bin/pg_dump --format=custom -f foo.dump foo
> :~ $ ./bin/pg_restore -d foo -C -c --force foo.dump
> pg_restore: error: could not execute query: ERROR:  cannot drop the currently 
> open database
> Command was: DROP DATABASE foo WITH(FORCE);
> pg_restore: error: could not execute query: ERROR:  database "foo" already 
> exists
> Command was: CREATE DATABASE foo WITH TEMPLATE = template0 ENCODING = 'UTF8' 
> LOCALE_PROVIDER = libc LOCALE = 'en_US.UTF-8';
>
>
> pg_restore: error: could not execute query: ERROR:  relation "t" already 
> exists
> Command was: CREATE TABLE public.t (
> a integer
> );
>
>
> pg_restore: warning: errors ignored on restore: 3
> 
>
> Without knowing internals, I would expect an option named --force to make that
> just work, especially given the documentation provided in this patch.  I think
> the risk for user confusion outweighs the benefits, or maybe I'm just not 
> smart
> enough to see all the benefits?  If so, I would argue that more documentation
> is required.
>
> Skimming the patch itself, it updates the --help output with --force for
> pg_dump and not for pg_restore.  Additionally it produces a compilerwarning:
>
> pg_restore.c:127:26: warning: incompatible pointer types initializing 'int *' 
> with an expression of type 'bool *' [-Wincompatible-pointer-types]
> {"force", no_argument, , 1},
>^~
> 1 warning generated.

I have changed the status of the patch to "Returned with Feedback" as
the comments have not been addressed for some time. Please feel free
to address these issues and update commitfest accordingly.

Regards,
Vignesh




Re: ALTER ROLE documentation improvement

2024-01-14 Thread vignesh C
On Tue, 26 Sept 2023 at 04:38, Nathan Bossart  wrote:
>
> On Fri, Sep 15, 2023 at 02:25:38PM -0700, Yurii Rashkovskii wrote:
> > Thank you for the feedback. I've updated the glossary and updated the
> > terminology to be consistent. Please see the new patch attached.
>
> Thanks for the new version of the patch.
>
>   This user owns all system catalog tables in each database.  It is also 
> the role
>   from which all granted permissions originate.  Because of these things, 
> this
> - role may not be dropped.
> + role may not be dropped. This role must always be a superuser, it can't 
> be changed
> + to be a non-superuser.
>
> I think we should move this note to the sentence just below that mentions
> its superuserness.  Maybe it could look something like this:
>
> This role also behaves as a normal database superuser, and its
> superuser status cannot be revoked.

Modified

> +   Database superusers can change any of these settings for any role, except 
> for
> +   changing SUPERUSER to NOSUPERUSER
> +   for a bootstrap 
> superuser.
>
> nitpick: s/a bootstrap superuser/the bootstrap superuser

Modified

>  #: commands/user.c:871
>  #, c-format
> -msgid "The bootstrap user must have the %s attribute."
> +msgid "The bootstrap superuser must have the %s attribute."
>  msgstr "Der Bootstrap-Benutzer muss das %s-Attribut haben."
>
> No need to update the translation files.  Those are updated separately in
> the pgtranslation repo.

Removed the translation changes

The attached v3 version patch has the changes for the same.

Regards,
Vignesh
From 0b92a0abfbd759a295eb8d5ec80d9203486a0e46 Mon Sep 17 00:00:00 2001
From: Yurii Rashkovskii 
Date: Fri, 15 Sep 2023 11:41:46 -0700
Subject: [PATCH v3] Improve ALTER ROLE documentation to document current
 behavior.

Previously, this was possible (assuming current_user is a bootstrap user):

```
ALTER ROLE current_user NOSUPERUSER
```

However, as of 16.0 this is no longer allowed:

```
ERROR:  permission denied to alter role
DETAIL:  The bootstrap user must have the SUPERUSER attribute.
```

Also, update the term across the board to use "bootstrap superuser"
---
 doc/src/sgml/glossary.sgml   | 3 ++-
 doc/src/sgml/ref/alter_role.sgml | 4 +++-
 doc/src/sgml/user-manag.sgml | 2 +-
 src/backend/commands/user.c  | 2 +-
 4 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
index 5815fa4471..5839094fce 100644
--- a/doc/src/sgml/glossary.sgml
+++ b/doc/src/sgml/glossary.sgml
@@ -247,7 +247,8 @@
 
 
  This role also behaves as a normal
- database superuser.
+ database superuser,
+ and its superuser status cannot be revoked.
 

   
diff --git a/doc/src/sgml/ref/alter_role.sgml b/doc/src/sgml/ref/alter_role.sgml
index ab1ee45d54..361bae27c3 100644
--- a/doc/src/sgml/ref/alter_role.sgml
+++ b/doc/src/sgml/ref/alter_role.sgml
@@ -69,7 +69,9 @@ ALTER ROLE { role_specification | A
GRANT and
REVOKE for that.)
Attributes not mentioned in the command retain their previous settings.
-   Database superusers can change any of these settings for any role.
+   Database superusers can change any of these settings for any role, except for
+   changing SUPERUSER to NOSUPERUSER
+   for the bootstrap superuser.
Non-superuser roles having CREATEROLE privilege can
change most of these properties, but only for non-superuser and
non-replication roles for which they have been granted
diff --git a/doc/src/sgml/user-manag.sgml b/doc/src/sgml/user-manag.sgml
index 92a299d2d3..1c011ac62b 100644
--- a/doc/src/sgml/user-manag.sgml
+++ b/doc/src/sgml/user-manag.sgml
@@ -350,7 +350,7 @@ ALTER ROLE myname SET enable_indexscan TO off;
options. Thus, the fact that privileges are not inherited by default nor
is SET ROLE granted by default is a safeguard against
accidents, not a security feature. Also note that, because this automatic
-   grant is granted by the bootstrap user, it cannot be removed or changed by
+   grant is granted by the bootstrap superuser, it cannot be removed or changed by
the CREATEROLE user; however, any superuser could
revoke it, modify it, and/or issue additional such grants to other
CREATEROLE users. Whichever CREATEROLE
diff --git a/src/backend/commands/user.c b/src/backend/commands/user.c
index 7e81589711..7a9c177b21 100644
--- a/src/backend/commands/user.c
+++ b/src/backend/commands/user.c
@@ -868,7 +868,7 @@ AlterRole(ParseState *pstate, AlterRoleStmt *stmt)
 			ereport(ERROR,
 	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 	 errmsg("permission denied to alter role"),
-	 errdetail("The bootstrap user must have the %s attribute.",
+	 errdetail("The bootstrap superuser must have the %s attribute.",
 			   "SUPERUSER")));
 
 		new_record[Anum_pg_authid_rolsuper - 1] = BoolGetDatum(should_be_super);
-- 
2.34.1